Blackmagic Forum

Sat Nov 15, 2014 8:46 pm

@Miro Markov: Maybe this will be of interest to you:

ppbm7 [dot] com/index.php/tweakers-page/85-raid-or-not-to-raid/96-raid-or-not-to-raid

Sat Nov 15, 2014 10:25 pm

7 slots is not unique. All the Gigabyte x99 boards have 7 PCI-e slots

Sat Nov 15, 2014 10:27 pm

JerryBruck wrote:@Adam Simmons
I've put your objection to the WS, out to some of the experienced experts I've been following and if they answer, I'll report back. Even if this turns out to be an issue that can't be settled, the discussion might be helpful to noob nation, amidst whose multitudes I too belong.

Adam is correct regarding Xeon/SuperMicro versus i7/ASUS E-WS.
The current highend Xeons provide 40 PCIe Lanes (eg E5-2697 v3), the current highend i7's provide 40 PCIe Lanes (eg. i7-5960X).
The i7's however support a Max CPU Configuration of 1.

So a dual-Xeon motherboard running two E5-2697 v3 will give 28 cores and 80 PCIe Lanes from the processors.
Whereas a single-i7 motherboard running one i7-5960 will give 8 cores and 40 PCIe Lanes from the processor.

ASUS X99 motherboards such as the Deluxe or E-WS have single space slots, which would be fine if running only one or two video cards, or multiple single-slot video cards such as the AMD FirePro S7000, but the GPU core count and overall GPU performance would be lower than what is possible on the Xeon/SuperMicro system.

The total available PCIe Lanes in the system are spread across certain motherboard resources plus the installed peripherals in the PCIe slots.
As an example a high performance x16 RAID card will require 16 Lanes to achieve maximum performance.
So in basic terms, with a single i7-5xxx with its 40 Lanes, depending on the specific hardware configuration, you won't get two x16 GPUs and an x16 RAID card all functioning at x16 speeds, one will be limited to x8.
Note, there are additional PCIe Lanes available in the chipset, but they are typically bandwidth limited by the CPU interlink speed, so they are typically only used for peripherals with fewer lanes, such as basic I/O cards, SSD PCIe cards, etc.
Some highend motherboards include information as to which slots are connected to which Lane sources, eg. chipset, cpu0, cpu1. Some slots may be limited or unusable with only a single processor.

As an example, a properly configured dual-Xeon system with 80 CPU PCIe Lanes should basically allow 4 GPUs at x16 speed, an x16 RAID adapter, plus all motherboard resources.
A single CPU i7-5xxx with 40 Lanes won't. You would basically be limited to 2 x16 GPUs, and an x8 RAID adapter, plus all motherboard resources.
Note that this is a basic example only, it will vary depending on the specific hardware chosen.
Even the ASUS X99E-WS won't go over 4 x16. So you won't get 4 x16 GPU plus an x16 RAID, some peripherals will drop to x8. IIRC go over by 1 lane (ie. install 5 cards), and it drops to x16x16x8x8x8x8x8.

You may have also noticed that the E5-2697 v3 has a lower clock speed than the i7-5960.
In this case that is of little consequence. If the software running is GPU bound, having 4 x16 GPUs will always out-perform 2 x16 GPUs.

An i7-5xxx and ASUS X99E-WS will still be a decent system.
If maximum horsepower is a requirement though, there are more powerful configurations available.

JerryBruck wrote:How are you cooling them, especially the last category?

Good heatpipe coolers are typically within less than 5 degrees of water cooling. Even on overclocking.
Board flex weight, rad location, and possible cpu location space savings are usually the only benefits to water cooling.

Miro Markov wrote:The more i read about RAID, the more i question how good a built-in MB RAID controller is.

Motherboard RAID (aka Firmware RAID) only provides minimal performance and advantages over Software RAID. A proper dedicated RAID adapter is the best solution, but may have drawbacks for upgradeability and drive portability.
My 16-core system has an x16 RAID adapter and two 15k RPM SAS drives, it is loud but equals or beats many consumer SSDs for read/write performance.

Sat Nov 15, 2014 11:39 pm

@David Green

I've said it before and I'll say it again, though I think it should be clear enough: nobody but nobody will contest the desirability of a two-processor system and the various goodnesses it embodies, what I and others on this thread are concerned with is: what performance can you squeeze from a single-processor system? This is of interest to persons without the many thousands of additional dollars/pounds Euros required. I see no reason to fabricate an argument over this.

Ignoramus that I am, I do believe that chip speed matters, across the various functions of the various NLEs, graders &c. This really runs up the cost of dual Xeons. "If the software running is GPU-bound" is a big If. I don't know who mentioned 4 GPUs, not I, I'm concerned with the generality of tasks, and a balanced system overall.

As for the needs of a top hardware RAID controller, the Areca ARC-1883IX-24 takes just 8 lanes, not your 16, so there.

On water cooling, the benefits you mention as minor, especially rad location, may prove more than minor, I'll be the canary in the mineshaft here. I shall report. Perhaps you'll have the last laugh, perhaps not.

Sun Nov 16, 2014 1:03 am

Yeah, more CPUs, more cores, more GPUs - who would not want that?

But I think a x99 board, with a new 6core, 2 GPUs, a BM I/O board, and maybe a TB board, plus a nice internal RAID 0 with 4 or 5 drives, that runs from the board (software/firmware RAID), is the price/performance sweetspot in the moment.

Sun Nov 16, 2014 2:42 am

@JerryBruck

Sorry, I did not know that this was an argument.
My post was meant as simply the technical merits of the two platforms.
Hopefully the information can help someone to decide what direction their purchase decisions should go.

If all that a person wants and can afford and will ever require is two x16 GPUs, an x8 RAID adapter, and one or two other low-lane-count PCIe peripherals, then an ASUS X99E-WS and i7-5960 would likely be sufficient. It's only limitation is that it really cannot be expanded much beyond that. With two GPUs, a RAID, and two other PCI peripherals, it will typically be running at x16x16x8x8x8x8x8.
If a person has the budget and requires the leading edge in performance, or would like near-future expandability such as additional GPUs, then they probably should look at other hardware.

Processor speed typically only matters for software that is CPU-bound and that does not employ heavy multi-threading. As core count increases, clock speed typically decreases. Whether there are any benefits to faster processor speed must be taken on a per-software basis.
I also write vertical-market heavily multi-threaded software (256 to 32768 thread count) for GIS/CAD-CAM/video game development, where a 2.0GHz 16-core processor will out-perform a 3.5GHz 4-core hyper-thread processor.
Typical highend video editing software will be more I/O-bound (disk read/write) and bus-bound (moving data from main memory to GPU memory), neither of which will see much or any benefit from higher processor speeds or especially from processor overclocking.
Video frame modification and convolution will see greater throughput benefits from higher multi-threading than from higher clock speeds (it is easier to double the number of threads/cores than to double the clock speed), assuming that the software supports multi-threading (the more chunks the frame is split into, the more parallel processing can be achieved, and the higher the frame throughput). Which is typically why GPUs are utilized since they contain significantly more cores (in the hundreds or thousands).

An x8 RAID adapter is usually sufficient for most people's needs.
A single PCIe 3.0 lane spec is 8GT/s or just over 950MB/s.

@Frank

Be sure to check the specs on which new i7 and compare that to the system's hardware requirements, as the new i7 processors vary in the number of PCIe lanes supported on-processor.
For example, the i7-5820K 6-Core HT processor only provides 28 PCIe Lanes on-processor. That will typically only get you one x16 GPU, one x8 RAID, a couple of low-lane-count peripherals, along with the rest of the motherboard peripherals. Going dual-GPU will likely drop them to x8.
The i7-5930K 6-Core HT on the other hand, provides 40 PCIe Lanes.
The Intel processor technical specs are available on the Intel ARK pages.

Also be aware that a HyperThread typically only provides between 5% to 20% of the performance of a regular physical core. So a Quad HT only has the throughput of about 4.5 to 5 cores.

Sun Nov 16, 2014 9:41 am

Yes, because of the 40 lanes, the i7-5930K was, what I had on my mind.

Regarding clocking. Though, Premiere or Resolve thread pretty well, there is always an effect or plugin that doesn't, and make the whole computer waiting.
Even on a well balanced computer (specs wise) I found software that does not multi-thread well, some of the most annoying bottlenecks. More often than not I see CPU/GPU hovering at 25%, while there is plenty RAM just getting bored, and the I/O is also only at 30% of it's capacity.

And thanks for your interesting posts, I learn something new every time.

Sun Nov 16, 2014 5:13 pm

The Titan Z at $1400 is a good choice for a single card solution. This should hold you over until Nvidia Pascal is out in 2016.

Windows 10 should be good, since it follows Microsoft's good/crap, tick/tock software development process.

Sun Nov 16, 2014 7:07 pm

@Frank

If the system supports only a single processor, and especially if it is only 4, 6, or 8 cores, the fastest processor possible will always help.

@Brent

IMHO Windows 8.1 is an excellent operating system.
It has many core features that are significantly improved over Windows 7.

@OP

As this thread is originally regarding a "new system for realtime 4k in Resolve", IMHO the better choice for hardware is, as Adam suggested, multi-processor with more PCIe lanes and double-spaced slots.

Decompressing 4k/5k/6k raw files is processor intensive and will be improved with a dual processor system.
Dual dedicated CUDA/OpenCL GPUs will result in faster frame processing.
This type of configuration cannot be realized on a single processor single-spaced slot platform.

A typical high-end 4k+ editing system may be configured such as:
- dual slot x16 GPU for GUI (display monitors)
- dual slot x16 GPU dedicated for image processing
- dual slot x16 GPU dedicated for image processing
- optional x8 or x16 RAID controller
- optional BMD adapter for HDMI or SDI reference monitor
- optional Thunderbolt adapter
- other optional adapters

This type of configuration requires more than 40 PCIe Lanes, and a motherboard that supports dual-spaced PCIe slots. This can only be achieved on a dual processor system and typically a SuperMicro motherboard.

I would refer any actual hardware configuration and deployment to Adam.

Sun Nov 16, 2014 9:29 pm

David Green wrote:@Frank

If the system supports only a single processor, and especially if it is only 4, 6, or 8 cores, the fastest processor possible will always help.

...

Although most X99 boards will also take a single Xeon CPU up to 18 core

Mon Nov 17, 2014 8:30 am

I ALSO..... am in the market to build a new machine for Real time 4k editing!!! Shooting in RAW .DNG file format, using After Effects to color and render out to AVI, than to Vegas to edit the AVI and output compressed. And if not shooting RAW than I'm shooting 4k prores files. Editing with Resolve. I've noticed heavy CPU use when editing/rendering with RAW in after effects. So I am willing to spend the money on duel XEON cpu's but not if the performance increase is maybe 10percent over a good single i7 setup. That's my main qaundry, what percentage jump am I seeing with my listed workflow?? I almost need to provide the RAW footage to someone with the duel cpu's to see for my self. FYI I'm running i7 4770 overclocked with duel GTX980's for their HDMI 2.0 output advantage since I pump it to a 4k LG 48" tv, since it has no DVI input or display port input. And this system can't touch 4k real time, prores 4k in Resolve can playback at 20fps but add some layers and it's dead when it comes to smooth playback.

Mon Nov 17, 2014 9:38 pm

Adam Simmons wrote:Although most X99 boards will also take a single Xeon CPU up to 18 core

True. My comment to Frank was regarding the i7 series since that is what he was referring to in his posts.
As an example only, an i7-5930K 3.7GHz 6-Core is around $650, whereas a Xeon E5-2697 v2 2.7GHz 12-Core is around $3200, which I don't believe is in his budget range.

Deanzsyclone wrote:(1) I've noticed heavy CPU use when editing/rendering with RAW in after effects.
(2) what percentage jump am I seeing with my listed workflow??
(3) I'm running i7 4770 overclocked with duel (dual) GTX980's

I highlighted a few comments in your post.
I am not sure if I answered your question below, as I often post TMI.
Post any further questions here if you need to.

The basic answer is yes you should see a performance difference in AE and your workflow if you upgraded from the 4 core i7-4770 to an 8 core i7-5960 for example (plus motherboard). Undoubtedly you may also have to upgrade a few other areas of the computer as well though, as you may shift any bottlenecks around.

1. Adobe Premiere and After Effects are highly threaded multi-processor multi-GPU applications, so they will make full use of all processor resources.
If your workflow in AE is pushing the CPU usage across all cores to 90%+ then you will see a benefit to moving to a higher core count CPU.

Depending on your AE preferences settings and which AE functions you are using, the CPU's HyperThreads will not be utilized, so your i7 will effectively be running on only the 4 physical cores.
HyperThreads only typically give 5% to 20% more performance anyway, so I tend to discount them when looking at overall performance specifications as they give very little benefit in most real-world applications. The same goes with overclocking the processor.

In almost all software code cases, processor core scalability is not linear.
In other words, upgrading from a 4 core CPU to an 8 core CPU does not guarantee a two-fold increase in performance (ie. is the time to completion cut in half).
There is overhead for each application thread, so as more threads are spawned, the performance gain returns slowly diminish by a small factor.
There is also processor execution waits introduced when multiple processors/threads require access to a shared resource. etc.

Depending on which functions you are using in AE, a 4 core processor is only just sufficient to run it.
For example, if Multi-Frame Rendering is enabled, the default is to reserve 2 physical cores for the OS which leaves only 2 physical cores for AE.

You also do not mention how much RAM is installed in your system.
A large amount of RAM will be important in 4K resolutions and especially as you increase core count.

2. That would depend on the scalability of the specific function(s) you are running, and in which software application (AE, Resolve, Premiere, etc.).

In a well threaded application, you will typically see around a 95% gain in performance for each additional core. HyperThreads don't count here.
In other words: 1 core = 100%, 2 cores = 195%, 3 cores = 290%, 4 cores = 385%, and on ...

Using Adobe AE as an example, it can use all cores to accelerate rendering within a single frame, or to render multiple frames simultaneously.

The basic algorithm for performing modification or convolution on an image buffer is to split it into the number of blocks equal to the number of processing threads.
To visual it in simplistic terms, an HD 1080 video frame would be split into blocks of video pixel lines from top-to-bottom for each core, so a 4 core CPU would simultaneously process the image as four 270 line blocks, a 16 core CPU would simultaneously process the image as fifteen 68 line blocks and one 60 line block.
If you have a system with 16 cores versus a system with 4 cores, the 16 core system will process the frame almost four times faster (minus the thread overhead mentioned above), assuming similar clock speed, since each core only has to process ~68 lines versus 270 lines in the image frame.
GPU-based image processing ramps this up significantly, as many GPUs have more than a thousand cores, so in simplistic terms an image can be processed on a single line per core basis, or 270 times faster than what a quad core CPU can do.

One typical stipulation on increasing threading and core count for image processing, is that typically the more cores you are running, the more RAM you will also require and consume.
This is because two threads cannot write to the same memory address region simultaneously, and since thread execution timing is non-deterministic, read/writes cannot be assumed to be successfully interleaved across threads, so separate individual memory buffers for each thread are typically employed.

3. No offense meant to anyone with my comment here. People need to stop falsely believing that processor overclocking is going to solve any real performance issues.
The real-world performance gains for overclocking lies somewhere between zero and minimal.
All that overclocking usually introduces is instability and additional heat. Something that you never want in a production system.

Mon Nov 17, 2014 10:51 pm

Just so you know you have to use the V3 Xeons on the X99 boards.

I'm not saying they are within his budget, but just pointing out that the boards will take the V3 Xeons with more cores. The 14 core 2.6Ghz V3's are about the same price as the 12 core 2.7Ghz V2's

Mon Nov 17, 2014 11:02 pm

Adam Simmons wrote:Just so you know you have to use the V3 Xeons on the X99 boards.

Yes, I know.
I didn't have the list price on a v3 Xeon handy, which is why I prefixed the comment with "as an example only".

Mon Nov 17, 2014 11:20 pm

Adam Simmons wrote:Just so you know you have to use the V3 Xeons on the X99 boards.

I'm not saying they are within his budget, but just pointing out that the boards will take the V3 Xeons with more cores. The 14 core 2.6Ghz V3's are about the same price as the 12 core 2.7Ghz V2's

V3 Xeons? Which ones are you talking about? I thought you had to use the i7 2011 v3 with the x99. Which would be better in the end?

Mon Nov 17, 2014 11:29 pm

Most X99 boards will take both the Haswell-e and the E5-2xxx v3 xeons. As to which is best, it all depends on what it's being used for and how well multi-threaded the software is. If it's well multi-threaded then the the more cores the better, if it's poorly multi-threaded then you want to go with higher clock speeds over cores. Personally if I could afford it I'd go for a 14 core at the moment or wait until the 18 cores come out and get one of those

Tue Nov 18, 2014 12:37 am

Three points:

I'd rather have the the extra clock frequency speeds with the i7 over the Xeon. Everyday application performance is usually better with higher clocks.

We are on the border of a major shift in Silicon design with the 3D chips. I just don't see the point of going all in right now, while within a years time we will have GPU's that can handle 4K with a lot of headroom.

The nMP is a total waste of money for what it is.

Tue Nov 18, 2014 1:00 am

William Edwards wrote:Which would be better in the end?

As Adam mentioned, it would depend a lot on what you are running, and also what your budget is.

Some relevant comparative features of a similarly clocked Xeon E5 v3 with 10 cores:

Code: Select all: i7-5960X E5-2687W v3 Cores 8 10 Cache 20MB 25MB Max CPUs 1 2 HyperThread Yes Yes Freq Base 3.0GHz 3.1GHz Freq Turbo 3.5GHz 3.5GHz Max Memory 64GB 768GB ECC Memory* No Yes Price** $1300 ~$2150 (Ark box/tray price) * Requires a compatible motherboard and ECC memory sticks. ** Approximate price only. Xeon price from ark.intel.com

If you require more than 8 cores.
If you require more than 64GB of memory.
If you require ECC Memory (I recommend it if going over 32GB, but that is a personal decision).
If you require more than 40 Processor PCIe Lanes (dual Xeons = 80 Lanes).

If any of the above are true, then the Xeon processor would be required.
Note that not all of these processor features are supported on all LGA2011-v3 compatible motherboards.

The Xeon E5 v3 processor family specifications can be found on the ark.intel.com website.
There are a wide number of Xeon E5 v3 processors with a variety of specifications and price range.

brent k wrote:Three points:
... extra clock frequency speeds with the i7 over the Xeon ...
... within a years time we will have GPU's ...
... a total waste of money ...

I don't mean to sound argumentative, do you have anything to back up those points?

1. Xeon E5 v3 processors are available that clock as high or higher than the i7 2011-3 processors.
The i7 is limited to a maximum of 8 cores and single CPU.
A dual Xeon E5-2687W-v3 3.5GHz system with 20-cores will fly past the fastest i7 8-core.
A single Xeon E5-2687W-v3 10-core benches higher than the i7-5960X 8-core.
An E5-2697-v3 2.6GHz 14-core will also easily beat an i7-5960X 3.0GHz 8-core in heavy-threaded software.
Which processor family owns all of the top spots on the PassMark CPU benchmark?

2. GPU based solutions have a number of limitations.
Not everything can be done on the GPU.
PCIe bandwidth is nowhere near what memory bandwidth is.
It is easier and less expensive to put large amounts of memory onto the motherboard.
Not all software supports CUDA/OpenCL for all functionality, not all software will support it in a year, some software will never support it.
Stating that things may come some day in the future is fine for those who are not having to work right now.

3. This thread is a discussion regarding high-end dedicated editing systems.

*edit* Fixed incorrect i7 price

Tue Nov 18, 2014 3:23 am

David Green, Thanks so much for the info, some of which I suspected that multi core setup would really benefit with my type of workload. And yes, my cores are running 100% when converting the raw files in AE. But maybe only a 70% load in Vegas 13 when rendering from AVI to MPG4 4k. Sorry to mention but I am using 32 gigs of system memory and its working nicely. But would jump to 64+ knowing now that more cores will tend to use more memory in certain workloads so that would be useful.

Keeping in mind, not upgrading, but building a new system... will have a drive running raid 30 for large video files (avi's and .dng's) Anyhow thanks again for the info, wanted to hear what I suspected, from someone in the know. Especially nice to know the details.

Tue Nov 18, 2014 4:06 am

David Green wrote:I don't mean to sound argumentative, do you have anything to back up those points?

1. Xeon E5 v3 processors are available that clock as high or higher than the i7 2011-3 processors.
The i7 is limited to a maximum of 8 cores and single CPU.
A dual Xeon E5-2687W-v3 3.5GHz system with 20-cores will fly past the fastest i7 8-core.
A single Xeon E5-2687W-v3 10-core benches higher than the i7-5960X 8-core.
An E5-2697-v3 2.6GHz 14-core will also easily beat an i7-5960X 3.0GHz 8-core in heavy-threaded software.
Which processor family owns all of the top spots on the PassMark CPU benchmark?

2. GPU based solutions have a number of limitations.
Not everything can be done on the GPU.
PCIe bandwidth is nowhere near what memory bandwidth is.
It is easier and less expensive to put large amounts of memory onto the motherboard.
Not all software supports CUDA/OpenCL for all functionality, not all software will support it in a year, some software will never support it.
Stating that things may come some day in the future is fine for those who are not having to work right now.

3. This thread is a discussion regarding high-end dedicated editing systems.

1. A simple overclock of the i7 can span the gap quite a bit, not all software uses multi-threading. Most software shows bigger gains with higher clock speeds
2. Use a GPU based software.

3. the OP asked about Apple

I'm not trying to be a dick here. I've always run into the storage being the Achilles heel of my computer. Because of this, it was never worth the extra price of the high end hardware.

Tue Nov 18, 2014 6:02 am

brent k wrote:1. A simple overclock of the i7 can span the gap quite a bit, not all software uses multi-threading. Most software shows bigger gains with higher clock speeds

Yeah, supposedly most i7-5960X units overlock to ~4.4 GHz without too much trouble. That's basically as much total compute as a 14-core Xeon E5-2697 @ 2.6 GHz... at ~1/3 of the price. (Xeon overclocking options are, of course, much more limited.)

The Xeon has more L3, but the fact that the 5960X delivers that power over fewer cores probably offsets any advantage that would bring for most workloads. Unless you really need ECC or something, it seems to make more sense to stick to i7 unless you're going dual socket.

2x 10-cores @ 3.1 GHz on the 7048GR-TR platform sure is tempting... but you're easily talking $9K+ by the time you add RAM, three GPUs, and the other items you need to complete a system like that. Still, makes one hell of an end-of-year tax write-off.

Tue Nov 18, 2014 6:36 am

For me, if I needed more processing power, I would add more nodes. Although, I switched to all GPU based software, I know it sucks to learn a new program, but I question the longevity of a software package that isn't up to date with the newer tech.

My high end computer's parts usually end up in my other PC's when I go to upgrade. I've bought Xeon workstations, SGI computers, the latest and greatest, and everything under the sun, lately though, I've found just buying a high end gaming system every couple of years is the way to go. That way, price and durability is not an issue, plus you always have the latest and greatest.

If I were building a system, it would be.
ASUS X99-E
Intel 5930K
Nvidia GTX980 2 or 3 of them.
SSDs
the other usual stuff

Wed Jan 07, 2015 5:50 pm

Great stuff guys! The word "realtime" is the key here.

I'm on project where we are going with different plan. We are going on 4K camerawork earliest at the 3Q of 2015, but probaply 2016. Right now I'm building for beautiful quality FullHD 14-bit raw pipeline, where the key is in good cinematography.

Suddenly it's dirty cheap to go with 1080P rec.709 and DCI 2k. Talking about native resolution monitors and X99.

My only worry is, that lookin at CES 2015, it's easy to say that almost every TV and monitor to be sold now, is going to be 4K. So that beautiful 1080P will be upscaled to scheis#e on those 4k panels.

Anyway, we are building for High IQ FullHD, realtime Resolve, AE, Premiere.
And updating it to 4K, but not yet.

Money is issue. We are willing to spend, but when we invest we want to get there. And we are holding because not everything is available for same quality 4K right now. Easy of use is as important as IQ.

My Full HD build plan is as it stands:

Windows
ARC XL case - This is future proofed, and the price is right.
X99 Rampage V motherboard - best support, best OC
64 GB DDR4 RAM - no reason to wait, 32GB might be enough, but you want to buy at once right?

PSU - not sure yet, 1200 is where it stands, overkill FullHD? too little 4K?

SSD - still planning. While there's going to be interesting M.2 and PCIe SSD coming to the markets, I think for the nature of really short films this project produces, Pro 850, 3x256GB Raid 0 and one 1TB might do it for starters. Maybe add 1x256GB for OS. Add from there when Samsung brings in the new M.2 and PCIe SSD competition starts cutting prices.

CPU 5920K - I was going with 5960X, but the price difference doesn't make sense. Six cores here for 355 euros ($420). Good clock speeds. X99. For FullHD work one good GPU is enough? This can handle two in SLI.

GPU - This is where I would want to wait. GTX 780ti still gives the best results for Adobe CC. Only 3GB ram, but the price is right. 980 gives us HDMI 2.0 but I'm not going for 4K monitor yet. Titan Black would solve the issue with 6GB, but the price makes you wonder as there should be interesting stuff coming up... for FullHd I think I go either 1x780ti/1xtitan black/2x780ti? But the price has to be right.

Now this is all good for Full HD (right?) but the 4k day comes here too. So I was reading this great stuff here and it occurred to me that, what if I update with the MoBo? I mean 2016 we know lot more about realtime 4k workflow capabilities on X99 CPU's. And adding another CPU would let me keep 5920K...?

Few questions:

1) What is your take on Dual CPU motherboards like Z10PE-D8 WS?
It is only 40 euros ($50) more that the X99-e WS and 70 euros ($80) more than Rampage V.
Adding 5960X in 2016 should be more economic. There should be more CPU options then too.
Would this work?
I mean for example two 5920K's would give you 12 core's for 710 euros ($840)...?

2) Does Resolve still like more Titan Black 6GB than 2x 780ti 3GB SLI?

3) Extra question, What do you think about the upscaling quality of the current 4K panels?

Thanks.
K

Wed Jan 07, 2015 5:59 pm

Disks? if you want to do realtime those need to be well though in advance. (and ain't cheap).

Total Budget?

Wed Jan 07, 2015 6:20 pm

KevinCarter wrote:X99 Rampage V motherboard - best support, best OC

I would not bother counting on OC'ing as a benefit.
In a workstation, stability is preferred over overclocking.
And there is no guarantee what percentage of overclock you can achieve on any set of parts.

KevinCarter wrote:CPU 5920K

Do you mean 5930K? Or 5820K? I have not seen a 5920K on Intel's site.

5820K: http://ark.intel.com/products/82932/
5930K: http://ark.intel.com/products/82931/

The 5820K has 28 PCIe Lanes, which will not support two x16 video cards.
The 5930K has 40 PCIe Lanes.

KevinCarter wrote:SSD - still planning. While there's going to be interesting M.2 and PCIe SSD coming to the markets.

Keep in mind that PCIe based SSD boards will require additional PCIe Lanes, so the processor choice may matter.

KevinCarter wrote:1) What is your take on Dual CPU motherboards like Z10PE-D8 WS?
It is only 40 euros ($50) more that the X99-e WS and 70 euros ($80) more than Rampage V.
Adding 5960X in 2016 should be more economic. There should be more CPU options then too.
Would this work?
I mean for example two 5920K's would give you 12 core's for 710 euros ($840)...?

The ASUS Z10 boards are nice. SuperMicro is another good board vendor.

You cannot put two i7's into one computer, they do not support dual CPU configuration.
You have to go with E5 processors for dual CPUs.

Wed Jan 07, 2015 6:31 pm

Thanks Walter.

waltervolpatto wrote:Disks? if you want to do realtime those need to be well though in advance. (and ain't cheap).

Total Budget?

All SSD samsung pro850... / maybe PCIe SSD, I've put lot of though on this. Waiting for more info about P3600/P3500 or the likes we heard today; HyperX Predator SSD and SM951-SSD.
The key is that right now I don't need lot of space. Get it in, get it out.
Total budget max out at around 6500 euros ($7700) (including Eizo), but right now I'm aiming lot cheaper.

Wed Jan 07, 2015 6:48 pm

Thanks David.

David Green wrote:
KevinCarter wrote:X99 Rampage V motherboard - best support, best OC

I would not bother counting on OC'ing as a benefit.
In a workstation, stability is preferred over overclocking.
And there is no guarantee what percentage of overclock you can achieve on any set of parts.

Yes, Asus WS was the first idea, but it's been questioned. "switches" they say.

David Green wrote:
KevinCarter wrote:CPU 5920K

Do you mean 5930K? Or 5820K? I have not seen a 5920K on Intel's site.

Typo.
5820K, six cores for 355 euros ($420).

David Green wrote:
KevinCarter wrote:SSD - still planning. While there's going to be interesting M.2 and PCIe SSD coming to the markets.

Keep in mind that PCIe based SSD boards will require additional PCIe Lanes, so the processor choice may matter.

That's true. Then again, doesn't the MoBo have something to do with it?

SM951 finally announced by the way:
http://www.techpowerup.com/208683/samsu ... -ssds.html

David Green wrote:
KevinCarter wrote:1) What is your take on Dual CPU motherboards like Z10PE-D8 WS?
It is only 40 euros ($50) more that the X99-e WS and 70 euros ($80) more than Rampage V.
Adding 5960X in 2016 should be more economic. There should be more CPU options then too.
Would this work?
I mean for example two 5920K's would give you 12 core's for 710 euros ($840)...?

The ASUS Z10 boards are nice. SuperMicro is another good board vendor.

You cannot put two i7's into one computer, they do not support dual CPU configuration.
You have to go with E5 processors for dual CPUs.

Damn, there goes my good idea.

I think the consensus is that 5820K is not enough for4K realtime Resolve?

Wed Jan 07, 2015 9:05 pm

KevinCarter wrote:Yes, Asus WS was the first idea, but it's been questioned. "switches" they say.

The MUX is buffered non-blocking (ie. there is no data loss, only possible delay).
And any throughput propagation delay is typically in the nanoseconds.

Without seeing actual proof of a system with issues I would take it with a grain of salt.
The system would have to be significantly loaded with high-transfer-requirement x16/x8 boards to cause any conceivable problems.
There are many gamers running dual and triple SLI x16 GPUs plus peripherals on MUX boards without issues.
MUX boards are not new technology, they have been around for years.

That being said, in my opinion, an i7+MUX would not compete all-out with a dual E5 CPU 80 PCIe Lane setup. But the i7+MUX would come in at half the cost of the dual-E5.

As with all things in life, especially with the wide variety of computer hardware configurations, you would have to try it to see; pay your money and take your chances. My posts come with a disclaimer of no guarantees, all that I am giving is information.

I personally have an ASUS X99-E WS plus i7-5960X plus triple-FirePro system build planned for this early summer, so we will see how that goes. I expect it to be significantly faster than my older 16-core system. This is not a 4k system however.

KevinCarter wrote:Typo.
5820K, six cores for 355 euros ($420).

The i7-5820K has 28 PCIe 3.0 Lanes.

With that you can get one GPU running at x16 in an x16 slot, a second GPU running at x8 in an x16/x8 slot, and with 4 Lanes to spare for any other PCIe 3.0 peripherals in other motherboard x16/x8/x4 slots.

This would be fine for something like a GUI GPU at x8, a Compute GPU at x16, and a PCIe 3.0 SSD card, if that was all you required. Whether that will give you real-time 4k...

KevinCarter wrote:... doesn't the MoBo have something to do with it?

The motherboard chipset (depending on the series and model) typically only provides a limited number of PCIe 2.0 Lanes.

The chipset Lanes typically connect to some of the existing on-board peripherals, and depending on the motherboard design possibly to some of the smaller PCIe slots such as any x1, x2, or possibly x4 slots (these will be limited to the PCIe 2.0 speed).

Some motherboards will also use different color PCIe slot connectors (eg. black, brown, tan, white, blue) to differentiate which slots are connected to the processor's PCIe 3.0 Lanes and which slots are connected to the chipset's PCIe 2.0 Lanes.

Note PCIe 3.0 is 8GT/sec and PCIe 2.0 is 5GT/Sec. GT = GigaTransfers. So PCIe 2.0 is not that much slower.

KevinCarter wrote:I think the consensus is that 5820K is not enough for4K realtime Resolve?

The limitation will be the number of PCIe 3.0 Lanes on the 5820K processor (28 Lanes).
For example, you will not be able to run two or three high-end GPUs at x16 each.

If, for example, you required a GUI GPU and two Compute GPUs to get the level of real-time playback you wanted, you will require 48 Lanes from the processor to run them all at x16.

The i7-5930K or i7-5960X only have 40 Lanes.
They also only support single CPU configuration.

With a single 40 Lane i7 CPU, you could run the GUI GPU at x8 and two Compute GPUs at x16 each, for 40 Lanes.
But, note, that would not leave any spare PCIe 3.0 Lanes for any M2 or PCIe SSD peripherals.
Those would require a motherboard that supported using the chipset's PCIe 2.0 Lanes to those peripherals.
Otherwise, plugging in the M2 or PCIe SSD card into a processor-fed slot would most likely drop one or more of the GPU slots to x8.
Unless it was a MUX motherboard like the ASUS WS...

Thu Jan 08, 2015 7:50 am

It's not a case of having to 'take it with a grain of salt', I've tested it and found that when capturing uncompressed it has dropped frames when the other slots are populated which is why I would never use a board that uses Plex chips.
Gaming is not the same as Video editing, especially when it comes to realtime input output so it's not a valid comparison.

Tue Feb 10, 2015 3:37 am

We haven't found any noticeable difference in Resolve running at 8x vs 16x with the GPUs when testing the GTX9804GB cards and the R290 8GB cards.
The extra lanes just don't seem to be where the bottleneck is when it comes to Resolve and GPUs in our experience.
If someone has benchmarks that show a reasonable difference in 16x vs 8x on the GPUs with resolve, I would like to see them (genuine request, not a 'challenge').

Running the GUI GPU at 4x or 8x also doesn't seem to make any useful difference in Resolve for our work.

Tue Feb 10, 2015 3:54 am

Note: Resolve does not use SLI.

For 4K video you should seriously consider GPU ram needs.. which is why we spec GPUs with lots of ram.
e.g. If you are happy with 23GB or GPU ram for HD, why wouldn't you wish for 8GB for 4K? or more?

Slot speeds and usage can make a big difference when you are close to the performance edge.
The config guide is a good place to start... otherwise.. good thread here...

Tue Feb 10, 2015 9:26 pm

Regarding using GPUs in PCIe x8 versus x16 slots, here are some numbers for interest's sake for those who like reading my essays.

Please check my math and numbers as I quickly threw this post online.

The PCIe 3.0 per Lane specification is 8 GigaTransfers per second, which equates to a 985 MB per second data transfer rate.

A PCIe 3.0 slot running with 8 Lanes (x8) would therefore be able to transfer 8 x 985 = 7880 MB per second (7.88 GB/s).
A PCIe 3.0 slot running with 16 Lanes (x16) would therefore be able to transfer 16 x 985 = 15760 MB per second (15.76 GB/s).

The NVidia GTX 980 is rated at 224 GB/s memory bandwidth.
The AMD R9 290 is rated at 320 GB/s memory bandwidth.
Both are substantially faster than PCIe 3.0 bandwidth.

A single 1920x1080 128-bit RGBA frame data buffer is ((2,073,600 * 4) * 4) = 33,177,600 bytes (~32MB).
Note that this example is assuming a 32-bit floating point value for each color component per pixel.
I am not privvy on how BMD is storing the raw uncompressed frame data in memory.

At 30 frames per second that is (33,177,600 * 30) = 995,328,000 bytes (~995MB) per second.

And since each frame is transferred twice over the PCIe bus, first transferred from main memory to the GPU, then processed by the Compute kernel, then transferred back to main memory, we multiply the data transfer by 2.
So that is (995,328,000 * 2) = 1,990,656,000 bytes (~2GB) per second.

Since 8 Lanes can transfer 7.88GB per second, and the ballpark data transfer as proposed above for HD 128-bit 30fps is ~2GB per second, it is easy to see that a PCIe 3.0 x8 slot can move that amount of data without breaking a sweat.

However, it isn't totally that simple (few things in life are).
Since x16 has twice the total bandwidth of x8, it also has twice the burst bandwidth if we measured the possible data transfer during a specific time-frame in milliseconds.
"Burst", with respect to data transfer, is a high-bandwidth transfer over a short period of time.

This is relevant because the frame data compute process occurs in typically five stages:
1. CPU preparation of frame data (pre-processing, initiating transfer setup, etc.).
2. Frame data bus transfer from main memory to GPU. <- PCIe performance related.
3. Compute kernel processing of frame data on GPU cores.
4. Frame data bus transfer from GPU to main memory. <- PCIe performance related.
5. CPU usage of frame data (post-processing, etc.).
Note that the time spent in each stage is not equal, ie. not 20% per stage.
The number of milliseconds spent in bus transfer will be reasonably constant for the same size frame, while the number of milliseconds spent in Compute will vary by the assigned effects etc.

What I am trying to show from this, is that the frame data transfer from main memory to GPU is not a steady continual process where the data is metered out consistently and evenly over the span of every second.
Instead, the data transfer occurs in typically two burst transfers, CPU to GPU and GPU to CPU.
This action is relevant, because as the burst transfer speed of the PCIe bus decreases, then when the frame compute process cycle is measured against a constant metric such as frames-per-second, any "jitter" or "lag" present will increase.
And conversely, as the bus burst speed increases, any "jitter" or "lag" decreases.

When we measure the bus transfer portion as a specific number of milliseconds in length, and we compare this to the metric of the number of milliseconds between frames for 30 frames-per-second, if every nth bus transfer extends beyond every nth per-frame-time, it will result in a frames-per-second flutter.

So, while a person may not see any really distinct and noticeable visual difference between PCIe slots running at x8 or x16, as they increase the video frame rate, which increases the number of data buffer transfers per second, and/or they increase the video frame resolution, which increases the size of the data buffer transferred, jitter will usually increase, which is seen as a fluttering in frames-per-second throughput.

Having a computer system where every GPU is in an x16 slot simply grants you the performance overhead to have a more stable and consistent frames-per-second data transfer. And the overhead to allow future increases in frame rate or frame resolution.

So regarding x8 versus x16, if the amount of time spent in bus transfers is half, then that can reduce any apparent frames-per-second jitter and fluttering, and that also leaves more time available for compute functions.

Regarding benchmarking x8 versus x16 for Compute/CUDA use, I have not found any good comparative tests online that are relevant for video frame editing or bi-directional data buffer cases.
There are numerous articles on x8-vs-x16 related to video game rendering, but this is not relevant for video editing Compute/CUDA, since video game rendering is mainly uni-directional.
Even the CUDA-Z tool AFAIK does not test bi-directional performance, and is limited to integer, floating point, and bandwidth tests only.

I could write a sample Compute application to test this, to some degree, but it would not be entirely relevant to the performance of other software such as Resolve, since I do not know how they are managing memory or the code in their compute kernels.

FYI: I also write 3D software tools for video game developers and VFX use, so I deal with CPU-GPU data transfer requirements all of the time.

Regarding any "bottleneck" with Resolve and GPUs.

When processing frames, especially if there is a lot of math processing (filters, denoise, blur, etc.), probably 75%-90% of the time is spent on the CPU and GPU, and 25%-10% spent on bus transfers.
So a faster CPU and a faster GPU or multiple GPUs with more Compute Cores should make a larger difference than whether the video card(s) are in an x8 or x16 slot.

*edit*

Here is a quick explanation using visual ASCII art.
This is for example only and any values are not meant to be taken literally.

To achieve 30 frames per second, each frame must be fully processed and sent to the final output display within approximately 33 milliseconds (1000 / 30 = 33.333...).

This example is simplistic terms of what may occur within one frame's time duration.
The values chosen are strictly for example purposes and may not reflect actual circumstances.

c = CPU pre-processing of data
b = bus transfer of frame data from main memory to gpu
g = GPU processing of data
B = bus transfer of frame data from gpu to main memory
C = CPU post-processing of data
i = idle time, occurs when the frame process time is faster than 33 milliseconds, assuming framerate syncing is true

This is an example timing of a frame process cycle when the GPU is in an x16 slot.

Code: Select all: frame |<-- 33 milliseconds per frame -->| frame ----------------------------------------------- ......|ccccccbbgggggggggggggggBBCCCCCCii|...... -----------------------------------------------

Now if we plug the GPU into an x8 slot, and assume that the bus transfer time increases by two times since it is one half the bandwidth of the x16 slot. Note the bus times b and B are twice as long here.

Code: Select all: frame |<-- 33 milliseconds per frame -->| frame ----------------------------------------------- ......|ccccccbbbbgggggggggggggggBBBBCCCCCC|.... -----------------------------------------------

We can see that the frame process cycle requires more than 33 milliseconds to complete.
If we were imposing a sync timing lock for 30 fps, the rendered framerate for this frame would drop in half from 30 to 15 frames per second.
If sync is free-running the framerate would be less than 30 (35ms or 28.57fps in the example).
If the frame process cycle time varies by a few milliseconds for each frame, which it typically will, we will see fluctuating playback framerates.

From this it is also easy to see where increasing the CPU performance or the GPU performance will decrease their time duration in the frame processing cycle, and also improve framerate.
And it is easy to see where the overall performance of the software is reliant on multiple components in the computer system.

buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Re: buying a new system for realtime 4k in davinci resolve

Who is online