Multiple GPU performance question

Get answers to your questions about color grading, editing and finishing with DaVinci Resolve.
  • Author
  • Message
Offline

Irakly Shanidze

  • Posts: 92
  • Joined: Fri May 26, 2017 8:56 pm
  • Location: USA

Multiple GPU performance question

PostWed Apr 17, 2019 3:45 am

Today I was working on a 2-minute video consisting of a 50/50 mix of ProRes HQ footage from BM Video Assist 4K footage and H.264 iPhone X clips. Resolve was configured to use all three GPUs with auto selection of the compute engine. First time it took 26:29 to render, which was truly unexpected given my hardware setup. While it was doing its thing, I entertained myself with watching Activity Monitor GPU window. It was surprising to see that both FirePro D700 units were working their asses off, while Radeon Pro Vega 64 was clearly slacking with a load of less than 50%.

Just to see what happens, I deselected both D700 and manually selected Metal. This time, the same very video took 12:41 to render, which was consistent with amount of noise reduction applied.

So, why am I telling you all this? Until tonight I was a true believer that three GPUs were better than one, and while the most powerful unit would do the heavy lifting, its weaker brothers would be just helping here and there. Apparently, I was wrong. Or, perhaps, I do not understand how to configure the system correctly...

Did anyone experience the same thing? If not, what am I doing wrong?
Leica SL/BM Video Assist 4K, Panasonic Lumix DC-S1
DaVinci Resolve Studio 18.6
Mac Studio M1 • 32GB RAM
Offline

Michael_Andreas

  • Posts: 1672
  • Joined: Sat Jan 05, 2019 9:40 pm
  • Real Name: Michael Andreas

Re: Multiple GPU performance question

PostWed Apr 17, 2019 4:15 am

I believe that DR will not use GPUs from different manufacturers simultaneously. Not familiar with those models, is that the situation here?
_________________________________________________
DR Studio 17.4.1 Win10Pro 21H1/19043.1320 - i7-6700K@4GHz, 32GB RAM
RTX 2070 8GB, "Studio" driver 472.39
OS,Library: 1TB SSD - Project: 1TB SSD - Cache: 1TB NVMe
Offline

Irakly Shanidze

  • Posts: 92
  • Joined: Fri May 26, 2017 8:56 pm
  • Location: USA

Re: Multiple GPU performance question

PostWed Apr 17, 2019 6:10 am

Michael_Andreas wrote:I believe that DR will not use GPUs from different manufacturers simultaneously. Not familiar with those models, is that the situation here?


AMD Radeon Pro Vega 64 and AMD FirePro D700 are apparently made by the same people and all supported natively by OS X.

Just to be clear, all three GPUs ARE working simultaneously, but Resolve distributes more load on slower D700, and both of them are maxed out, while Vega 64 is working at half of its capacity.
Leica SL/BM Video Assist 4K, Panasonic Lumix DC-S1
DaVinci Resolve Studio 18.6
Mac Studio M1 • 32GB RAM
Offline

Irakly Shanidze

  • Posts: 92
  • Joined: Fri May 26, 2017 8:56 pm
  • Location: USA

Re: Multiple GPU performance question

PostWed Apr 17, 2019 6:15 am

Irakly Shanidze wrote:
Michael_Andreas wrote:I believe that DR will not use GPUs from different manufacturers simultaneously. Not familiar with those models, is that the situation here?


AMD Radeon Pro Vega 64 and AMD FirePro D700 are apparently made by the same people and all supported natively by OS X.

Just to give you the scale of things, two D700 GPUs perform similar to NVidia 1070, while Vega 64 is approximately 30% better.

Just to be clear, all three GPUs ARE working simultaneously, but Resolve distributes more load on slower D700, and both of them are maxed out, while Vega 64 is working at half of its capacity.
Leica SL/BM Video Assist 4K, Panasonic Lumix DC-S1
DaVinci Resolve Studio 18.6
Mac Studio M1 • 32GB RAM
Offline
User avatar

Frank Glencairn

  • Posts: 1801
  • Joined: Wed Aug 22, 2012 7:07 am
  • Location: Germany

Re: Multiple GPU performance question

PostWed Apr 17, 2019 6:18 am

Here is some eye-opening read for you

https://www.pugetsystems.com/labs/artic ... n-Xp-1060/
http://frankglencairn.wordpress.com/

I told you so :-)
Offline

Hendrik Proosa

  • Posts: 3015
  • Joined: Wed Aug 22, 2012 6:53 am
  • Location: Estonia

Re: Multiple GPU performance question

PostWed Apr 17, 2019 6:28 am

My guess is that each gpu is served frames in sequential order (with three gpus: f1 > gpu1, f2 > gpu2, f3 > gpu3, f4 > gpu1..) and in your case one of them finishes work faster and for some reason sits idle till it gets another frame to crunch on. It should get new frame as soon as it finishes though, so one thing to try is to render into format that is not sequential in its data layout, for example some image sequence. Because then frame 4 for example could be written when frames 2 and 3 are still being processed, which can not happen with video files.
I do stuff.
Offline

Andrew Kolakowski

  • Posts: 9209
  • Joined: Tue Sep 11, 2012 10:20 am
  • Location: Poland

Re: Multiple GPU performance question

PostWed Apr 17, 2019 8:04 am

For reading nothing stops you to read video file in the same way as image sequence, specially when format is I frame based. Fact that it's in the container is not a problem- you can still access many frames up-front. Most apps cache frames into RAM- this is actually good for processing and playback etc.
Even for writing good app can cache frames into RAM and then write them at bigger than 1 chunks. Processing is the slow part, writing is relatively fast process.
Offline

Hendrik Proosa

  • Posts: 3015
  • Joined: Wed Aug 22, 2012 6:53 am
  • Location: Estonia

Re: Multiple GPU performance question

PostWed Apr 17, 2019 8:21 am

Andrew Kolakowski wrote:Even for writing good app can cache frames into RAM and then write them at bigger than 1 chunks. Processing is the slow part, writing is relatively fast process.

You can, but does it? Reading is sequential anyway, ripping random frames out of order for rendering is unnecessary and gives no benefit. Distributing them and processing what comes back is what makes the difference.
I do stuff.
Offline

Sam Steti

  • Posts: 2470
  • Joined: Tue Jun 17, 2014 7:29 am
  • Location: France

Re: Multiple GPU performance question

PostWed Apr 17, 2019 8:57 am

Irakly, basically if you want to wipe out a few questions in your mind, first try to put GPUs with exactly the same amount of VRAM...
Then, remember that dual GPUs of MP 6.1 are "seen" as one (this is btw the only computer that can have its 2 GPUs working in Resolve free version for this reason), so try OpenCL and Metal with different combinations.
Finally, also try to dedicate a GPU to GUI only, you may be surprised too...

For the record, BMD has stopped reckoning multiple GPUs these last months, but I know it works well with 2 if you have the same amount (no connection with the brand as I read). For 3, I don't know except for external disclosures...
*MacMini M1 16 Go - Ext nvme SSDs on TB3 - 14 To HD in 2 x 4 disks USB3 towers
*Legacy MacPro 8core Xeons, 32 Go ram, 2 x gtx 980 ti, 3SSDs including RAID
*Resolve Studio everywhere, Fusion Studio too
*https://www.buymeacoffee.com/videorhin
Offline

MishaEngel

  • Posts: 1432
  • Joined: Wed Aug 29, 2018 12:18 am
  • Real Name: Misha Engel

Re: Multiple GPU performance question

PostWed Apr 17, 2019 1:55 pm

Irakly Shanidze wrote:Today I was working on a 2-minute video consisting of a 50/50 mix of ProRes HQ footage from BM Video Assist 4K footage and H.264 iPhone X clips. Resolve was configured to use all three GPUs with auto selection of the compute engine. First time it took 26:29 to render, which was truly unexpected given my hardware setup. While it was doing its thing, I entertained myself with watching Activity Monitor GPU window. It was surprising to see that both FirePro D700 units were working their asses off, while Radeon Pro Vega 64 was clearly slacking with a load of less than 50%.

Just to see what happens, I deselected both D700 and manually selected Metal. This time, the same very video took 12:41 to render, which was consistent with amount of noise reduction applied.

So, why am I telling you all this? Until tonight I was a true believer that three GPUs were better than one, and while the most powerful unit would do the heavy lifting, its weaker brothers would be just helping here and there. Apparently, I was wrong. Or, perhaps, I do not understand how to configure the system correctly...

Did anyone experience the same thing? If not, what am I doing wrong?


Don't use the D700 GPU's turn them off in Resolve, only use the VEGA 64, it's super fast.
Offline

Irakly Shanidze

  • Posts: 92
  • Joined: Fri May 26, 2017 8:56 pm
  • Location: USA

Re: Multiple GPU performance question

PostWed Apr 17, 2019 4:36 pm

MishaEngel wrote:
Irakly Shanidze wrote:Don't use the D700 GPU's turn them off in Resolve, only use the VEGA 64, it's super fast.


I already did. The test was more than convincing. My guess is, GPUs are not used in parallel, and D700 combo becomes a bottleneck.
Leica SL/BM Video Assist 4K, Panasonic Lumix DC-S1
DaVinci Resolve Studio 18.6
Mac Studio M1 • 32GB RAM
Offline

Irakly Shanidze

  • Posts: 92
  • Joined: Fri May 26, 2017 8:56 pm
  • Location: USA

Re: Multiple GPU performance question

PostWed Apr 17, 2019 4:44 pm

Sam Steti wrote:Irakly, basically if you want to wipe out a few questions in your mind, first try to put GPUs with exactly the same amount of VRAM...
Then, remember that dual GPUs of MP 6.1 are "seen" as one (this is btw the only computer that can have its 2 GPUs working in Resolve free version for this reason), so try OpenCL and Metal with different combinations.
Finally, also try to dedicate a GPU to GUI only, you may be surprised too...

For the record, BMD has stopped reckoning multiple GPUs these last months, but I know it works well with 2 if you have the same amount (no connection with the brand as I read). For 3, I don't know except for external disclosures...


Sam, I am not sure if this is entirely true. I tested different combinations, and it is quite possible to have only one D700 do the work, at least according to the Activity Monitor, and the impact on performance is quite noticeable.

As testing Resolve performance with different GPU setups is not my full-time job, I'm not going to downgrade to High Sierra and replace Vega 64 with Nvidia 980Ti just out of the morbid curiosity. My goal is to find the most optimal configuration with what I have now. At this point it is clear that excluding onboard GPUs results in 100% performance gain in rendering. It would have been nice to squeeze just a little bit more blood from this stone, but if it is not doable, so be it.
Leica SL/BM Video Assist 4K, Panasonic Lumix DC-S1
DaVinci Resolve Studio 18.6
Mac Studio M1 • 32GB RAM
Offline

Irakly Shanidze

  • Posts: 92
  • Joined: Fri May 26, 2017 8:56 pm
  • Location: USA

Re: Multiple GPU performance question

PostWed Apr 17, 2019 4:50 pm

Frank Glencairn wrote:Here is some eye-opening read for you

https://www.pugetsystems.com/labs/artic ... n-Xp-1060/


Thank you Frank, very interesting read :)
I'm aware that for working with compressed formats like H.264 a faster CPU makes more difference than GPU power. In case of identical GPUs, most likely the performance drop that they had with four Titan-X was due to more CPU power wasted on coordinating data streams for four graphics cards than gained from adding the fourth unit. It gave me an idea that probably the same thing was happening in my case.
Leica SL/BM Video Assist 4K, Panasonic Lumix DC-S1
DaVinci Resolve Studio 18.6
Mac Studio M1 • 32GB RAM
Offline

Sam Steti

  • Posts: 2470
  • Joined: Tue Jun 17, 2014 7:29 am
  • Location: France

Re: Multiple GPU performance question

PostWed Apr 17, 2019 6:24 pm

Irakly Shanidze wrote:
Sam Steti wrote:Irakly, basically if you want to wipe out a few questions in your mind, first try to put GPUs with exactly the same amount of VRAM...
Then, remember that dual GPUs of MP 6.1 are "seen" as one (this is btw the only computer that can have its 2 GPUs working in Resolve free version for this reason), so try OpenCL and Metal with different combinations.
Finally, also try to dedicate a GPU to GUI only, you may be surprised too...

For the record, BMD has stopped reckoning multiple GPUs these last months, but I know it works well with 2 if you have the same amount (no connection with the brand as I read). For 3, I don't know except for external disclosures...


Sam, I am not sure if this is entirely true. I tested different combinations, and it is quite possible to have only one D700 do the work, at least according to the Activity Monitor, and the impact on performance is quite noticeable.
I know, I know... That's not what's at stake

As testing Resolve performance with different GPU setups is not my full-time job, I'm not going to downgrade to High Sierra and replace Vega 64 with Nvidia 980Ti just out of the morbid curiosity.
Did someone suggest you anything like that ? You reading my specs are a recommendation ?
Hey man, didn't want to bother you, you will find your answers anyway, even here in Resolve's forum; therefore no problem, your investigation will surely end positively. I'm confident and wish you the best.
*MacMini M1 16 Go - Ext nvme SSDs on TB3 - 14 To HD in 2 x 4 disks USB3 towers
*Legacy MacPro 8core Xeons, 32 Go ram, 2 x gtx 980 ti, 3SSDs including RAID
*Resolve Studio everywhere, Fusion Studio too
*https://www.buymeacoffee.com/videorhin
Offline
User avatar

Jed Mitchell

  • Posts: 165
  • Joined: Tue Nov 03, 2015 11:04 pm
  • Location: New York, NY

Re: Multiple GPU performance question

PostWed Apr 17, 2019 8:31 pm

I've been doing a lot of testing of different GPU combinations lately to find out where the sweet spot is for my work, and the best setup advice I can offer is... it depends.

The cards I've tested recently, in different combinations both inside the case & externally on Macs:

+ D700s
+ Vega 64 FE
+ Vega 64 stock
+ GTX 980
+ GTX 1080 Ti
+ RTX 2080 Ti
+ RTX Titan


This is all with a focus on online editing, beauty and other finishing work so my testing suite is generally pretty demanding and doesn't use legacy benchmarks like the popular "standard candle" from LGG or the simple tests Puget runs. I love those guys but I think those tests are pretty misleading for all but the lightest workloads.

I also break up my tests into playback speed, processing speed and VRAM bottlenecks (and then combine them). The same hardware won't solve problems to the same degree in all 3 categories.

Here are a couple general things I've noticed:

+ Image processing doesn't seem to be all linear *or* all parallel -- it's a mix of operations depending on the codecs and requested operations. You shouldn't expect performance of one operation to correlate directly to another, generally you've got to test specifically what you want to do.

+ Playback performance comes down to the codec more than the card, so in the case of DNx the reads are entirely sequential and the codec doesn't create a bottleneck so performance scales linearly. In the case of R3Ds, the compression ratio and debayer settings drastically change the performance of other operations, I think partially because R3Ds are temporally encoded and so there is some latency between processing the current "frame" and requesting the next frame for decompression.

+ For R3D playback, more GPUs don't scale up as quickly as faster single GPUs, and all of them hit a decompression bottleneck on the CPU before I can find a meaningful debayer performance ceiling in a UHD timeline (the largest playback resolution I care about).

+ Heterogeneous multi-GPU setups will synchronize to the lowest amount of VRAM, at what appears to be the slowest VRAM speed. Thus, VRAM limited tasks like temporal noise reduction at high resolutions will bottleneck at the lowest common denominator.

+ The above being said, there doesn't seem to be a performance "falloff" related to VRAM quantity -- there's just a hard cliff where you run into the "GPU memory full" errors. Up to that cliff it's all a level playing field. The 11GB in the X080 Ti cards is just baaarely enough for retouching in UHD. Get more if you can afford it.

+ VRAM speed *does* seem to make a difference for image processing tasks.

+ Higher clocked cores & VRAM seem to scale faster for image processing than total number of cores.

+ More powerful single GPUs scale less quickly than multiple homogeneous GPUs of lower individual power for image processing: 2x 1080 Ti's are much more powerful than 1x RTX Titan.

+ Heterogeneous multi-GPU setups do not scale as well as homogeneous multi-GPU setups for image processing: 2x 1080 Tis are faster in most cases than a 1080 Ti + RTX Titan.

+ Using display GPU for compute doesn't have any impact on any of these cards.

+ Combining internal GPUs with eGPUs... works, but is moderately less stable on the iMac Pros we've tried to use them on for extended periods. These are in Sonnet cases, not BMD eGPUs, so take that for whatever it is.


Your mileage may vary but I'd recommend sticking to what BMD has been saying for a while: either get 2-3 perfectly matched GPUs or a single powerful GPU. Don't mix & match and don't bother running a separate card for the GUI.
"It's amazing what you can do when you don't know you can't do it."


Systems:
R16.2.3 | Win10 | i9 7940X | 128GB RAM | 1x RTX Titan | 960Pro cache disk
R16.2.3 | Win10 | i9 7940X | 128GB RAM | 1x 2080 Ti | 660p cache disk
Offline

Irakly Shanidze

  • Posts: 92
  • Joined: Fri May 26, 2017 8:56 pm
  • Location: USA

Re: Multiple GPU performance question

PostWed Apr 17, 2019 9:27 pm

Thank you Jed! This is excellent. That pretty much sums everything up in a clear and persuasive fashion.

I have only one relevant question left: do I understand correctly that 6-core 3.5GHz CPU does a better job in Resolve than the 10-core 3.0GHz? I am just wondering should I upgrade, or spend $250 on something more useful like a new tennis racket?

Just out of curiosity, how did Vega 64 do against the 2080 Ti?

The only advice of yours that I won't follow is not to run a monitor on a separate GPU. My Trashcan is sitting right next to the main display, so plugging it into an onboard Thunderbolt port is more like a matter of convenience. It seems like the decision to which GPU the monitor would be connected to in inconsequential for the system performance.

Jed Mitchell wrote:I've been doing a lot of testing of different GPU combinations lately to find out where the sweet spot is for my work, and the best setup advice I can offer is... it depends.

The cards I've tested recently, in different combinations both inside the case & externally on Macs:

+ D700s
+ Vega 64 FE
+ Vega 64 stock
+ GTX 980
+ GTX 1080 Ti
+ RTX 2080 Ti
+ RTX Titan


This is all with a focus on online editing, beauty and other finishing work so my testing suite is generally pretty demanding and doesn't use legacy benchmarks like the popular "standard candle" from LGG or the simple tests Puget runs. I love those guys but I think those tests are pretty misleading for all but the lightest workloads.

I also break up my tests into playback speed, processing speed and VRAM bottlenecks (and then combine them). The same hardware won't solve problems to the same degree in all 3 categories.

Here are a couple general things I've noticed:

+ Image processing doesn't seem to be all linear *or* all parallel -- it's a mix of operations depending on the codecs and requested operations. You shouldn't expect performance of one operation to correlate directly to another, generally you've got to test specifically what you want to do.

+ Playback performance comes down to the codec more than the card, so in the case of DNx the reads are entirely sequential and the codec doesn't create a bottleneck so performance scales linearly. In the case of R3Ds, the compression ratio and debayer settings drastically change the performance of other operations, I think partially because R3Ds are temporally encoded and so there is some latency between processing the current "frame" and requesting the next frame for decompression.

+ For R3D playback, more GPUs don't scale up as quickly as faster single GPUs, and all of them hit a decompression bottleneck on the CPU before I can find a meaningful debayer performance ceiling in a UHD timeline (the largest playback resolution I care about).

+ Heterogeneous multi-GPU setups will synchronize to the lowest amount of VRAM, at what appears to be the slowest VRAM speed. Thus, VRAM limited tasks like temporal noise reduction at high resolutions will bottleneck at the lowest common denominator.

+ The above being said, there doesn't seem to be a performance "falloff" related to VRAM quantity -- there's just a hard cliff where you run into the "GPU memory full" errors. Up to that cliff it's all a level playing field. The 11GB in the X080 Ti cards is just baaarely enough for retouching in UHD. Get more if you can afford it.

+ VRAM speed *does* seem to make a difference for image processing tasks.

+ Higher clocked cores & VRAM seem to scale faster for image processing than total number of cores.

+ More powerful single GPUs scale less quickly than multiple homogeneous GPUs of lower individual power for image processing: 2x 1080 Ti's are much more powerful than 1x RTX Titan.

+ Heterogeneous multi-GPU setups do not scale as well as homogeneous multi-GPU setups for image processing: 2x 1080 Tis are faster in most cases than a 1080 Ti + RTX Titan.

+ Using display GPU for compute doesn't have any impact on any of these cards.

+ Combining internal GPUs with eGPUs... works, but is moderately less stable on the iMac Pros we've tried to use them on for extended periods. These are in Sonnet cases, not BMD eGPUs, so take that for whatever it is.


Your mileage may vary but I'd recommend sticking to what BMD has been saying for a while: either get 2-3 perfectly matched GPUs or a single powerful GPU. Don't mix & match and don't bother running a separate card for the GUI.
Leica SL/BM Video Assist 4K, Panasonic Lumix DC-S1
DaVinci Resolve Studio 18.6
Mac Studio M1 • 32GB RAM
Offline
User avatar

Jed Mitchell

  • Posts: 165
  • Joined: Tue Nov 03, 2015 11:04 pm
  • Location: New York, NY

Re: Multiple GPU performance question

PostWed Apr 17, 2019 10:08 pm

Irakly Shanidze wrote:I have only one relevant question left: do I understand correctly that 6-core 3.5GHz CPU does a better job in Resolve than the 10-core 3.0GHz?


Unfortunately the answer still seems to be... it depends!

I had an old 4 core i7 (I mean like ~2011 era) in a machine until last year paired with 2x 980s and it was absolutely fine compared to maxed-out Trashcans... except where R3Ds came into play. With R3Ds the bottleneck is as much at the CPU as it is the GPU -- this is probably true of some other codecs like Cineform & H264 / H265 but I never work with those so I can't really say.

There is a wiiide bell curve in Resolve that I think tops out somewhere between 12-18 physical cores, but in general for compressed RAW decoding the extra cores are worth the money. For uncompressed or I-frame-only codecs it's not as relevant.

If you do other tasks on the same system like 3D or compositing the whole equation changes, of course.

Irakly Shanidze wrote:Just out of curiosity, how did Vega 64 do against the 2080 Ti?


Stock Vega 64 traded blows with a single 1080 Ti, coming out maybe 5% behind. The Vega 64 FE was closer: behind in some tests, ahead in others. The single 2080 Ti has been ~20% faster than the single 1080 Ti, actually closer in performance to 2x 1080 Ti and dead even with a single RTX Titan.

It's pricey but I have to say it's worth it (I don't have a Radeon VII to test though, which I suspect is an even better deal for only a small performance hit).

Irakly Shanidze wrote:The only advice of yours that I won't follow is not to run a monitor on a separate GPU.


Totally -- wasn't saying *not* to do it, there's just nothing to gain from that setting (that I've been able to measure).
"It's amazing what you can do when you don't know you can't do it."


Systems:
R16.2.3 | Win10 | i9 7940X | 128GB RAM | 1x RTX Titan | 960Pro cache disk
R16.2.3 | Win10 | i9 7940X | 128GB RAM | 1x 2080 Ti | 660p cache disk
Offline

MishaEngel

  • Posts: 1432
  • Joined: Wed Aug 29, 2018 12:18 am
  • Real Name: Misha Engel

Re: Multiple GPU performance question

PostWed Apr 17, 2019 10:13 pm

Irakly Shanidze wrote:Just out of curiosity, how did Vega 64 do against the 2080 Ti?


Vega64 is supported by updated MacOS-versions, NVidia cards are not.

https://forum.blackmagicdesign.com/viewtopic.php?f=21&t=88238


Under windows 10 you can see the performance here:

https://www.pugetsystems.com/labs/articles/DaVinci-Resolve-15-AMD-Radeon-VII-16GB-Performance-1382/

RX Vega 64 is doing pretty good for the price point.
Offline

Irakly Shanidze

  • Posts: 92
  • Joined: Fri May 26, 2017 8:56 pm
  • Location: USA

Re: Multiple GPU performance question

PostThu Apr 18, 2019 3:54 am

The whole reason why I started using the Trashcan for video editing in the first place was that Apple made all of us face the hard choice between Nvidia and Mojave (and most likely anything thereafter). I was perfectly happy with a twin 980 Ti setup and an idea of using eGPU did not seem at all appealing, mostly because I thought that TB2 wouldn't be able to handle the 4K exchange with Vega 64. Now, however, I can see that it ended up being a much better idea.

MishaEngel wrote:
Irakly Shanidze wrote:Just out of curiosity, how did Vega 64 do against the 2080 Ti?


Vega64 is supported by updated MacOS-versions, NVidia cards are not.

https://forum.blackmagicdesign.com/viewtopic.php?f=21&t=88238


Under windows 10 you can see the performance here:

https://www.pugetsystems.com/labs/articles/DaVinci-Resolve-15-AMD-Radeon-VII-16GB-Performance-1382/

RX Vega 64 is doing pretty good for the price point.
Leica SL/BM Video Assist 4K, Panasonic Lumix DC-S1
DaVinci Resolve Studio 18.6
Mac Studio M1 • 32GB RAM
Offline

MishaEngel

  • Posts: 1432
  • Joined: Wed Aug 29, 2018 12:18 am
  • Real Name: Misha Engel

Re: Multiple GPU performance question

PostThu Apr 18, 2019 2:38 pm

The trash can has 2x D700 GPU's, those are made by AMD. They have 6 GB VRAM each with a memory bandwidth of 263 GB/s and a peak fp32 performance of 3.482 Tflops.

The RX VEGA 64 has 8 GB VRAM with a memory bandwidth of 484 GB/s and a peak fp32 performance of 12.665 Tflops.

The advantage of using the VEGA over the D700's is that you don't overheat your trashcan and it's a hell of a lot faster. Your Thunderbolt connection will be your bottle neck.
Offline

Irakly Shanidze

  • Posts: 92
  • Joined: Fri May 26, 2017 8:56 pm
  • Location: USA

Re: Multiple GPU performance question

PostThu Apr 18, 2019 4:58 pm

MishaEngel wrote:The trash can has 2x D700 GPU's, those are made by AMD. They have 6 GB VRAM each with a memory bandwidth of 263 GB/s and a peak fp32 performance of 3.482 Tflops.

The RX VEGA 64 has 8 GB VRAM with a memory bandwidth of 484 GB/s and a peak fp32 performance of 12.665 Tflops.

The advantage of using the VEGA over the D700's is that you don't overheat your trashcan and it's a hell of a lot faster. Your Thunderbolt connection will be your bottle neck.


I'm going to test again TB2 against TB3 using MacBook Pro 2017. First time I tried it without disabling the onboard GPU for compute, and the result was about 30% worse Resolve performance of MacBook Pro. Then I ran Geekbench 4 on both machines, and Trashcan result was 165K vs 140K of MacBook Pro. This result was puzzling, as possible thermothrottling of MBP should not have been an issue, since in both cases GPU was external.
Leica SL/BM Video Assist 4K, Panasonic Lumix DC-S1
DaVinci Resolve Studio 18.6
Mac Studio M1 • 32GB RAM
Offline

Andrew Kolakowski

  • Posts: 9209
  • Joined: Tue Sep 11, 2012 10:20 am
  • Location: Poland

Re: Multiple GPU performance question

PostThu Apr 18, 2019 7:51 pm

It all depends what you do. If you push data to GPU and let it process it (like many testing tools do to check pure GPU performance) then TB3 or even TB2 won't matter much (if you don't measure time for pushing data to GPU). If you have to constantly move data from GPU<->CPU then you hit bottleneck (for example read source video file, decode it on CPU, push to GPU for processing, get it back for export).
Offline

Dermot Shane

  • Posts: 2720
  • Joined: Tue Nov 11, 2014 6:48 pm
  • Location: Vancouver, Canada

Re: Multiple GPU performance question

PostThu Apr 18, 2019 8:24 pm

or in the worst case, moveing 11 frames of video for each frame processed (NR at 5fr setting) and the source video is 8k.... and the timeline raster is UHD

i'd not want to try that with Tb2 + a client needing a master ASAP... but it would be a interesting real world test
Offline

Irakly Shanidze

  • Posts: 92
  • Joined: Fri May 26, 2017 8:56 pm
  • Location: USA

Re: Multiple GPU performance question

PostThu Apr 18, 2019 10:03 pm

Dermot Shane wrote:or in the worst case, moveing 11 frames of video for each frame processed (NR at 5fr setting) and the source video is 8k.... and the timeline raster is UHD

i'd not want to try that with Tb2 + a client needing a master ASAP... but it would be a interesting real world test


4K master is not a problem. Resolve is stable, not a single sign of going over limits. If you want me to test it with 8K footage, just maybe send me a 2-3 minute clip, and let's see what happens.
Leica SL/BM Video Assist 4K, Panasonic Lumix DC-S1
DaVinci Resolve Studio 18.6
Mac Studio M1 • 32GB RAM
Offline

plettplett

  • Posts: 11
  • Joined: Fri Feb 28, 2020 1:22 am
  • Real Name: chris spence

Re: Multiple GPU performance question

PostSat Oct 03, 2020 7:13 pm

Would anyone know if 3 nVidia GPU devices in a system (in non-SLI config) could be accessed by an OFX plugin (Linux or Windows). If yes, would the Resolve version need to be "Studio"?

I'm trying to find a way to test-run my CUDA OFX plugin concurrently across 3 GPUs (I only have access to 1 GPU).

Thanks,
Chris

Return to DaVinci Resolve

Who is online

Users browsing this forum: AndreN, Baidu [Spider], Bing [Bot], Google [Bot], michael_72, mywald, Singularity and 147 guests