Blackmagic Forum

Mon Sep 25, 2017 8:18 pm

Jean Claude wrote:Now, you say that CUDA does not know how to do things: but if you pay: CUDA or other things do a lot of things.

That's not what I said, I said it's not optimized for that. OpenCL and CUDA can do a lot of things, but they do not excel at tasks that aren't parallelizable.

The reason why the GPU encoding and decoding is so powerful is because as Andrew said it is completely separate from the main GPU. if you install something like hwinfo on your machine you will see that the video engine has it's on % load and MHz statistics.

Jean Claude wrote:Are you registered with NVIDIA developers?

I am, but anyone can view the link I posted.

Jean Claude wrote:But whether BMD or other: do you believe that an editor will deliver you its manufacturing secrets? Seriously.

You would be shocked what a developer can figure out with the appropriate tools and experience. On more than one occasion I've had to try and gleam how some black box application works. Giving them a bunch of white papers just makes it easier. Often times the goal is not to learn exactly how something works, it's to get a rough idea of what avenues of development they should pursue. Buying your competitions software and then running all kinds of tests on it, is a lot more common in the industry than people think.

Tue Sep 26, 2017 2:41 am

So, in the end it's a few simple points we can make for editing practices:

1. If you have the Studio version (on Windows) or you are on a Mac, you can edit formats natively that are supported by the specialized hardware in your machine.

2. If not, transcode to an intermediate codec that is not too demanding on the CPU, like ProRes or DNxHD/HR.

3. If you need formats on output which are not supported by specific hardware, write a master in one of the formats under 2.

4. If you need highly compressed formats for delivery, use a specialized encoder that maxes out your CPUs.

For me it's case closed.

Tue Sep 26, 2017 9:19 am

Uli Plank wrote:....For me it's case closed.

+1

Tue Sep 26, 2017 4:59 pm

Andrew Kolakowski wrote:
Jean Claude wrote:I understand that some users complain that there are differences in results depending on whether it is with a studio version or not.

Now, you say that CUDA does not know how to do things: but if you pay: CUDA or other things do a lot of things.
I do not have a very powerful machine: already one year .. it goes very fast but I do not complain. So ?

(and concerning the programming in GPU: It is quite astonishing what one happens to make ..). It is not a public poster that limits the perimeter. Are you registered with NVIDIA developers?

But whether BMD or other: do you believe that an editor will deliver you its manufacturing secrets? Seriously.

Nothing to do with secrets.
It's all about nature of processes. GPU (over CUDA or OpenCL) loves process which can be heavily parallelised due to its 1000s processing units. If you have process which can be split into many subprocesses then GPU is your friend. Problem is that h264/h265 decoding can't be heavily parallelised, so writing decoder on CUDA or OpenCL is difficult and at the end it won't be very fast anyway. This is the reason why GPUs have today special units just for this, which are not part of main processing engine.

This is also reason why things like blur effect are not so fast on GPU. They also can't be heavily parallelised, where things like brightness adjustment can be split into even e.g. pixel/per GPU thread (as they are totally independent). You can't for example decode h264 stream per pixel due to its complexity and architecture.

There are codecs which are made for GPUs from the ground:
https://www.daniel2.com
but these are rather intermediate codecs with low complexity. Problem is that this will eat main GPU processing power, so Resolve will have less to use.

Andrew,

Small details (you bring me confusion).

- for the H264 or in a general way with the formats long GOP, it is itself CPU (very often), either GPU if the GPU knows how to treat it and if the methods or functions harware are called: it binds only me but nothing prevents the use of a GPU for specific calculations of the moment that at the end an entire frame is well reconstituted for the further processing, including during the decoding of a long GOP.
(Not necessarily the most optimized 100% but as long as it is already paralleled: it is time gained)

- For GPU processing: AKA OFX, to my knowledge but I know only the public part:
At the input of an OFX a complete frame is always in RGBA 32 BITS.
Each pixel (RGBA), is seen in FLOAT4 (4 floats) from 0.0 to 1.00 addressable.
Each frame is cut in a matrix equal to the dimensions X, Y => 1 pixel
Each pixel can be addressed in X, Y.
As we paralellise in X, Y, we must consolidate at the end to return an entire frame in the right order to the pipeline process.

If it is necessary to make a calculation of blur (ex), it belongs to the programmer to post a grouping of pixels between them. There are parallelization functions but more pixel level of a higher level to make for example averages or others, which then make use of calculations of level pixels (low level). These are only logical fragments according to the needs of the programmers.

Regarding the GPU and the H264, no affirmation to be made without knowing. Concerning the 422 in decoding (it is rather there that I intervened): it would not surprise me.

Now think what you want, it is only a personal feedback.

Tue Sep 26, 2017 5:07 pm

You can write h264 purely in eg. OpenCL (https://github.com/SoylentGraham/Soy264), but it will be slow and cost a lot of time to develop. Hybrid solutions are also a problem as (even if it sounds strange) talking between CPU and GPU is not as fast as programmers would like.
Whole story is simple- look at the market. There is a reason why you have no OpenCL h264 decoders or even hybrid solutions. They are simply not very efficient, so there is no point of developing them.

GPUs may have 4:2:2 support at some point, but this means more work for Nvidia/AMD and needs more powerful chip, so also GPU cost will go up. Atm. these decoders are for home users, so h264/h265 streams can be watched even on low spec laptops.

Tue Sep 26, 2017 5:31 pm

Andrew

I stop afterwards: but when you program an OFX for Davinci Resolve, it happens in 3 (primary) levels (I remind you that there are preferences in Davinci Resolve):
- MultiTread only CPU,
- If possible and enable: CUDA
- if possible and enable: OPENCL

But you have to write the codes sources for the 3 solutions.
(not simple to do because the platforms are not identical to see in resources identification and programming)

I stop here.

Tue Sep 26, 2017 6:06 pm

What this has to do with h264 decoder?

It's type of the process which is GPU friendly or not. Most video effects are well suited for heavily parallelisation, so writing them for GPU makes sense and it's not that difficult. If you have to in the same time have fallback method for CPU then you need to provide this also. It's still 10x easier than writing h264 decoder in OpenCL which at the end would not be very fast anyway.

I'm not sure what you are trying to prove/say?

Tue Sep 26, 2017 7:05 pm

for the love of god, an effect is in a different timezone compared to encoding and decoding when it comes to the complexity involved, and how well it can be parallelized.

Wed Oct 04, 2017 7:06 pm

Rohit Gupta wrote:Hardware decode acceleration for h264 8-bit is supported on the following platform:

1. macOS on last 3-4 years Macbook, Macbook Pro and iMac (both free and Studio version)
2. Windows with capable NVIDIA GPU with 3GB RAM or more (Studio version only)

Also, h264 decode performance on Windows is much better with the Studio version, for example, you’ll see better random access and editing performance.

Also, 10-bit h264 and h265 etc is only supported for decode on the Studio version.

Hello Rohit,

Thank you for your answer. This line still puzzles me.

Support for hardware accelerated H.264 decoding when using NVIDIA GPUs on Linux and Windows

Based on your answer,

h264 8-bit

OSX (free & Studio)
Win (Studio, NVIDIA GPU, min 3GB RAM)
Linux (???)

h264 10-bit, h265, etc.

OSX (Studio)
Win (Studio)
Linux (Studio ???)

Is there any Linux support and if so, what are the requirements?

Thank you.

Thu Oct 05, 2017 8:28 am

Uli Plank wrote:I'm not sure if they don't resort to Intel's QuickSync on the Mac.

+1 Uli + Windows 10

Confusion about H.264 – BM, please clarify

Re: Confusion about H.264 – BM, please clarify

Re: Confusion about H.264 – BM, please clarify

Re: Confusion about H.264 – BM, please clarify

Re: Confusion about H.264 – BM, please clarify

Re: Confusion about H.264 – BM, please clarify

Re: Confusion about H.264 – BM, please clarify

Re: Confusion about H.264 – BM, please clarify

Re: Confusion about H.264 – BM, please clarify

Re: Confusion about H.264 – BM, please clarify

Re: Confusion about H.264 – BM, please clarify

Who is online