Andrew Kolakowski wrote:Jean Claude wrote:I understand that some users complain that there are differences in results depending on whether it is with a studio version or not.
Now, you say that CUDA does not know how to do things: but if you pay: CUDA or other things do a lot of things.
I do not have a very powerful machine: already one year .. it goes very fast but I do not complain. So ?
(and concerning the programming in GPU: It is quite astonishing what one happens to make ..). It is not a public poster that limits the perimeter. Are you registered with NVIDIA developers?
But whether BMD or other: do you believe that an editor will deliver you its manufacturing secrets? Seriously.
Nothing to do with secrets.
It's all about nature of processes. GPU (over CUDA or OpenCL) loves process which can be heavily parallelised due to its 1000s processing units. If you have process which can be split into many subprocesses then GPU is your friend. Problem is that h264/h265 decoding can't be heavily parallelised, so writing decoder on CUDA or OpenCL is difficult and at the end it won't be very fast anyway. This is the reason why GPUs have today special units just for this, which are not part of main processing engine.
This is also reason why things like blur effect are not so fast on GPU. They also can't be heavily parallelised, where things like brightness adjustment can be split into even e.g. pixel/per GPU thread (as they are totally independent). You can't for example decode h264 stream per pixel due to its complexity and architecture.
There are codecs which are made for GPUs from the ground:
https://www.daniel2.combut these are rather intermediate codecs with low complexity. Problem is that this will eat main GPU processing power, so Resolve will have less to use.
Andrew,
Small details (you bring me confusion).
- for the H264 or in a general way with the formats long GOP, it is itself CPU (very often), either GPU if the GPU knows how to treat it and if the methods or functions harware are called: it binds only me but nothing prevents the use of a GPU for specific calculations of the moment that at the end an entire frame is well reconstituted for the further processing, including during the decoding of a long GOP.
(Not necessarily the most optimized 100% but as long as it is already paralleled: it is time gained)
- For GPU processing: AKA OFX, to my knowledge but I know only the public part:
At the input of an OFX a complete frame is always in RGBA 32 BITS.
Each pixel (RGBA), is seen in FLOAT4 (4 floats) from 0.0 to 1.00 addressable.
Each frame is cut in a matrix equal to the dimensions X, Y => 1 pixel
Each pixel can be addressed in X, Y.
As we paralellise in X, Y, we must consolidate at the end to return an entire frame in the right order to the pipeline process.
If it is necessary to make a calculation of blur (ex), it belongs to the programmer to post a grouping of pixels between them. There are parallelization functions but more pixel level of a higher level to make for example averages or others, which then make use of calculations of level pixels (low level). These are only logical fragments according to the needs of the programmers.
Regarding the GPU and the H264, no affirmation to be made without knowing. Concerning the 422 in decoding (it is rather there that I intervened): it would not surprise me.
Now think what you want, it is only a personal feedback.
![:-/ :?](./images/smilies/icon_e_confused.gif)