VMFXBV wrote:...I never understood the obsession with ProRes Apple people have.(since the m1). I could edit heavy prores files on an FX8350 (still do when all our other machines are busy rendering other stuff). And that CPU is extremely old by today's standards. Its an extremely light codec for CPUs ...don't really need an asic for it. Same for dnxhr.
150fps vs 900 fps for final rendering? Who really cares...
I formerly thought the same thing about the "Afterburner" card in the 2019 Mac Pro. That accelerated decoding ProRes and ProRes RAW. I thought ProRes IS optimized media, why have a special card (or a year later the M1) to accelerate that? Possible answers:
- Even though 4k ProRes 422, etc. is relatively lightweight, decoding ProRes RAW has more compute burden, which esp. manifests when handling 8k multicam. The Afterburner card and subsequent Apple Silicon ProRes accelerators were partially for ProRes RAW, not just ProRes.
- Even for regular ProRes, it's not always about the final export. For large doc or scripted productions, you are dealing with multiple multicam teams, offloading that and transcoding to some mezzanine codec. An M1 Ultra Mac Studio using Apple Compressor can transcode 4k ProRes to 1080p ProRes Proxy at about 800 frames/sec. However that task is parallelizable, so you could split the files across several slower Windows or Apple machines and achieve the same result.
Re nVidia 4000's dual accelerators doubling the rate of ProRes to H265 exports, M1 Max has two H264/H265 encoders and the M1 Ultra has four, yet no NLE I've tested can use those in parallel on a single stream. IOW the current single-stream scalability is zero. That raises the question of how did nVidia get the results shown in the original post, whereby the RTX 4090 encoded to H265 at 2x the rate of the RTX 3080 Ti.
If the H265 Long GOP format they used involved "open" aka independent GOPs, in theory they could split the input file, dispatch it to different parallel threads or processes, then concatenate the result. However if the H264 source media used "closed" aka dependent GOPs, I don't think that's possible because one GOP needs info from other GOPs.
Or they could just use multiple H265 input files and run those in parallel on different processes. Given two H265 encoders that would be about 2x faster.
Apple's Compressor already does that if you enable the advanced preference "Enable additional Compressor Instances". On M1 Ultra it can transcode 4k ProRes 422 source to 720p HEVC at about 1,000 frames/sec, aggregated across four input files and four output files. The scalability is not perfect but useful: 1 instance yields 418 frames/sec, 2 instances 790 frames/sec and 4 instances 960 frames/sec. It can do that for 8-bit 4:2:0 HEVC or 10-bit 4:2:2 HEVC at the same rate.
That is only possible because the M1 Ultra has four separate H264/HEVC accelerators, in addition to the four ProRes accelerators.
It would be nice to have effective decode & encode acceleration of single stream 10-bit 4:2:2 H264/H265. Even if the software could harness that, it would be limited to the M1 Max (which has one decoder but two encoders) and M1 Ultra (which has two decoders and four encoders), plus other non-Apple CPU/GPUs that may add multiple accelerators to their new designs.
Currently on the M1 Ultra the multiple accelerators help for H264/265 multicam (at least on FCP, I tested it on Resolve Studio and I think it also helped). But they don't help for smoother editing or faster export on single-stream H264/H265. It is already pretty quick but if multiple accelerators could be leveraged on single-stream Long GOP formats, that would be better.