Getting answer on my question from BMD support team, maybe it will be useful for others:
The Latency would depend on several different factors, but capture with DirectGPU should have low latency.
When using Nvidia GPUDirect, frames are passed directly from the frame buffer on the card to your GPU. This process bypasses the CPU for extremely fast frame delivery.
The InputLoopThrough (CPU) sample provides latency measurements for input, processing and output. The "processing" section of the sample is just simulated processing time using delays, but you can use the same latency measuring techniques to check for latency here.
For reference, on my MacBook Pro M2 with InputLoopThrough, I am getting a capture latency of roughly 35ms, processing latency of 5ms and output latency of roughly 58ms. In total, this is about 3 frames.
However, this is with CPU processing and not GPU processing with DirectGPU. The GPUDirect process will perform differently from InputLoopThrough's CPU processing but unfortunately, that sample does not have latency measurements.