Decklink 4k Loopthrough Latency

Ask software engineering and SDK questions for developers working on Mac OS X, Windows or Linux.
  • Author
  • Message
Offline

rephil

  • Posts: 1
  • Joined: Wed Aug 10, 2022 2:29 pm
  • Real Name: Philip Butenko

Decklink 4k Loopthrough Latency

PostWed Aug 10, 2022 6:37 pm

Hey Everyone!

Disclaimer: So i have read I think every post on Blackmagics forum about Decklink latency but dont really feel that the posts that I have read were defined well enough or helped me understand the Decklink cards well enough for my current issue, so here goes nothing.

The Setup: I have a camera input HD1080i50 (its an endoscope used in hospitals), its coupled to my computer which has the standard Decklink 4k card. The input signal is overlayed with an interface written in opencv. And the output signal is sent to a screen that desplays the signal in HD1080i50.

The Problem: I wish to reduce the loopthrough latency from my current 110ms (cirka) to less than 60ms. From when the frame is detected on the wire input to its done sending it out the wire ouput. The one other requirement is that I must be able to draw an overlay to the videostream. Im currently doing this with opencv via x86mem and the CPU. This overlay is not static.

What I have tried:
Using Blackmagic Decklink SDK 12.4 I have compiled the Inputloopthrough project with referencelock off and:
    BMDDisplayMode = bmdModeHD1080i50
    BMDPixelFormat = bmdFormat10BitYUV
I have further removed the frame and audio processing code but left the method intact otherwise.
This gave latencies that looked like this:
Average latency: Input = 42.42 ms, Processing = 0.23 ms, Output = 69.47 ms
This is a total of around 110ms - this varries with maybe 20ms in total, which im assuming is in part because reference lock is not being used.
I have my main project which performs with the same amount of frames of latency as the Inputloopthrough project this tells me that drawing the overlay takes almost no time in comparison the input read time and the ouput write time. And so looking into using GPUdirect doesnt seem to be necessary for me at this moment.

Questions
1. Are these latency numbers the norm of the blackmagic decklink 4k cards in general or are there faster ones?
2. Are there any tweaks or properties that could help further reduce the latency related to the input and/or output latency (with minor or greater benefit)?
3. how much can I expect reference lock to reduce the latency by if I had that?
4. I dont need the audio channels...would removing that code for audio improve performance in a significant way?
5. If I have to look at other hardware solutions, is there any you would suggest in particular?

if you can answer any of these questions feel free to join the conversation. Thank you so much in advance!
Offline

petrno

  • Posts: 17
  • Joined: Sat Jan 16, 2021 9:00 pm
  • Real Name: Petr Novak

Re: Decklink 4k Loopthrough Latency

PostTue Aug 16, 2022 6:29 am

Hi Phil,

Code: Select all
Average latency: Input = 42.42 ms


The above is standard for 1080i50, as the length of the frame is 40 ms and the DeckLink card and the driver and SDK spend around 2.5 ms to process that. This 42.5 ms is measured from the moment the frame starts on the SDI wire to the moment the whole frame is available in your application. When the frame is available in your application, on the SDI wire you are already 2.5 ms into the next frame.

With this architecture (complete frame made available to the application rather than providing it "on-the-fly"), there does not seem to be any way to cut this - except perhaps to reduce the 2.5 ms delay in the whole Blackmagic ecosystem.

Code: Select all
Output = 69.47 ms

You mention you do not synchronise with RefIn, so that above number will actually vary over time, as your input frame rate is not the same as your output frame rate. You mention your input source is an endoscope, so I doubt it has a RefIn input, but if it has, you should really connect that and the same RefIn to the DeckLink card, so that your input and output frame rates are the same. Depending which clock is faster, you will face frame drops or duplicated frames on the output after some time and the frequency of these events will depend on the difference of the input and output frame rates.

Your application will never achieve 0 ms processing time, and even if it did, you have already lost 2.5 ms on input. So with the RefIn case, your absolutely best possible case is like this:

- frame x is being read by the input of the card
- frame x-1 is being processed by your application and by the DeckLink ecosystem
- frame x-2 is being sent by the output of the card

So your theoretically best case scenario would be a delay of 2 frames, that is for 1080i50 80 ms.

The InputLoopThrough example sets an output buffer (in InputLoopThrough example this is called "prerollFrames", so if you are slightly late you would not drop the frame. This can be set in the application. The minimum buffer (in number of frames) depends on the card (see the the value of BMDDeckLinkMinimumPrerollFrames attribute for your card), but usually you have at least 1 frame. So instead of 80 ms, you would measure the total delay to be 120 ms and the Output = xx.xx ms would show a numer around 75 ms (2 frames - 80 ms - minus the time above 40 ms on input minus the delay in your application).

In any case, with proper synchronisation, your delay should always be in the multiple of the duration of a single frame (40 ms in your case), so you can never achieve 60 ms delay, your best theoretical delay given the architecture of "store-and-forward" is 80 ms, practically more likely 120 ms.

If you critically depend on latency below 80 ms, you would have to use a dedicated hardware based device which can mix your overlay and the input signal using a "cut-through" principle, not "store-and-forward". Without synchronisation, even in this case the delay would be 40 ms (1 frame).

I hope this helps,
Petr
--
Petr Novak
Offline

boyu2022

  • Posts: 3
  • Joined: Tue Aug 23, 2022 3:42 pm
  • Real Name: Bo Yu

Re: Decklink 4k Loopthrough Latency

PostTue Aug 23, 2022 3:49 pm

Just want to jump in on this conversation. Thanks Petr for the explanation, this makes a lot of sense. Although I still have a question. When the input video is interlaced (ie 1080i50), isn't the frame rate actually 50 frames per second from the capture card's perspective? Doesn't the card de-interlace each field and treat it as a frame? Or does it wait for two fields and de-interlace the two fields together? Which method of de-interlacing is typically used and is there a way to choose which one to use?
Offline

petrno

  • Posts: 17
  • Joined: Sat Jan 16, 2021 9:00 pm
  • Real Name: Petr Novak

Re: Decklink 4k Loopthrough Latency

PostThu Aug 25, 2022 2:08 pm

Hi,

the interlaced frame is delivered as 1 frame (2 fields) and they are marked which is which. They are not de-interlaced, but put together. So the API is frame oriented, not field oriented. This means your frame rate for 1080i50 will be 25 frames per second and each frame will have 2 fields.

I hope this helps.

Petr
--
Petr Novak

Return to Software Developers

Who is online

Users browsing this forum: No registered users and 18 guests