Blackmagic Forum

Tue Apr 15, 2025 3:04 am

I have just selected AI timeline transcription on my podcast timeline to generate a whole transcript.
It has failed to identify and transcribe probably 75% of the content in the timeline.

There are two audio tracks - one for me and one for the guest.

I have attached a sample screenshot.

: Excerpt of timeline showing the extent of guest audio and the results seen in the transcription window for this segment of the timeline. This inaccuracy repeats throughout the whole timeline of a 1 hour podcast.; Screenshot 2025-04-15 at 2.56.10 pm.png (451.99 KiB) Viewed 562 times

After taking 45 minutes to create the transcript, when I look at it (if you look at this sample graphic), you'll see it has identified some dialogue from me ending at 23:46 where the playhead is located.

The entire block of guest dialogue through to 27 minutes is completely missed and transcribed as "hmm"

Virtually all of the guest dialogue has been completely missed by the transcription tool in this hour long podcast.

The transcription accuracy of this tool, through version 19 and now into 20 appears to remain pretty terrible. I'm pleased to see timecodes and speaker names can now be exported in version 20, but the transcription feature for analysing text from audio is performing extremely badly so far (to the point of not usable).

Not great for a paid feature!

Thu Apr 17, 2025 6:00 am

Update for anyone following this thread.

Thanks very much to Johnny from BlackMagic for messaging me and helping with this. I'll share the problem and the work-around solution from his advice, which I've tested and it has fixed the problem for my use case.

It turns out the AI translation feature works on the lowest audio track. If you have dialogue on separate audio tracks, and blanks space or overtalking, then it causes the AI translation feature a problem and it will indeed skip the audio on the higher tracks.

If there is an audio gap, it then looks for other audio on higher tracks and will include it, which is why I was getting 'bitty' results on the raw podcast content. Layered audio tracks with simultaneous dialogue or 'blank space' where dialogue is present in a higher track was causing the problems.

The fastest solution to producing a complete timeline, which I've just tested and it seems to have worked very well, was to bounce my mix to a single completed track (using 'Timeline -> Audio -> Bounce mix to track')

That has created the whole podcast dialogue as a single track.

I've then moved that to a new timeline to test it and run the AI transcription and it is much, much better. Speaker detection has worked well and (although I haven't manually checked it over entirely), it's looking pretty good - and speaker names and timecodes are included in the export now.

As the whole dialogue is on a single track as a single clip, it does also mean that the transcript can be corrected when switching from 'timeline' to 'clip' mode (in the top left of the transcription box). This is also useful, as it means errors can be corrected from within the transcript editor and then exported to a correct transcript (txt file).

If you're finding transcript is not accurate or is missing content, this may be your issue - look for overlapping text and consider bouncing these segments to a single track.

Hope this all helps others and thanks again to Johnny for such prompt help.

Fri Apr 18, 2025 6:42 pm

Thanks for posting the follow up. I did the same and it solved 90% of the problems I was having while transcribing.

However, something I can't quite understand (since the neural engine must be the same) is how perfect I get subtitles every time, but how transcription misses some very long and obvious silences.

Update:
I was wrong. Transcription is still messed up. If I create subtitles and after import them as transcriptions, silences are detected and also some long parts that weren't being transcribed, just skipped, now are working correctly.

It seems that there are different model settings for subtitles and for transcription, and without being able to tune them, results are inconsistent (unusable).

Thu May 08, 2025 4:22 am

This was super helpful and seems to have worked in my case. I do hope that there's a more robust implementation in the works!

Wed May 28, 2025 1:11 pm

Thanks for the update and workaround. Avid's new-ish transcription tool works with multiple audio tracks in timelines very well IMO. You can turn on/off tracks in the transcription panel and it will only show the tracks you want. This is insanely useful when working with sync map sequences where each audio track has a different speaker...a lot of the time in a completely different area or conversation - like in my case, multiple mic'd up athletes across an entire field during practice, for instance.

Really hoping BM will be able to keep up with Avid and update the DaVinci transcription tool to match and/or surpass Media Composer with implementations like this.

Timeline transcription failing very badly.

Timeline transcription failing very badly.

Re: Timeline transcription failing very badly.

Re: Timeline transcription failing very badly.

Re: Timeline transcription failing very badly.

Re: Timeline transcription failing very badly.

Who is online