Free Transcriptions in Resolve using OpenAI Whisper

Get answers to your questions about color grading, editing and finishing with DaVinci Resolve.
  • Author
  • Message
Offline

Octavian Mot

  • Posts: 286
  • Joined: Mon Aug 25, 2014 2:42 pm
  • Location: Germany

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostThu Nov 10, 2022 10:48 am

Uli Plank wrote:OMG, AI poetry from silence!

:-D :-D
The hallucinations seem to be more pronounced in other languages besides English.

But, as I was explaining in the README:
Code: Select all
there is a human-like AI trapped in your machine doing your job for you on a mechanical typewriter with missing keys... It has the right to have day-dreams too.

If it's hallucinating over non-silent segments of the audio, the solution is to select those segments using V and then re-transcribe only those time intervals by pressing T.
Trying to keep it together at mots.us
Taming AI for filmmaking at StoryToolkit.ai
Offline

Octavian Mot

  • Posts: 286
  • Joined: Mon Aug 25, 2014 2:42 pm
  • Location: Germany

Re: Free Transcriptions and Semantic Search using AI for Res

PostThu Nov 10, 2022 10:56 am

By the way, the Windows standalone version for machines with CUDA GPUs is up on the release page https://github.com/octimot/StoryToolkit ... ses/latest.

It's in a super alpha version, but it should work in most cases if you follow the installation instructions.

Feel free to try it out and let us know if something's weird.
Trying to keep it together at mots.us
Taming AI for filmmaking at StoryToolkit.ai
Offline
User avatar

Gary Hango

  • Posts: 886
  • Joined: Mon Apr 09, 2018 10:35 pm
  • Location: Left Coast
  • Real Name: Gary Hango

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostThu Nov 10, 2022 2:55 pm

Uli Plank wrote:OMG, AI poetry from silence!

a i cummings.
Microsoft Windows 10 Pro x64
Intel(R) Core(TM) i7-6700, 3.40GHz, 32.0 GB
MB: MSI, BIOS: American Megatrends Inc. A.60, 12/17/2015
NVIDIA GeForce GTX 960, 2Gb
Resolve 18.1.1.0007(Free)
Offline

rnbaker

  • Posts: 60
  • Joined: Thu Aug 22, 2019 10:40 am
  • Real Name: Ray Baker

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostThu Nov 10, 2022 4:05 pm

Octavian,
Thanks for the incredible tool. This blows away the auto-transcription built into Premiere Pro and everything else I have tried. I just installed 0.17.1 for Windows and it worked great but I just want to share some warning/error messages with you...

1) "WARNING: FFMPEG_BINARY env variable is empty. Looking for ffmpeg in PATH."
2) "Unable to find module DaVinciResolveScript from $PYTHONPATH - trying default locations"
3) "librosa\util\decorators.py:88: UserWarning: PySoundFile failed. Trying audioread instead.
return f(*args, **kwargs)"

I didn't get any messages like this with 0.16.18 which I successfully installed and used yesterday. Also, I have FFMPEG installed via Chocolatey and even forced reinstalled it today after I saw the error message and obviously, I still get the error message after the force reinstall.

All of that said, the transcription worked flawlessly with 0.17.1 but I thought I would share those messages anyway just in case they might help you with StoryToolkitAI development and optimization.

Thanks again and keep up the great work!

(By the way, sorry, I don't have a GitHub account yet but I will soon.)
Offline

rnbaker

  • Posts: 60
  • Joined: Thu Aug 22, 2019 10:40 am
  • Real Name: Ray Baker

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostThu Nov 10, 2022 4:41 pm

Robert Arnold wrote:It *does* in fact hallucinate when it encounters silence for a long time. I accidentally gave it a timeline that was missing audio, and it came up with this:

-------------

I have attached my sunflower seeds to possession.
So what do you say?
I'm fine, thank you.
Sure, I have a sunflower seed.
Let's go out to see it.

--------------

haha, I did the exact same thing. I just grabbed a random 1-minute clip to test it on, and after I ran it, I found out it just had some instrumentals on it. This is what I got...

[00:00.000 --> 00:05.880] Chart, Please eat
[01:00.000 --> 01:14.120] Thank you.

So, Mr.(?) Chart, waited almost a minute to respond and then took almost 15 seconds to start and finish, "thank you" or I guess...
"ttttttttttttthhhhhhhhhhhhaaaaaaaaaaaaaaaaannnnnnnkkkkkkkk
yyyyyyooooooooooooooooooooooooouuuuuuu"

lol
Offline

Octavian Mot

  • Posts: 286
  • Joined: Mon Aug 25, 2014 2:42 pm
  • Location: Germany

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostThu Nov 10, 2022 4:49 pm

@rnbaker

Thanks for the feedback! I'm glad that it works for you!

None of the warnings are anything to worry about.

1) "WARNING: FFMPEG_BINARY env variable is empty. Looking for ffmpeg in PATH."

This means that it didn't find it in the environment variable, but it's looking in PATH. Since there's no warning/error after that it means that FFMPEG was found.
"Unable to find module DaVinciResolveScript from $PYTHONPATH - trying default locations"

Same logic as above, this is the Resolve API throwing a warning. But as long as Resolve is connected with the app, it means that it found its way later
3) "librosa\util\decorators.py:88: UserWarning: PySoundFile failed. Trying audioread instead.
return f(*args, **kwargs)"

This says that it's using audioread instead of PySoundFile, but it's more or less the same thing for what we need.


@everyone

I'm curious if folks are using the Advanced Search feature.

This is the first step towards really making use of AI to find concepts in the transcribed footage, and we're getting more and more used to it in our editing room. We're going to incorporate more AI models/features soon, hopefully for video footage too.
Trying to keep it together at mots.us
Taming AI for filmmaking at StoryToolkit.ai
Offline

Octavian Mot

  • Posts: 286
  • Joined: Mon Aug 25, 2014 2:42 pm
  • Location: Germany

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostThu Nov 10, 2022 4:54 pm

rnbaker wrote:
Robert Arnold wrote:It *does* in fact hallucinate when it encounters silence for a long time. I accidentally gave it a timeline that was missing audio, and it came up with this:

-------------

I have attached my sunflower seeds to possession.
So what do you say?
I'm fine, thank you.
Sure, I have a sunflower seed.
Let's go out to see it.

--------------

haha, I did the exact same thing. I just grabbed a random 1-minute clip to test it on, and after I ran it, I found out it just had some instrumentals on it. This is what I got...

[00:00.000 --> 00:05.880] Chart, Please eat
[01:00.000 --> 01:14.120] Thank you.

So, Mr.(?) Chart, waited almost a minute to respond and then took almost 15 seconds to start and finish, "thank you" or I guess...
"ttttttttttttthhhhhhhhhhhhaaaaaaaaaaaaaaaaannnnnnnkkkkkkkk
yyyyyyooooooooooooooooooooooooouuuuuuu"

lol

:D :D :D I'm seriously considering we should post these somewhere on the GitHub page....
Last edited by Octavian Mot on Thu Nov 10, 2022 4:55 pm, edited 1 time in total.
Trying to keep it together at mots.us
Taming AI for filmmaking at StoryToolkit.ai
Offline

rnbaker

  • Posts: 60
  • Joined: Thu Aug 22, 2019 10:40 am
  • Real Name: Ray Baker

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostThu Nov 10, 2022 4:55 pm

Octavian Mot wrote:@rnbaker

Thanks for the feedback! I'm glad that it works for you!

None of the warnings are anything to worry about.

No problem and thanks for the info!

Octavian Mot wrote: :D :D :D I'm seriously considering we should post these somewhere on the GitHub page....

lol, yep, you should!
Offline

Robert Arnold

  • Posts: 447
  • Joined: Tue Oct 30, 2012 11:53 pm

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostThu Nov 10, 2022 6:29 pm

Octavian Mot wrote:
Uli Plank wrote:OMG, AI poetry from silence!

:-D :-D
The hallucinations seem to be more pronounced in other languages besides English.

But, as I was explaining in the README:
Code: Select all
there is a human-like AI trapped in your machine doing your job for you on a mechanical typewriter with missing keys... It has the right to have day-dreams too.

If it's hallucinating over non-silent segments of the audio, the solution is to select those segments using V and then re-transcribe only those time intervals by pressing T.


Even if the actual software weren't like some sort of miracle, just the READ ME is worth the download!

BTW, I have an RX 6900XT running in my Hackintosh, which uses metal. Any chance of GPU utilization in future versions?
Offline
User avatar

CodeTech

  • Posts: 110
  • Joined: Tue Jul 16, 2019 12:00 am
  • Location: Calgary
  • Real Name: Geoff Allan

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostThu Nov 10, 2022 7:38 pm

Attempting to get this working in Windows, no luck so far.

So I've installed StoryToolkitAI.0.17.1.WIN.exe and it runs. It finds FFmpeg, then displays
"Unable to find module DaVinciResolveScript from $PYTHONPATH - trying default locations".

I've ensured that Resolve is set to local scripting.

I run Resolve, load up a project, then run StoryToolkitAI.

I don't see any "Transcribe Timeline" button.

After about a minute, the StoryToolkitAI window disappears and nothing happens.

What am I missing??

Edited: I uninstalled a version of Python that I had installed last week while trying to get this working before. Now it runs.
Last edited by CodeTech on Thu Nov 10, 2022 8:23 pm, edited 1 time in total.
Resolve Studio 19b1, Win11, i7-13700K, 64GB DDR5 RAM, 3080Ti (12GB) (552.22 Studio), 5TB M.2 SSDs, 90TB+ HDs, Three 4K monitors: 27", 28", 43", Focusrite Scarlett 2i4, Presonus Studio Monitors
Offline

infinityespi

  • Posts: 42
  • Joined: Tue Nov 26, 2019 3:58 pm
  • Real Name: Eduardo Espinosa

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostThu Nov 10, 2022 7:50 pm

I found that the tool is a bit slower than PP’s however the actual transcription is a lot more accurate. Love the integration and the fact that I can export the subtitles right onto the timeline
DaVinci Resolve Studio 17
macOS Big Sur
Imac 2021
M1
16 gigs of Ram
https://www.infinityespi.com
Offline

Octavian Mot

  • Posts: 286
  • Joined: Mon Aug 25, 2014 2:42 pm
  • Location: Germany

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostThu Nov 10, 2022 8:47 pm

CodeTech wrote:Edited: I uninstalled a version of Python that I had installed last week while trying to get this working before. Now it runs.


I'm glad that it worked out! From all the reports I see, it seems that in most cases, the tool can't connect to the Resolve API on Windows due to a messed up Python environment / installation.
Trying to keep it together at mots.us
Taming AI for filmmaking at StoryToolkit.ai
Offline
User avatar

CodeTech

  • Posts: 110
  • Joined: Tue Jul 16, 2019 12:00 am
  • Location: Calgary
  • Real Name: Geoff Allan

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostThu Nov 10, 2022 10:13 pm

Now that I've played around with this for a while, I'm very impressed.
It did German->English perfectly.
Some of my own raw videos confused it slightly, but for the most part even quiet speech was correctly picked up and transcribed well.

This is an incredibly useful tool!
Resolve Studio 19b1, Win11, i7-13700K, 64GB DDR5 RAM, 3080Ti (12GB) (552.22 Studio), 5TB M.2 SSDs, 90TB+ HDs, Three 4K monitors: 27", 28", 43", Focusrite Scarlett 2i4, Presonus Studio Monitors
Offline

JimBaloney

  • Posts: 1
  • Joined: Fri Nov 11, 2022 3:35 pm
  • Real Name: Jim Adams

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostFri Nov 11, 2022 3:41 pm

I've just stumbled upon this project while researching solutions to another related issue. I'm curious, are there plans to integrate a text to speech engine so as to provide a means to generate alternate language audio for projects? That would be useful for my training video projects that am working on. BTW awesome work.
Offline

ColinMcT

  • Posts: 54
  • Joined: Sat Jul 10, 2021 11:14 pm
  • Real Name: Colin McTaggart

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostSat Nov 12, 2022 12:17 am

I have a friend that codes Python for a living. I've asked him to help but he's very busy and may not have time for awhile to set this up for me on my machine. I have no idea with this is anyone able to give me a simple dot point explanation on how to set up on my iMac please?
DaVinci Resolve Studio 17.4.3
2019 27" iMac 40gig ram
BM Speed Editor
BM Editor Keyboard
Ultrastudio mini 4K
BM Micro Panel
Offline

Octavian Mot

  • Posts: 286
  • Joined: Mon Aug 25, 2014 2:42 pm
  • Location: Germany

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostSat Nov 12, 2022 5:17 am

ColinMcT wrote: is anyone able to give me a simple dot point explanation on how to set up on my iMac please?

Sure, download then see the instructions here:
https://github.com/octimot/StoryToolkitAI/releases.

They should be straightforward, but if you run into something we'll try to help.
Trying to keep it together at mots.us
Taming AI for filmmaking at StoryToolkit.ai
Offline

Joelarvidsson

  • Posts: 201
  • Joined: Fri Oct 12, 2012 6:18 am

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostSat Nov 12, 2022 8:30 am

Is there a way in Davinci to write out subtitles one word at the time like on YouTube? I'm sure it can be done, but not with a set format. It slip my mind the name of the subtitle format that store the time for each word. But I would love to get it to work in DVR.
Resolve Studio 18.5, Studio driver 536.99
Supermicro 2 Intel Xeon E5-2687W 3.10GHz processors. 64GB ram. GTX 1080Ti GPU Samsung PM893 SSD for system, Intel ssd for cache. Windows 10 pro. Qnap gnap TS-1685 for media. 925MB/s & 1062MB/s
Offline

Octavian Mot

  • Posts: 286
  • Joined: Mon Aug 25, 2014 2:42 pm
  • Location: Germany

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostSat Nov 12, 2022 9:06 am

Joelarvidsson wrote:Is there a way in Davinci to write out subtitles one word at the time like on YouTube?

I don't think toggling each word individually is possible using subtitles in Resolve. Maybe with a Text+ via Fusion Comp.

Also, the AI used for StoryToolkitAI will not give you word-level timings in your transcript, but only start and end times for each phrase, because it's focused more on the meaning and context of what is being said instead of reproducing each word individually - that's one of the reasons why the results are significantly better than other speech-to-text models. However, if that's something interesting to folks, we could consider an update which aligns the transcript to audio at the word level...
Trying to keep it together at mots.us
Taming AI for filmmaking at StoryToolkit.ai
Offline

CougerJoe

  • Posts: 345
  • Joined: Wed Sep 18, 2019 5:15 am
  • Real Name: bob brady

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostSat Nov 12, 2022 10:35 am

I get "Userwarning: PySoundFile failed. Trying audioread instead"

It's working fine but what's the negative to not using PySoundFile (whatever that is) You said it's much the same thing but I saw when PySoundFile tried to run it used all my Vram, but audioread doesn't. Normally with AI software the most VRAM you can use is fastest/best
I'm using Windows executable version

Thanks Very nice app!
Offline

Videoneth

  • Posts: 1698
  • Joined: Fri Nov 13, 2020 11:03 pm
  • Warnings: 1
  • Real Name: Maxwell Allington

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostSat Nov 12, 2022 11:10 am

Octavian Mot wrote:
Joelarvidsson wrote:Is there a way in Davinci to write out subtitles one word at the time like on YouTube?

I don't think toggling each word individually is possible using subtitles in Resolve. Maybe with a Text+ via Fusion Comp.

Also, the AI used for StoryToolkitAI will not give you word-level timings in your transcript, but only start and end times for each phrase, because it's focused more on the meaning and context of what is being said instead of reproducing each word individually - that's one of the reasons why the results are significantly better than other speech-to-text models. However, if that's something interesting to folks, we could consider an update which aligns the transcript to audio at the word level...


but it could : https://github.com/jianfch/stable-ts
This script modifies methods of Whisper's model to gain access to the predicted timestamp tokens of each word (token) without needing additional inference. It also stabilizes the timestamps down to the word (token) level to ensure chronology.


It works well. Of course, it has not the same use as a regular transcript. I used it myself so I don't have to break down sentences to words for animation.
Windows 10
19b
nVidia 3090 - 552.22
Offline

Octavian Mot

  • Posts: 286
  • Joined: Mon Aug 25, 2014 2:42 pm
  • Location: Germany

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostSat Nov 12, 2022 11:13 am

CougerJoe wrote:I get "Userwarning: PySoundFile failed. Trying audioread instead"

It's working fine but what's the negative to not using PySoundFile (whatever that is) You said it's much the same thing but I saw when PySoundFile tried to run it used all my Vram, but audioread doesn't. Normally with AI software the most VRAM you can use is fastest/best
I'm using Windows executable version


This is probably happening because Resolve's "Audio Only" render preset is actually rendering a Quicktime MOV file instead of a a wav file. So, the tool is re-converting that to Linear PCM, and only after that, the file is being passed to AI. Unfortunately, I couldn't find a way to select Wave, Linear PCM using the API.

In our studio, we use a "transcription_WAV" render preset: go to the Resolve Render Page, select the Audio Only preset, make sure that the "Export Video" in the Video tab is disabled, then, in the "Audio" tab, select the "Wave" format and "Linear PCM" as codec. Then save this preset as "transcription_WAV", and the next time you transcribe, you should see Resolve rendering .wav files for transcriptions instead of .mov, and the process will use a bit less resources and take less time.

I'll write this up in the README with the new update. Thanks for the feedback!

BTW: I'm grateful to Blackmagic for even providing the API in the first place, but we're definitely pushing the boundaries on what can be done with it. Finding workarounds is really an adventure for most features that involve Resolve in the tool. :D
Trying to keep it together at mots.us
Taming AI for filmmaking at StoryToolkit.ai
Offline

Octavian Mot

  • Posts: 286
  • Joined: Mon Aug 25, 2014 2:42 pm
  • Location: Germany

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostSat Nov 12, 2022 11:25 am

Videoneth wrote:Is there a way in Davinci to write out subtitles one word at the time like on YouTube?
but it could : https://github.com/jianfch/stable-ts

I'd love to make that available, but unfortunately the results are not good in that example. The algorithm has the tendency to round the seconds for each word (you can see that on the link you sent) and we're losing a lot of precision and most likely context like that. Feel free to open an issue / feature request on the GitHub page and we can decompose the problem in a more technical manner. :-D

We see this tool evolving into something like an AI Assistant Editor which is able to help you find relevant content in your footage, rather than just a transcription tool. So, the transcriptions are just a means to an end and, as long as they're precise, word timings are not really necessary for the bigger picture in my opinion.
Trying to keep it together at mots.us
Taming AI for filmmaking at StoryToolkit.ai
Offline

CougerJoe

  • Posts: 345
  • Joined: Wed Sep 18, 2019 5:15 am
  • Real Name: bob brady

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostSat Nov 12, 2022 12:07 pm

Octavian Mot wrote:
In our studio, we use a "transcription_WAV" render preset: go to the Resolve Render Page, select the Audio Only preset, make sure that the "Export Video" in the Video tab is disabled, then, in the "Audio" tab, select the "Wave" format and "Linear PCM" as codec. Then save this preset as "transcription_WAV", and the next time you transcribe, you should see Resolve rendering .wav files for transcriptions instead of .mov, and the process will use a bit less resources and take less time.

I'll write this up in the README with the new update. Thanks for the feedback!



That worked perfectly thankyou!
Although it appeared to be using more GPU and was using more Vram it was actually 8% slower doing the same transcribe as before. The time I refer to is the one that shows here
INFO: Finished transcription for Timeline 1 in XX seconds

Single word subtitles are popular on social media, this sort of thing
https://www.youtube.com/shorts/dZklZVaU4AI
It would be a bonus if we could do that easily. I will need to look into TEXT+ , see if there is a method to quickly convert subs to TEXT+
Offline

Octavian Mot

  • Posts: 286
  • Joined: Mon Aug 25, 2014 2:42 pm
  • Location: Germany

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostSat Nov 12, 2022 12:35 pm

CougerJoe wrote:I will need to look into TEXT+ , see if there is a method to quickly convert subs to TEXT+

There's already a similar request here: https://github.com/octimot/StoryToolkitAI/issues/14

If you find out something useful, let us know!
Trying to keep it together at mots.us
Taming AI for filmmaking at StoryToolkit.ai
Offline

rnbaker

  • Posts: 60
  • Joined: Thu Aug 22, 2019 10:40 am
  • Real Name: Ray Baker

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostSat Nov 12, 2022 6:38 pm

CougerJoe wrote:It would be a bonus if we could do that easily. I will need to look into TEXT+ , see if there is a method to quickly convert subs to TEXT+

Someone recently started a thread on reddit about this. There's not much there though except maybe https://resolver.tools/subsimple/ (it is only a workaround at best obviously but maybe it will help a bit)

https://www.reddit.com/r/davinciresolve ... _keyframe/

And, please let us know if you find something better or if you start a thread on here (BMD's forums).
Offline

CougerJoe

  • Posts: 345
  • Joined: Wed Sep 18, 2019 5:15 am
  • Real Name: bob brady

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostSun Nov 13, 2022 2:02 am

rnbaker wrote:
CougerJoe wrote:It would be a bonus if we could do that easily. I will need to look into TEXT+ , see if there is a method to quickly convert subs to TEXT+

Someone recently started a thread on reddit about this. There's not much there though except maybe https://resolver.tools/subsimple/ (it is only a workaround at best obviously but maybe it will help a bit)

https://www.reddit.com/r/davinciresolve ... _keyframe/

And, please let us know if you find something better or if you start a thread on here (BMD's forums).


I shall. Looks like that workaround involves a $300 tool

@Octavian Mot Have a look at this video, it loses sync around 1:50, regains sync around 3:15 , do you understand the cause?
https://streamable.com/thcieo
Offline

rnbaker

  • Posts: 60
  • Joined: Thu Aug 22, 2019 10:40 am
  • Real Name: Ray Baker

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostSun Nov 13, 2022 2:15 am

CougerJoe wrote:
I shall. Looks like that workaround involves a $300 tool

@Octavian Mot Have a look at this video, it loses sync around 1:50, regains sync around 3:15 , do you understand the cause?
https://streamable.com/thcieo

Wow, that's too bad about the workaround. I was wondering about the subtitles losing sync too because I had that happen a bit with StoryToolkitAI but I just manually adjusted the subtitles to put them back in sync. I was just hoping it would get fixed in a future release and really I should have said something too but I am certainly glad you did! However, also, I think my project is 23.976 fps and I am thinking that might be it because he had said something about DR API recognizing it at 23 fps at times (or something like that) and causing issues for StoryToolkitAI.
Offline

CougerJoe

  • Posts: 345
  • Joined: Wed Sep 18, 2019 5:15 am
  • Real Name: bob brady

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostSun Nov 13, 2022 3:25 am

rnbaker wrote: I was wondering about the subtitles losing sync too because I had that happen a bit with StoryToolkitAI but I just manually adjusted the subtitles to put them back in sync. I was just hoping it would get fixed in a future release and really I should have said something too but I am certainly glad you did! However, also, I think my project is 23.976 fps and I am thinking that might be it because he had said something about DR API recognizing it at 23 fps at times (or something like that) and causing issues for StoryToolkitAI.


I tried the same video using Translation instead of transcribe, and Large models instead of Medium, it synced subs perfectly throughout the whole video. Unsure if either of those options helped or if desync is some transient bug

(This is the video I was testing
)
Offline

rnbaker

  • Posts: 60
  • Joined: Thu Aug 22, 2019 10:40 am
  • Real Name: Ray Baker

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostSun Nov 13, 2022 1:43 pm

CougerJoe wrote:I tried the same video using Translation instead of transcribe, and Large models instead of Medium, it synced subs perfectly throughout the whole video. Unsure if either of those options helped or if desync is some transient bug

@Octavian for you too obviously...

That is interesting and I was thinking about it more, and obviously while it desync'ed on my 23.976fps timeline/project, the issue with the 23.976fps is with the transcript sync highlighting when you have the timeline open in DR, but the subtitle creation is based on audio files and not video files so fps shouldn't matter.

Also, I was using the medium English-only model and normal transcribing mode when StoryToolkitAI desync'ed, and from what I saw in your test video, it seems very close to what you experienced. I would say the desync'ing I experienced was always within +/- .25 to 2.5 seconds and would return to being in sync once again and then out of sync and then back into sync again, and on and on.
Offline

Octavian Mot

  • Posts: 286
  • Joined: Mon Aug 25, 2014 2:42 pm
  • Location: Germany

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostSun Nov 13, 2022 6:04 pm

Thanks for the feedback! I really appreciate it!

There's a bunch of info to unpack, and I need to look over all your comments with the tool in front of me. It would be great if you guys could use the issues tab on the Github page since we can deal with them in order and benefit from the community over there. Some, have been debated there already I think.

Word based animation etc.
Regarding the text+ feature, I see it more like a VFX upgrade / nice to have, rather than pure editing, so my instinct tells me that we should focus on editing features first, unless we find help. A big next step would be to push the AI search features as much as possible, since we're also directly benefiting from them on our current projects - I'd love to explain that more some time, btw.

About losing sync
The large model is better than the medium model (especially on non-english languages), but also has its biases here and there.

Another thing to mention is that the 23.976fps timelines are problematic for the Resolve API since we're getting either a 23fps or 24fps rounded integer from Resolve, instead of the correct float - see known issues on the Github page. I think I've reported this months ago on the forum here too and others have confirmed it.

But, this might also just be the AI getting lazy every now and then and just acting like a child :-)

Currently, you can re-align phrases using shortcuts directly from the app and I think we could automate the re-alignment with another AI model soon - again, for feedback if this would be useful, it would be amazing to debate it on Github.

Another thing that helps in our editing room, is to select the segments that are off with V, and simply re-transcribe using key T - I would avoid however re-transcribing segments that are less than 20-30 sec long with anything less than the large model because the AI will be missing a lot of context to get to better results than in the first pass (I might be wrong though)
Trying to keep it together at mots.us
Taming AI for filmmaking at StoryToolkit.ai
Offline

studio1492

  • Posts: 654
  • Joined: Fri Oct 26, 2018 6:22 pm
  • Real Name: Fran Navas

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostMon Nov 14, 2022 3:00 pm

+1 Just subscribed to this thread.

I've tested this utility on macOS and it works great. Sync is far to be perfect, as well as translations, the install process is not a road of roses, but once it runs, it is fully worth as it saves 95% of work against hand made captions. I hope BMD adquieres this utility to be included in future versions of DVR Studio, while the author gets compensated for such incredible work. Thanks
- MBP 14" M1 Pro 16GB, 1TB, 10 core CPU, 16 core GPU.
- Resolve Studio 18.6.4 @ macOS 13.6.2
- Mini Panel v2.0
- Speed Editor (gathering dust until killer custom keys arrive)
- Synology DS218
Offline
User avatar

Paddywack0

  • Posts: 62
  • Joined: Wed Sep 01, 2021 6:04 pm
  • Real Name: Nick Elborough

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostMon Nov 14, 2022 8:41 pm

Just installed and started using this on Win10 and it is amazing. Works really well. Although in its infancy the developer is really responsive to questions and requests if you message him on the Github.
Windows 10 Pro - Version Build 19043.1766
HP z820 24core Dual 3.5 Xeon E5
128Gb Memory
RTX 2080ti graphics
Offline

Vadim Tyupalov

  • Posts: 18
  • Joined: Tue Feb 09, 2021 9:23 pm
  • Real Name: Vadim Tyupalov

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostTue Nov 15, 2022 9:59 am

I've tried to use it and it works really well! 50-minutes interview was transcribed from russian only in 5 minutes on RTX 2060 6GB card, and it's faster and even more accurate than adobe sensey algorithm. Couple of questions:
1) Can you add support to translate on different languages, that are supported by whisper, not only english? I guess it can be done the same way as you choose language of transcription - and than you can choose the translation language of your transcription. It can be very handy to add multiple subtitles for youtube content, because the accuracy of whisper is far better, than google automatic translate

2) Can you add support for export as a .txt file with timestamps? It's very useful when you are working with a journalist, because they can faster orient in video eith text

Also i'm waiting for speaker recognition feature, it would be very handy for the job I described in 2 question.

By the way, you've done the great jobe and I really appreciate it. Speech transcription is the last thing why I still have to use Premiere Pro sometimes, and now the time of not using it at all comes closer as never. I hope you will release a nice app soon and the installation process will be easier and won't require command line for operating with this instrument
Offline

Octavian Mot

  • Posts: 286
  • Joined: Mon Aug 25, 2014 2:42 pm
  • Location: Germany

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostTue Nov 15, 2022 3:38 pm

Vadim Tyupalov wrote:1) Can you add support to translate on different languages, that are supported by whisper, not only english? I guess it can be done the same way as you choose language of transcription - and than you can choose the translation language of your transcription. It can be very handy to add multiple subtitles for youtube content, because the accuracy of whisper is far better, than google automatic translate
Unfortunately, translations to other languages besides English cannot be done with the current Whisper models, since its only been trained to translate to English. Maybe someone wants to take the challenge and train some other language models. We could discuss about it on the Github page.

Vadim Tyupalov wrote:2) Can you add support for export as a .txt file with timestamps? It's very useful when you are working with a journalist, because they can faster orient in video eith text
This was just added on the non-standalone version (see discussion here: https://github.com/octimot/StoryToolkit ... 1315419549) and will soon be available on the standalone release.

studio1492 wrote:Sync is far to be perfect, as well as translations
It would be super helpful to give more details on the issues page over on Github, since a lot of the results can be improved via transcription settings or simply by using a larger model. Things like source language, audio length etc. would be good to know.

studio1492 wrote:the install process is not a road of roses
This will be simplified even more soon!
Trying to keep it together at mots.us
Taming AI for filmmaking at StoryToolkit.ai
Offline

Graham Kay

  • Posts: 21
  • Joined: Sun May 12, 2019 3:45 pm
  • Real Name: Graham Kay

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostTue Nov 15, 2022 6:53 pm

Thank you, Octavian!

This is pretty amazing. As others have said, it's way ahead of the auto transcription tool in Premiere Pro - mainly in terms of how accurate it is, but also, despite not being integrated into Resolve, it's actually more straightforward and efficient to use once it is up and running.

I tested it on dialogue recorded by children speaking a regional accent (of English), which Premiere consistently fails with. It was completely accurate, even when one speaker came up with a spoonerism; the AI put the right consonants back in the right places! Wow. It also knew to capitalise proper nouns that weren't necessarily obvious from the context.

I also tested it with poorly recorded dialogue of elderly Urdu speakers and asked it to transcribe and translate into English. Again, flawless.

I will be using this a lot and look forward to future refinements.

Thanks again - I don't usually make the effort to write feedback on these things, but this has blown me away.
____________________________

DR Studio 18.0
Win10Pro 21H2/19044.2130
i7-3770K @ 3.5GHz
32GB RAM
GTX 1060 6GB "Studio" driver 512.15
Offline

Octavian Mot

  • Posts: 286
  • Joined: Mon Aug 25, 2014 2:42 pm
  • Location: Germany

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostSat Nov 19, 2022 7:40 am

Thank you for taking the time to write down your feedback!

I just uploaded a new standalone app version which took into consideration some of your ideas: https://github.com/octimot/StoryToolkitAI/releases

Besides other things, we've been playing in our editing room with the new Transcript Groups feature that allows the user to select segments from the transcript and turn them into groups which may be used later for different operations. We use it a lot to group what people say by topics or even by speakers. Although it might not seem important, this is a prerequisite to start the work on auto speaker recognition, and it will also be used to filter out advanced search content (i.e. you could perform searches only on certain groups if you want to).

We're currently planning a feature that would allow you to search semantically within your own Resolve marker notes, and even to use Resolve markers to divide transcript sections into groups.

FYI, after doing more testing it seems that the complicated installation steps might not be needed for most users. I tried to explain more about it on the release page (see Installation section).
Trying to keep it together at mots.us
Taming AI for filmmaking at StoryToolkit.ai
Offline
User avatar

Robert Niessner

  • Posts: 5022
  • Joined: Thu Feb 21, 2013 9:51 am
  • Location: Graz, Austria

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostSat Nov 19, 2022 12:07 pm

Octavian, very cool work and very generous to make this freely available, thank you!

A quick question, as I haven't found an answer to this:
Would this also run with Resolve 17 or is there something in version 18 only you need specifically for this to work?
Saying "Thx for help!" is not a crime.
--------------------------------
Robert Niessner
LAUFBILDkommission
Graz / Austria
--------------------------------
Blackmagic Camera Blog (German):
http://laufbildkommission.wordpress.com

Read the blog in English via Google Translate:
http://tinyurl.com/pjf6a3m
Offline

Octavian Mot

  • Posts: 286
  • Joined: Mon Aug 25, 2014 2:42 pm
  • Location: Germany

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostSat Nov 19, 2022 3:29 pm

Robert Niessner wrote:Would this also run with Resolve 17 or is there something in version 18 only you need specifically for this to work?

We've been using an iteration of this since Resolve 17, and only made small changes to the Resolve API communication module, so I don't see a reason for it not to work, unless some API functionality was dropped since - and in that case, only those features might not work. (Quick edit: 17 was using Python 3.6, so it might be that the code used there is not up to date to support some of the features available in 3.9 - a version that Resolve 18 supports, and we use for the tool - but, again, this might not be a problem)

From a broader perspective, you don't need to have Resolve installed on your machine, so if you simply need transcripts, translations to English, SRT subtitles to import into Resolve or other NLEs, or even perform advanced searches on transcripts (or your own existing subtitles), you can just start it up without Resolve and use your own audio to transcribe etc. This is also useful I think for situations where you need an assistant, a producer etc. to just review and group stuff for you on the transcripts, while not having Resolve on their machines.

The thing that the integration with Resolve does is that it opens a new way of navigating timelines and finding spoken text within timelines in Resolve and that really speeds things up for us in the edit. And some fun features like copying markers between timelines and clips, or rendering out stills from markers etc.
Trying to keep it together at mots.us
Taming AI for filmmaking at StoryToolkit.ai
Offline
User avatar

Robert Niessner

  • Posts: 5022
  • Joined: Thu Feb 21, 2013 9:51 am
  • Location: Graz, Austria

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostSat Nov 19, 2022 4:55 pm

Very cool, thanks for this detailed answer. Already thought that the Python version might be the only difference. Hopefully all the Python versions I already installed to play with Stable Diffusion and other AI software won't meddle with this.

I'll test your tool with Resolve 17 as soon as possible and will give feedback of any issues I might encounter with the older Python version of Resolve 17.
Saying "Thx for help!" is not a crime.
--------------------------------
Robert Niessner
LAUFBILDkommission
Graz / Austria
--------------------------------
Blackmagic Camera Blog (German):
http://laufbildkommission.wordpress.com

Read the blog in English via Google Translate:
http://tinyurl.com/pjf6a3m
Offline

rnbaker

  • Posts: 60
  • Joined: Thu Aug 22, 2019 10:40 am
  • Real Name: Ray Baker

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostSat Nov 19, 2022 6:44 pm

Octavian,
I just want to update you about StoryToolkitAI going out of sync and that I tried a totally different project with the large model only transcribing English-to-English on a totally different machine (I spun up a Paperspace machine with an Nvidia A4000) using 0.17.1. And, I got the same results as using the medium English-only model with 0.16.16 on the other project that I told you about before and that is, it would stay in sync for a while and then go out of sync (+/-.25 to 2.5 seconds) and sync again and out of sync and on and on. I also tried the project I talked about before (but this time also with 0.17.1 and the large model, and it was also only English-to-English transcribing) and got the same results as before sync, out of sync, and on and on.

These are longer projects (1+ hours) so maybe it is related to that but at the same time, it happens not only at the end and middle of the project but also at the beginning (first few minutes). Also, @CougarJoe experienced it with a shorter video but solved it by using translate + transcribe and the large model. I thought maybe the problem would be solved by just using the large model but unfortunately, obviously it didn't. However, I will try it again with translate + transcribe (even though it is just English-to-English) on 0.17.5 and report back. Anyway, could you maybe add this as a known issue as it seems also @studio1492 experienced this and I assume others have experienced this also?

By the way, I didn't install Davinci Resolve on the Paperspace machine and just did a manual transcribe (of .wav files that I created in DR on my personal computer) and it would give errors after every transcribing segment that it couldn't find DR's API or something like that (sorry it was very late so I didn't pay close attention). I guess a suggestion would be to maybe add something that stops this error after the first 2-3 times that StoryToolkitAI encounters this but obviously, please ignore this suggestion if it's my fault for not installing DR.

Finally, on that Paperspace machine, I noticed that StorytoolkitAI transcription only used 7-10% maximum of the A4000 and also, only a few percent of the CPU. I don't know if ST is restricted by OpenAI Whisper and can't scale and better utilize the system hardware but if ST can, maybe add that to the list of things to look at for the future including support for multiple GPU setups.

Sorry for not opening a GitHub account yet but I will post my next update over there and here. Also, I hope I don't come off as complaining or difficult because this is awesome and I just want it to improve, and if I had any Python knowledge and time, I would definitely volunteer but hopefully in the future, I will!

Thanks for everything!
Offline

Octavian Mot

  • Posts: 286
  • Joined: Mon Aug 25, 2014 2:42 pm
  • Location: Germany

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostSun Nov 20, 2022 2:29 pm

@rnbaker

Thanks for taking the time to test and write down your feedback!

About transcription sync
I will write it as a known issue soon. To add to what I was saying earlier on the issue: if we don't find a way to prompt whisper correctly (as we do for dialogue and punctuation - see initial prompt in README), this can only be solved with an additional AI model that aligns the text after Whisper did the transcription. It's a longer and a bit more technical conversation to have, so that's why I suggest to move it on the GitHub page.

For our editing room, the current priority is to have correct transcriptions at approximately the right times, since we're doing a log of semantic search at the moment but also planning to add more AI functionality on that side (to help us find content more efficiently inside our footage). I realize that this tool is super helpful for a transcribe->translate->subtitles workflow too for some folks, so getting transcriptions times at frame levels will be fixed in the future. In other words: that's not the hard part for AI, we just need time to code it...

GPU usage etc.
This is another conversation that might turn technical very fast. The short version: yes, optimizations are needed and will be done and more benchmarks on different GPUs and machines like the one you were mentioning are much needed, so thanks again!
Trying to keep it together at mots.us
Taming AI for filmmaking at StoryToolkit.ai
Offline

Videoneth

  • Posts: 1698
  • Joined: Fri Nov 13, 2020 11:03 pm
  • Warnings: 1
  • Real Name: Maxwell Allington

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostMon Nov 21, 2022 12:21 am

I just did a git pull
But it's the first time I opened it with Resolve 18.1 so I don't know if it's because of Resolve or something else

:o
Attachments
bug.jpg
bug.jpg (43.65 KiB) Viewed 7033 times
Windows 10
19b
nVidia 3090 - 552.22
Offline

Octavian Mot

  • Posts: 286
  • Joined: Mon Aug 25, 2014 2:42 pm
  • Location: Germany

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostMon Nov 21, 2022 6:27 am

Others are reporting this for the non-stadalone version, but we can't reproduce the issue on our machines. What screen size and resolution are you using?

Although we could pretend it's a design choice and leave it like that :D :D

Update: I pushed an update which hopefully fixed the problem. @Videoneth try a git pull and if it didn't help please DM me here on the forum or open an issue on Github.
Trying to keep it together at mots.us
Taming AI for filmmaking at StoryToolkit.ai
Offline

Videoneth

  • Posts: 1698
  • Joined: Fri Nov 13, 2020 11:03 pm
  • Warnings: 1
  • Real Name: Maxwell Allington

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostThu Nov 24, 2022 7:20 pm

Octavian Mot wrote:Others are reporting this for the non-stadalone version, but we can't reproduce the issue on our machines. What screen size and resolution are you using?

Although we could pretend it's a design choice and leave it like that :D :D

Update: I pushed an update which hopefully fixed the problem. @Videoneth try a git pull and if it didn't help please DM me here on the forum or open an issue on Github.


I'm on Windows, scaling at 125%, and Resolve 18.1.1. I just did I git pull.
Maybe it is a problem related to how resolve 18.1.1 handle resolution now :?

EDIT, just saw your last paragraph, gonna git pull now, but it says it's "Already up to date"...

it seems to works now, cool! Gonna use it now for a project, thanks again for your tool!
Attachments
screenshot.jpg
screenshot.jpg (40.92 KiB) Viewed 6841 times
Last edited by Videoneth on Thu Nov 24, 2022 7:29 pm, edited 1 time in total.
Windows 10
19b
nVidia 3090 - 552.22
Offline

Videoneth

  • Posts: 1698
  • Joined: Fri Nov 13, 2020 11:03 pm
  • Warnings: 1
  • Real Name: Maxwell Allington

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostThu Nov 24, 2022 7:28 pm

Btw, I'm curious, would it be possible to use customtkinter, so it could inherit the "theme" of the os? and match the dark tone of Resolve

I was watching a tutorial on it :
Windows 10
19b
nVidia 3090 - 552.22
Offline

Octavian Mot

  • Posts: 286
  • Joined: Mon Aug 25, 2014 2:42 pm
  • Location: Germany

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostThu Nov 24, 2022 8:17 pm

Videoneth wrote:Btw, I'm curious, would it be possible to use customtkinter, so it could inherit the "theme" of the os? and match the dark tone of Resolve


Sure, but isn't the wonderful style of the 2000s coming back soon? I'd hate to change the GUI theme to match 2030, and then find out that we have to go back to a Windows 98 SE look. :lol: :D

On a serious note, around 50% of the code that I wrote is algorithm that connects all these wonderful AI models with our editing needs, but the other 50% is basically GUI and interaction, so a major change could mean rewriting a lot of the code. So I think it's prudent to first focus on polishing the main features that make our editing easier, add a bit more AI magic (better search, integration with even more advanced AI, footage ingesting for labelling and classification etc.), and only then focus on the design. Unless we find help... :)

Having said that, changing the color of the theme and maybe matching the buttons according to your Windows theme could be trivial (this is already the case on MacOS). Feel free to open this feature request / issue on Github and maybe we can work something out.
Trying to keep it together at mots.us
Taming AI for filmmaking at StoryToolkit.ai
Offline

David Johns

  • Posts: 12
  • Joined: Fri Jan 25, 2019 8:31 pm
  • Location: Birmingham, England
  • Real Name: David Johns

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostWed Dec 07, 2022 2:43 pm

I don't really understand GitHub or coding or anything but just wanted to post my grateful thanks for this amazing tool, which I just tried for the first time. On a ten-minute video, it got just one word wrong! And it left out loads of ums and other wasteful stuff which YouTube's auto captions leaves in. THANK YOU!!!!

David
Resolve Studio 18.6 on a Macbook Air M1 with 16GB RAM
Offline

Ellory Yu

  • Posts: 4011
  • Joined: Wed Jul 30, 2014 5:25 pm

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostThu Dec 08, 2022 12:12 pm

Is there support coming for OpenCL gpu?
URSA Mini Pro 4.6K G2, Blackmagic Design Pocket Cinema Camera 6K, Panasonic GH5
PC Workstation Core I7 64Gb, 2 x AMD R9 390X 8Gb, Blackmagic Design DeckLink 4K Mini Monitor, Windows 10 Pro 64-bit, Resolve Studio 18, BM Micro Panel & Speed Editor
Offline

benoit

  • Posts: 41
  • Joined: Sat Sep 22, 2018 10:05 am
  • Location: Bretagne
  • Real Name: benoit evano

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostTue Dec 13, 2022 12:27 pm

Works very well for french transcription.
Super... thank you.
intel 6 core i7 5820k 3.3Ghz
34Gb ram
1x GTX 1070 ti
intensity card
Window10 - Resolve Studio 18.6.2
Offline

Cindrivani

  • Posts: 53
  • Joined: Fri Dec 03, 2021 7:50 pm
  • Location: Paris, France
  • Real Name: Fontaine David

Re: Free Transcriptions in Resolve using OpenAI Whisper

PostThu Dec 15, 2022 11:05 am

benoit wrote:Works very well for french transcription.
Super... thank you.


I agree, french transcription is much better than PP's extraction, but slower (one thing certainly explained the other).
Words are 95% well chosen and well written, and Cut-point are 99% well done.

Thank you so much ! 8-)
DVR Studio 18.6.4 | MacOs 12.7.1 | MacPro 5.1 dual X5680 DDR3 96 Go | RadeonVII
Scratchdisk Sonnet M.2 4x4 2To
SpeedEditor | MicroPanel | DeckLink MiniMonitor 4K
PreviousNext

Return to DaVinci Resolve

Who is online

Users browsing this forum: Animotion, Bing [Bot] and 239 guests