Blackmagic Forum

Thu Nov 10, 2022 10:48 am

Uli Plank wrote:OMG, AI poetry from silence!

The hallucinations seem to be more pronounced in other languages besides English.

But, as I was explaining in the README:

Code: Select all: there is a human-like AI trapped in your machine doing your job for you on a mechanical typewriter with missing keys... It has the right to have day-dreams too.

If it's hallucinating over non-silent segments of the audio, the solution is to select those segments using V and then re-transcribe only those time intervals by pressing T.

Thu Nov 10, 2022 10:56 am

By the way, the Windows standalone version for machines with CUDA GPUs is up on the release page https://github.com/octimot/StoryToolkit ... ses/latest.

It's in a super alpha version, but it should work in most cases if you follow the installation instructions.

Feel free to try it out and let us know if something's weird.

Thu Nov 10, 2022 2:55 pm

Uli Plank wrote:OMG, AI poetry from silence!

a i cummings.

Thu Nov 10, 2022 4:05 pm

Octavian,
Thanks for the incredible tool. This blows away the auto-transcription built into Premiere Pro and everything else I have tried. I just installed 0.17.1 for Windows and it worked great but I just want to share some warning/error messages with you...

1) "WARNING: FFMPEG_BINARY env variable is empty. Looking for ffmpeg in PATH."
2) "Unable to find module DaVinciResolveScript from $PYTHONPATH - trying default locations"
3) "librosa\util\decorators.py:88: UserWarning: PySoundFile failed. Trying audioread instead.
return f(*args, **kwargs)"

I didn't get any messages like this with 0.16.18 which I successfully installed and used yesterday. Also, I have FFMPEG installed via Chocolatey and even forced reinstalled it today after I saw the error message and obviously, I still get the error message after the force reinstall.

All of that said, the transcription worked flawlessly with 0.17.1 but I thought I would share those messages anyway just in case they might help you with StoryToolkitAI development and optimization.

Thanks again and keep up the great work!

(By the way, sorry, I don't have a GitHub account yet but I will soon.)

Thu Nov 10, 2022 4:41 pm

Robert Arnold wrote:It *does* in fact hallucinate when it encounters silence for a long time. I accidentally gave it a timeline that was missing audio, and it came up with this:

-------------

I have attached my sunflower seeds to possession.
So what do you say?
I'm fine, thank you.
Sure, I have a sunflower seed.
Let's go out to see it.

--------------

haha, I did the exact same thing. I just grabbed a random 1-minute clip to test it on, and after I ran it, I found out it just had some instrumentals on it. This is what I got...

[00:00.000 --> 00:05.880] Chart, Please eat
[01:00.000 --> 01:14.120] Thank you.

So, Mr.(?) Chart, waited almost a minute to respond and then took almost 15 seconds to start and finish, "thank you" or I guess...
"ttttttttttttthhhhhhhhhhhhaaaaaaaaaaaaaaaaannnnnnnkkkkkkkk
yyyyyyooooooooooooooooooooooooouuuuuuu"

lol

Thu Nov 10, 2022 4:49 pm

@rnbaker

Thanks for the feedback! I'm glad that it works for you!

None of the warnings are anything to worry about.

1) "WARNING: FFMPEG_BINARY env variable is empty. Looking for ffmpeg in PATH."

This means that it didn't find it in the environment variable, but it's looking in PATH. Since there's no warning/error after that it means that FFMPEG was found.

"Unable to find module DaVinciResolveScript from $PYTHONPATH - trying default locations"

Same logic as above, this is the Resolve API throwing a warning. But as long as Resolve is connected with the app, it means that it found its way later

3) "librosa\util\decorators.py:88: UserWarning: PySoundFile failed. Trying audioread instead.
return f(*args, **kwargs)"

This says that it's using audioread instead of PySoundFile, but it's more or less the same thing for what we need.

@everyone

I'm curious if folks are using the Advanced Search feature.

This is the first step towards really making use of AI to find concepts in the transcribed footage, and we're getting more and more used to it in our editing room. We're going to incorporate more AI models/features soon, hopefully for video footage too.

Thu Nov 10, 2022 4:54 pm

rnbaker wrote:
Robert Arnold wrote:It *does* in fact hallucinate when it encounters silence for a long time. I accidentally gave it a timeline that was missing audio, and it came up with this:

-------------

I have attached my sunflower seeds to possession.
So what do you say?
I'm fine, thank you.
Sure, I have a sunflower seed.
Let's go out to see it.

--------------

haha, I did the exact same thing. I just grabbed a random 1-minute clip to test it on, and after I ran it, I found out it just had some instrumentals on it. This is what I got...

[00:00.000 --> 00:05.880] Chart, Please eat
[01:00.000 --> 01:14.120] Thank you.

So, Mr.(?) Chart, waited almost a minute to respond and then took almost 15 seconds to start and finish, "thank you" or I guess...
"ttttttttttttthhhhhhhhhhhhaaaaaaaaaaaaaaaaannnnnnnkkkkkkkk
yyyyyyooooooooooooooooooooooooouuuuuuu"

lol

I'm seriously considering we should post these somewhere on the GitHub page....

Thu Nov 10, 2022 4:55 pm

Octavian Mot wrote:@rnbaker

Thanks for the feedback! I'm glad that it works for you!

None of the warnings are anything to worry about.

No problem and thanks for the info!

Octavian Mot wrote: I'm seriously considering we should post these somewhere on the GitHub page....

lol, yep, you should!

Thu Nov 10, 2022 6:29 pm

Octavian Mot wrote:
Uli Plank wrote:OMG, AI poetry from silence!

The hallucinations seem to be more pronounced in other languages besides English.

But, as I was explaining in the README:
Code: Select all
there is a human-like AI trapped in your machine doing your job for you on a mechanical typewriter with missing keys... It has the right to have day-dreams too.

If it's hallucinating over non-silent segments of the audio, the solution is to select those segments using V and then re-transcribe only those time intervals by pressing T.

Even if the actual software weren't like some sort of miracle, just the READ ME is worth the download!

BTW, I have an RX 6900XT running in my Hackintosh, which uses metal. Any chance of GPU utilization in future versions?

Thu Nov 10, 2022 7:38 pm

Attempting to get this working in Windows, no luck so far.

So I've installed StoryToolkitAI.0.17.1.WIN.exe and it runs. It finds FFmpeg, then displays
"Unable to find module DaVinciResolveScript from $PYTHONPATH - trying default locations".

I've ensured that Resolve is set to local scripting.

I run Resolve, load up a project, then run StoryToolkitAI.

I don't see any "Transcribe Timeline" button.

After about a minute, the StoryToolkitAI window disappears and nothing happens.

What am I missing??

Edited: I uninstalled a version of Python that I had installed last week while trying to get this working before. Now it runs.

Thu Nov 10, 2022 7:50 pm

I found that the tool is a bit slower than PP’s however the actual transcription is a lot more accurate. Love the integration and the fact that I can export the subtitles right onto the timeline

Thu Nov 10, 2022 8:47 pm

CodeTech wrote:Edited: I uninstalled a version of Python that I had installed last week while trying to get this working before. Now it runs.

I'm glad that it worked out! From all the reports I see, it seems that in most cases, the tool can't connect to the Resolve API on Windows due to a messed up Python environment / installation.

Thu Nov 10, 2022 10:13 pm

Now that I've played around with this for a while, I'm very impressed.
It did German->English perfectly.
Some of my own raw videos confused it slightly, but for the most part even quiet speech was correctly picked up and transcribed well.

This is an incredibly useful tool!

Fri Nov 11, 2022 3:41 pm

I've just stumbled upon this project while researching solutions to another related issue. I'm curious, are there plans to integrate a text to speech engine so as to provide a means to generate alternate language audio for projects? That would be useful for my training video projects that am working on. BTW awesome work.

Sat Nov 12, 2022 12:17 am

I have a friend that codes Python for a living. I've asked him to help but he's very busy and may not have time for awhile to set this up for me on my machine. I have no idea with this is anyone able to give me a simple dot point explanation on how to set up on my iMac please?

Sat Nov 12, 2022 5:17 am

ColinMcT wrote: is anyone able to give me a simple dot point explanation on how to set up on my iMac please?

Sure, download then see the instructions here:
https://github.com/octimot/StoryToolkitAI/releases.

They should be straightforward, but if you run into something we'll try to help.

Sat Nov 12, 2022 8:30 am

Is there a way in Davinci to write out subtitles one word at the time like on YouTube? I'm sure it can be done, but not with a set format. It slip my mind the name of the subtitle format that store the time for each word. But I would love to get it to work in DVR.

Sat Nov 12, 2022 9:06 am

Joelarvidsson wrote:Is there a way in Davinci to write out subtitles one word at the time like on YouTube?

I don't think toggling each word individually is possible using subtitles in Resolve. Maybe with a Text+ via Fusion Comp.

Also, the AI used for StoryToolkitAI will not give you word-level timings in your transcript, but only start and end times for each phrase, because it's focused more on the meaning and context of what is being said instead of reproducing each word individually - that's one of the reasons why the results are significantly better than other speech-to-text models. However, if that's something interesting to folks, we could consider an update which aligns the transcript to audio at the word level...

Sat Nov 12, 2022 10:35 am

I get "Userwarning: PySoundFile failed. Trying audioread instead"

It's working fine but what's the negative to not using PySoundFile (whatever that is) You said it's much the same thing but I saw when PySoundFile tried to run it used all my Vram, but audioread doesn't. Normally with AI software the most VRAM you can use is fastest/best
I'm using Windows executable version

Thanks Very nice app!

Sat Nov 12, 2022 11:10 am

Octavian Mot wrote:
Joelarvidsson wrote:Is there a way in Davinci to write out subtitles one word at the time like on YouTube?

I don't think toggling each word individually is possible using subtitles in Resolve. Maybe with a Text+ via Fusion Comp.

Also, the AI used for StoryToolkitAI will not give you word-level timings in your transcript, but only start and end times for each phrase, because it's focused more on the meaning and context of what is being said instead of reproducing each word individually - that's one of the reasons why the results are significantly better than other speech-to-text models. However, if that's something interesting to folks, we could consider an update which aligns the transcript to audio at the word level...

but it could : https://github.com/jianfch/stable-ts

This script modifies methods of Whisper's model to gain access to the predicted timestamp tokens of each word (token) without needing additional inference. It also stabilizes the timestamps down to the word (token) level to ensure chronology.

It works well. Of course, it has not the same use as a regular transcript. I used it myself so I don't have to break down sentences to words for animation.

Sat Nov 12, 2022 11:13 am

CougerJoe wrote:I get "Userwarning: PySoundFile failed. Trying audioread instead"

It's working fine but what's the negative to not using PySoundFile (whatever that is) You said it's much the same thing but I saw when PySoundFile tried to run it used all my Vram, but audioread doesn't. Normally with AI software the most VRAM you can use is fastest/best
I'm using Windows executable version

This is probably happening because Resolve's "Audio Only" render preset is actually rendering a Quicktime MOV file instead of a a wav file. So, the tool is re-converting that to Linear PCM, and only after that, the file is being passed to AI. Unfortunately, I couldn't find a way to select Wave, Linear PCM using the API.

In our studio, we use a "transcription_WAV" render preset: go to the Resolve Render Page, select the Audio Only preset, make sure that the "Export Video" in the Video tab is disabled, then, in the "Audio" tab, select the "Wave" format and "Linear PCM" as codec. Then save this preset as "transcription_WAV", and the next time you transcribe, you should see Resolve rendering .wav files for transcriptions instead of .mov, and the process will use a bit less resources and take less time.

I'll write this up in the README with the new update. Thanks for the feedback!

BTW: I'm grateful to Blackmagic for even providing the API in the first place, but we're definitely pushing the boundaries on what can be done with it. Finding workarounds is really an adventure for most features that involve Resolve in the tool.

Sat Nov 12, 2022 11:25 am

Videoneth wrote:Is there a way in Davinci to write out subtitles one word at the time like on YouTube?
but it could : https://github.com/jianfch/stable-ts

I'd love to make that available, but unfortunately the results are not good in that example. The algorithm has the tendency to round the seconds for each word (you can see that on the link you sent) and we're losing a lot of precision and most likely context like that. Feel free to open an issue / feature request on the GitHub page and we can decompose the problem in a more technical manner. :-D

We see this tool evolving into something like an AI Assistant Editor which is able to help you find relevant content in your footage, rather than just a transcription tool. So, the transcriptions are just a means to an end and, as long as they're precise, word timings are not really necessary for the bigger picture in my opinion.

Sat Nov 12, 2022 12:07 pm

Octavian Mot wrote:
In our studio, we use a "transcription_WAV" render preset: go to the Resolve Render Page, select the Audio Only preset, make sure that the "Export Video" in the Video tab is disabled, then, in the "Audio" tab, select the "Wave" format and "Linear PCM" as codec. Then save this preset as "transcription_WAV", and the next time you transcribe, you should see Resolve rendering .wav files for transcriptions instead of .mov, and the process will use a bit less resources and take less time.

I'll write this up in the README with the new update. Thanks for the feedback!

That worked perfectly thankyou!
Although it appeared to be using more GPU and was using more Vram it was actually 8% slower doing the same transcribe as before. The time I refer to is the one that shows here
INFO: Finished transcription for Timeline 1 in XX seconds

Single word subtitles are popular on social media, this sort of thing
https://www.youtube.com/shorts/dZklZVaU4AI
It would be a bonus if we could do that easily. I will need to look into TEXT+ , see if there is a method to quickly convert subs to TEXT+

Sat Nov 12, 2022 12:35 pm

CougerJoe wrote:I will need to look into TEXT+ , see if there is a method to quickly convert subs to TEXT+

There's already a similar request here: https://github.com/octimot/StoryToolkitAI/issues/14

If you find out something useful, let us know!

Sat Nov 12, 2022 6:38 pm

CougerJoe wrote:It would be a bonus if we could do that easily. I will need to look into TEXT+ , see if there is a method to quickly convert subs to TEXT+

Someone recently started a thread on reddit about this. There's not much there though except maybe https://resolver.tools/subsimple/ (it is only a workaround at best obviously but maybe it will help a bit)

https://www.reddit.com/r/davinciresolve ... _keyframe/

And, please let us know if you find something better or if you start a thread on here (BMD's forums).

Sun Nov 13, 2022 2:02 am

rnbaker wrote:
CougerJoe wrote:It would be a bonus if we could do that easily. I will need to look into TEXT+ , see if there is a method to quickly convert subs to TEXT+

Someone recently started a thread on reddit about this. There's not much there though except maybe https://resolver.tools/subsimple/ (it is only a workaround at best obviously but maybe it will help a bit)

https://www.reddit.com/r/davinciresolve ... _keyframe/

And, please let us know if you find something better or if you start a thread on here (BMD's forums).

I shall. Looks like that workaround involves a $300 tool

@Octavian Mot Have a look at this video, it loses sync around 1:50, regains sync around 3:15 , do you understand the cause?
https://streamable.com/thcieo

Sun Nov 13, 2022 2:15 am

CougerJoe wrote:
I shall. Looks like that workaround involves a $300 tool

@Octavian Mot Have a look at this video, it loses sync around 1:50, regains sync around 3:15 , do you understand the cause?
https://streamable.com/thcieo

Wow, that's too bad about the workaround. I was wondering about the subtitles losing sync too because I had that happen a bit with StoryToolkitAI but I just manually adjusted the subtitles to put them back in sync. I was just hoping it would get fixed in a future release and really I should have said something too but I am certainly glad you did! However, also, I think my project is 23.976 fps and I am thinking that might be it because he had said something about DR API recognizing it at 23 fps at times (or something like that) and causing issues for StoryToolkitAI.

Sun Nov 13, 2022 3:25 am

rnbaker wrote: I was wondering about the subtitles losing sync too because I had that happen a bit with StoryToolkitAI but I just manually adjusted the subtitles to put them back in sync. I was just hoping it would get fixed in a future release and really I should have said something too but I am certainly glad you did! However, also, I think my project is 23.976 fps and I am thinking that might be it because he had said something about DR API recognizing it at 23 fps at times (or something like that) and causing issues for StoryToolkitAI.

I tried the same video using Translation instead of transcribe, and Large models instead of Medium, it synced subs perfectly throughout the whole video. Unsure if either of those options helped or if desync is some transient bug

(This is the video I was testing

)

Sun Nov 13, 2022 1:43 pm

CougerJoe wrote:I tried the same video using Translation instead of transcribe, and Large models instead of Medium, it synced subs perfectly throughout the whole video. Unsure if either of those options helped or if desync is some transient bug

@Octavian for you too obviously...

That is interesting and I was thinking about it more, and obviously while it desync'ed on my 23.976fps timeline/project, the issue with the 23.976fps is with the transcript sync highlighting when you have the timeline open in DR, but the subtitle creation is based on audio files and not video files so fps shouldn't matter.

Also, I was using the medium English-only model and normal transcribing mode when StoryToolkitAI desync'ed, and from what I saw in your test video, it seems very close to what you experienced. I would say the desync'ing I experienced was always within +/- .25 to 2.5 seconds and would return to being in sync once again and then out of sync and then back into sync again, and on and on.

Sun Nov 13, 2022 6:04 pm

Thanks for the feedback! I really appreciate it!

There's a bunch of info to unpack, and I need to look over all your comments with the tool in front of me. It would be great if you guys could use the issues tab on the Github page since we can deal with them in order and benefit from the community over there. Some, have been debated there already I think.

Word based animation etc.
Regarding the text+ feature, I see it more like a VFX upgrade / nice to have, rather than pure editing, so my instinct tells me that we should focus on editing features first, unless we find help. A big next step would be to push the AI search features as much as possible, since we're also directly benefiting from them on our current projects - I'd love to explain that more some time, btw.

About losing sync
The large model is better than the medium model (especially on non-english languages), but also has its biases here and there.

Another thing to mention is that the 23.976fps timelines are problematic for the Resolve API since we're getting either a 23fps or 24fps rounded integer from Resolve, instead of the correct float - see known issues on the Github page. I think I've reported this months ago on the forum here too and others have confirmed it.

But, this might also just be the AI getting lazy every now and then and just acting like a child :-)

Currently, you can re-align phrases using shortcuts directly from the app and I think we could automate the re-alignment with another AI model soon - again, for feedback if this would be useful, it would be amazing to debate it on Github.

Another thing that helps in our editing room, is to select the segments that are off with V, and simply re-transcribe using key T - I would avoid however re-transcribing segments that are less than 20-30 sec long with anything less than the large model because the AI will be missing a lot of context to get to better results than in the first pass (I might be wrong though)

Mon Nov 14, 2022 3:00 pm

+1 Just subscribed to this thread.

I've tested this utility on macOS and it works great. Sync is far to be perfect, as well as translations, the install process is not a road of roses, but once it runs, it is fully worth as it saves 95% of work against hand made captions. I hope BMD adquieres this utility to be included in future versions of DVR Studio, while the author gets compensated for such incredible work. Thanks

Mon Nov 14, 2022 8:41 pm

Just installed and started using this on Win10 and it is amazing. Works really well. Although in its infancy the developer is really responsive to questions and requests if you message him on the Github.

Tue Nov 15, 2022 9:59 am

I've tried to use it and it works really well! 50-minutes interview was transcribed from russian only in 5 minutes on RTX 2060 6GB card, and it's faster and even more accurate than adobe sensey algorithm. Couple of questions:
1) Can you add support to translate on different languages, that are supported by whisper, not only english? I guess it can be done the same way as you choose language of transcription - and than you can choose the translation language of your transcription. It can be very handy to add multiple subtitles for youtube content, because the accuracy of whisper is far better, than google automatic translate

2) Can you add support for export as a .txt file with timestamps? It's very useful when you are working with a journalist, because they can faster orient in video eith text

Also i'm waiting for speaker recognition feature, it would be very handy for the job I described in 2 question.

By the way, you've done the great jobe and I really appreciate it. Speech transcription is the last thing why I still have to use Premiere Pro sometimes, and now the time of not using it at all comes closer as never. I hope you will release a nice app soon and the installation process will be easier and won't require command line for operating with this instrument

Tue Nov 15, 2022 3:38 pm

Vadim Tyupalov wrote:1) Can you add support to translate on different languages, that are supported by whisper, not only english? I guess it can be done the same way as you choose language of transcription - and than you can choose the translation language of your transcription. It can be very handy to add multiple subtitles for youtube content, because the accuracy of whisper is far better, than google automatic translate

Unfortunately, translations to other languages besides English cannot be done with the current Whisper models, since its only been trained to translate to English. Maybe someone wants to take the challenge and train some other language models. We could discuss about it on the Github page.

Vadim Tyupalov wrote:2) Can you add support for export as a .txt file with timestamps? It's very useful when you are working with a journalist, because they can faster orient in video eith text

This was just added on the non-standalone version (see discussion here: https://github.com/octimot/StoryToolkit ... 1315419549) and will soon be available on the standalone release.

studio1492 wrote:Sync is far to be perfect, as well as translations

It would be super helpful to give more details on the issues page over on Github, since a lot of the results can be improved via transcription settings or simply by using a larger model. Things like source language, audio length etc. would be good to know.

studio1492 wrote:the install process is not a road of roses

This will be simplified even more soon!

Tue Nov 15, 2022 6:53 pm

Thank you, Octavian!

This is pretty amazing. As others have said, it's way ahead of the auto transcription tool in Premiere Pro - mainly in terms of how accurate it is, but also, despite not being integrated into Resolve, it's actually more straightforward and efficient to use once it is up and running.

I tested it on dialogue recorded by children speaking a regional accent (of English), which Premiere consistently fails with. It was completely accurate, even when one speaker came up with a spoonerism; the AI put the right consonants back in the right places! Wow. It also knew to capitalise proper nouns that weren't necessarily obvious from the context.

I also tested it with poorly recorded dialogue of elderly Urdu speakers and asked it to transcribe and translate into English. Again, flawless.

I will be using this a lot and look forward to future refinements.

Thanks again - I don't usually make the effort to write feedback on these things, but this has blown me away.

Sat Nov 19, 2022 7:40 am

Thank you for taking the time to write down your feedback!

I just uploaded a new standalone app version which took into consideration some of your ideas: https://github.com/octimot/StoryToolkitAI/releases

Besides other things, we've been playing in our editing room with the new Transcript Groups feature that allows the user to select segments from the transcript and turn them into groups which may be used later for different operations. We use it a lot to group what people say by topics or even by speakers. Although it might not seem important, this is a prerequisite to start the work on auto speaker recognition, and it will also be used to filter out advanced search content (i.e. you could perform searches only on certain groups if you want to).

We're currently planning a feature that would allow you to search semantically within your own Resolve marker notes, and even to use Resolve markers to divide transcript sections into groups.

FYI, after doing more testing it seems that the complicated installation steps might not be needed for most users. I tried to explain more about it on the release page (see Installation section).

Sat Nov 19, 2022 12:07 pm

Octavian, very cool work and very generous to make this freely available, thank you!

A quick question, as I haven't found an answer to this:
Would this also run with Resolve 17 or is there something in version 18 only you need specifically for this to work?

Sat Nov 19, 2022 3:29 pm

Robert Niessner wrote:Would this also run with Resolve 17 or is there something in version 18 only you need specifically for this to work?

We've been using an iteration of this since Resolve 17, and only made small changes to the Resolve API communication module, so I don't see a reason for it not to work, unless some API functionality was dropped since - and in that case, only those features might not work. (Quick edit: 17 was using Python 3.6, so it might be that the code used there is not up to date to support some of the features available in 3.9 - a version that Resolve 18 supports, and we use for the tool - but, again, this might not be a problem)

From a broader perspective, you don't need to have Resolve installed on your machine, so if you simply need transcripts, translations to English, SRT subtitles to import into Resolve or other NLEs, or even perform advanced searches on transcripts (or your own existing subtitles), you can just start it up without Resolve and use your own audio to transcribe etc. This is also useful I think for situations where you need an assistant, a producer etc. to just review and group stuff for you on the transcripts, while not having Resolve on their machines.

The thing that the integration with Resolve does is that it opens a new way of navigating timelines and finding spoken text within timelines in Resolve and that really speeds things up for us in the edit. And some fun features like copying markers between timelines and clips, or rendering out stills from markers etc.

Sat Nov 19, 2022 4:55 pm

Very cool, thanks for this detailed answer. Already thought that the Python version might be the only difference. Hopefully all the Python versions I already installed to play with Stable Diffusion and other AI software won't meddle with this.

I'll test your tool with Resolve 17 as soon as possible and will give feedback of any issues I might encounter with the older Python version of Resolve 17.

Sat Nov 19, 2022 6:44 pm

Octavian,
I just want to update you about StoryToolkitAI going out of sync and that I tried a totally different project with the large model only transcribing English-to-English on a totally different machine (I spun up a Paperspace machine with an Nvidia A4000) using 0.17.1. And, I got the same results as using the medium English-only model with 0.16.16 on the other project that I told you about before and that is, it would stay in sync for a while and then go out of sync (+/-.25 to 2.5 seconds) and sync again and out of sync and on and on. I also tried the project I talked about before (but this time also with 0.17.1 and the large model, and it was also only English-to-English transcribing) and got the same results as before sync, out of sync, and on and on.

These are longer projects (1+ hours) so maybe it is related to that but at the same time, it happens not only at the end and middle of the project but also at the beginning (first few minutes). Also, @CougarJoe experienced it with a shorter video but solved it by using translate + transcribe and the large model. I thought maybe the problem would be solved by just using the large model but unfortunately, obviously it didn't. However, I will try it again with translate + transcribe (even though it is just English-to-English) on 0.17.5 and report back. Anyway, could you maybe add this as a known issue as it seems also @studio1492 experienced this and I assume others have experienced this also?

By the way, I didn't install Davinci Resolve on the Paperspace machine and just did a manual transcribe (of .wav files that I created in DR on my personal computer) and it would give errors after every transcribing segment that it couldn't find DR's API or something like that (sorry it was very late so I didn't pay close attention). I guess a suggestion would be to maybe add something that stops this error after the first 2-3 times that StoryToolkitAI encounters this but obviously, please ignore this suggestion if it's my fault for not installing DR.

Finally, on that Paperspace machine, I noticed that StorytoolkitAI transcription only used 7-10% maximum of the A4000 and also, only a few percent of the CPU. I don't know if ST is restricted by OpenAI Whisper and can't scale and better utilize the system hardware but if ST can, maybe add that to the list of things to look at for the future including support for multiple GPU setups.

Sorry for not opening a GitHub account yet but I will post my next update over there and here. Also, I hope I don't come off as complaining or difficult because this is awesome and I just want it to improve, and if I had any Python knowledge and time, I would definitely volunteer but hopefully in the future, I will!

Thanks for everything!

Sun Nov 20, 2022 2:29 pm

@rnbaker

Thanks for taking the time to test and write down your feedback!

About transcription sync
I will write it as a known issue soon. To add to what I was saying earlier on the issue: if we don't find a way to prompt whisper correctly (as we do for dialogue and punctuation - see initial prompt in README), this can only be solved with an additional AI model that aligns the text after Whisper did the transcription. It's a longer and a bit more technical conversation to have, so that's why I suggest to move it on the GitHub page.

For our editing room, the current priority is to have correct transcriptions at approximately the right times, since we're doing a log of semantic search at the moment but also planning to add more AI functionality on that side (to help us find content more efficiently inside our footage). I realize that this tool is super helpful for a transcribe->translate->subtitles workflow too for some folks, so getting transcriptions times at frame levels will be fixed in the future. In other words: that's not the hard part for AI, we just need time to code it...

GPU usage etc.
This is another conversation that might turn technical very fast. The short version: yes, optimizations are needed and will be done and more benchmarks on different GPUs and machines like the one you were mentioning are much needed, so thanks again!

Mon Nov 21, 2022 12:21 am

I just did a git pull
But it's the first time I opened it with Resolve 18.1 so I don't know if it's because of Resolve or something else

Mon Nov 21, 2022 6:27 am

Others are reporting this for the non-stadalone version, but we can't reproduce the issue on our machines. What screen size and resolution are you using?

Although we could pretend it's a design choice and leave it like that

Update: I pushed an update which hopefully fixed the problem. @Videoneth try a git pull and if it didn't help please DM me here on the forum or open an issue on Github.

Thu Nov 24, 2022 7:20 pm

Octavian Mot wrote:Others are reporting this for the non-stadalone version, but we can't reproduce the issue on our machines. What screen size and resolution are you using?

Although we could pretend it's a design choice and leave it like that

Update: I pushed an update which hopefully fixed the problem. @Videoneth try a git pull and if it didn't help please DM me here on the forum or open an issue on Github.

I'm on Windows, scaling at 125%, and Resolve 18.1.1. I just did I git pull.
Maybe it is a problem related to how resolve 18.1.1 handle resolution now

EDIT, just saw your last paragraph, gonna git pull now, but it says it's "Already up to date"...

it seems to works now, cool! Gonna use it now for a project, thanks again for your tool!

Thu Nov 24, 2022 7:28 pm

Btw, I'm curious, would it be possible to use customtkinter, so it could inherit the "theme" of the os? and match the dark tone of Resolve

I was watching a tutorial on it :

Thu Nov 24, 2022 8:17 pm

Videoneth wrote:Btw, I'm curious, would it be possible to use customtkinter, so it could inherit the "theme" of the os? and match the dark tone of Resolve

Sure, but isn't the wonderful style of the 2000s coming back soon? I'd hate to change the GUI theme to match 2030, and then find out that we have to go back to a Windows 98 SE look. :lol:

On a serious note, around 50% of the code that I wrote is algorithm that connects all these wonderful AI models with our editing needs, but the other 50% is basically GUI and interaction, so a major change could mean rewriting a lot of the code. So I think it's prudent to first focus on polishing the main features that make our editing easier, add a bit more AI magic (better search, integration with even more advanced AI, footage ingesting for labelling and classification etc.), and only then focus on the design. Unless we find help...

Having said that, changing the color of the theme and maybe matching the buttons according to your Windows theme could be trivial (this is already the case on MacOS). Feel free to open this feature request / issue on Github and maybe we can work something out.

Wed Dec 07, 2022 2:43 pm

I don't really understand GitHub or coding or anything but just wanted to post my grateful thanks for this amazing tool, which I just tried for the first time. On a ten-minute video, it got just one word wrong! And it left out loads of ums and other wasteful stuff which YouTube's auto captions leaves in. THANK YOU!!!!

David

Thu Dec 08, 2022 12:12 pm

Is there support coming for OpenCL gpu?

Tue Dec 13, 2022 12:27 pm

Works very well for french transcription.
Super... thank you.

Thu Dec 15, 2022 11:05 am

benoit wrote:Works very well for french transcription.
Super... thank you.

I agree, french transcription is much better than PP's extraction, but slower (one thing certainly explained the other).
Words are 95% well chosen and well written, and Cut-point are 99% well done.

Thank you so much ! 8-)

Who is online