Blackmagic Forum

Thu May 08, 2025 5:44 am

good morning as I reported a few days ago I had encountered a big problem with the 20 beta 2 after I had learned my voice and in the subsequent attempt to reuse it via the voice converter on another existing audio, as soon as I launched the command the program closed unexpectedly,
yesterday I upgraded to the new beta 3 release and the problem was solved, I would say that it works well, logically everything needs to be improved.
keep in mind that using audacity and a tonor 30 usb microphone I recorded some parts of a book for 10 minutes, then I imported the mp3 file into dvr20 beta 2 and activated the learning audio procedure in the faster mode
so I assume that using the slow one everything should improve.
a big thank you to the development team always excellent and very quick to fix bugs really thanks you are unbeatable

i am happy
by marco of milan

Thu May 08, 2025 6:08 am

levideo wrote: I recorded some parts of a book for 10 minutes, then I imported the mp3 file into dvr20 beta 2 and activated the learning audio procedure in the faster mode
so I assume that using the slow one everything should improve.

Would you happen to know how long the training of 10mins of audio took with your 12GB 4070 GPU on the faster mode?

Thu May 08, 2025 7:29 am

Hello, I need to tell you that I performed the operation in question and then left the PC for a period of time, if I remember correctly, of a couple of hours.

I didn't know that it's possible to monitor the remaining time through an icon at the bottom right.

I will try to redo the operation in the slower mode, therefore more accurate, while monitoring the remaining time indicated and the actual time of the operation, which I will then communicate to you.

I think that to maximize the use of this function, it is necessary to use only the longer procedure.

Additionally, it would be nice to have suggestions from the development team on how to best create the audio sample, to have guidelines on what to read and how to best refine everything.

For example, I would like to download video tutorials from YouTube in other languages, import them into DaVinci Resolve Studio 20 and have it transcribe with subtitles in the desired language, then study a mechanism to convert the subtitles into a voice in another language, with the possibility of being able to use the voice converter applying the desired voice.

What do you think?

Let me know, thank you very much, bye, Marco

Thu May 08, 2025 8:41 am

levideo wrote:
I think that to maximize the use of this function, it is necessary to use only the longer procedure.

Additionally, it would be nice to have suggestions from the development team on how to best create the audio sample, to have guidelines on what to read and how to best refine everything.

Yeah I agree, train with better mode unless it can be seen there is a purpose for faster. I've only tried it twice, My 5minute audio sample trained reasonably well on Better, but my 9m20s sample (a different voice) sounded terrible also trained on Better. Samples seemed like the same quality to me but neither were perfect studio recordings. I will stick to a single voice and keep increasing the sample duration and see if I notice it getting better.

Trained voices should be able to be used as text to speech, it's the logical progression, Beta3 is my first V20 I'm trying and I was surprised to see that the generated voice model can't be used for Text to speech, there is no text to speech of any sort.

I found this but don't know anything about it

Thu May 08, 2025 10:08 am

CougerJoe wrote:
levideo wrote:
I think that to maximize the use of this function, it is necessary to use only the longer procedure.

Additionally, it would be nice to have suggestions from the development team on how to best create the audio sample, to have guidelines on what to read and how to best refine everything.

Yeah I agree, train with better mode unless it can be seen there is a purpose for faster. I've only tried it twice, My 5minute audio sample trained reasonably well on Better, but my 9m20s sample (a different voice) sounded terrible also trained on Better. Samples seemed like the same quality to me but neither were perfect studio recordings. I will stick to a single voice and keep increasing the sample duration and see if I notice it getting better.

Trained voices should be able to be used as text to speech, it's the logical progression, Beta3 is my first V20 I'm trying and I was surprised to see that the generated voice model can't be used for Text to speech, there is no text to speech of any sort.

I found this but don't know anything about it

very very interesting, I saw that it costs really little as a plugin, do you think it can be trusted, will it work well with davinci resolve studio beta 3? I'll try to contact the creator and see what he tells me.
let me know thank you very much bye marco

Thu May 08, 2025 11:21 pm

Yeah that's the problem, not a lot of information, don't know if it's compatible with V20B3, no ability to trial it first

Fri May 09, 2025 6:02 am

CougerJoe wrote:Yeah that's the problem, not a lot of information, don't know if it's compatible with V20B3, no ability to trial it first

thank you for your reply, in fact if it only costs 3.5€ one-off without a monthly subscription, a really irrelevant price. for compatibility with the 20 I asked the designer, and I am waiting for his reply. I will keep you updated on everything.
Bye bye Marco

Fri May 09, 2025 8:59 am

levideo wrote:
thank you for your reply, in fact if it only costs 3.5€ one-off without a monthly subscription, a really irrelevant price. for compatibility with the 20 I asked the designer, and I am waiting for his reply. I will keep you updated on everything.
Bye bye Marco

It looks to interface subscription services, Azure $15 per million characters and MiniMax $100 per million for it's best quality. Azure price does seem very reasonable with 0.5 million characters free each month.

I'd be happier using my trained Resolve AI voices as TTS in Resolve, maybe this will happen soon. Also I trained a voice using 10 and 20m voice samples. In limited trials they were still quite similar in quality, and not very realistic, will try a 1 hour sample next. The training for the 20minute sample took 14 minutes and used between 17-18GB VRAM of an RTX4090. Going by tensor cores alone your 4070 would train a 20minute sample in about 45 minutes however in practice it doesn't look to work like that, an issue of VRAM I think.

Fri May 09, 2025 9:09 am

CougerJoe wrote:
levideo wrote:
thank you for your reply, in fact if it only costs 3.5€ one-off without a monthly subscription, a really irrelevant price. for compatibility with the 20 I asked the designer, and I am waiting for his reply. I will keep you updated on everything.
Bye bye Marco

It looks to interface subscription services, Azure $15 per million characters and MiniMax $100 per million for it's best quality. Azure price does seem very reasonable with 0.5 million characters free each month.

I'd be happier using my trained Resolve AI voices as TTS in Resolve, maybe this will happen soon. Also I trained a voice using 10 and 20m voice samples. In limited trials they were still quite similar in quality, and not very realistic, will try a 1 hour sample next. The training for the 20minute sample took 14 minutes and used between 17-18GB VRAM of an RTX4090. Going by tensor cores alone your 4070 would train a 20minute sample in about 45 minutes however in practice it doesn't look to work like that, an issue of VRAM I think.

hi friend thanks a lot for your reply
i share the reply of builder of plugin follow here:

@LevideoHD Yes, this plugin does not require a monthly subscription, as you can use your API keys to access various services. If you do not have an API key, we also offer some free Azure-based voice options. Additionally, the plugin is compatible with all versions of DaVinci Resolve.

Fri May 09, 2025 9:15 am

What further do you suggest I ask the plugin designer?
i share my configuration hw pc

https://ibb.co/4nqKxVCn

Fri May 09, 2025 9:21 am

Isn't that GPU error 702 caused by the RTX 3060 Ti graphics card which perhaps has lower hardware specifications than necessary?

:?:

Fri May 09, 2025 11:45 pm

levideo wrote:Isn't that GPU error 702 caused by the RTX 3060 Ti graphics card which perhaps has lower hardware specifications than necessary?

Most likely, looks like 8GB+ AI Voice convert, but maybe 12GB+ for training? Need more people with various VRAM sizes to try it out. At certain VRAM amounts training may not crash but if it takes 5+ hours to train a few minute sample that 1 person reported with results that weren't very good it may not seem worthwhile using.

As for that TTS Resolve plugin if you do buy and install it I'd be interested in your review in the main Resolve section, as far as complexity with installation, bugs and what needs improving, it's still in beta I saw.

Sat May 10, 2025 2:33 pm

Hi guys
Have any of you tried to clean the voice? How does DVR react to a recording that has been cleaned out with the Voice isolation?

Sun May 11, 2025 5:27 am

AndrewTheGreat wrote:Hi guys
Have any of you tried to clean the voice? How does DVR react to a recording that has been cleaned out with the Voice isolation?

The problem I find with Voice Isolation is that it's not as AI smart as it should be, whereas a human can track a voice correctly knowing some syllables produce less volume than others, and a person may raise and lower voice or move closer and further away from mic Voice Isolation doesn't understand that and clips out parts of the voice, so while 100% you may want to use, might have to settle for 30-50% but still can be parts of the voice missing.

After voice convert Ive had sections of words missing but I may not have had the Voice Isolation correctly set. I see Voice Isolation the same way as Magic Mask1, not as intelligent as it should have been, MM2 a nice upgrade but Voice Isolation is still the same, and still has the same limitations as the original Nvidia RTX Voice Noise cancellation.

Fri May 16, 2025 6:21 pm

CougerJoe wrote:
AndrewTheGreat wrote:Hi guys
Have any of you tried to clean the voice? How does DVR react to a recording that has been cleaned out with the Voice isolation?

The problem I find with Voice Isolation is that it's not as AI smart as it should be, whereas a human can track a voice correctly knowing some syllables produce less volume than others, and a person may raise and lower voice or move closer and further away from mic Voice Isolation doesn't understand that and clips out parts of the voice, so while 100% you may want to use, might have to settle for 30-50% but still can be parts of the voice missing.

After voice convert Ive had sections of words missing but I may not have had the Voice Isolation correctly set. I see Voice Isolation the same way as Magic Mask1, not as intelligent as it should have been, MM2 a nice upgrade but Voice Isolation is still the same, and still has the same limitations as the original Nvidia RTX Voice Noise cancellation.

hello thank you for sharing your experience.
in my case making him learn a sample of my voice long 26 minutes, I did not encounter any problems using my voice through the voice converter, no pronunciation defects as you mentioned, only that the voice is not truthful, we must admit that on this point we expect great improvements from the program.
we will see what the future holds.
a greeting to everyone bye marco

Sat May 17, 2025 12:03 pm

I'd also like to know how this AI feature works with non-English speech. Because once Adobe's speech enhance was introduced in Premiere Pro which is basically a revoicing model wrapped in a speech cleaning and denoising tool, it used to add a lot of English accent to non-English speech.

Sat May 17, 2025 3:16 pm

AndrewTheGreat wrote:I'd also like to know how this AI feature works with non-English speech. Because once Adobe's speech enhance was introduced in Premiere Pro which is basically a revoicing model wrapped in a speech cleaning and denoising tool, it used to add a lot of English accent to non-English speech.

hi i have to say that by selecting one of the 4 languages available in us language 2 male and 2 female when you apply the voice converter the audio converted to Italian does not suffer from particular inflections of the american english language. i don't know premiere but on this point i have to say that in davinci they are absolutely usable. you have to understand how to add other voices directly from davinci if not sample them. it would be desirable to have a greater number of native voices and usable directly

Sun May 18, 2025 9:24 am

levideo wrote:
AndrewTheGreat wrote:I'd also like to know how this AI feature works with non-English speech. Because once Adobe's speech enhance was introduced in Premiere Pro which is basically a revoicing model wrapped in a speech cleaning and denoising tool, it used to add a lot of English accent to non-English speech.

hi i have to say that by selecting one of the 4 languages available in us language 2 male and 2 female when you apply the voice converter the audio converted to Italian does not suffer from particular inflections of the american english language. i don't know premiere but on this point i have to say that in davinci they are absolutely usable. you have to understand how to add other voices directly from davinci if not sample them. it would be desirable to have a greater number of native voices and usable directly

Great news, thank you so much, marco

Mon May 19, 2025 10:12 am

AndrewTheGreat wrote:
levideo wrote:
AndrewTheGreat wrote:I'd also like to know how this AI feature works with non-English speech. Because once Adobe's speech enhance was introduced in Premiere Pro which is basically a revoicing model wrapped in a speech cleaning and denoising tool, it used to add a lot of English accent to non-English speech.

hi i have to say that by selecting one of the 4 languages available in us language 2 male and 2 female when you apply the voice converter the audio converted to Italian does not suffer from particular inflections of the american english language. i don't know premiere but on this point i have to say that in davinci they are absolutely usable. you have to understand how to add other voices directly from davinci if not sample them. it would be desirable to have a greater number of native voices and usable directly

Great news, thank you so much, marco

Hi friend , you're welcome, we hope that other people can add practical suggestions to this post on how to add new entries using script files etc etc.
bye Marco

Beta 3 resolve problem audio learning and voice converter.

Beta 3 resolve problem audio learning and voice converter.

Re: Beta 3 resolve problem audio learning and voice converte

Re: Beta 3 resolve problem audio learning and voice converte

Re: Beta 3 resolve problem audio learning and voice converte

Re: Beta 3 resolve problem audio learning and voice converte

Re: Beta 3 resolve problem audio learning and voice converte

Re: Beta 3 resolve problem audio learning and voice converte

Re: Beta 3 resolve problem audio learning and voice converte

Re: Beta 3 resolve problem audio learning and voice converte

Re: Beta 3 resolve problem audio learning and voice converte

Re: Beta 3 resolve problem audio learning and voice converte

Re: Beta 3 resolve problem audio learning and voice converte

Re: Beta 3 resolve problem audio learning and voice converte

Re: Beta 3 resolve problem audio learning and voice converte

Re: Beta 3 resolve problem audio learning and voice converte

Re: Beta 3 resolve problem audio learning and voice converte

Re: Beta 3 resolve problem audio learning and voice converte

Re: Beta 3 resolve problem audio learning and voice converte

Re: Beta 3 resolve problem audio learning and voice converte

Who is online