Blackmagic Forum

Sat Apr 05, 2025 6:31 pm

I've fed it a clip of mine with me talking for a few minutes. It launched VoiceTraining.exe in the background, and it's now been running for about 2 hours. There's no progress meter and Resolve simply said the process would be completed in the background.

I'm just curious if anyone else has run this and what their experience was?

According to task manager it's using 70% CPU and 5GB of ram, the GPU is taking zero hit. My system is a 9700x/9070/32GB DDR5 if it matters. It seems Blackmagic may have rushed this feature out since it lacks a basic progress meter.

I'll try doing this on my M4 Macbook Air to see how it fares there once this is finished to compare I guess.

*edit* It just finished, but the final result was horrible. The clip I sent it was recorded outside and it seems to be injecting wind noise into my voice... going to train one recorded indoors in a controlled environment.

Also, I just noticed there is a progress meter tucked away in the bottom right of DR's UI

Sat Apr 05, 2025 11:01 pm

The clip I sent it was recorded outside and it seems to be injecting wind noise into my voice

So it learned the sound of your voice quite well based on the training data. (hehe)

Maybe a little Vocal Isolation or Noise Reduction first.
I am interested in hearing more about this feature.

Sun Apr 06, 2025 1:59 am

jabooberman wrote:According to task manager it's using 70% CPU and 5GB of ram, the GPU is taking zero hit. My system is a 9700x/9070/32GB DDR5 if it matters. It seems Blackmagic may have rushed this feature out since it lacks a basic progress meter.

I think both Voice Isolation and Dialogue Leveler started out as CPU only and they were slow, but eventually were transferred over to GPU processing and no longer cause a obvious slow down with export or Transcript/Subtitle generation.

Sun Apr 06, 2025 9:39 am

Yes I'm finding this very very slow. I gave it a good 30 mins of training data (on fast mode) and the little progress bar at the bottom is implying it's going to take 20 hours. In the meantime my computer is very slow! M1 Max 32GB. I also notice you can switch between GPU and apple neural processing (or auto) but so far both seem the same grindingly slow. I'm guessing a lot of this AI stuff is processing locally, but isn't optimised yet as I'm getting very very slow results just doing normal transcriptions (way slower than 19.1.4)

Sun Apr 06, 2025 1:17 pm

trinderfilms wrote:Yes I'm finding this very very slow. I gave it a good 30 mins of training data (on fast mode) and the little progress bar at the bottom is implying it's going to take 20 hours. In the meantime my computer is very slow! M1 Max 32GB. I also notice you can switch between GPU and apple neural processing (or auto) but so far both seem the same grindingly slow. I'm guessing a lot of this AI stuff is processing locally, but isn't optimised yet as I'm getting very very slow results just doing normal transcriptions (way slower than 19.1.4)

The documentation for new features stated that a 10 minute sample of training data could take several hours. They didn't specify whether that was on a mediocre system or a high-end system. When I get a chance, I try it on my M1 Max with a short file. Love to hear some performance results. Not that this sort of operation matters as to how long it takes given the use case but it would be great to know ahead of time (to manage expectations, if nothing else).

Mon Apr 07, 2025 12:49 am

I installed v20 studio on my MacBook M1 Max and used a 10 minute audio file containing dialog and it looks like it's going to take less than 3 hours according to the estimate. This was on 'Better' mode (the default).

Tue Apr 08, 2025 9:37 am

Can't see any progress meter and nothing happened after a few hours...

UPDATED: It's fixed after I upgraded to beta 2

Fri Apr 25, 2025 5:17 pm

I gave it a one minute file only, but it doesnt make any progress at all.

Also CPU/GPU show no difference whether in pause or in running mode (background).

MAC M1, does it need extra security access or so?

Tue May 06, 2025 11:28 am

Beta3, Training of 5 minutes of audio took 10 minutes or less.
The GPU cycling on and off all the way through, 50% of time the GPU was inactive with only minor CPU use. Resolve using close to 18GB VRAM.

The result sounded quite good, realistic with many words but spliced with words that were obviously AI created and I wouldn't say it sounded much like the voice it was trained from.

A narration with fan noise is re-voiced with the Trained AI Voice.

Tue May 06, 2025 4:43 pm

CougerJoe wrote:Beta3, Training of 5 minutes of audio took 10 minutes or less.
The GPU cycling on and off all the way through, 50% of time the GPU was inactive with only minor CPU use. Resolve using close to 18GB VRAM.

The result sounded quite good, realistic with many words but spliced with words that were obviously AI created and I wouldn't say it sounded much like the voice it was trained from.

A narration with fan noise is re-voiced with the Trained AI Voice.

Oh, this is going to wear the hell out of some people's SSDs in those M1 Machines.

But it might also force GPU upgrades for some PC users (if they want to use it regularly).

Wed May 07, 2025 12:32 am

Trensharo wrote:
But it might also force GPU upgrades for some PC users (if they want to use it regularly).

That's what would be interesting to find out ,what is the minimum Vram requirement for not crashing and also not taking a ridiculous amount of time to generate the model. I"m unsure if Resolve scaled to the 24GB Vram of my GPU resulting in nearly 18GB VRAM use or if it needs that much. What happens with a 16/12/8 GB VRAM card, does the data fit within the VRAM or it doesn't causing slow processing and possibly crashing.

Also when people download and install the Voice Trainer consider restarting Resolve, the new engine will then be optimised. Optimisation I've read uses 'best tactics' for each GPU, it most likely relates to Tensor cores or lack of tensor cores but possibly considers the VRAM of the GPU (but probably not)

I tried again after Engine optimisation, not sure if the difference was related to optimisation or there was another reason for different figures but this time, using 9m20s of sample audio training took 8 minutes, and maximum VRAM was 12.2GB, Voice convert was mainly just under 8GB VRAM but with peaks over 10GB. The result of the new AI model was not very good, I had better results training the previous 5minute audio but different samples. The source's were of similar quality, not perfect with minor background hiss.

Voice training (v.20 beta)... how long does it take?

Voice training (v.20 beta)... how long does it take?

Re: Voice training (v.20 beta)... how long does it take?

Re: Voice training (v.20 beta)... how long does it take?

Re: Voice training (v.20 beta)... how long does it take?

Re: Voice training (v.20 beta)... how long does it take?

Re: Voice training (v.20 beta)... how long does it take?

Re: Voice training (v.20 beta)... how long does it take?

Re: Voice training (v.20 beta)... how long does it take?

Re: Voice training (v.20 beta)... how long does it take?

Re: Voice training (v.20 beta)... how long does it take?

Re: Voice training (v.20 beta)... how long does it take?

Who is online