Desperate need for "Fault Tolerance" in Render Queue items

  • Author
  • Message
Offline
User avatar

Mel Matsuoka

  • Posts: 1438
  • Joined: Wed Aug 22, 2012 9:54 am
  • Location: Buffalo, NY

Desperate need for "Fault Tolerance" in Render Queue items

PostSun Sep 06, 2020 9:52 pm

I set up a transcode of 14 hours worth of footage last night, only to wake up this morning to an error popup in the Render Queue, informing me that Resolve was unable to decode a file. The error apparently popped up only about 1 hour into the rendering process, after I had gone to bed.

Image

The worst thing about this error is that it's a modal warning dialog that actually STOPS the entire Render Queue process, and doesn't try to continue on until a human being sees the error and manually dismisses the error popup, whereupon the Render Queue continues rendering other jobs in the queue. However, it will NOT resume rendering the actual Queue item that contains the errant file!

This is usually not a big deal when rendering out a :30 commercial. But when you are counting on Resolve to transcode hours and hours of footage in an unattended, overnight Render Queue, this is downright catastrophic, especially if you have tight delivery turnaround times on a transcode job. Needless to say, I was not a happy camper when I was greeted by the error this morning.

Resolve really needs to have a "fault tolerance" mechanism for cases like this. If it encounters a decode or write error on a clip in a Queue, it should still continue processing all the other clips in the queue with the assumption that the error was an isolated error. Then ONLY at the end of the rendering process, should it warn you that errors were encountered, and ideally it would also write out an actual error log file listing the specifics of each error, so the user can use the logfile to diagnose each problem clip separately. This way, even if a batch has errors in it, at least it would still successful churn through all the other clips which don't have any issues, and would prevent huge chunks of lost time.

This is also tangentially related to Resolve's poor error-reporting/logging in other areas of the app, specifically the Media Management panel.
Blog: https://PostProductive.tv
YouTube: https://www.youtube.com/@postproductive
Offline

franciscovaldez

  • Posts: 465
  • Joined: Wed Aug 22, 2012 4:52 pm

Re: Desperate need for "Fault Tolerance" in Render Queue ite

PostSun Sep 06, 2020 10:34 pm

+1

It would be great if it had a pop up window at the end with all the unfinished renders and thumbnails of the culprit frames.
MacBook Pro 13"
M2
UltraStudio 4K

Mac Pro
2.7 GHz 12-Core Intel Xeon E5
64 GB 1866 MHz DDR3
AMD FirePro D700 6 GB
Offline
User avatar

Mel Matsuoka

  • Posts: 1438
  • Joined: Wed Aug 22, 2012 9:54 am
  • Location: Buffalo, NY

Re: Desperate need for "Fault Tolerance" in Render Queue ite

PostSun Sep 06, 2020 10:44 pm

franciscovaldez wrote:+1

It would be great if it had a pop up window at the end with all the unfinished renders and thumbnails of the culprit frames.


I would find it more useful to have an actual, machine-parseable text file log (so you could write scripts that process the error log), with the exact timecode or framecount offset where the error was detected, if possible. I'm not really sure how useful thumbnails would be, because if they can't decode properly, what would there be to show as a thumbnail?
Blog: https://PostProductive.tv
YouTube: https://www.youtube.com/@postproductive
Offline

franciscovaldez

  • Posts: 465
  • Joined: Wed Aug 22, 2012 4:52 pm

Re: Desperate need for "Fault Tolerance" in Render Queue ite

PostSun Sep 06, 2020 11:48 pm

I would guess, as you see them in your timeline. Could be a frame or the clip thumbnail.

What I mean, it could be useful to have a visual idea of the clip that is giving the problem and not just the coordinates pointing to it.

I have no clue how to write scripts and not very proficient at converting frame count to actual reels timecode.

Also it would be cool if we could resume stoped renders. Let's say you paused or your computer crashed when at 97%. To be able to take it from there, instead of starting from scratch... that's what I like about exporting frame sequences.
MacBook Pro 13"
M2
UltraStudio 4K

Mac Pro
2.7 GHz 12-Core Intel Xeon E5
64 GB 1866 MHz DDR3
AMD FirePro D700 6 GB
Offline
User avatar

Mel Matsuoka

  • Posts: 1438
  • Joined: Wed Aug 22, 2012 9:54 am
  • Location: Buffalo, NY

Re: Desperate need for "Fault Tolerance" in Render Queue ite

PostMon Sep 07, 2020 12:05 am

franciscovaldez wrote:Also it would be cool if we could resume stoped renders. Let's say you paused or your computer crashed when at 97%. To be able to take it from there, instead of starting from scratch... that's what I like about exporting frame sequences.


This I completely agree with. After all, Resolve is basically a glorified, highly specific GUI over an SQL based database. One would think that it should be able to know what it has rendered in the past (even across different projects), what renders have failed etc, and give you options as to how to proceed with a Render Queue that may conflict with what already exists at a specific folder on the filesystem.
Blog: https://PostProductive.tv
YouTube: https://www.youtube.com/@postproductive
Offline

AndrewKeil

  • Posts: 326
  • Joined: Sat Jan 25, 2020 11:27 pm
  • Location: Belfast, Northern Ireland
  • Real Name: Andrew Keil

Re: Desperate need for "Fault Tolerance" in Render Queue ite

PostMon Sep 07, 2020 9:49 am

+1
Offline

Vit Reiter

  • Posts: 1010
  • Joined: Mon Sep 04, 2017 5:36 pm

Re: Desperate need for "Fault Tolerance" in Render Queue ite

PostMon Sep 07, 2020 10:09 am

It's a big DaVinci benefit that it stops rendering a corrupted item. But it could skip the corrupted item and render other items in the queue. An error pop-up window may not appear until the entire queue has rendered, and the icon and processing information may appear in the nodes on the right.
DaVinci Resolve 18.6.6 Studio (macOS Monterey 12.7.6)
Mac Pro 2013, AMD FirePro D700, 64GB RAM

Film Editor, Colorist, DIT, Datalab technician
linkedin.com/in/vít-reiter-film-editor
Offline

Jim Simon

  • Posts: 36051
  • Joined: Fri Dec 23, 2016 1:47 am

Re: Desperate need for "Fault Tolerance" in Render Queue ite

PostMon Sep 07, 2020 3:04 pm

I don't like this idea. If there's an error, I want to know right away.
My Biases:

You NEED training.
You NEED a desktop.
You NEED a calibrated (non-computer) display.
Offline

Hendrik Proosa

  • Posts: 3389
  • Joined: Wed Aug 22, 2012 6:53 am
  • Location: Estonia

Re: Desperate need for "Fault Tolerance" in Render Queue ite

PostMon Sep 07, 2020 3:28 pm

Jim Simon wrote:I don't like this idea. If there's an error, I want to know right away.

So you prefer not rendering to rendering? There can be a checkbox for users who want to babysit resolve every step if the way.
I do stuff
Offline

Jim Simon

  • Posts: 36051
  • Joined: Fri Dec 23, 2016 1:47 am

Re: Desperate need for "Fault Tolerance" in Render Queue ite

PostMon Sep 07, 2020 4:48 pm

If there's an error, then yes I prefer to stop the render and handle it.
My Biases:

You NEED training.
You NEED a desktop.
You NEED a calibrated (non-computer) display.
Offline
User avatar

Mel Matsuoka

  • Posts: 1438
  • Joined: Wed Aug 22, 2012 9:54 am
  • Location: Buffalo, NY

Re: Desperate need for "Fault Tolerance" in Render Queue ite

PostMon Sep 07, 2020 5:53 pm

Jim Simon wrote:If there's an error, then yes I prefer to stop the render and handle it.


So if you have 14 hours of rushes to transcode, and only 1 of the clips out of 2000 clips fails to decode at 1:00 AM while you’re sleeping, and you have to deliver those proxies to post-production first thing in the morning, you’d rather that the rest of the clips NOT get rendered because only 1 clip out of the entire set of data may be corrupted?

Sorry, but that makes absolutely no sense.
Blog: https://PostProductive.tv
YouTube: https://www.youtube.com/@postproductive
Offline
User avatar

Mel Matsuoka

  • Posts: 1438
  • Joined: Wed Aug 22, 2012 9:54 am
  • Location: Buffalo, NY

Desperate need for "Fault Tolerance" in Render Queue items

PostMon Sep 07, 2020 6:49 pm

Vit Reiter wrote:It's a big DaVinci benefit that it stops rendering a corrupted item. But it could skip the corrupted item and render other items in the queue. An error pop-up window may not appear until the entire queue has rendered, and the icon and processing information may appear in the nodes on the right.


I totally get where Jim is coming from, as far as wanting to know when errors have occurred as they happen. I am not advocating for the contrary.

I do think Resolve should still warn you in real-time when an error has occurred, but just not in a modal dialog that blocks the rest of the queue from rendering. I can’t think of a good use case for that behavior, perhaps with the possible exception of a situation where the queue may overwrite files that already exist (in which case it already warns you when you click the Render button), or if you may clobber source files (which Resolve should be hard coded NOT to do under any circumstance)





Sent from my iPhone using Tapatalk Pro
Blog: https://PostProductive.tv
YouTube: https://www.youtube.com/@postproductive
Offline

Vit Reiter

  • Posts: 1010
  • Joined: Mon Sep 04, 2017 5:36 pm

Re: Desperate need for

PostMon Sep 07, 2020 8:06 pm

Mel Matsuoka wrote:I do think Resolve should still warn you in real-time when an error has occurred, but just not in a modal dialog that blocks the rest of the queue from rendering.
I have no problem with real-time alerts. If the processing is stopped, a notification will appear immediately in the node and DaVinci will start processing the next node in the queue. The pop-up at the end of the queue notifies you once again that one or more nodes have stopped processing during rendering. But you will have an error notification in the node immediately.

The way DaVinci warns us is not that important. It is important that one rendering node does not stop the entire queue.
DaVinci Resolve 18.6.6 Studio (macOS Monterey 12.7.6)
Mac Pro 2013, AMD FirePro D700, 64GB RAM

Film Editor, Colorist, DIT, Datalab technician
linkedin.com/in/vít-reiter-film-editor
Offline

tlegvold

  • Posts: 799
  • Joined: Tue Nov 26, 2019 12:03 am
  • Location: Los Angeles
  • Real Name: Thor Legvold

Re: Desperate need for "Fault Tolerance" in Render Queue ite

PostMon Sep 07, 2020 9:36 pm

Alerts are important, real time or not.

However, an alert should NOT block the rest of the render queue. Like the OP, I had queue with 150 jobs to output. I set it up, started it up and went out to do other things. When I returned, the whole process had stopped because of one error in one queued job. I would have much preferred that it continued and did all of the jobs in the queue until it finished, and jumped over any with errors (yes, pop up a dialog panel or write a log file flagging the jobs that failed so they can be re-run).

Had this been an overnight job, I would have missed the deadline.

Computers excel at automating repetitive tasks, to free us up to focus on other things. I don't want to have to babysit Resolve when I've set up a batch render queue and go off to take care of other things while it's working.
2019 Mac Pro 16 Core CPU 192GB RAM | AMD Radeon W5700X 16GB | OS X Sonoma 14.7.4
Fairlight A.A. CC-2 | SX-36 | Audio Editor (FAE) | Studio Console |
2023 16" M3 Max MacBook Pro 64GB RAM | OS X Sonoma 14.7.1 | iPad Pro 13" M4 iPadOS 17.7
Offline
User avatar

Mel Matsuoka

  • Posts: 1438
  • Joined: Wed Aug 22, 2012 9:54 am
  • Location: Buffalo, NY

Re: Desperate need for "Fault Tolerance" in Render Queue ite

PostMon Sep 07, 2020 10:05 pm

tlegvold wrote:However, an alert should NOT block the rest of the render queue


As an addendum to this, this morning, I experienced the same issue on yet another overnight render. Except this time, there were no actual errors in the read/write process of the Render Queue. In this case, the issue was a spurious "Remote Grading" error popup dialog (which I've been getting randomly over the past month for some reason, as I reported in this thread), which popped up overnight.

The popup caused the Render Queue to pause. Dismissing the notification caused the Queue to continue rendering.

This is clearly undesirable default behavior for an active Render Queue, and really should be fixed.
Blog: https://PostProductive.tv
YouTube: https://www.youtube.com/@postproductive
Offline

Timo92

  • Posts: 284
  • Joined: Sun Aug 23, 2020 4:23 am
  • Real Name: Timo Teubert

Re: Desperate need for "Fault Tolerance" in Render Queue ite

PostThu Sep 10, 2020 3:38 am

+1 for having a machine-parseable text file log with real time alerts if an error is encountered during rendering (maybe with a sound alert?). Also, no disturbing pop-ups during rendering, and always trying to continue the rendering process.
Offline
User avatar

Dan Olson

  • Posts: 43
  • Joined: Mon Mar 13, 2017 5:27 am

Re: Desperate need for "Fault Tolerance" in Render Queue ite

PostFri Sep 11, 2020 3:39 am

tlegvold wrote:Alerts are important, real time or not.

However, an alert should NOT block the rest of the render queue.

Yeah, agreed, the simple answer is just to have the alert pop up, visually, while the render queue moves on. For single-item exports, like a final delivery, this is fundamentally the same. Error pops up, queue checks for next item, there are no more, queue stops.

But on a batch process I would really, really, really rather just deal with the stuff that failed rather than stop everything.
MacOS 10.15.5, MacPro 3.5GHz, Radeon Pro W5700X, 32GB DDR4

Return to DaVinci Resolve Feature Requests

Who is online

Users browsing this forum: No registered users and 19 guests