Glitches on BRAW footage, only after copy to SSD

Get answers to your questions about color grading, editing and finishing with DaVinci Resolve.
  • Author
  • Message
Offline

filmograma

  • Posts: 5
  • Joined: Sat Nov 06, 2021 5:58 pm
  • Real Name: Nathan Jackson

Glitches on BRAW footage, only after copy to SSD

PostSat Nov 27, 2021 6:09 pm

Hi, this has been happening to me regularly, but I've been getting artefacts on my BRAW footage from my Pocket cinema 4k, seemingly randomly. I originally spoke to BM about it because I thought it was a problem with the camera, however, having checked the footage on the camera SSD (Samsung T5), the glitch is not visible. It is visible in Resolve and in the BRAW player when playing the copied footage (not when playing back from the camera's SSD).

Could this be an issue with my internal SSD? Somehow the footage is getting corrupted during the copy? It seems odd because it's such a minor corruption. The footage plays back perfectly except for these imperfections.

When I find a clip with a glitch I can replace the copied clip with the original version from the Samsung T5 and the glitch disappears.

Pictures attached. It happens on a single frame of maybe 1 of every 4 clips.
Attachments
Screenshot 2021-11-27 at 19.05.18.jpg
Screenshot 2021-11-27 at 19.05.18.jpg (526.43 KiB) Viewed 848 times
Screenshot 2021-11-27 at 19.02.18.jpg
Screenshot 2021-11-27 at 19.02.18.jpg (213.06 KiB) Viewed 848 times
Offline

smunaut

  • Posts: 498
  • Joined: Sat Jan 30, 2021 6:15 pm
  • Real Name: Sylvain Munaut

Re: Glitches on BRAW footage, only after copy to SSD

PostMon Nov 29, 2021 10:12 am

It can either be an issue with your SSD, or possibly your RAM.

I've actually had both issues recently (on different systems). One one, my internal 2TB SSD started failing and when computing checksums ( sha256 ) of the BRAW off the original media (that I thankfully still had), they didn't match the one computed from my internal SSD. I ran a full SMART Self-Check on the SSD and it indeed failed, reporting read-errors and some re-allocated sectors. Replaced the SSD on that machine and no more issues.

On another system, the corruption wasn't from the SSD itself but was happening during the copy process itself and got traced to flaky ram. System was otherwise working fine, but when doing a memory stresstest (memtest86), it showed a few defective locations in the RAM and when copying 100 GB of footage, some of it ended up on those location and was getting corrupted. Replaced the RAM sticks and never had issue again.

In general now what I do is :

(1) Compute a chekcsum (sha256) of the original file on one system.
(2) On another computer (or just doing a unplug/replug to avoid caching), I copy the file to my internal drive.
(3) I then compute the checksum reading the file from my internal drive and validate that they match.
(4) I also keep a text file with all the footage checksums next to them so I can always revalidate they didn't end up corrupted somehow later on.
Resolve Studio - Ryzen 5800X3D - AMD RX6600 / NVidia RTX 4070 (switching between the 2) - Linux
Offline

kfriis

  • Posts: 358
  • Joined: Sun Oct 10, 2021 10:14 am
  • Real Name: Kurt Friis Hansen

Re: Glitches on BRAW footage, only after copy to SSD

PostMon Nov 29, 2021 11:40 am

@smunaut Your approach works, but can require Extreme use of time, before the error shows up in internal RAM.

Once I discovered, that my backups were periodically wrong (a single byte in one or two files - actually only one flipped bit). That’s why backups always needs verifying of all the transferred files. Otherwise….?!

The problem was found in RAM with a memory test program, but it took several days. The test - ordinarily - runs from low to high memory, but that did not trigger the bitflip.When I then forced a reverse of the test direction, the flip - luckily - happened roughly every second test run.

The solution to the detection problem is using ECC memory (one extra test bit per byte), where a single bit error can be detected (and corrected on the fly), plus you can activate a warning, when this happens. This works, and it’s both heartstopping and heartening, when an alarm from the motherboard is received, but if your livelihood depends on… ehh?

There was no mention of hardware for the poster, but there are problems in new M1 (incl. Pro and Max) based hardware, when large copies are involved - often from external drives (directly connected or more frequently connected via hub or dock). I have personally seen it happen in Big Sur 11.3 and later - currently 11.6.1 on M1 but never on Intel based MacBook Pro 2018 with the same Big Sur version and similar or identical tasks performed on identical (moved and reconnected) hardware and files. Reports indicate, that the current MacOS 12 release versions have similar problems.

In my case, the internalSSD is not really involved as an active member of the key operation.

The symptoms are hard to catch. Let’s say, you have a 300 GByte copy from a Thunderbolt external SSD to NAS. It takes for forever, we all know that. When you return, the computer asks for entering the password, not as usual just the Touch-ID. When you have entered the OS after the OS-forced (re)boot, one of your CPU cores - never the same - has panicked (a timeout of some 90 seconds was established).

In one case, I was working on the machine, while the copy took place. Suddenly the keyboard entry felt like “molasses”, and a bit later everything froze. Some 90 seconds later, the machine rebooted. The problem exists for both Intel (Rosetta) and M1 code, but the common denominator is, that the programs were designed with basic disk I/O handling code initially for PC’s, and they die during and in the midst of a large data transfer (target file corrupted, too short or wrong date) and huge amounts of transfer of typically large video files are involved.

Look for kernel panic on Big Sur etc in the Apple user and developer forums, if you’re interested. I have found no way around the problem.

Regards
Offline

smunaut

  • Posts: 498
  • Joined: Sat Jan 30, 2021 6:15 pm
  • Real Name: Sylvain Munaut

Re: Glitches on BRAW footage, only after copy to SSD

PostMon Nov 29, 2021 12:35 pm

"Extreme use of time" ?

- SSD Self Test is ~ 4h but you only need to do it rarely. I do it every 3 months.
- Memory test is ~ 2h to make 4 full pass over 64G of RAM. And again, only need to do it once in a while or when you notice a possible issue.

Usually for both of theses, I just run then overnight.

And keeping and checking checksum on ingest is just basic data protection and should be done regardless.
It takes me ~ 30 min to copy 300G of footage from the T5 SSD to my NAS and then another 10-15 min to compute the checksum from the copied files and from the original drive.
Of course this only uses 1 CPU core, the machine is perfectly usable for other tasks while that's running, I don't just sit there waiting for it to be done.

ECC memory is great and definitely should be considered when building a workstation. But it's also not the only source of errors. You could have SSD errors, you could also have USB transfer errors, and then you could also have double bit errors in memory.

Davinci Resolve actually has a module in Media management to do copy / ingest of footage that computes checksums at the same time as doing the copy for the same purpose. (Although I prefer to do it externally since I also apply the same treatment to audio and other media files than just camera braw).
Resolve Studio - Ryzen 5800X3D - AMD RX6600 / NVidia RTX 4070 (switching between the 2) - Linux
Offline

kfriis

  • Posts: 358
  • Joined: Sun Oct 10, 2021 10:14 am
  • Real Name: Kurt Friis Hansen

Re: Glitches on BRAW footage, only after copy to SSD

PostMon Nov 29, 2021 1:10 pm

[quote="smunaut"]"Extreme use of time" ?

[cut…]

Have you ever tried tracking down a PERIODIC error in RAM?

Especially one, that does NOT turn up on a “bread and butter” scan, then, and only shows it’s ugly face if one bit is accessed BEFORE the next LOWER bit in the chip? Not the other way around. To top it all, the error was PERIODIC - in my case about each second, consecutive, run. Mixing scan up and down showed nothing wrong. Typically two, seldom three consecutive decrementing scans flipped the bit.

That takes a LOOONG time to find, and until I discovered, that my backups were affected, God only knows what was also affected!!!

A problem, any ECC capable motherboard flags with ease!

With ECC, the moment, the system discovers one or more “flips”, the system acts. If it’s just one flip, that can be corrected reliably on the fly, the system issues a warning (you decide level). Then you are told, this stick, this bit in that position. Presto. if you have two banks of identical sticks, remove the stick in question, and deactivate/remove the other sticks, and continue work (with half the RAM). Order new stick. Presto!

If more than one bit in one byte flips, everything halts (in theory, that is not always detected, but would typically involve more than one chip on a “licorice stick”, which typically leads to other, more dramatic errors too. You know, sometimes you just have to follow the smoke ;-)

Regards
Offline

smunaut

  • Posts: 498
  • Joined: Sat Jan 30, 2021 6:15 pm
  • Real Name: Sylvain Munaut

Re: Glitches on BRAW footage, only after copy to SSD

PostMon Nov 29, 2021 1:22 pm

Just because it _can_ be an elusive error doesn't mean it's not worth running the quick checks first ...

A significant part of error can be caught by run of the mill memtests. There is no point looking for zebras if you haven't even tried looking for horses firsts.

Not everyone has the budget to replace their whole hardware setup with one will full ECC memory.
Video editing is not a primary source of income for me, and that doesn't mean I want to have errors in my files. If there are steps I can take to drastically improve reliability that don't require me to buy any hardware, I believe they're worth taking.

(and as I stated above, memory errors are only 1 of the many possible way files can get corrupted)
Resolve Studio - Ryzen 5800X3D - AMD RX6600 / NVidia RTX 4070 (switching between the 2) - Linux
Offline

kfriis

  • Posts: 358
  • Joined: Sun Oct 10, 2021 10:14 am
  • Real Name: Kurt Friis Hansen

Re: Glitches on BRAW footage, only after copy to SSD

PostMon Nov 29, 2021 3:03 pm

smunaut wrote:[....cut]Not everyone has the budget to replace their whole hardware setup with one will full ECC memory.[cut....]


I'm not disagreeing with you per se, but a lot of people do not even know about the benefits of ECC RAM/Motherboard support - even their advisers are often totally "blind" on that issue (in engineering circles and amongst top level developers, plus big iron admins it's a given, that you need this. It's often far to expensive not to - also insurance wise, you know: "professional neglicense"). If someone invests between 5k and 10k SGD/AUD/USD or more on a video rig (monitors and peripherals excluded) minor motherboard and actual ECC RAM price differences do not really count.

I can forgive people, who don't know (and don't have to), but NOT their advisers. There are of course the odd gamer cum video editor, that rejects ECC memory on the grounds, that they are less usable for heavy overclocking (if you have ever heard a true overclocker trying to... you learn a completely new vocabulary, not contained in any English dictionary). That guy doesn't believe in backups either, so no problem there (I know for sure, since he hasn't started yet, since the last total data loss when his rig decided to enter a sudden "indian smoke signalling" state, and he asked me if.... I was polite, even as he promised huge values of coin for any help in retrieving "anything" ;-)

I can accept, that there are limits to options for portable gear, like notebooks, and for hobby use, but I've seen to many businesses run into severe trouble trying to save a few to a few hundred quid, or caused by bad advise (unnecessarily, when the the small difference in price would have been accepted from a data safety viewpoint anyway).

The more we "display" the options and benefits of using ECC memory supporting gear, the bigger the chance is, that those who really need it, will actually take note.

Regards

Return to DaVinci Resolve

Who is online

Users browsing this forum: martyeu, Tfiddler, Warped Kings and 207 guests