HighPoint NVMe M.2 RAID (EVO 970) fails unexpectedly

Do you have questions about Desktop Video, Converters, Routers and Monitoring?
  • Author
  • Message
Offline
User avatar

Adriano Castaldini

  • Posts: 188
  • Joined: Sat Mar 30, 2013 4:10 am

HighPoint NVMe M.2 RAID (EVO 970) fails unexpectedly

PostThu Nov 07, 2019 1:21 am

I bought an HighPoint SSD7101A-1 to make a NVMe RAID0 with 4x 970 EVOs.
I correctly built the RAID with the WebGUI app and formatted it in HFS+.
It happens that after a while (even doing nothing) the RAID fails unexpectedly: the alarm sounds and WebGUI reports “Disk 'Samsung SSD 970 EVO 2TB-S464NB0M400143A' at Controller1-Enclosure1-Device3 failed” (the disk is not always the same). One can say it's a drive failure, but everytime the problem happens WebGUI reports a different failed drive! And if you simply restart the OS, the raid is still there again (with all the files) and seems working fine (I can only suppose that if it were actually a drive failure, once restarted the OS I should hear the alarm sound immediately).

So, what can it be? A problem of the controller's firmware/BIOS? A loose connection of the NVMe plugs inside the controller? Is the controller damaged? Or is actually one of the drives to be damaged?

I can't understand how to solve this issue.
My hardware is an hackintosh with 2x Radeon VII, OSX on EVO 970 1TB, footage on PCIe RAID (4x 2TB 970 2TB EVOs on HighPoint), 64GB ram, GA-X99P-SLI, i7-6950X, DeckLink Mini Monitor 4K, two 8-drives tb2 RAID (G-Speed), Davinci 16.1.1 on Mojave.

Please, someone can help me?

Thanks a lot.
Offline

Mark Foster

  • Posts: 58
  • Joined: Tue Oct 27, 2015 10:59 am

Re: HighPoint NVMe M.2 RAID (EVO 970) fails unexpectedly

PostSat Nov 09, 2019 12:03 pm

maybe the drives running to hot

what says the webgui under SHI?
MacPro 5.1 2x3,46GHz 96GB
SSD7101A-1
RocketU 1344A
radeon VII
BMD ext 4k
2 x apple 27"
MacPro 5.1 2x3,46GHz 96GB
TITAN RIDGE
RocketU 1344A
radeon VII
LG27UD88
MBP 15" 2017 3.1GHz i7 16GB/2TB
MBP 15" 2014 2.8GHz i7 16GB/1TB
OS 10.14.6
Offline
User avatar

Adriano Castaldini

  • Posts: 188
  • Joined: Sat Mar 30, 2013 4:10 am

Re: HighPoint NVMe M.2 RAID (EVO 970) fails unexpectedly

PostSun Nov 10, 2019 5:28 pm

Thanks Mark for your reply.
I monitored SHI this last couple of days. I would have liked to see a disk failure in order to know at which temp it happens, but “unfortunately” disks never failed this time! BUT... over 120°F apps tends to crash. (WebGUI sets temp threshold at 140°F.)
Anyway, these are the temps:
1. Without the cover-heatsink, during hard routine the disks easily arrive to 129°F (hard routine I mean: Blackmagic Speed Test in loop + VLC playing a Prores4444 + writing a 256GB RAW video file + play it with MlRawViewer);
2. With the cover-heatsink, during the same hard routine, the disks never exceed 105°F;
3. With the cover-heatsink, Davinci previews on-the-fly an heavily graded RAW (with massive DeNoise, nodes, layers, and some efx) and plays it in loop in real-time (24fps) thanks to a couple of Radeon VII cards, and after about 10 minutes SHI reports 123°F for the disks.
After this last routine, Davinci did't quit correctly: I had to restart the OS, so I consider this a crash-like event.
Also after the first routine, VLC freezed and I needed to restart the OS.
My conclusion is that inspite of the threshold of 140°F set in SHI, after 120°F the disks don't fail but the system does!

Do you have similar temperatures as mine?

I'm curious also because I had to replace the original white thermal pad (provided with HgihPoint controller) with a slimmer one because the provided white pad was too thick (1.5mm) and with the disks mounted inside the card, I noticed that the PCB bowed when I screwed the cover on. So I replaced the provided pad with a 0.5mm slim thermal pad (Alphacool 14W).

Did you experienced the same problem? Your PCB bowed 'cause the thermal pad thickness?

Thanks a lot.

Return to Post Production

Who is online

Users browsing this forum: No registered users and 9 guests