Tag Archives: Synology

Recover a RAID5 Array on Linux with healthy disks

Intel Atom failures

I know the title sounds a bit weird and you may ask why would you need to recover a RAID5 array when all your disks are healthy, right?

To understand what is going on, my DS1515+ has an Intel Atom C2538. (source: Synology CPU / NAS Type). It recently caused a lot of issues in the IT industry. (remember the Cisco clock issue? 🙂 )

The Errata AVR54 of the C2000 Specifications update clearly states the following: “system may experience inability to boot or may cease operation”. My NAS was starting to have regular reboots and it completely crashed before I could back up the last delta of data. 

In the first instance, Synology denied any abnormal failure rate on this specific hardware while admitting a flaw (!). Synology then extended the warranty of all the NAS platforms affected by this hardware flaw.

 

Recovering the data using the GUI. (fail)

I immediately opened a case with Synology who sent me another DS1515+ pretty quickly. I still had to pay for express shipping).

After I inserted my disks into the newly received NAS, I noticed that the new NAS was beeping and was trying to recover my RAID5 array without any luck. The DSM told me that the Raid 5 array was down but all disks healthy.


 

I waited until the parity check was performed to verify if the Synology was silently trying to recover the volume. Unfortunately after 10 hours, nothing appeared in my volume list.

I decided to dig and found plenty of useful information provided by Linux experts in the Synology community (shootout to him/her).

Here is what I have done:

Recovering the volume using the CLI. (Semi-success)

First I wanted to check my raid information:

I knew md0 and md1 (system + swap) were fine but md2 (actual data) was not behaving properly even though my disks were “fine”.  The Raid 5 state is clean and the number of disks is accurate with what I have (/dev/sd[abcde]3). Let’s find with more detail the state of the RAID 5 array:

The partitions were good when I ran an fdisk -l so I tried to stop and reassemble the RAID 5 array.

When I tried to mount my partition as mentionned in the above link from the Synology forums, I had the following issue:

So I stopped the raid array and reassembled again and try to check with dmesg what was the status of my array:

AH ! The journal has an issue when loading so let’s try to mount it and load the journal. (Do not plan to do that for a long-term use of your NAS, my immediate concern was data recovery).

I guess the mount has been performed and let’s check if I could see something in the /recovery folder:

I could see my folders (some names are changes for obvious privacy reasons) but I was wondering how to retrieve my data now …. the GUI couldn’t see a volume and I couldn’t install any package on the volume in the GUI (because it didn’t see any)…… So I couldn’t FTP at all.

So I had another empty box with Linux running on it and I decided to do some Rsync backup from the old NAS (failed volume) to another NAS.

I am so happy I retrieved the delta of data and learned from my mistakes. I need to automate the backup more frequently and on one more that 1 device. Now I just have to wait until I could copy a few TB of datas.

I am far from being a linux guru but I know my way around bash. Network Engineers, you need to understand how linux is working. I had a very interesting conversation with Pete Lumbis a year ago at the Software-Defined Enterprise Conference & Expo  about how to learn linux … Pete and I had the same observation:

Most of the course I tried about Linux were not reaching the expectation I had, I was quickly bored even though I had to keep watching/reading to understand everything. On this case I did prefer to dig the technology by reading articles and get my hands very dirty.

Nic