Intel Atom failures
I know the title sounds a bit weird and you may ask why would you need to recover a RAID5 array when all your disks are healthy, right?
To understand what is going on, my DS1515+ has an Intel Atom C2538. (source: Synology CPU / NAS Type). It recently caused a lot of issues in the IT industry. (remember the Cisco clock issue? 🙂 )
The Errata AVR54 of the C2000 Specifications update clearly states the following: “system may experience inability to boot or may cease operation”. My NAS was starting to have regular reboots and it completely crashed before I could back up the last delta of data.
In the first instance, Synology denied any abnormal failure rate on this specific hardware while admitting a flaw (!). Synology then extended the warranty of all the NAS platforms affected by this hardware flaw.
Recovering the data using the GUI. (fail)
I immediately opened a case with Synology who sent me another DS1515+ pretty quickly. I still had to pay for express shipping).
After I inserted my disks into the newly received NAS, I noticed that the new NAS was beeping and was trying to recover my RAID5 array without any luck. The DSM told me that the Raid 5 array was down but all disks healthy.

I waited until the parity check was performed to verify if the Synology was silently trying to recover the volume. Unfortunately after 10 hours, nothing appeared in my volume list.
I decided to dig and found plenty of useful information provided by Linux experts in the Synology community (shootout to him/her).
Here is what I have done:
Recovering the volume using the CLI. (Semi-success)
First I wanted to check my raid information:
|
admin@Syno-Home:/volume1$ cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md2 : active raid5 sda3[0] sde3[4] sdd3[3] sdc3[2] sdb3[1] 11701777664 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU] md1 : active raid1 sda2[0] sdb2[1] sdc2[2] sdd2[3] sde2[4] 2097088 blocks [5/5] [UUUUU] md0 : active raid1 sda1[0] sdb1[1] sdc1[2] sdd1[3] sde1[4] 2490176 blocks [5/5] [UUUUU] |
I knew md0 and md1 (system + swap) were fine but md2 (actual data) was not behaving properly even though my disks were “fine”. The Raid 5 state is clean and the number of disks is accurate with what I have (/dev/sd[abcde]3). Let’s find with more detail the state of the RAID 5 array:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
|
admin@Syno-Home:/volume1$ sudo mdadm --detail /dev/md2 /dev/md2: Version : 1.2 Creation Time : Wed Aug 19 16:28:10 2015 Raid Level : raid5 Array Size : 11701777664 (11159.69 GiB 11982.62 GB) Used Dev Size : 2925444416 (2789.92 GiB 2995.66 GB) Raid Devices : 5 Total Devices : 5 Persistence : Superblock is persistent Update Time : Thu May 24 21:49:28 2018 State : clean Active Devices : 5 Working Devices : 5 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Name : Syno-Home:2 (local to host Syno-Home) UUID : e55ddd73:47653974:e2639cbd:36b06604 Events : 4660 Number Major Minor RaidDevice State 0 8 3 0 active sync /dev/sda3 1 8 19 1 active sync /dev/sdb3 2 8 35 2 active sync /dev/sdc3 3 8 51 3 active sync /dev/sdd3 4 8 67 4 active sync /dev/sde3 |
The partitions were good when I ran an fdisk -l so I tried to stop and reassemble the RAID 5 array.
|
ash-4.3# mdadm --stop /dev/md2 mdadm: stopped /dev/md2 ash-4.3# mdadm --assemble --force --run /dev/md2 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3 -v mdadm: looking for devices for /dev/md2 mdadm: /dev/sda3 is identified as a member of /dev/md2, slot 0. mdadm: /dev/sdb3 is identified as a member of /dev/md2, slot 1. mdadm: /dev/sdc3 is identified as a member of /dev/md2, slot 2. mdadm: /dev/sdd3 is identified as a member of /dev/md2, slot 3. mdadm: /dev/sde3 is identified as a member of /dev/md2, slot 4. mdadm: added /dev/sdb3 to /dev/md2 as 1 mdadm: added /dev/sdc3 to /dev/md2 as 2 mdadm: added /dev/sdd3 to /dev/md2 as 3 mdadm: added /dev/sde3 to /dev/md2 as 4 mdadm: added /dev/sda3 to /dev/md2 as 0 mdadm: /dev/md2 has been started with 5 drives. |
When I tried to mount my partition as mentionned in the above link from the Synology forums, I had the following issue:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
|
ash-4.3# mount -o ro /dev/md2 /recovery mount: wrong fs type, bad option, bad superblock on /dev/md2, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. ash-4.3# ash-4.3# dmesg | tail [ 4343.120646] md/raid:md2: raid level 5 active with 5 out of 5 devices, algorithm 2 [ 4343.129021] RAID conf printout: [ 4343.129024] --- level:5 rd:5 wd:5 [ 4343.129027] disk 0, o:1, dev:sda3 [ 4343.129030] disk 1, o:1, dev:sdb3 [ 4343.129032] disk 2, o:1, dev:sdc3 [ 4343.129034] disk 3, o:1, dev:sdd3 [ 4343.129037] disk 4, o:1, dev:sde3 [ 4343.129083] md2: detected capacity change from 0 to 11982620327936 [ 4343.136496] md2: unknown partition table |
So I stopped the raid array and reassembled again and try to check with dmesg what was the status of my array:
|
ash-4.3# dmesg | tail [ 4343.129032] disk 2, o:1, dev:sdc3 [ 4343.129034] disk 3, o:1, dev:sdd3 [ 4343.129037] disk 4, o:1, dev:sde3 [ 4343.129083] md2: detected capacity change from 0 to 11982620327936 [ 4343.136496] md2: unknown partition table [ 4409.403524] EXT4-fs (md2): INFO: recovery required on readonly filesystem [ 4409.411127] EXT4-fs (md2): write access will be enabled during recovery [ 4409.419424] EXT4-fs (md2): barriers disabled [ 4409.774205] JBD2: journal transaction 23508701 on md2-8 is corrupt. [ 4409.781228] EXT4-fs (md2): error loading journal ash-4.3# |
AH ! The journal has an issue when loading so let’s try to mount it and load the journal. (Do not plan to do that for a long-term use of your NAS, my immediate concern was data recovery).
|
ash-4.3# mount -o ro,noload /dev/md2 /recovery ash-4.3# |
I guess the mount has been performed and let’s check if I could see something in the /recovery folder:
|
ash-4.3# ls @appstore @autoupdate @database yyyyyyy lost+found NetBackup yyyy @smbd.core synoquota.db @tmp aquota.group Backup yyyy @eaDir Media Nicolas @S2S @synoaudiod.core @syslog-ng.core video aquota.user @cloudsync @download @iSCSI music photo yyyyyyy @SYNO.FileStatio.core Temp |
I could see my folders (some names are changes for obvious privacy reasons) but I was wondering how to retrieve my data now …. the GUI couldn’t see a volume and I couldn’t install any package on the volume in the GUI (because it didn’t see any)…… So I couldn’t FTP at all.
So I had another empty box with Linux running on it and I decided to do some Rsync backup from the old NAS (failed volume) to another NAS.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
|
ash-4.3# rsync -gloptruv /volume1/yyyy/ admin@IP:/volume1/Backup The authenticity of host 'IP (IP)' can't be established. ECDSA key fingerprint is SHA256:SHA256ASH:) Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'IP' (ECDSA) to the list of known hosts. admin@IP's password: Could not chdir to home directory /var/services/homes/admin: No such file or directory sending incremental file list .file. .file. .file. .file. .file. .file. .file. .file. .file. .file. .file. |
I am so happy I retrieved the delta of data and learned from my mistakes. I need to automate the backup more frequently and on one more that 1 device. Now I just have to wait until I could copy a few TB of datas.
I am far from being a linux guru but I know my way around bash. Network Engineers, you need to understand how linux is working. I had a very interesting conversation with Pete Lumbis a year ago at the Software-Defined Enterprise Conference & Expo about how to learn linux … Pete and I had the same observation:
Most of the course I tried about Linux were not reaching the expectation I had, I was quickly bored even though I had to keep watching/reading to understand everything. On this case I did prefer to dig the technology by reading articles and get my hands very dirty.
Nic