Tag Archives: Storage

Recover a RAID5 Array on Linux with healthy disks

Intel Atom failures

I know the title sounds a bit weird and you may ask why would you need to recover a RAID5 array when all your disks are healthy, right?

To understand what is going on, my DS1515+ has an Intel Atom C2538. (source: Synology CPU / NAS Type). It recently caused a lot of issues in the IT industry. (remember the Cisco clock issue? 🙂 )

The Errata AVR54 of the C2000 Specifications update clearly states the following: “system may experience inability to boot or may cease operation”. My NAS was starting to have regular reboots and it completely crashed before I could back up the last delta of data. 

In the first instance, Synology denied any abnormal failure rate on this specific hardware while admitting a flaw (!). Synology then extended the warranty of all the NAS platforms affected by this hardware flaw.

 

Recovering the data using the GUI. (fail)

I immediately opened a case with Synology who sent me another DS1515+ pretty quickly. I still had to pay for express shipping).

After I inserted my disks into the newly received NAS, I noticed that the new NAS was beeping and was trying to recover my RAID5 array without any luck. The DSM told me that the Raid 5 array was down but all disks healthy.


 

I waited until the parity check was performed to verify if the Synology was silently trying to recover the volume. Unfortunately after 10 hours, nothing appeared in my volume list.

I decided to dig and found plenty of useful information provided by Linux experts in the Synology community (shootout to him/her).

Here is what I have done:

Recovering the volume using the CLI. (Semi-success)

First I wanted to check my raid information:

I knew md0 and md1 (system + swap) were fine but md2 (actual data) was not behaving properly even though my disks were “fine”.  The Raid 5 state is clean and the number of disks is accurate with what I have (/dev/sd[abcde]3). Let’s find with more detail the state of the RAID 5 array:

The partitions were good when I ran an fdisk -l so I tried to stop and reassemble the RAID 5 array.

When I tried to mount my partition as mentionned in the above link from the Synology forums, I had the following issue:

So I stopped the raid array and reassembled again and try to check with dmesg what was the status of my array:

AH ! The journal has an issue when loading so let’s try to mount it and load the journal. (Do not plan to do that for a long-term use of your NAS, my immediate concern was data recovery).

I guess the mount has been performed and let’s check if I could see something in the /recovery folder:

I could see my folders (some names are changes for obvious privacy reasons) but I was wondering how to retrieve my data now …. the GUI couldn’t see a volume and I couldn’t install any package on the volume in the GUI (because it didn’t see any)…… So I couldn’t FTP at all.

So I had another empty box with Linux running on it and I decided to do some Rsync backup from the old NAS (failed volume) to another NAS.

I am so happy I retrieved the delta of data and learned from my mistakes. I need to automate the backup more frequently and on one more that 1 device. Now I just have to wait until I could copy a few TB of datas.

I am far from being a linux guru but I know my way around bash. Network Engineers, you need to understand how linux is working. I had a very interesting conversation with Pete Lumbis a year ago at the Software-Defined Enterprise Conference & Expo  about how to learn linux … Pete and I had the same observation:

Most of the course I tried about Linux were not reaching the expectation I had, I was quickly bored even though I had to keep watching/reading to understand everything. On this case I did prefer to dig the technology by reading articles and get my hands very dirty.

Nic

Hyper-converged infrastructure – Part 1 : Is it a real thing ?

Recently I was lucky enough to play with Cisco Hyperflex in a lab and since it was funny to play with, I decided to write a basic blog post about the hyper-converged infrastructure concept (experts, you can move forward and read something else 🙂 ). It has really piqued my interest. I know I may be late to the game but better late than never right? 🙂

Legacy IT Infrastructure

Back in the days, you had to have separate silo to maintain a complete infrastructure (it is still true by the way, but it tends to become more and more frequent that networks, servers, and storage are progressively forming a single IT platform …. sorry I meant “cloud”):

  • Compute(System and Virtualization)
  • Storage
  • Network (Network and Security)
  • Application

You had to install and maintain multiple sub infrastructures in order to run the IT services in your company. 

If  you wanted to deploy a greenfield infrastructure for your data center, here is a brief summary of what you needed:

  • Physical servers (Owners: System team)
  • Hypervisors (Owners: System team)
  • Operating system (Owners: System team) 
  • Network infrastructure (Owners: Network team)
    • Routing – Switching
    • Security (VPN, Cybersecurity)
    • Load Balancers
  • Storage arrays (Owners: Storage team)
  • Applications for the business to run. (Owners: IT applications team)

Each silo has its own experts and language (LUN + FLOGI vs GPO + AD vs OSPF, BGP and TLS). As you can guess, it was a bit complicated and long to provision new applications and services for any business (even in a brownfield IT environment). Once everything was running, the IT team was in charge to maintain the infrastructure and one of the drawback was dealing with several manufacturers (and potentially partners) to maintain your infrastructure…. 

Converged Infrastructure and simplification

In the late 2000s, famous manufacturers saw an opportunity to simplify the complexity of the complete data center stack and converged infrastructure was born.

With the emergence of cloud applications, EMC and Cisco created a joint venture Acadia that will later be renamed VCE for (VMware, Cisco, EMC). The purpose of that company was to sell converged infrastructure products. Vblock was the flagship product. As you know, you could buy an already provisioned rack that was customized according to your preferences. The vBlock was composed of the following individual products:

  • Storage Array: EMC VNX/VMAX 
  • Storage Networking: Cisco Nexus, Cisco MDS
  • Servers: Cisco UCS C or UCS B
  • Networking: Cisco Nexus
  • Virtualization: vSphere

VCE was in charge of configuring (or customizing I should say) the vBlock according to your need and preference.

Once the network was delivered, you “just” had to plug it in your data center networking infrastructure and everything should be connected. Servers were ready to be deployed.

Going that way, you could save time and trouble. Agility is also a big selling point for these kinds of architectures. 

As you can see, the footprint for these products was still consequent. in this case, you had to deal with a single manufacturer but the main drawback is the product flexibility. You could not install any version on your Cisco Nexus because VCE was very strict on the supported version.

Hyper-converged Infrastructure and  horizontal scaling

Hyper-converged is a term that has been rolling since 2012. The main difference between converged and hyper-converged infrastructure is definitely the storage 

  • Converged infrastructure:
    • Centralized array accessible using a traditional storage network (FC with FSPF or ISCSI/NFS)
  • Hyper-converged infrastructure:
    • Distributed drives in each servers forming a centralized file system.

Hyper-converged system has the ability to be adaptable. The way it scales is horizontal while reducing the footprint by a significant amount. If you just want to try it, just perform a setup with few hosts and if the solution works for you, just add nodes to the cluster horizontally and you will increase your performance and redundancy.  This way, you can consolidate your compute and storage infrastructure.

Horizontal scaling is a familiar concept for many network engineers (Clos Fabrics anyone?)

In my opinion, it is a natural evolution of the Data Center compute and storage infrastructure.

There are several “Hyper-converged” manufacturers on the market:

My next post will be about deploying a Cisco Hyperflex infrastructure.

Thanks for reading !