Tag Archives: Linux

Recover a RAID5 Array on Linux with healthy disks

Intel Atom failures

I know the title sounds a bit weird and you may ask why would you need to recover a RAID5 array when all your disks are healthy, right?

To understand what is going on, my DS1515+ has an Intel Atom C2538. (source: Synology CPU / NAS Type). It recently caused a lot of issues in the IT industry. (remember the Cisco clock issue? 🙂 )

The Errata AVR54 of the C2000 Specifications update clearly states the following: “system may experience inability to boot or may cease operation”. My NAS was starting to have regular reboots and it completely crashed before I could back up the last delta of data. 

In the first instance, Synology denied any abnormal failure rate on this specific hardware while admitting a flaw (!). Synology then extended the warranty of all the NAS platforms affected by this hardware flaw.

 

Recovering the data using the GUI. (fail)

I immediately opened a case with Synology who sent me another DS1515+ pretty quickly. I still had to pay for express shipping).

After I inserted my disks into the newly received NAS, I noticed that the new NAS was beeping and was trying to recover my RAID5 array without any luck. The DSM told me that the Raid 5 array was down but all disks healthy.


 

I waited until the parity check was performed to verify if the Synology was silently trying to recover the volume. Unfortunately after 10 hours, nothing appeared in my volume list.

I decided to dig and found plenty of useful information provided by Linux experts in the Synology community (shootout to him/her).

Here is what I have done:

Recovering the volume using the CLI. (Semi-success)

First I wanted to check my raid information:

I knew md0 and md1 (system + swap) were fine but md2 (actual data) was not behaving properly even though my disks were “fine”.  The Raid 5 state is clean and the number of disks is accurate with what I have (/dev/sd[abcde]3). Let’s find with more detail the state of the RAID 5 array:

The partitions were good when I ran an fdisk -l so I tried to stop and reassemble the RAID 5 array.

When I tried to mount my partition as mentionned in the above link from the Synology forums, I had the following issue:

So I stopped the raid array and reassembled again and try to check with dmesg what was the status of my array:

AH ! The journal has an issue when loading so let’s try to mount it and load the journal. (Do not plan to do that for a long-term use of your NAS, my immediate concern was data recovery).

I guess the mount has been performed and let’s check if I could see something in the /recovery folder:

I could see my folders (some names are changes for obvious privacy reasons) but I was wondering how to retrieve my data now …. the GUI couldn’t see a volume and I couldn’t install any package on the volume in the GUI (because it didn’t see any)…… So I couldn’t FTP at all.

So I had another empty box with Linux running on it and I decided to do some Rsync backup from the old NAS (failed volume) to another NAS.

I am so happy I retrieved the delta of data and learned from my mistakes. I need to automate the backup more frequently and on one more that 1 device. Now I just have to wait until I could copy a few TB of datas.

I am far from being a linux guru but I know my way around bash. Network Engineers, you need to understand how linux is working. I had a very interesting conversation with Pete Lumbis a year ago at the Software-Defined Enterprise Conference & Expo  about how to learn linux … Pete and I had the same observation:

Most of the course I tried about Linux were not reaching the expectation I had, I was quickly bored even though I had to keep watching/reading to understand everything. On this case I did prefer to dig the technology by reading articles and get my hands very dirty.

Nic

From Network Engineer v1.0 to v2.0

I recently relocated to the US from France/Switzerland and I have been so busy the past 2 years working on that process. Yes, It is that long! 

I have been asked about career advice twice this week and I wanted to share my thoughts about it.

Networking in 2008

I think we all agree on the fact that the networking field has been very static for the past 15 years. One of the ways to provide a better network experience to the users/applications was to add more bandwidth (or invest in WAN optimization). OSPF/BGP/EIRGRP/MPLS and spanning tree haven’t changed much since 2002 right?

 
All the networking manufacturers paradigm was all about releasing new hardware that could provide more bandwidth and availability. As an engineer, you had to know networking protocols but we also had to understand specifics of networking hardware. It was very useful to understand how the 6500 Crossbar was switching packets internally. Another example was the StackWise technology: who remembers that the 3750 v2 could not locally switch without sending packets on the ring?.

Every device had a specific function in the network for example (which is still true at some point). Engineers were doing was vendors told them to do and they had to standardize their deployment (Access – Distribution – Core). It was a safe bet to design to design a network using the 3 tiers architecture mentioned previously.

 

Some networking engineers are self-educated up to a certain point and one of the ways to learn networking back in the days was to read a Cisco Press book, buy some hardware (2950 – 3600) on eBay and do some labs on your own or using a third party training company. For these engineers, the way to get a job was to climb the traditional certification pyramid (CCENT – CCNA – CCNP – CCIE). While this is still kinda relevant, the CCIE does not automatically open doors for any jobs anymore. Matt Oswalt published a quote that makes total sense “vendor certs are basically a way of putting the vendor in control of your career. On the other hand, fundamental knowledge puts YOU in control”. 

I have a dual CCIE and studied very hard to get where I am today but the journey is far from being over (hopefully). I need to be a little less focused on proprietary certification and get some open source knowledge as well. (Damn CCDE you are tempting but I need to resist !)

Linux/Python skills were definitely not mandatory in any of the job descriptions back in the days. But as you can guess it becomes more and more a requirement nowaday.

I’ve been invited to a very interesting dinner with CIOs of Fortune 100 companies recently. They are all aware of the ongoing networking transition. They admitted it was not an easy plan to embrace this evolution but they are already preparing their teams for that.  

Speaking of technologies, which technologies are we talking about? Do we need to know everything in IT? the answer is obviously “No” but it is valuable to at least understand how all the systems are interconnecting to each other.

Here is what a job description looked like back in the days (2008):

 

The need for evolution

I am doing this blog post is because our field is changing and our skills need to evolve with the networking trends. Engineers are the core of the networking industry. We all have a critical function in every organization that is willing to undertake their “business digital” transformation. We need to prepare how to evolve with the upcoming technologies.
I am willing to create a blog post series on how to tackle your own networking evolution. Please do not get me wrong, we still need to understand bits and bytes of all the networking protocols in order to provide connectivity. This statement will never go away (hopefully) and there is no working overlay if the underlay as been designed carefully. What needs to evolve is the way we are able to provision services for our customers/users/applications. When was the last time you heard that the networking team was taking too long to provide connectivity between A and B? 

 

Networking in 2014+

Short story long, network engineers have to stay relevant throughout the years. 

Today it would be a bit different, it is definitely expected to know everything that is above right  (except maybe Cisco Works and CatOS 🙂 )

Himawan Nugroho made a great Cisco Live presentation that I attended in Milan: BRKSDN-4005 – CCIE Skill transformation to SDN kungfu. The most interesting slide for me is the following one: 

 

He confirmed what I was explaining above. You still have to be an expert at traditional routing/switching but also have a broader knowledge of the following technologies:  Linux and Operating Systems, Scripting, Overlays (proprietary and standards) and network virtualization. 

Some new protocols and ways to provide network connectivity have recently emerged. Some of them are already dead (Trill anyone ?) and other are being used worldwide in different flavors (VXLAN anyone ?). 

We see plenty of blog post related to the eternal question: Should we learn how to script/code:

My take on this is that you should be able to automate your network and most of your tasks. You should not consider going too deep (for now). We are not required to become a full-time developer.

Some of the following items you will find on this list are not necessarily new but it is something that the network engineers can’t avoid to be aware of anymore. This is by no means an exhaustive list but it gives you an indication of what the current trends are in our industry. Feel free to drop a comment if you think something valuable should be added.

.

Acquiring all of these skills do not happen overnight so I will publish quite a few blog posts about how I am preparing my own evolution. Let me know in the comments below what you liked, disliked or if you have any question.

Nic