All posts by Nicolas Michel

Creating a Net-DevOps environment.

TL;DR : Code is here. Help yourself ๐Ÿ™‚

Introduction to the Net-DevOps Container:

Recently, Ethan Banks posted a very interesting blog post where he struggled a little bit to set up a Python environment. If I understood correctly, he wanted to increase his skills set in particular with NetDevOps. He fairly pointed that it could be a bit complicated to handle all the dependencies you might need. Based on the fact that you also want to start fresh between projects you want to spend the least amount of time resolving these kinds of issues and maximize your time on something that is valuable: Learn how to Automate or just Automate your network environment.

Credit xkcd #1987 and inspired by Ethan’s site to illustrate my point.

A lot of people are in a similar case and are not sure how to start their journey to Devops / Full stack engineer. Ivan Pepelnjak first lessons of the great Network Automation Course is inviting us to create a lab environment so that you can practice in a safe environment (aka “don’t mess with your prod”)… You can use eve-ng / GNS3/ VIRL / Vagrant to emulate the devices but I also wanted to have an environment where I could run the code …. and don’t have to rely on my corporate laptop OS.

I was in a similar case a few weeks ago and I decided to setup an environment that would fill the following requirements:

  • Must be able to run scripts
  • Changes in that environment must be quick and easy to install. We don’t have much time to troubleshoot the dependencies.
  • Must be able to edit code on my favorite code editor rather than vim (sorry guys … my beard is not long enough ๐Ÿ™‚ )
  • Must be able to run the environment on any machine (PC / MAC / Server) so that I can experience the same behavior everywhere.

All these requirements led me to the wonderful world of Docker !

By definition, Docker is a tool engineered and release with one goal in mind: “Create and Deploy application using containers”. It was natural to add Docker as a tool of the NetDevOps portfolio. It felt natural to create a “NetDevOps container that will allow me to work with efficiency in mind. I wasn’t very familiar with it until last year so it was a good opportunity to learn it.

In order to have a container, you need to choose a base image and I decided to go with Ubuntu 18.04 because I was familiar with it and enjoyed it throughout the years. Ubuntu is now very popular across the world and the community is one of the best (if not the best)

Then, I managed to install all the regular and well-known linux tools that I am using as a network engineer: fping/hping, curl, htop, iperf, netcat, nmap, openssh-client, snmp-walker (yeah !), tcpdump, tshark, telnet (!), wget, vim and zsh. I am pretty sure you used them at least once as well …

From a Net DevOps standpoint, I have installed most of the things I needed as well: Python2, Python3, Powershell (NSX), PIP, Ansible 2.7.4. Libraries are not left behind, they are critical and mandatory for your scripts to run. We are network engineers and not full time developers (or real developers I should say) so there is a chance we will use the same libraries over and over again (e.g netmiko, napalm, nornir, xmltodict, PyYAML …). Hank from Cisco DevNet has released an awesome video that demonstrates all the useful libraries a network engineer should use. I have implemented these libraries into a requirements.txt file that will be copied and installed when the docker image is build using pip. There are still some work that needs to be done in order to configure ansible up to this point but I got mainly what I need …

Demonstration:

First, I need to build the image ( I use the term “bake” when I speak about container so it helps neophytes to understand it better) so that I can consume it.

Now that you baked (built) the container using the recipe (Dockerfile), you are ready to consume the container.

We are now in the container and ready to automate !!! We have access to our networking tools as well as our NetDevOps tools

It is obvious that if you want to use that particular container, you should probably change a few things to accommodate your needs. For example this container will create a user ‘nic’ with a home directory of the same name … You might want to change that. Also, I did a mapping of my laptop drive to a folder in the container so that I could use my laptop editor to work on my code but execute it in the container.

I am still far away from being an expert in NetDevOps so if you have suggestions or comment so that I could improve this, please let me know !

Also I have uploaded a series of video of that work here. Refer to videos 3 and 4 in order to see this in action.

Recover a RAID5 Array on Linux with healthy disks

Intel Atom failures

I know the title sounds a bit weird and you may ask why would you need to recover a RAID5 array when all your disks are healthy, right?

To understand what is going on, my DS1515+ has an Intel Atom C2538. (source: Synology CPU / NAS Type). It recently caused a lot of issues in the IT industry. (remember the Cisco clock issue? ๐Ÿ™‚ )

The Errata AVR54 of the C2000 Specifications update clearly states the following: “system may experience inability to boot or may cease operation”. My NAS was starting to have regular reboots and it completely crashed before I could back up the last delta of data. 

In the first instance, Synology denied any abnormal failure rate on this specific hardware while admitting a flaw (!). Synology then extended the warranty of all the NAS platforms affected by this hardware flaw.

 

Recovering the data using the GUI. (fail)

I immediately opened a case with Synology who sent me another DS1515+ pretty quickly. I still had to pay for express shipping).

After I inserted my disks into the newly received NAS, I noticed that the new NAS was beeping and was trying to recover my RAID5 array without any luck. The DSM told me that the Raid 5 array was down but all disks healthy.


 

I waited until the parity check was performed to verify if the Synology was silently trying to recover the volume. Unfortunately after 10 hours, nothing appeared in my volume list.

I decided to dig and found plenty of useful information provided by Linux experts in the Synology community (shootout to him/her).

Here is what I have done:

Recovering the volume using the CLI. (Semi-success)

First I wanted to check my raid information:

I knew md0 and md1 (system + swap) were fine but md2 (actual data) was not behaving properly even though my disks were “fine”.  The Raid 5 state is clean and the number of disks is accurate with what I have (/dev/sd[abcde]3). Let’s find with more detail the state of the RAID 5 array:

The partitions were good when I ran an fdisk -l so I tried to stop and reassemble the RAID 5 array.

When I tried to mount my partition as mentionned in the above link from the Synology forums, I had the following issue:

So I stopped the raid array and reassembled again and try to check with dmesg what was the status of my array:

AH ! The journal has an issue when loading so let’s try to mount it and load the journal. (Do not plan to do that for a long-term use of your NAS, my immediate concern was data recovery).

I guess the mount has been performed and let’s check if I could see something in the /recovery folder:

I could see my folders (some names are changes for obvious privacy reasons) but I was wondering how to retrieve my data now …. the GUI couldn’t see a volume and I couldn’t install any package on the volume in the GUI (because it didn’t see any)…… So I couldn’t FTP at all.

So I had another empty box with Linux running on it and I decided to do some Rsync backup from the old NAS (failed volume) to another NAS.

I am so happy I retrieved the delta of data and learned from my mistakes. I need to automate the backup more frequently and on one more that 1 device. Now I just have to wait until I could copy a few TB of datas.

I am far from being a linux guru but I know my way around bash. Network Engineers, you need to understand how linux is working. I had a very interesting conversation with Pete Lumbis a year ago at the Software-Defined Enterprise Conference & Expo  about how to learn linux … Pete and I had the same observation:

Most of the course I tried about Linux were not reaching the expectation I had, I was quickly bored even though I had to keep watching/reading to understand everything. On this case I did prefer to dig the technology by reading articles and get my hands very dirty.

Nic

Sorting list in Python

During my Python studies, I came across something that didn’t make much sense to me so I had to learn and investigate (with the help of experts).

What you can usually do in Python is to modify a variable and assign the result to the same variable. Because a piece of code is usually worth much more than an explanation:

When you want to sort a list, that behavior is a bit different:

let’s pretend I have a list of ARP entries into my switch:

If I want to sort it and reassign the value of it to the previously used variable I would use this code (Let’s pretend arp_entries is my variable that contains all these entries):

According to this python official documentation, Python lists have a built-in list.sort() method that modifies the list in place. Let’s verify this:

There is also a sorted list function that can do the job if you want to keep the original list intact:

I was testing this because I am currently working on the free python class that is run by Kirk Byers at https://pynet.twb-tech.com/ . To make the most of this course, I strongly recommend that course if you have a very small experience of programming. I will talk about that in a next blog post but in the meantime, have a look at kirk’s website. It’s awesome!

Thanks to Kirk, Nicholas Russo and Greg Mueller for the hints and help provided on slack ( Network to Code ran by Jason Edelman )           

Nic

Hyper-converged infrastructure โ€“ Part 2 : Planning an Cisco HyperFlex deployment

I recently got the chance to deploy a Cisco HyperFlex solution that is composed of 3 Cisco HX nodes in my home lab. As a result, I wanted to share my experience with that new technology (for me). If you do not really know what all this “Hyperconverged Infrastructure hype” is all about, you can read an introduction here.

Cisco eased our job by releasing a pre installation spreadsheet and it is very important to read that document with great attention. It will allow you to prepare the baseline of your HC infrastructure. The installation is very straightforward once all the requirements are met. The HX infrastructure has an important peculiarity, it is very very very (did I say very) sensitive …. if one single requirement is not met, the installation will stall and you will be in a delicate situation because you could have to wipe the servers and restart the process. As a result, you could lose precious hours.

Cisco has a way to automate the deployment and to manage your HX cluster.Finally, The HX installer will interact with the Cisco UCSM, the vCenter, and the Cisco HX Servers.

It is especially relevant to note that the Cisco HX servers are tightly integrated with all the components described in the picture below:

HyperFlex Software versions.

As usual with this kind of deployment, you have to make sure that every version running in your environment is supported.  We will run the 2.1(1b) version in our lab and will upgrade to 2.5 at a later time. We need to make sure that our FI UCS Manager is running 3.1(2g).

In addition, the dedicated vCenter that we will use is running the release 6.0 U3 with Enterprise plus licenses.

Nodes requirements.

You cannot install less than 3 nodes in a Cisco HyperFlex Cluster. Because the HX solution is very sensitive, it is mandatory to have some consistency across the nodes regarding the following parameters:

  • VLAN IDs
  • Credentials 
  • SSH must be enabled
  • DNS and NTP
  • VMware vSphere installed.

Network requirements.

First of all, the HyperFlex solutions require several subnets to manage and operate the cluster.

We will segment these different types of traffic using 4 vlans:

  • Management Traffic subnet: This dedicated subnet will be used in order for the vCenter to contact the ESXi server. It will also be used to manage the storage cluster.
    • VLAN 210: 10.22.210.0/24
  • Data Traffic subnet: This subnet is used to transport the storage data and HX Data Platform replication
    • VLAN 212: 10.22.212.0/24
  • vMotion Network: Explicit
    • VLAN 213: 10.22.213.0/24
  • VM Network: Explicit
    • VLAN 211: 10.22.211.0/24

Here is how we will assign IP addresses to our cluster:

UCSM Requirements.

We also need to assign IP addresses for the UCS Manager Fabric Interconnect that will be connected to our Nexus 5548:

  • Cluster IP Address: 
    • 10.22.210.9
  • FI-A IP Address:
    • 10.22.210.10
  • FI-B IP Address:
    • 10.22.210.11
  • A pool of IP for KVM:
    • 10.22.210.15-20
  • MAC Pool Prefix:
    • 00:25:B5:A0

 

DNS Requirements.

It is a best practice to use DNS entries in your network to manage your ESXi servers. Here we will use 1 DNS A records per nodes to manage the ESXi server. The vCenter, Fabric Interconnect and HX Installer will also have one.

The list below will show all the DNS entries I have used for this lab:

  • srv-hx-fi
    • 10.22.210.9
  • srv-hx-fi-a
    • 10.22.210.10
  • srv-hx-fi-b
    • 10.22.210.11
  • srv-hx-esxi-01
    • 10.22.210.30
  • srv-hx-esxi-02
    • 10.22.210.31
  • srv-hx-esxi-03
    • 10.22.210.32
  • srv-hx-installer
    • 10.22.210.211
  • srv-hx-vc
    • 10.22.210.210

This sounds very basics and as a consequence, it is CRITICAL that these steps are performed PRIOR any deployment otherwise you will waste a lot of time trying to recover (at some point you would have to wipe your servers and reinstall a custom ESXi image on each one). 

Finally, In the next blog post, I will show how to install the vCenter, The Fabric Interconnect and the HX installer needed for the HyperFlex deployment.

In conclusion, do not hesitate to leave a comment to let me know if you encountered any issue while planning your deployment.

Thanks for reading!  

Hyper-converged infrastructure – Part 1 : Is it a real thing ?

Recently I was lucky enough to play with Cisco Hyperflex in a lab and since it was funny to play with, I decided to write a basic blog post about the hyper-converged infrastructure concept (experts, you can move forward and read something else ๐Ÿ™‚ ). It has really piqued my interest. I know I may be late to the game but better late than never right? ๐Ÿ™‚

Legacy IT Infrastructure

Back in the days, you had to have separate silo to maintain a complete infrastructure (it is still true by the way, but it tends to become more and more frequent that networks, servers, and storage are progressively forming a single IT platform …. sorry I meant “cloud”):

  • Compute(System and Virtualization)
  • Storage
  • Network (Network and Security)
  • Application

You had to install and maintain multiple sub infrastructures in order to run the IT services in your company. 

If  you wanted to deploy a greenfield infrastructure for your data center, here is a brief summary of what you needed:

  • Physical servers (Owners: System team)
  • Hypervisors (Owners: System team)
  • Operating system (Owners: System team) 
  • Network infrastructure (Owners: Network team)
    • Routing – Switching
    • Security (VPN, Cybersecurity)
    • Load Balancers
  • Storage arrays (Owners: Storage team)
  • Applications for the business to run. (Owners: IT applications team)

Each silo has its own experts and language (LUN + FLOGI vs GPO + AD vs OSPF, BGP and TLS). As you can guess, it was a bit complicated and long to provision new applications and services for any business (even in a brownfield IT environment). Once everything was running, the IT team was in charge to maintain the infrastructure and one of the drawback was dealing with several manufacturers (and potentially partners) to maintain your infrastructure…. 

Converged Infrastructure and simplification

In the late 2000s, famous manufacturers saw an opportunity to simplify the complexity of the complete data center stack and converged infrastructure was born.

With the emergence of cloud applications, EMC and Cisco created a joint venture Acadia that will later be renamed VCE for (VMware, Cisco, EMC). The purpose of that company was to sell converged infrastructure products. Vblock was the flagship product. As you know, you could buy an already provisioned rack that was customized according to your preferences. The vBlock was composed of the following individual products:

  • Storage Array: EMC VNX/VMAX 
  • Storage Networking: Cisco Nexus, Cisco MDS
  • Servers: Cisco UCS C or UCS B
  • Networking: Cisco Nexus
  • Virtualization: vSphere

VCE was in charge of configuring (or customizing I should say) the vBlock according to your need and preference.

Once the network was delivered, you “just” had to plug it in your data center networking infrastructure and everything should be connected. Servers were ready to be deployed.

Going that way, you could save time and trouble. Agility is also a big selling point for these kinds of architectures. 

As you can see, the footprint for these products was still consequent. in this case, you had to deal with a single manufacturer but the main drawback is the product flexibility. You could not install any version on your Cisco Nexus because VCE was very strict on the supported version.

Hyper-converged Infrastructure and  horizontal scaling

Hyper-converged is a term that has been rolling since 2012. The main difference between converged and hyper-converged infrastructure is definitely the storage 

  • Converged infrastructure:
    • Centralized array accessible using a traditional storage network (FC with FSPF or ISCSI/NFS)
  • Hyper-converged infrastructure:
    • Distributed drives in each servers forming a centralized file system.

Hyper-converged system has the ability to be adaptable. The way it scales is horizontal while reducing the footprint by a significant amount. If you just want to try it, just perform a setup with few hosts and if the solution works for you, just add nodes to the cluster horizontally and you will increase your performance and redundancy.  This way, you can consolidate your compute and storage infrastructure.

Horizontal scaling is a familiar concept for many network engineers (Clos Fabrics anyone?)

In my opinion, it is a natural evolution of the Data Center compute and storage infrastructure.

There are several “Hyper-converged” manufacturers on the market:

My next post will be about deploying a Cisco Hyperflex infrastructure.

Thanks for reading !