All posts by Nicolas Michel

Cisco Meraki vMX 100 deployment in Azure

Generalities

There are many ways to connect your “on Premises” Data Center workloads with Microsoft Azure. I own the full meraki suite at home and have enjoyed it for the past three years. It provides all the features I need. I also have some workloads in Microsoft Azure and wanted to access them using a private and encrypted network instead of accessing them using their public IP. Meraki have the possibility to deploy a vMX 100 in Microsoft Azure. You can deploy a vMX100 either in Azure or in AWS and it will be part of your full mesh VPN as any other MX device that you own.

It can support up to 500 Mbps of VPN throughput which can be sufficient for a lot of organizations. From a licensing standpoint, you just need a Meraki License : LIC-VMX100-1YR (1 Year), LIC-VMX100-3YR (3 Years), LIC-VMX100-5YR (5 years). Microsoft will charge you monthly for the VM application.

From a design standpoint, the traditionnal Meraki MX appliances can be configured either in VPN concentrator or in NAT mode. The NAT mode concentrator has 2 interfaces (upstream and downstream) and performs Network Address Translation as you would do with a traditionnal firewall. In the concentrator mode the MX has a single interface connection to the upstream network. This mode is the only supported mode for the vMX100 in Microsoft Azure

Limitations

When you deploy the vMX 100 in your Azure and Meraki infrastructure for the first time, it works pretty well and the vMX 100 is able to fetch its configuration pretty quickly. If you delete all the objects and start from scratch, this will trigger a bug that is being identified by Meraki. Although I don’t have the technical details, the Meraki TAC will manually apply a fix that will trigger a synchronisation between the Meraki cloud and the vMX 100 in Microsoft Azure.

Logical Diagram

The below diagram is not 100% accurate based on the fact that the vMX100 supports only the one arm VPN concentrator mode. From a logical standpoint it represents a general good idea of what we are trying to achieve here. The Home internal network would use 192.168.10.0/24, the servers in Azure would use 172.16.10.0/24 and the vMX100 would use 172.16.0.0/24 with a single interface both for the downstream and upstream traffic. We will see how can we interconnect the Azure Linux virtual machines with the Meraki vMX 100 single arm VPN concentrator in details.

Initial Setup – Meraki

Initially, you would need to install the vMX100 license received from Cisco to the Meraki Dashboard.

Cisco Meraki Dashboard – Licensing – Adding the vMX100

When the vMX100 license is installed, we can claim that device. We will do that in a new network. It is important that the network type is setup as “Security appliance” with a default configuration.

Cisco Meraki Dashboard – Adding a new network

We can now see that the appliance is ready in the Meraki Dashboard and that it will come with a basic configuration. It is now time to deploy the vMX in Azure.

Cisco Meraki Dashboard – Appliance

Initial Setup – Microsoft Azure

As you can see in the screenshot below, our Azure infrastructure is empty and we will configure it so that it can host the vMX100 and some servers.

Azure – Resource Group initial

The Cisco Meraki vMX100 is available publicly on the Azure public catalogs as a managed application. It means that when you deploy the vMX100, a dedicated resource group will be created specifically for that service. That resource group will host every crucial component of the solution (Virtual Machine – Storage – Networking)

First we will create a dedicated resource group and virtual network for the vMX Network Interface (172.16.0.0/24)

Azure – Resource group

Once the resource group for the vMX interface is created, we need to create a Virtual Network (vNET) for it.

Azure – vNet Meraki LAN creation – Step 1

Azure – vNet Meraki LAN creation – Step 2
Azure – vNet Meraki LAN creation – Step 3

In this step, make sure you specify the right subnet for your Meraki vMX interface, it will be assigned automatically to the vMX when it will be deployed. In our example, the Meraki interface will use an IP address in the 172.16.0.0/24 range.

Azure – vNet Meraki LAN creation – Step 4
Azure – vNet Meraki LAN creation – Step 5
Azure – vNet Meraki LAN creation – Step 6

When the resource group and virtual network are created, we are ready to install our vMX 100 appliance in Microsoft azure.

This is what we have created so far.

vMX100 deployment in Microsoft Azure

We are now ready to deploy our vMX 100 in Microsoft Azure as a managed application. A token must be generated from the Meraki dashboard in order to identify your tenant when you deploy the vMX 100. When you generate the vMX100 token, you have 1 hour to deploy the virtual machine in Azure or the token will no longer be valid.

Meraki Azure – vMX100 Token
Meraki Azure – vMX100 Deployment 1

When it comes to configuring the basic settings of the vMX100, you will need to enter the Meraki Token that has been generated previously (reminder: This token has a lifetime of 1 hour). the resource group needed for the vMX 100 needs to be NEW and empty, you cannot reuse the previously created resource group for the Meraki interface. The reason behind is that vMX100 will be a managed applications and require its own resource group.

Meraki Azure – vMX100 Deployment 2
Meraki Azure – vMX100 Deployment 3

After that, you need to map the right vNet and subnet for the virtual machines. Here, you will reuse the previously created objects:

Meraki Azure – vMX100 Deployment 4

Next, you specify the size of the virtual machine you need. Meraki doesn’t specify if there is a different performance specifcations for each size so I went with the cheapest.

Meraki Azure – vMX100 Deployment 5

Once everything is setup, finish the process by buying the vMX100 subscription.

Meraki Azure – vMX100 Deployment 6

Wait for the virtual machine deployment completion and check in the meraki dashboard if the vMX100 in Azure has successfully fectched its configuration via the Meraki dashboard.

Meraki Azure – vMX100 Deployment 7

Up to this point, this is what has been created in Microsoft Azure.

Meraki Azure – vMX100 Deployment 8

Let’s verify in the Meraki Dashboard if the vMX 100 is online and able to fetch its configuration.

Meraki Azure – vMX100 Deployment 9

If you browse to the public IP of the vMX100, you will be able to see if it’s healthy and download some logs if needed (the serial number of the appliance is the login credential, there is no password).

Meraki Azure – vMX100 Verifications

VPN Configuration

We can now start configuring the actual VPN and deploy some virtual machines. Make sure that both the vMX100 and the other Meraki Security Appliances (MX) are part of the VPN and are configured as hubs.

Meraki Azure – VPN Configuration.

We can check the VPN status in the meraki dashboard.

Meraki Azure – VPN Status


Now that the VPN is up we can verify by pinging the vMX100 interface

Virtual machine deployment

It is now time to deploy some virtual machines in Azure and create the peering between them and the Meraki vMX100.

In order to do that, we need to deploy a resource group and a virtual network. These 2 objects will be used by the linux virtual machine that will be hosted in our Microsoft Azure instance. The subnet used inside the vNet will be 172.16.10.0/24

Creating a resource group for the Azure Servers
Creating a vNet for the Azure Servers
Creating a vNet for the Azure Servers

Now that we have the underlying infrastructure ready for the servers, we can deploy the virtual machines:

Meraki Azure – VM deployment 1
Meraki Azure – VM deployment – Disk
Meraki Azure – VM deployment – Network (172.16.10.0/24)
Meraki Azure – VM deployment

Azure Routing Table and vNet Peering

The last step is to create a route for the internal home network that will point to the single network interface of the vMX100 (in Cisco world, that would mean : ip route 192.168.0.0 255.255.0.0 172.16.0.4). A peering between both virtual network Azure Meraki Lan and Azure Servers is also mandatory to create the virtual communication between them

Route Table Creation

Meraki Azure – Route Table

The route table must belong to the vNet previously created.

Meraki Azure – Route table 2
Meraki Azure -Route table 3
Meraki Azure – Route table 4
Meraki Azure – Route table – ip route 192.168.0.0 255.255.0.0 172.16.10.4

Meraki Azure – Route table

The Route table has been now created, we need to associate the accurate server subnets to that route table.

Meraki Azure – Route Table and Subnet Association
Meraki Azure – Route Table and Subnet associated

vNet Peering

Finally, The last task that needs to be achieved in order to provide connectivity between Azure and your on premises network, is to create a peering between the 2 vNets previously created.

On the Azure GUI, you will be able to create two peering in a single task, one for each direction (Servers to Meraki LAN and Meraki LAN to Servers).

Meraki Azure – Peering configuration

Verify that the Peerings are in the connected state

Meraki Azure – Peering verification

Final representation – Microsoft Azure Objects

Here is a reprensantation of the objects we have created in Azure so far:

Testing

Finally, we can test if we have the connectivity to Azure using a Virtual Private Network.

This was the manual way of interconnecting your Azure instances and your Home or Data Center workloads, it is definitely possible to automate it. Let me know what you think or if you have a question.

Creating a Net-DevOps environment.

TL;DR : Code is here. Help yourself 🙂

Introduction to the Net-DevOps Container:

Recently, Ethan Banks posted a very interesting blog post where he struggled a little bit to set up a Python environment. If I understood correctly, he wanted to increase his skills set in particular with NetDevOps. He fairly pointed that it could be a bit complicated to handle all the dependencies you might need. Based on the fact that you also want to start fresh between projects you want to spend the least amount of time resolving these kinds of issues and maximize your time on something that is valuable: Learn how to Automate or just Automate your network environment.

Credit xkcd #1987 and inspired by Ethan’s site to illustrate my point.

A lot of people are in a similar case and are not sure how to start their journey to Devops / Full stack engineer. Ivan Pepelnjak first lessons of the great Network Automation Course is inviting us to create a lab environment so that you can practice in a safe environment (aka “don’t mess with your prod”)… You can use eve-ng / GNS3/ VIRL / Vagrant to emulate the devices but I also wanted to have an environment where I could run the code …. and don’t have to rely on my corporate laptop OS.

I was in a similar case a few weeks ago and I decided to setup an environment that would fill the following requirements:

  • Must be able to run scripts
  • Changes in that environment must be quick and easy to install. We don’t have much time to troubleshoot the dependencies.
  • Must be able to edit code on my favorite code editor rather than vim (sorry guys … my beard is not long enough 🙂 )
  • Must be able to run the environment on any machine (PC / MAC / Server) so that I can experience the same behavior everywhere.

All these requirements led me to the wonderful world of Docker !

By definition, Docker is a tool engineered and release with one goal in mind: “Create and Deploy application using containers”. It was natural to add Docker as a tool of the NetDevOps portfolio. It felt natural to create a “NetDevOps container that will allow me to work with efficiency in mind. I wasn’t very familiar with it until last year so it was a good opportunity to learn it.

In order to have a container, you need to choose a base image and I decided to go with Ubuntu 18.04 because I was familiar with it and enjoyed it throughout the years. Ubuntu is now very popular across the world and the community is one of the best (if not the best)

Then, I managed to install all the regular and well-known linux tools that I am using as a network engineer: fping/hping, curl, htop, iperf, netcat, nmap, openssh-client, snmp-walker (yeah !), tcpdump, tshark, telnet (!), wget, vim and zsh. I am pretty sure you used them at least once as well …

From a Net DevOps standpoint, I have installed most of the things I needed as well: Python2, Python3, Powershell (NSX), PIP, Ansible 2.7.4. Libraries are not left behind, they are critical and mandatory for your scripts to run. We are network engineers and not full time developers (or real developers I should say) so there is a chance we will use the same libraries over and over again (e.g netmiko, napalm, nornir, xmltodict, PyYAML …). Hank from Cisco DevNet has released an awesome video that demonstrates all the useful libraries a network engineer should use. I have implemented these libraries into a requirements.txt file that will be copied and installed when the docker image is build using pip. There are still some work that needs to be done in order to configure ansible up to this point but I got mainly what I need …

Demonstration:

First, I need to build the image ( I use the term “bake” when I speak about container so it helps neophytes to understand it better) so that I can consume it.

Now that you baked (built) the container using the recipe (Dockerfile), you are ready to consume the container.

We are now in the container and ready to automate !!! We have access to our networking tools as well as our NetDevOps tools

It is obvious that if you want to use that particular container, you should probably change a few things to accommodate your needs. For example this container will create a user ‘nic’ with a home directory of the same name … You might want to change that. Also, I did a mapping of my laptop drive to a folder in the container so that I could use my laptop editor to work on my code but execute it in the container.

I am still far away from being an expert in NetDevOps so if you have suggestions or comment so that I could improve this, please let me know !

Also I have uploaded a series of video of that work here. Refer to videos 3 and 4 in order to see this in action.

Recover a RAID5 Array on Linux with healthy disks

Intel Atom failures

I know the title sounds a bit weird and you may ask why would you need to recover a RAID5 array when all your disks are healthy, right?

To understand what is going on, my DS1515+ has an Intel Atom C2538. (source: Synology CPU / NAS Type). It recently caused a lot of issues in the IT industry. (remember the Cisco clock issue? 🙂 )

The Errata AVR54 of the C2000 Specifications update clearly states the following: “system may experience inability to boot or may cease operation”. My NAS was starting to have regular reboots and it completely crashed before I could back up the last delta of data. 

In the first instance, Synology denied any abnormal failure rate on this specific hardware while admitting a flaw (!). Synology then extended the warranty of all the NAS platforms affected by this hardware flaw.

 

Recovering the data using the GUI. (fail)

I immediately opened a case with Synology who sent me another DS1515+ pretty quickly. I still had to pay for express shipping).

After I inserted my disks into the newly received NAS, I noticed that the new NAS was beeping and was trying to recover my RAID5 array without any luck. The DSM told me that the Raid 5 array was down but all disks healthy.


 

I waited until the parity check was performed to verify if the Synology was silently trying to recover the volume. Unfortunately after 10 hours, nothing appeared in my volume list.

I decided to dig and found plenty of useful information provided by Linux experts in the Synology community (shootout to him/her).

Here is what I have done:

Recovering the volume using the CLI. (Semi-success)

First I wanted to check my raid information:

I knew md0 and md1 (system + swap) were fine but md2 (actual data) was not behaving properly even though my disks were “fine”.  The Raid 5 state is clean and the number of disks is accurate with what I have (/dev/sd[abcde]3). Let’s find with more detail the state of the RAID 5 array:

The partitions were good when I ran an fdisk -l so I tried to stop and reassemble the RAID 5 array.

When I tried to mount my partition as mentionned in the above link from the Synology forums, I had the following issue:

So I stopped the raid array and reassembled again and try to check with dmesg what was the status of my array:

AH ! The journal has an issue when loading so let’s try to mount it and load the journal. (Do not plan to do that for a long-term use of your NAS, my immediate concern was data recovery).

I guess the mount has been performed and let’s check if I could see something in the /recovery folder:

I could see my folders (some names are changes for obvious privacy reasons) but I was wondering how to retrieve my data now …. the GUI couldn’t see a volume and I couldn’t install any package on the volume in the GUI (because it didn’t see any)…… So I couldn’t FTP at all.

So I had another empty box with Linux running on it and I decided to do some Rsync backup from the old NAS (failed volume) to another NAS.

I am so happy I retrieved the delta of data and learned from my mistakes. I need to automate the backup more frequently and on one more that 1 device. Now I just have to wait until I could copy a few TB of datas.

I am far from being a linux guru but I know my way around bash. Network Engineers, you need to understand how linux is working. I had a very interesting conversation with Pete Lumbis a year ago at the Software-Defined Enterprise Conference & Expo  about how to learn linux … Pete and I had the same observation:

Most of the course I tried about Linux were not reaching the expectation I had, I was quickly bored even though I had to keep watching/reading to understand everything. On this case I did prefer to dig the technology by reading articles and get my hands very dirty.

Nic

Sorting list in Python

During my Python studies, I came across something that didn’t make much sense to me so I had to learn and investigate (with the help of experts).

What you can usually do in Python is to modify a variable and assign the result to the same variable. Because a piece of code is usually worth much more than an explanation:

When you want to sort a list, that behavior is a bit different:

let’s pretend I have a list of ARP entries into my switch:

If I want to sort it and reassign the value of it to the previously used variable I would use this code (Let’s pretend arp_entries is my variable that contains all these entries):

According to this python official documentation, Python lists have a built-in list.sort() method that modifies the list in place. Let’s verify this:

There is also a sorted list function that can do the job if you want to keep the original list intact:

I was testing this because I am currently working on the free python class that is run by Kirk Byers at https://pynet.twb-tech.com/ . To make the most of this course, I strongly recommend that course if you have a very small experience of programming. I will talk about that in a next blog post but in the meantime, have a look at kirk’s website. It’s awesome!

Thanks to Kirk, Nicholas Russo and Greg Mueller for the hints and help provided on slack ( Network to Code ran by Jason Edelman )           

Nic

Hyper-converged infrastructure – Part 2 : Planning an Cisco HyperFlex deployment

I recently got the chance to deploy a Cisco HyperFlex solution that is composed of 3 Cisco HX nodes in my home lab. As a result, I wanted to share my experience with that new technology (for me). If you do not really know what all this “Hyperconverged Infrastructure hype” is all about, you can read an introduction here.

Cisco eased our job by releasing a pre installation spreadsheet and it is very important to read that document with great attention. It will allow you to prepare the baseline of your HC infrastructure. The installation is very straightforward once all the requirements are met. The HX infrastructure has an important peculiarity, it is very very very (did I say very) sensitive …. if one single requirement is not met, the installation will stall and you will be in a delicate situation because you could have to wipe the servers and restart the process. As a result, you could lose precious hours.

Cisco has a way to automate the deployment and to manage your HX cluster.Finally, The HX installer will interact with the Cisco UCSM, the vCenter, and the Cisco HX Servers.

It is especially relevant to note that the Cisco HX servers are tightly integrated with all the components described in the picture below:

HyperFlex Software versions.

As usual with this kind of deployment, you have to make sure that every version running in your environment is supported.  We will run the 2.1(1b) version in our lab and will upgrade to 2.5 at a later time. We need to make sure that our FI UCS Manager is running 3.1(2g).

In addition, the dedicated vCenter that we will use is running the release 6.0 U3 with Enterprise plus licenses.

Nodes requirements.

You cannot install less than 3 nodes in a Cisco HyperFlex Cluster. Because the HX solution is very sensitive, it is mandatory to have some consistency across the nodes regarding the following parameters:

  • VLAN IDs
  • Credentials 
  • SSH must be enabled
  • DNS and NTP
  • VMware vSphere installed.

Network requirements.

First of all, the HyperFlex solutions require several subnets to manage and operate the cluster.

We will segment these different types of traffic using 4 vlans:

  • Management Traffic subnet: This dedicated subnet will be used in order for the vCenter to contact the ESXi server. It will also be used to manage the storage cluster.
    • VLAN 210: 10.22.210.0/24
  • Data Traffic subnet: This subnet is used to transport the storage data and HX Data Platform replication
    • VLAN 212: 10.22.212.0/24
  • vMotion Network: Explicit
    • VLAN 213: 10.22.213.0/24
  • VM Network: Explicit
    • VLAN 211: 10.22.211.0/24

Here is how we will assign IP addresses to our cluster:

UCSM Requirements.

We also need to assign IP addresses for the UCS Manager Fabric Interconnect that will be connected to our Nexus 5548:

  • Cluster IP Address: 
    • 10.22.210.9
  • FI-A IP Address:
    • 10.22.210.10
  • FI-B IP Address:
    • 10.22.210.11
  • A pool of IP for KVM:
    • 10.22.210.15-20
  • MAC Pool Prefix:
    • 00:25:B5:A0

 

DNS Requirements.

It is a best practice to use DNS entries in your network to manage your ESXi servers. Here we will use 1 DNS A records per nodes to manage the ESXi server. The vCenter, Fabric Interconnect and HX Installer will also have one.

The list below will show all the DNS entries I have used for this lab:

  • srv-hx-fi
    • 10.22.210.9
  • srv-hx-fi-a
    • 10.22.210.10
  • srv-hx-fi-b
    • 10.22.210.11
  • srv-hx-esxi-01
    • 10.22.210.30
  • srv-hx-esxi-02
    • 10.22.210.31
  • srv-hx-esxi-03
    • 10.22.210.32
  • srv-hx-installer
    • 10.22.210.211
  • srv-hx-vc
    • 10.22.210.210

This sounds very basics and as a consequence, it is CRITICAL that these steps are performed PRIOR any deployment otherwise you will waste a lot of time trying to recover (at some point you would have to wipe your servers and reinstall a custom ESXi image on each one). 

Finally, In the next blog post, I will show how to install the vCenter, The Fabric Interconnect and the HX installer needed for the HyperFlex deployment.

In conclusion, do not hesitate to leave a comment to let me know if you encountered any issue while planning your deployment.

Thanks for reading!