One of things we are really proud of within the EVO:RAIL team is how easy, simple and quick it is to setup the first EVO:RAIL and then using our auto-discovery method – scale out the environment by quickly adding subsequent EVO:RAIL appliances. As ever the experience is highly dependent on the network pre-requisites being in place, and so as long as you RTFM, you should be good-to-go.

In case you don’t know, EVO:RAIL implements a version of “Zero Network Configuration” which is a widely recognized RFC standard. VMware’s implementation of this is called VMware Loudmouth (it was originally to be called LoudSpeaker) and it’s a service that runs on both EVO:RAIL’s built-in vCenter Server Appliance and on the 4-server nodes that run VMware ESXi in the 2U enclosure. Below you can see the Loudmouth status running on the first ESXi server node, and on the vCenter Server Appliance:

Screen Shot 2014-10-23 at 12.18.52

The User Guide for EVO:RAIL outlines the network requirements for the appliance, but its worth restating them here – and also flagging up the consequences if you don’t meet these pre-requisites. Firstly, EVO:RAIL requires a physical switch (which isn’t part of the offering, but most of the Qualified EVO:RAIL Partners (QEPs) would be happy to sell you one if you don’t have one already!) that supports 10Gbps networking. Both RJ-45 or SFP+ connections are supported, and some QEPs have two EVO:RAIL offerings to support these, although some just support one type at present.

IPv6 support is required on all the ports on the switch, and you need at least 8 ports free to plug-in each of the 2x10Gps nics (vmnic0/1) that the server in the EVO:RAIL uses. You may actually want to have a separate switch for the single 1Gps nic that each server node has for BMC/ILO/DRAC style access. These could be plugged into the 10Gbps switch if you have spare ports, or you might prefer to plug them into a spare 1Gbps switch if you have ports free. After all those 10Gps ports are precious… The User Guide has a good diagram that illustrates this – and of course, it is supported to have vmnic0 plugged into one switch, and vmnic1 plugged into another – to provide switch redundancy.

Screen Shot 2014-10-23 at 11.55.03

Whilst EVO:RAIL doesn’t require VLANs as such, I think it is the case that the vast majority of people will want to use them. Personally I would strongly recommend that if the switch is brand new or hasn’t been used by vSphere before you ensure that the VLAN configuration is in place BEFORE you even rack-and-stack the EVO:RAIL. This will lead to a more out-of-the-box experience.

In total you will need at least 3 VLANs to cover the vMotion, Virtual SAN and Virtual Machine traffic. Of course, you can have as many VLANs to separate your various VMs as you need. By default management traffic is not tagged by the EVO:RAIL appliance, and will reside on the default VLAN of the physical switch (this is normally VLAN0 or VLAN1 depending on the vendor). You’ll notice that in the EVO:RAIL UI there isn’t an option to set VLAN Tagging for the VMware ESXi hosts:

Screen Shot 2014-10-23 at 12.30.55

The EVO:RAIL could have set a VLAN ID, but of course doing that would have disconnected you from the appliance, and potentially it would mean a re-IP of the local management machine, and even a re-patch to the right VLAN. So by default the EVO:RAIL uses the default network of VLAN0/1 on the switch. If you want/need to change the management network – you could use the Web Client or vSphere Desktop Client to enable tagging for the Management VMkernel portgroup and the management virtual machine network. For me this is similar to the days when I used the “Ultimate Deployment Appliance” to install ESXi across the network using a PXE situation. In that setup I needed to use a native VLAN, as you don’t get VLAN tagging support in a PXE boot. For me it was easiest to use VLAN0/1 to do the install, and make my ESXi use that as the management network, rather than go through a reconfiguration afterwards.

Of course you do get the option to set the VLAN Tagging ID for vMotion, Virtual SAN and of course, for your VM Networks:

Screen Shot 2014-10-23 at 12.31.54 Screen Shot 2014-10-23 at 12.32.53 Screen Shot 2014-10-23 at 12.34.12

Now, here’s the important part – both the Virtual SAN and Management VLAN will need multicast enabled for both IPv4 and IPv6. This has always been a requirement for Virtual SAN, and the EVO:RAIL requires it for the auto-discovery process to work correctly. You could enable multicast for IPv4 and IPv6 on ALL the ports of the physical switch (this would be desirable if the customer isn’t using VLANs at all), or alternatively enable it just on the two networks that require it (Management and Virtual SAN). If you are using multiple switches ISL multicast traffic for IPv4 and IPv6 must be able to communicate between the switches.

To allow multicast traffic to pass through, you have two options for either all EVO:RAIL ports on your TOR switch or for the Virtual SAN and management VLANs (if you have VLANs configured):

1) Enable IGMP Snooping on your TOR switches AND enable IGMP Querier. By default, most switches enable IGMP Snooping, but disable IGMP Querier

alternatively

2) Disable IGMP Snooping on your switches. This option may lead to additional multicast traffic on your network.

In my experience folks often over-react to the idea of multicast. I’m not sure why that is, given that this traffic is used to transmit a relatively small amount of network traffic in the form of metadata. Perhaps old-timers like me are thinking back to the days when technologies like Symantec Ghost or multi-cast video had the potential to flatten a network with excessive multicast traffic, and bring it to its knees. But we are not talking about that sort of volume of traffic in the case of Virtual SAN or EVO:RAIL. In case you don’t know (and I didn’t before joining the EVO:RAIL team) IGMP Snooping software examines IGMP protocol messages within a VLAN to discover which interfaces are connected to hosts or other devices interested in receiving this traffic. Using the interface information, IGMP Snooping can reduce bandwidth consumption in a multi-access LAN environment to avoid flooding an entire VLAN. IGMP Snooping tracks ports that are attached to multicast-capable routers to help manage IGMP membership report forwarding. It also responds to topology change notifications. For IPv6, MLD (Multicast Listener Discovery) is essentially the same as IGMP (Internet Group Management Protocol) in IPv4.

Note: Text in italics are direct quotes from the EVO:RAIL user guide!

So the 60million dollar question. What happens if you choose to totally ignore these recommendations and pre-requisites? Well firstly, it strikes me as obvious that configuring EVO:RAIL to speak to VLANs that don’t even exist is going to result in all manner of problems. You really have two options – either do the VLAN work on the physical switch first (my personal recommendation) or do not use VLANs at all and have a flat network. What’s not a good idea is to setup EVO:RAIL without any VLAN configuration, and then reconfigure the physical switch to use them after the fact. That would mean vSphere networking on each of the physical hosts would need reconfiguring to be an area of the VLAN ID for tagging purposes on the portgroup.

The above seems pretty straightforward to me – and I’m really sorry if it sounds like teaching Grandma how to suck eggs. The multi-cast requirements are less obvious as most folks will be as new to EVO:RAIL as I am (this is my 9th week!). What happens if the multicast requirements aren’t met on either the Management or the Virtual SAN networks? Firstly on the EVO:RAIL Management network, if multicast is not enabled, then the servers that make up the EVO:RAIL will not be discovered and you will see an error message at around 2% of the configuration process.

As for Virtual SAN the EVO:RAIL configuration will continue and will complete, but you will find Virtual SAN will show that individual disk groups have been created for each appliance node, rather than one single disk group containing all the storage (HDD/SSD) for all the nodes in the cluster. It’s perhaps best to use the vSphere Web Client which is available from the EVO:RAIL Management UI to inspect the status of Virtual SAN configuration, and from there you can:

  1. Navigate to the Home tab, select Hosts & Clusters
  2. Expand the Marvin-Datacenter and Marvin-Virtual-SAN-Cluster to show you now have four hosts provided by the EVO:RAIL Appliance
  3. With the Marvin-Virtual-SAN-Cluster selected, click the Manage Tab, and under the Settings column, select General under Virtual SAN.
  4. Confirm that all 4 hosts are marked as “Eligible” and the network status is “Normal“. If multicast has not been enabled you would see the status “Misconfiguration Detected”

 

webclient

Note: This screen grab is taken from the new version of the HOL I’ve been working on.

As more realistic view from an actual EVO:RAIL appliance would look like this:

vsan-validation

If you’re seeing “Misconfiguration Detected” most likely you will find that the root of the problem is multicast related. If you are troubleshooting multicast issues with Virtual SAN, a good place to start is this blog post on the vSphere blog written by my colleague, Joe Cook:

http://blogs.vmware.com/vsphere/2014/09/virtual-san-networking-guidelines-multicast.html

Summary:

  • Get your switch configured first with the right settings. This is true of almost any hyper-converged environment.
  • If you get the 2/3% an error during the EVO:RAIL configuration process – check your multicast settings for the management network (VLAN0 or VLAN1 depending on vendor). Then click try again.
  • After the build of the first EVO:RAIL appliance, check your Virtual SAN settings to make sure there are no unpleasant messages and that all the server nodes are in the same disk group.