This is a topic I’ve written about previously in a blogpost called “Getting the RTFM In place”. Typically enterprise products are generally easy to setup so long as you have got your pre-requisites in place – fail to do that and your mileage will vary significantly. You’d be surprised (or perhaps not!) how frequently things go awry in life simple because the basics haven’t even been addressed. So despite having written about this before I want to touch on this topic again – this time adding steps the guy charged with the setup routine can follow. My apologies in advance if anyone thinks I’m teaching Grandma how to suck eggs. To be honest most of this is well documented in our EVO:RAIL User Guide. But heck, who downloads and reads an 8-page PDF before holding their nose and jumping in with both feet?
So the first thing you need to confirm is – do you have a 10GbE switch? EVO:RAIL requires a 10GbE switch primarily to ensure VSAN performs at optimum performance under ALL conditions (for instance a rebuild caused by a hardware failure…). But also because EVO:RAIL requires IPv4/IPv6 Multicast for the auto-discovery process carried out by the VMware Loudmouth service/daemon. In case you don’t know, VMware Loudmouth is an implementation of the “Zero Network Configuration” which is used for three core processes:
- 1st appliance build,
- Adding an additional appliance
- Adding a replacement node (where the original node has experienced a hardware failure).
This 10GbE would need at least 8 free ports (RJ-45 or SFP+) to plug in the 2x10Gps ports on each one of the 4-nodes in the 2U chassis (For the mathematically challenged that’s 2×4=8). Optionally, you would need to find a further 4-ports for the BMC (DRAC/ILO/RAC). Of course, you need a spare port for communicating to the EVO:RAIL Configuration UI (this is the VMware-Marvin daemon that runs inside the vCenter Server Appliance which is on the same management network as the ESXi hosts.)
Note: This screen grab comes from our new “New User Guide” available here: http://www.vmware.com/files/pdf/products/evorail/vmware-evorail-network-user-guide.pdf
The first EVO:RAIL you setup is likely to be your first ever experience of the appliance. True, you can get a simulated experience using the VMware Hands-on-Lab, but that does not perfectly match the real deal. As ever, simulation is never the same as the real experience (no sniggering at the back there, boy!). ESXi is pre-installed at the factory. So you might ask what kind of TCP-IP configuration is in place by default. The answer is nothing. The default management network (vmk0 on Standard Switch vSwitch0) is set to use a DHCP. If there is no DHCP on the default (V)LAN for the management network the ESXi host will default to using a “Link Local” address for IPv4 based on the 169.254.y.z format. It should also receive an IPv6 “Link Local” address at the same time.
Q. Do you need to set a static IP address of any type for the ESXi host before progressing to the EVO:RAIL Configuration UI?
A. No.
In fact doing so is likely to complicate matters, and could indeed cause problems. The EVO:RAIL Configuration will automagically create a new “Management Network” using a static IP address you provide the EVO:RAIL configuration UI. At the same time it renames the old “Management Network” to be called the “Marvin Management” network.
The main thing I would say is that if people are experiencing network problems it’s very tempting to “temporarily” assign a manual IP to the each of the EVO:RAIL nodes to carry out basic ping tests. This isn’t a good idea because
a.) It can create more problems than it resolves especially if the issue resides at a physical level – and
b.) The EVO:RAIL uses auto-IP and these auto-IP address should be used to the same diagnostic tests anyway. So what kind of validation of the network could be done before you even start to touch the EVO:RAIL Configuration UI?
Test1. Ping between EVO:RAIL Nodes:
Well, first you could get a BMC connection going to each of the EVO:RAIL Nodes. This would at least show you the ESXi “Direct Console User Interface” (DCUI). What you should see is that the yet to be configured ESXi host would have an auto-IP address for IPv4/IPv6. If a DHCP server is present on the network this is likely to be a valid IP address for that network – the IPv6 address is likely to be auto-IP assigned address unless you’re so ahead of the curve that you have IPv6 DHCP scopes listening on your network.
By default SSH is NOT enabled in VMware ESXi if you want to do tests such as a ping using these auto-IP addresses. So you have two options. You can use the DCUI with a BMC connection and use Alt+F1 to start an interactive command prompt. Then login as ‘root’ with the EVO:RAIL’s default password of “Passw0rd!”. To do that you need to enable the ESXi Shell
Alternatively, you could give your workstation a valid IP address and use the vSphere Client to connect to each of the nodes, and temporarily enable SSH. Remember SSH can be temporarily enabled using the DCUI – I would recommend turning off SSH once you’re done, as this is best practice generally – plus it also creates unhelpful health alarms in vCenter and in the EVO:RAIL Management UI. This is why I prefer to do this via the DCUI Console.
From there you could carry out basic tests such as pinging each of the remaining 4-nodes in the chassis. This at least tests basic connectivity and ensures that the four nodes can communicate with each other.
- If this fails it’s likely one of the nodes isn’t plugged in properly at a physical level…
- Or that one of the physical ports is exclusively set for a particular VLAN and isn’t available to the default management (V)LAN.
Test2. Ping from vCenter to each of the EVO:RAIL Nodes:
Another test you could do is to validate if EVO:RAIL’s built-in vCenter can communicate to the EVO:RAIL nodes. Unlike the EVO:RAIL nodes in the appliance, the vCenter Server Appliance defaults to using an static IPv4 and dynamic IPv6 IP address assigned auto-IP. There is no need to re-IP the vCSA from its default IP address of 192.168.10.200. The EVO:RAIL Configuration engine can automatically re-IP the vCenter (if required).
Given that the nodes in the EVO:RAIL are unlikely to have an IPv4 address that can reside in the same range – the best way to do a ping test here is to use the automatically assigned IPv6 address to vCSA. This can be done by first using the vSphere Client to access node01, and then using a remote console to get a window on the vCenter Server Appliance.
You can log into the vCenter using ‘root’ and the default password set by the QEP at the factory that is ‘Passw0rd!’. Using the “ip” command you can retrieve the IPv6 IP address, and then use this to ping the vCSA from one of the EVO:RAIL nodes.
This should reproduce replies from each of the EVO:RAIL nodes. Once again, this should result in valid responses. In EVO:RAIL each of the four nodes resides on the same network as the vCenter Appliance by default. If not then the configuration and validation would fail. Again if there is a failed response this is likely to indicate:
- One of the nodes isn’t plugged in properly at a physical level…
- One of the physical ports is exclusively set for a particular VLAN and isn’t available to the default management (V)LAN.
Test3: Multicast on the Management Network
One of the requirements for EVO:RAIL is support for IPv4 and IPv6 on the Management Network. You can run IPv6 ping tests using a multicast IP address to see what responses come back on the network. This test might not work if support for this functionality has been turned off by whoever manages the switch. It’s worth mentioning that ALL nodes with a valid IPv6 Multicast address could respond, so you would see replies not just from the EVO:RAIL nodes, the vCSA and Log Insight, but other IPv6 enabled devices too such as a router, switch IP address and others.
In this case ping -6 –I vmk0 ff02::1 is used on node01 in the EVO:RAIL.
In the screen grab b6e=node01, b72=node02, b76=node03, b7a=node04, 14d7=vCenter. There is a response from 1b64 and I believe this my router
- If this fails it’s likely one of the nodes isn’t plugged in properly at a physical level…
- One of the physical ports is exclusively set for a particular VLAN and isn’t available to the default (V)LAN.
- Multicast for IPv4/6 isn’t enabled on this network
- …or this test doesn’t work because the functionality required for it on the physical switch has been turned off…
Another way to test for multicast functionality is using the tcpdump utility on an ESXi host. This allows you to listen for VMware Loudmouth traffic. As you might remember “VMware Loudmouth” is an implementation of Zero Network Configuration where systems can communicate to each other without the need for IP address that assigned to that subnet (such as 192, 172, 10) and instead use multicast network as the way to communicate. VMware Loudmouth listens on the UDP Port of 5353. The exact command is:
tcpdump-uw –i vmk0 -s0 -t -c 20 udp port 5353
Note: -i (interface), -s0 (entire packet), -t ( (no timestamp) –c (frames to capture)
Conclusion:
In this article I’ve focused on the multicast requirement for management traffic, but of course, VSAN has similar requirements too. If after EVO:RAIL has completed its work, and you have issues with VSAN – you might be interested in a blogpost from the VSAN team that shows how to do diagnostic checks for VSAN multicast networking:
http://blogs.vmware.com/vsphere/2014/09/virtual-san-networking-guidelines-multicast.html
If all is right in the world the EVO:RAIL appliance should take 15 minutes or less to setup and configure. However, like any technology, it has pre-requisites especially around the networking. What I’ve tried to do in the blogpost is peel back the curtain to expose those pre-requisites and offer up some practical steps that can be used to validate that networking. I suspect over time that EVO:RAIL will do even more validation of pre-requisites than it already does today. I also suspect that customers who are keen to use EVO:RAIL across their environment will iron out networking wrinkles in the PoC phases such that when they need to setup a new location the physical switch is already pre-configured with these defaults to ensure that 15 minute experience. What I would also love to see is our QEP helping customers with this – so alongside bundling the appliance with their own value-add, they would also bundle the networking with EVO:RAIL as well. Many QEP are doing that right now, for instance if you visit the SuperMicro page for EVO:RAIL they have links to their switches as well. Of course, we’d always want to leave it the customer where they source those switches from…