Now I’m done with vApps and their networking (for the moment – there’s still VPN/Load-Balancing/DNAT to think about!!!) – I’m working my way gradually through all the labs in the “VMware vCloud: Deploy and Manage VMware Cloud” course. My plan is to finish these labs, and blog my experiences along the way. Once I’m done with these labs, I think I will be ready to attach the vCAT documentation – I’m hoping also to find time to RTFM the official admin guides that ship with vCD, in the hope to pick up tidbits that aren’t covered in the official course. That’s something I think students often forget. Most vendor based courseware take you through the top most common configurations and questions – they rarely take you through every-single-little-setting otherwise you’d never complete a training course!

One of the chapters in the course deals with monitoring. There’s a number of “status areas” to check out. That takes the shape of number views:

Task/Events View

Screen Shot 2012-12-20 at 14.50.11.png

Note: SysAdmin View…

Screen Shot 2012-12-20 at 14.54.33.png

Note: OrgAdmin View… It was bit scary looking at this tasks/events log – not because of the error messages – but looking at the number of events which was north of 5,000 I realized how much work I’d been putting into vCloud Director!

Log Files:

Of course there are system logs that help troubleshoot the vCloud Director software itself. These logs are accessible directly from the vCD cell itself located in /opt/vmware/cloud-director/logs. There’s six core logs (which rotate) called:

  • cell.log – console output from the vCD Cell.
  • vcloud-container-deb.log – Debugging log messages from the cell
  • vcloud-container-info.log – Any warnings or errors in the cell
  • vmware-vcd-watchdog.log – Log of the Watchdog service – which restarts the vCD service if it hangs or stops
  • diagnostics.log – Diagnostic information (not enabled by default, and believe is enabled under instruction from VMware Support)
  • 2012_12_20.request.log – From what I can gather these are rather vanilla Apache HTTP request log files

From what I can gather the most valuable logs for admin troubleshooting are probably the vcloud-container-info and vmware-vcd-watchdog.log…

Screen Shot 2012-12-20 at 15.08.02.png

Note: You’ll notice a number of JMX log files. JMX stands for Java Management Extensions. This an internal system used to managing and monitoring applications – each of the objects monitored are called MBeans (Managed Beans). Is it me is this referrence to coffee in Java no longer cool, but deeply irrating and unfunny…?

Cell.log if you open it with vi general shows the start-up information of the vCD cell itself:

Screen Shot 2012-12-20 at 15.10.06.png

vcloud-container-debug.log shows internal communications and process happening in the cell:

Screen Shot 2012-12-20 at 15.13.23.png

vcloud-contain-info.log shows warnings and error messages. Doing a cat or tail on the file with | grep WARN will retrieve all the warning messages:

Screen Shot 2012-12-20 at 15.23.23.png

vmware-vcd-watch.log shows attempts by the watch dog service to restart the core vmware-vcd service

Screen Shot 2012-12-20 at 15.24.43.png

Status of vCloud Cell & Connection To vCenter:

Screen Shot 2012-12-20 at 15.00.21.png

Monitoring Organization Resource Consumption:

Screen Shot 2012-12-20 at 15.02.01.png

Monitoring Organization Virtual Datacenters Consumption:

Screen Shot 2012-12-20 at 15.03.27.png

Syslog and vCD

Before I get started I wanted to name check a rather good blogpost by fellow vExpert, Gabrie Van Zanten. He documents the process of using SysLog to monitor Firewall activity/changes on the Organizational and vApp Networks using a Splunk based Syslog Service. I didn’t use Gabrie’s post to write this – but it sounds like this “splunk” thing is increasingly popular, and might be of interest to folks:

http://www.gabesvirtualworld.com/vmware-vcloud-5-1-network-troubleshooting/

For me the only problem I have with “splunk” is the name, it sounds stomach-churningly familiar to some other word which will remain unmentionable.

As for me – I’m using the Syslog service running the vCenter Server Appliance – in fact many of my other VMware Appliance that offer configuration to Syslog (such as the vCNS Manager aka vShield Manager) are configured for it. To tell you the truth I’ve never bothered trying to access these logs, so I thought I might give it try and see what gives.

Before you begin, you need enable vCloud Director for Syslogging. This is done as the SysAdmin under System, Administration, General:

Screen Shot 2012-12-20 at 16.48.28.png

If these settings change – then you can use the “Synchronize Syslog Settings” option which appears on the right-click of each Organizations vCNS Edge Gateway or vApp Network vCNS Edge Gateway. As far as I can see in vCloud Director 5.1 so long as you set the Syslog IP address before you create any Organization/vApp Networks then the option for syslogging should be there…

Screen Shot 2012-12-20 at 16.55.35.png

Note: Synchronize Syslog Server Settings on an Organization Network

Screen Shot 2012-12-20 at 16.57.07.png

Note: Synchronize syslog server settings on the properties of vApp. Mmm, not sure why capitalization applies on one menu, and not another! But lets not split hairs or get too pandantic shall we?

You should be able to confirm the correct syslog settings are applied by right-clicking the Organization Network or vApp Network and checking the Syslog Settings tab:

Screen Shot 2012-12-20 at 17.25.37.png

Throughout all my work with networking in previous post I’ve actually been disabling the Firewall both in the GoS, vApp Network and Organization Network. That’s very unrealistic given the firewalls of the vCNS Edge Gateway appliances are always turn on. I’m going to turn on the vApp Network’s Firewall and also enabled logging…

Screen Shot 2012-12-20 at 17.32.32.png

Once applied it immediately stops any inbound traffic to the vApp – because the default firewall rule on a vApp allow ALL traffic OUTBOUND, but NOTHING, INBOUND. In this case I was making the change on the Web vApp which works on the 192.168.15.x network – after clicking OK and Applying the configuration the inbound ping I was doing stopped responding – of course, the outbound ping I was doing kept on working.

Screen Shot 2012-12-20 at 17.33.54.png

If I wanted to allow inbound traffic to passed through the vApp Firewall, I could create rule that the opposite of this default rule – one that said the any source traffic coming in externally, was allow to access the internal destination on any protocol. This is the same as turning the firewall off I guess… The important thing to note from a Syslog perspective if the logging is enabled on per rule basis:

Screen Shot 2012-12-20 at 17.43.11.png

and we could easily modify this allow inbound pings from the Organization Network – by changing the pull downlist for Protocol from being ANY to ICMP. This would allow ping traffic to pass but other traffic like Microsoft Remote Desktop Protocol (TCP 3389) or Web Services (TCP 80) and so on would be blocked.

https://www.michellelaverick.com/images/pingaccess.png

Then I went looking for the Syslog files on the vCenter Server. This was a little bit harder than first thought – but I did eventually locate them. I was attempting TCP 3389 connection to the 192.168.15.101. I expected to see a syslog entry for the vApp Network, but instead I found the entries in the syslog on the Organization Network. That I think makes sense because the traffic came from the External Network >>> Organization Network >>> vApp Network.

The syslog files for the vCenter Server Appliance are located on:

/var/log/remote/ and then there is a directory for the IP that represents my Edge Gateway on the Organization Network – ::ffff:192.168.3.10/ within this directory was a folder for the month 2012-12 – with log file for each day – the most recent being the day I was writing this blogpost: messages-2012-12-20

By using the cat command I was able search for any entry with the reference to the word “DROP”, indicating the packet had been dropped by the firewall. Before attempting this – I had go at using Microsoft RDP to gain access to the VM, something that failed through a lack of access via the vApp Firewall.

cat  messages-2012-12-20 | grep DROP

2012-12-20T18:39:05.000+00:00 firewall [759194f4-bb9e-4262-be20-5d247c821722]: DROP_2IN=vNic_0 OUT=vNic_1 SRC=192.168.3.198 DST=192.168.15.101 LEN=40 TOS=0x00 PREC=0x00 TTL=128 ID=0 PROTO=TCP SPT=1062 DPT=3389 WINDOW=40980 RES=0x00 SYN URGP=0 MARK=0x421

SRC=192.168.3.198 was the machine I was making the request to, and DST=192.168.15.101 was where I was trying to get to. The destination port attempted was 3389 the port number used for the connection. After adding a rule to allow inbound RDP like so…

https://www.michellelaverick.com/images/rdpaccess.png

The second attempt to access the VM via RDP was successful like so:

2012-12-20T19:02:51.000+00:00 firewall [759194f4-bb9e-4262-be20-5d247c821722]: ACCEPT_2IN= OUT=vNic_1 SRC=192.168.3.198 DST=192.168.15.101 LEN=48 TOS=0x00 PREC=0x00 TTL=126 ID=25500 DF PROTO=TCP SPT=1520 DPT=3389 WINDOW=64512 RES=0x00 SYN URGP=0 MARK=0x1

I could tell which Firewall Rule was responsible for the access by the line “ACCEPT_2IN=” this indicated it was the 2nd Rule in the vApp Network firewall rule that was responsible for allowing the access.

https://www.michellelaverick.com/images/rule2.png

The only REALLY odd thing about picking up SysLogs from vCenter is curiosity over the name of the SysLog directories – which are named after hosts. I found entries for things I wasn’t expecting such as “SALESDESKTOP05”. It turn out I had stale and out of date records in my DNS. I once had desktop called “SALESDESKTOP05” with an IP of 192.168.3.21. It turned out my vCNS Edge Gateway for the COIG Organization had the same IP as well. SysLog must have done some sort of lookup on the name.

https://www.michellelaverick.com/images/syslogbumpdns.png

Clearing out these bum records, enabling automatic scavenging and setting some names for the Edge Gateway interfaces cleaned this up over time… I would hope in a Production environment DNS would be handled more appropriately than in a lab environment where you tend to re-use and blow IP/Arecords frequently. I’ve since cleared out a lot my static records, and enabled scavenging of stale records on the DNS…