Solaris, VMWare & VGT mode

Today I had the strangest of problems. In a VMWare based testbed with a bunch of mixed systems (F5 Virtual appliances, a Linux host, 3 Solaris servers) I was facing severe connectivity issues with the Solaris hosts. Specifically, with all systems connected on VLAN 162 (L3 addressing: 172.16.2.0/24) anything TCP related from the Solaris hosts failed. F5 and linux Virtual machines had no problem whatsoever.

I quickly fired up my trusted tcpdump tool to figure out what’s wrong. Then I issued a simple ICMP from a Solaris host to the load balancer to see what happens:

solaris-1# ping 172.16.2.1
172.16.2.1 is alive


[root@loadbalancer-1:Active] config # tcpdump -q -n -i ingress-lb not arp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ingress-lb, link-type EN10MB (Ethernet), capture size 108 bytes
19:43:05.048167 IP 172.16.2.21 > 172.16.2.1: ICMP echo request, id 12766, seq 0, length 64
19:43:05.048215 IP 172.16.2.1 > 172.16.2.21: ICMP echo reply, id 12766, seq 0, length 64

Nice. ICMP works. Everything looks nice in the packet capture. Now let’s try some TCP traffic for a change:

solaris-1:/root# telnet 172.16.2.1 22
Trying 172.16.2.1...


[root@loadbalancer-1:Active] config # tcpdump -q -n -i ingress-lb not arp
19:44:06.816663 IP 172.16.233.49.windb > 172.16.2.1.ssh: tcp 0
19:44:07.949006 IP 172.16.233.48.windb > 172.16.2.1.ssh: tcp 0
19:44:10.215576 IP 172.16.233.47.windb > 172.16.2.1.ssh: tcp 0
19:44:14.730324 IP 172.16.233.46.windb > 172.16.2.1.ssh: tcp 0
19:44:23.739898 IP 172.16.85.195.windb > 172.16.2.1.ssh: tcp 0

Du-oh. The packet reaches the load balancer alright but the source IP is corrupted. Googling didn’t really help, other people have run into this or similar issues but no solution. Pinging my skilled colleague Leonidas didn’t help either, he was similarly baffled at what was happening as I was. And then it hit me.

solaris-1# echo "Clickety-click; disabling checksum offload" && echo "set ip:dohwcksum=0" >> /etc/system
Clickety-click; disabling checksum offload

solaris-1:/root# telnet 172.16.2.1 22
Trying 172.16.2.1...
Connected to 172.16.2.1.
Escape character is '^]'.
SSH-2.0-OpenSSH_4.3

Uh! The joy! Too bad that 2′ after I figured this out Leonidas had signed off for the day and I can only brag about this in my blog 🙂

Advertisements

Tags: , , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: