Posts Tagged ‘solaris’

Xenserver: fake xen tools for Solaris 10 guests

March 31, 2013

Note: Also, you hopefully appreciate that this is completely unsupported.

Xenserver doesn’t enable Shutdown / Reboot buttons for VMs that don’t have the XenServer tools installed. This is an issue for my Solaris 10 guests since tools are not available for this platform. Which has been bugging me for some time.

So I went ahead and dug into the XenServer tools for linux. Turns out that the only thing they’re doing is updating a bunch of parameters on XenStore. However, Solaris 10 doesn’t have a device path for XenStore, putting us back into square one. Or not?

Not really. Turns out that Xen Tools installation is like orgasm memory. Sure, it’s a lot better if one has them, but if not one can modify the appropriate XenStore parameters from dom0 and fake it. XenServer couldn’t care less how the parameters were modified, as long as they are tweaked in the proper order the Suspend/Reboot/Shutdown buttons are enabled. So just get the dom-id of your Solaris VM:


[root@dom0 log]# xe vm-list name-label=i-2-323-VM params=dom-id
dom-id ( RO) : 154

and

[root@dom0 log]# xenstore-write /local/domain/154/attr/PVAddons/Installed 1
[root@dom0 log]# xenstore-write /local/domain/154/attr/PVAddons/MajorVersion 6
[root@dom0 log]# xenstore-write /local/domain/154/attr/PVAddons/MinorVersion 1
[root@dom0 log]# xenstore-write /local/domain/154/attr/PVAddons/MicroVersion 0
[root@dom0 log]# xenstore-write /local/domain/154/attr/PVAddons/BuildVersion 59235
[root@dom0 log]# xenstore-write /local/domain/154/attr/PVAddons/os/class "SunOS"
[root@dom0 log]# xenstore-write /local/domain/154/attr/PVAddons/os/major "5"
[root@dom0 log]# xenstore-write /local/domain/154/attr/PVAddons/os/minor "10"
[root@dom0 log]# xenstore-write /local/domain/154/data/updated 1

The above is enough to enable the shutdown/reboot/suspend buttons. Unfortunately in the process it also sets the “platform: viridian: true” parameter which doesn’t play nicely with Solaris VMs.

[root@dom0 log]# xe vm-list name-label=i-2-323-VM params=uuid
uuid ( RO)    : 5dc51848-bc9c-dd70-b670-2c7d263a7fe5
[root@dom0 log]# xe vm-param-remove param-name=platform param-key=viridian uuid=[...]

… and see the “Force shutdown”, “Force reboot” buttons disappearing.

So what works?

  1. Reboot: this does a clean reboot of the Solaris 10 domU
  2. Live migrate: not extensively tested, but a test VM does keep network connectivity after a single live migration.

Unfortunately shutdown only kind of works. Hitting the button does initiate a clean shutdown of the Solaris domU but the guest never seems to do an ACPI poweroff and gets stuck at “Press any key to reboot”. This is proving a slightly more touch nut to crack.

Update 2012/04/01: I’ve wasted a few too many hours on “shutdown” not working. Maybe I’ll revisit this in the future but calling it quits for now.

Cloudstack: OS type & xenserver templates

March 6, 2013

I’ve been using cloudstack for circa a month now for virtualising Solaris workloads. It has been mostly working like a charm, once I applied the appropriate workarounds (cf. my relevant findings, courtesy of my IoannisB citrix identity). However one thing has been bugging me for some time:


# xe vm-list name-label=i-2-271-VM params=name-description
name-description ( RW) : Template which allows VM installation from install media

My Solaris VMs are launched using the generic Xenserver template. This is not really to my liking for two reasons. Firstly, I have to apply the viridian:false modification to the default template. Secondly, there is no reason to appreciate whether a VM is a Solaris one or not using the Xenserver CLI.

The fix is to have Cloudstack using the “Solaris 10 (experimental)” template for my Solaris workloads.

  1. Download the cloudstack source code and uncompress to a folder of your choice.
  2. Apply a rather simple diff to the CitrixHelper.java file:
    $ diff CitrixHelper.java.orig CitrixHelper.java
    425a426
    > _xenServer600GuestOsMap.put("Sun Solaris 10(64-bit)", "Solaris 10 (experimental)");
    542a544
    > _xenServer602GuestOsMap.put("Sun Solaris 10(64-bit)", "Solaris 10 (experimental)");
  3. Build the JAR files, per the instructions in the Cloudstack installation guide page 16. No need to build DEB or RPM packages
  4. Replace /usr/share/java/cloud-plugin-hypervisor-xen.jar with cloud-plugin-hypervisor-xen-4.0.0-incubating-SNAPSHOT.jar that was built in the step above
  5. Restart the management server.

Slashdot geeks may want to add a Step-6: Profit. Launch again a Solaris 10 64-bit template and enjoy:


# xe vm-list name-label=i-2-272-VM params=name-description
name-description ( RW) : Clones of this template will automatically provision their storage when first booted and then reconfigure themselves with the optimal settings for Solaris 10 (experimental).

Solaris + xenserver + ovswitch

February 28, 2013

This has troubling me for quite some time, hopefully someone else can save a few hours by bumping in this post.

For some reason my Solaris 10 Virtual Machines on Xenserver failed when the Distributed Virtual Switch Controller was also running. I didn’t really troubleshoot the issue until recently since I could live without cross-server private networks. This no longer being the case I decided to look into it again.

Scroll forward a couple of hours and after losing quite some time on trying various tricks on the VM (disabling NIC checksum offload, lower MTUs etc) to no avail I concluded that it must be a hypervisor issue. Digging into the openvswitch tools revealed something interesting.


[root@xenserver ~]# ovs-vsctl list-ports xapi25
vif42.0
tap47.0
vif47.0
vif6.0

Specifically, for my Linux VMs only a vifX.Y interface was being added to the bridge, while for my Solaris ones both a tapX.Y and a vifX.Y. Clickety-click.


[root@xenserver ~]# ovs-vsctl del-port xapi25 tap47.0

Et voila! Network connectivity to the Solaris VM works like a charm. Now to make this change permanent:


[root@xenserver ~]# diff /etc/xensource/scripts/vif.orig /etc/xensource/scripts/vif
134c134,138
if [[ $dev != tap* ]]; then
> $vsctl --timeout=30 -- --if-exists del-port $dev -- add-port $bridge $dev $vif_details
> else
> echo Skipping command $vsctl --timeout=30 -- --if-exists del-port $dev -- add-port $bridge $dev $vif_details
> fi

I am not really certain of the ugly side-effects that this may have. But it does the trick for me.

Update 2013/03/10: A better workaround is to have the above behavior apply only to Solaris VMs. For example, assuming that these are based on the “Solaris 10 (experimental)” template, the following snippet skips the offending command only for the Solaris VMs:

if [[ $dev != tap* ]]; then
    $vsctl --timeout=30 -- --if-exists del-port $dev -- add-port $bridge $dev $vif_details
else
    xe vm-list dom-id=$DOMID params=name-description | grep 'Solaris 10' 2>&1 >/dev/null || \
        $vsctl --timeout=30 -- --if-exists del-port $dev -- add-port $bridge $dev $vif_details
fi

Solaris, VMWare & VGT mode

February 16, 2012

Today I had the strangest of problems. In a VMWare based testbed with a bunch of mixed systems (F5 Virtual appliances, a Linux host, 3 Solaris servers) I was facing severe connectivity issues with the Solaris hosts. Specifically, with all systems connected on VLAN 162 (L3 addressing: 172.16.2.0/24) anything TCP related from the Solaris hosts failed. F5 and linux Virtual machines had no problem whatsoever.

I quickly fired up my trusted tcpdump tool to figure out what’s wrong. Then I issued a simple ICMP from a Solaris host to the load balancer to see what happens:

solaris-1# ping 172.16.2.1
172.16.2.1 is alive


[root@loadbalancer-1:Active] config # tcpdump -q -n -i ingress-lb not arp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ingress-lb, link-type EN10MB (Ethernet), capture size 108 bytes
19:43:05.048167 IP 172.16.2.21 > 172.16.2.1: ICMP echo request, id 12766, seq 0, length 64
19:43:05.048215 IP 172.16.2.1 > 172.16.2.21: ICMP echo reply, id 12766, seq 0, length 64

Nice. ICMP works. Everything looks nice in the packet capture. Now let’s try some TCP traffic for a change:

solaris-1:/root# telnet 172.16.2.1 22
Trying 172.16.2.1...


[root@loadbalancer-1:Active] config # tcpdump -q -n -i ingress-lb not arp
19:44:06.816663 IP 172.16.233.49.windb > 172.16.2.1.ssh: tcp 0
19:44:07.949006 IP 172.16.233.48.windb > 172.16.2.1.ssh: tcp 0
19:44:10.215576 IP 172.16.233.47.windb > 172.16.2.1.ssh: tcp 0
19:44:14.730324 IP 172.16.233.46.windb > 172.16.2.1.ssh: tcp 0
19:44:23.739898 IP 172.16.85.195.windb > 172.16.2.1.ssh: tcp 0

Du-oh. The packet reaches the load balancer alright but the source IP is corrupted. Googling didn’t really help, other people have run into this or similar issues but no solution. Pinging my skilled colleague Leonidas didn’t help either, he was similarly baffled at what was happening as I was. And then it hit me.

solaris-1# echo "Clickety-click; disabling checksum offload" && echo "set ip:dohwcksum=0" >> /etc/system
Clickety-click; disabling checksum offload

solaris-1:/root# telnet 172.16.2.1 22
Trying 172.16.2.1...
Connected to 172.16.2.1.
Escape character is '^]'.
SSH-2.0-OpenSSH_4.3

Uh! The joy! Too bad that 2′ after I figured this out Leonidas had signed off for the day and I can only brag about this in my blog 🙂

Solaris: cloning an iSCSI LUN

October 21, 2010

While I nailed down on a combination of ramdisk and golden Solaris container images for a diskless boot architectural prototype I had to implement for dayjob, I did toy around initially with iSCSI.

I ended up rejecting iSCSI mainly due to the additional requirements placed on the storage subsystem. A single ramdisk may be used by multiple nodes in the cluster, each client loads the ramdisk and then self-customizes the filesystem for host-specific parameters in the local RAM. Contrast this with iSCSI which requires a separate iSCSI LUN per client. The cost is not just about extra storage (which could be minimal in the presence of cloning and deduplication), there is an increased management cost (maintain 10 LUNs vs. a single ramdisk) as well as an increased CAPEX and OPEX due to the presence of an extra SAN. Specifically, you can’t really expect to have a highly available iSCSI solution with non-dedicated h/w, whereas a similar HA solution with ramdisks is trivial to setup and just needs two DHCP + TFTP servers (coupled with NIC bonding for extra redundancy).

The above said I thought I’d write some high level notes with regards to the pain of cloning an iSCSI LUN containing a Solaris installation. I can use them as a reference in the future or (if I’m lucky) someone will run into this blog post and suggest a more graceful approach.

  1. Setup an iSCSI LUN: it doesn’t really matter how you’ll do it. For my setup I used the Solaris iSCSI target (greetz to @c0t0d0s0 for yet another excellent tutorial)
  2. Install Solaris on the iSCSI LUN: Captain Jack provides a thorough step-by-step guide with screenshots with the relevant steps (I will admit wondering whether one can automate the process with Jumpstart and pre-install scripts but I never got there)
  3. Boot the newly installed node for the first time, make any site-specific changes you need and then shut it down. Forget this LUN from now on, it will be your “golden image”
  4. Clone the iSCSI LUN to a new one: This step is really dependent on your SAN. If you are using ZFS the steps are probably something as simple as the following:
  5. # zfs snapshot rpool/iscsi/lun0@golden
    # zfs clone rpool/iscsi/lun0@golden rpool/iscsi/lun1
    
  6. Add the LUN to an existing or new iSCSI target and get its GUID
  7. # iscsitadm create target -u 1 -b /dev/zvol/rdsk/rpool/iscsi/lun1 -t mytarget
    # iscsitadm list target -v mytarget
    Target: mytarget
        iSCSI Name: iqn.1986-03.com.sun:02:9c23130f-1d8e-6b20-8e95-a6ab8a227924.mytarget
        Connections: 1
            Initiator:
                iSCSI Name: iqn.1986-03.com.sun:01:ba78c2f3ffff.49b911ad
                Alias: unknown
        ACL list:
        TPGT list:
        LUN information:
    ...
            LUN: 1
                GUID: 600144f04caf16fb00000c29324dee00
                VID: SUN
                PID: SOLARIS
                Type: disk
                Size: 4.0G
                Backing store: /dev/zvol/rdsk/rpool/iscsi/lun1
                Status: online
    ...
    
  8. Configure a new system to boot from your newly created iSCSI LUN. Here is how a DHCP reservation for gPXE looks like:
  9. host  {
      hardware ethernet ;
      fixed-address                   ;
      option routers                  ;
      option subnet-mask              ;
      option domain-name-servers      ;
      filename                      "";
      # iscsi root-path format        iscsi::[protocol]:[port]:[LUN]:
      option root-path
        "iscsi::::1:iqn.1986-03.com.sun:02:9c23130f-1d8e-6b20-8e95-a6ab8a227924.mytarget;
    }
    
    

Neat. You installed Solaris in a LUN and you cloned the LUN. One would expect that you can repeat this process as many times as necessary and by changing just the LUN id in gPXE boot as many Solaris systems as you want, right? WRONG!

Turns out that the Solaris installer “burns” the iSCSI boot device identifier in the root filesystem during installation. In fact it does a pretty good job of “burning” it all over the place to make your life miserable when it comes to cloning an iSCSI LUN and re-using it for another system. So you got to jump through some extra hoops, otherwise you will just get a nice kernel panic. The following steps assume that you are using UFS (don’t ask!) but they would probably work similarly with ZFS as well.

  1. Mount the newly cloned iSCSI LUN from a Solaris system. This could be the iSCSI target itself if you are using Solaris for that task. Do notice the slight difference between the iSCSI target device and the device we are actually mounting.
  2. # iscsiadm modify discovery -t enable
    # iscsiadm list target -S
    Target: iqn.1986-03.com.sun:02:9c23130f-1d8e-6b20-8e95-a6ab8a227924.mytarget
            Alias: asmrootufs
            TPGT: 1
            ISID: 4000002a0000
            Connections: 1
            LUN: 0
                 Vendor:  SUN
                 Product: SOLARIS
                 OS Device Name: /dev/rdsk/c2t600144F04CADE09C00000C29324DEE00d0s2
            LUN: 1
                 Vendor:  SUN
                 Product: SOLARIS
                 OS Device Name: /dev/rdsk/c2t600144F04CAF16FB00000C29324DEE00d0s2
    ...
    # ls -l /dev/rdsk/c2t600144F04CAF16FB00000C29324DEE00d0s2
    lrwxrwxrwx  -> ../../devices/scsi_vhci/disk@g600144f04caf16fb00000c29324dee00:c,raw
    # mount /devices/scsi_vhci/disk\@g600144f04caf16fb00000c29324dee00\:a /mnt/foo/
    
  3. keep a note of the disk path above: “/devices/scsi_vhci/disk@g600144f04caf16fb00000c29324dee00:a”. You’re going to need it
  4. Edit the files ./boot/solaris/bootenv.rc, etc/path_to_inst and etc/vfstab. In them you will find references to the iSCSI LUN0 device which was used as our golden image (cf. the iscsiadm command above). Change these to the “/devices” path corresponding to our iSCSI LUN 1.
  5. Do a recursive grep (find /mnt/foo -type f | xargs grep) for any other occurences of the old iSCSI LUN. I think the above step covers everything but I played it from an old note and it may miss something.
  6. Update the boot archive in the new LUN.
  7. # bootadm list-archive -R /mnt/foo
    
  8. Manually create the required symlink under /dev/dsk
  9. # cd /mnt/foo/dev/dsk
    # ln -s ../../devices/scsi_vhci/disk\@g600144f04caf16fb00000c29324dee00\:a c2t600144F04CAF16FB00000C29324DEE00d0s0
    
  10. Unmount “/mnt/foo” and reboot your target node; now everything should work like a charm
  11. Profit!

Solaris: passing parameters from GRUB to the O/S

October 13, 2010

The problem with ramdisks is that each system ends up identical to each other. Hence you need a way to distinctly identify and pre-customize each system rather early in the boot process.

A clever way to do this is the MAC-address of the 1st network interface. This works nicely with Linux where the 1st network interface is always eth0. It doesn’t work as well in Solaris though for a couple of reasons:

  1. the network interface name depends on the NIC, i.e. e1000g0 for Intel NICs, bnx0 for Broadcom 1GbE, bnxe0 for Broadcom 10GbE ones etc.
  2. with Solaris the network interface and hence its MAC address is not visible with ifconfig until it is “plumbed”. And you can’t really expect the interface to be plumbed early in the boot process

Hence I considered a different approach. “With GRUB it is very easy to specify kernel and boot options in the boot menu“, so let’s go ahead, define a custom option in our menu.lst (which is centrally controlled by the DHCP/TFTPboot server) … :


# cat /tftpboot/menu.lst.01000C29A6B8E8
default=0
timeout=30
min_mem64 1024
title Solaris_10 Ramdisk
kernel$ /I86PC.Solaris_10-7/multiboot kernel/$ISADIR/unix -B hostname='random-hostname'
module$ /I86PC.Solaris_10-7/$ISADIR/ramdisk.img

… and read this option once the O/S boots. Easy enough. Only an hour or so later I couldn’t find any documentation on how to read my custom boot parameter.

[… clickety click …]

Turns out that the custom parameter is available using the prtconf command.


# prtconf -v /devices | sed -n '/hostname/{;n;p;}' |cut -f 2 -d \'
random-hostname

Easy to implement, hard to find out and remember.

Solaris diskless ramdisk boot

October 10, 2010

Working on a diskless architecture for dayjob I ran into the need to implement a diskless boot architecture. The options for the root filesystem were briefly the following:

  1. Ramdisk
  2. iSCSI
  3. NFS

Option 2 is nice but it does come with a “single point of failure” gotcha, namely the iSCSI server. Sure you can have a second one but I don’t really want to know what would happen if the SAN with the root filesystem goes offline. 3 is also nice but for me it was a slightly convoluted mess. Hence I opted for the ramdisk approach.

Thankfully, another kind soul has done something similar already. Here are some extra notes that I’ve found of use:

  1. Make a local copy of the root_archive command and adjust the UFS overhead to 50% instead of 10%. This will allow for some empty disk space in your root filesystem
  2. Remove the miniroot ramdisks that get installed with Solaris (/boot/x86.miniroot-safe and /boot/amd64/x86.miniroot-safe); you are building a ramdisk on your own, you don’t need to bundle two more within it
  3. Remove the boot_archive under /platform/i86pc

The above allow to cram a Solaris Core installation with a few extras (SSH, bash) in just 122 (compressed) MBytes.


# du -sh /var/tmp/ramdisk.img
122M /var/tmp/ramdisk.img

This is it. Copy the ramdisk to your TFTP boot server, configure GRUB accordingly:


# cat /tftpboot/menu.lst.01000C29A6B8E8
default=0
timeout=30
min_mem64 1024
title Solaris_10 Ramdisk
kernel$ /I86PC.Solaris_10-7/multiboot kernel/$ISADIR/unix
module$ /I86PC.Solaris_10-7/$ISADIR/ramdisk.img

configure your DHCP server appropriately for a network boot and … profit 🙂

Which still leaves open the issue of DHCP server high availability, however this should be easy to tackle.

The truth about (Solaris and) Xen

July 8, 2010

Why is Solaris not good enough to act as a hypervisor requiring Sun to ship Xen in xVM?

via Tales of a Code Monkey

Because what used to be Sun didn’t suffer from the NIH syndrome that seems to haunt Linux developers. And while Linux may one day dominate everything, this doesn’t necessarily mean that it’s the best thing since sliced bread the same way that x86 is not. It will just mean that the market dynamics would have won.

Till then I am thankful that contrary to RedHat some people still appreciate that KVM is an emerging technology and Xen the leading FOSS virtualization solution. Thanks to them I can happily type uptime:

# cat /etc/release
Solaris 10 10/09 s10x_u8wos_08a X86
Copyright 2009 Sun Microsystems, Inc. All Rights Reserved.
Use is subject to license terms.
Assembled 16 September 2009
# uptime
9:46pm up 1:10, 1 user, load average: 1.14, 1.98, 2.19

and get consistently low load average utilization in my Solaris VM rather than in excess of 15, like I did with KVM.

Tip of the day: pax and removal of leading slashes

March 24, 2010

This came up at dayjob a couple of weeks ago and again today.

A nice feature of GNU tar is the removal of leading slashes in an archive. For instance if an archive foobar.tar contains files /foo and /bar the following command in a GNU Linux system:

cp foobar.tar /test
cd /test
tar xvf foobar.tar

Will generate /test/foo and /test/bar. Not so with a Solaris 10 system, where the files will be placed in the root directory.

Thanksfully there is a trivial workaround to remove the leading slash:

pax -rv -s ',^/,,' -f foobar.tar

A workaround now documented in a convenient place, rather than me having to search for it in either the bash history of a random system or an interesting but rather lengthy manpage.

Solaris: the bars

March 4, 2010

A picture is worth a thousand words. A graph a little bit more 🙂

It’s slightly a pity that the respective binaries are closed source but compared to mpstat(1M) they paint a much nicer picture. And contrary to the claim on the page, the x86 binaries work on Solaris 10 as well.