Archive for the ‘sysadmin’ Category

NetScaler fun with OpenStack keys and userdata

April 17, 2016

One of the things that’s been bugging me about NetScaler and OpenStack is the lack of basic integration. Its management network is configured via DHCP on first boot, or via config drive and userdata if DHCP is not available, but it doesn’t import SSH keys or runs userdata scripts for its initial configuration.

Thankfully, the above limitation maybe easily alleviated using the nsbefore.sh and nsafter.sh boot-time configuration backdoors. Here is a sample nsbefore.sh, based on the OpenStack docs, for VPX that can handle import of SSH keys:

root@ns# cat /nsconfig/nsbefore.sh
#!/usr/bin/bash
# Fetch public key using HTTP
ATTEMPTS=10
FAILED=0
while [ ! -f /nsconfig/ssh/authorized_keys ]; do
  curl -f http://169.254.169.254/latest/meta-data/public-keys/0/openssh-key > /tmp/metadata-key 2>/dev/null
  if [ $? -eq 0 ]; then
    cat /tmp/metadata-key >> /nsconfig/ssh/authorized_keys
    chmod 0600 /nsconfig/ssh/authorized_keys
    rm -f /tmp/metadata-key
    echo "Successfully retrieved public key from instance metadata"
    echo "*****************"
    echo "AUTHORIZED KEYS"
    echo "*****************"
    cat /nsconfig/ssh/authorized_keys
    echo "*****************"
  else
    FAILED=`expr $FAILED + 1`
    if [ $FAILED -ge $ATTEMPTS ]; then
      echo "Failed to retrieve public key from instance metadata after $FAILED attempts, quitting"
      break
    fi
    echo "Could not retrieve public key from instance metadata (attempt #$FAILED/$ATTEMPTS), retrying in 5 seconds..."
    ifconfig 0/1
    sleep 5
  fi
done

Courtesy of the RedHat documentation a simple nsafter.sh that can retrieve and run a userdata is the following:

#!/usr/bin/bash

# Fetch userdata using HTTP
ATTEMPTS=10
FAILED=0
while [ ! -f /nsconfig/userdata ]; do
  curl -f http://169.254.169.254/openstack/2012-08-10/user_data > /tmp/userdata 2>/dev/null
  if [ $? -eq 0 ]; then
    cat /tmp/userdata >> /nsconfig/userdata
    chmod 0700 /nsconfig/userdata
    rm -f /tmp/userdata
    echo "Successfully retrieved userdata"
    echo "*****************"
    echo "USERDATA"
    echo "*****************"
    cat /nsconfig/userdata
    echo "*****************"
    /nsconfig/userdata
  else
    FAILED=`expr $FAILED + 1`
    if [ $FAILED -ge $ATTEMPTS ]; then
      echo "Failed to retrieve public key from instance metadata after $FAILED attempts, quitting"
      break
    fi
    echo "Could not retrieve public key from instance metadata (attempt #$FAILED/$ATTEMPTS), retrying in 5 seconds..."
    sleep 5
  fi
done

Simple enough. Now to put these to the test:

  1. Create a simple HEAT template
  2. # more template
    ################################################################################
    heat_template_version: 2015-10-15
    
    ################################################################################
    
    description: >
      Simple template to deploy a NetScaler with floating IP
    
    ################################################################################
    
    resources:
      testvpx:
        type: OS::Nova::Server
        properties:
          key_name: mysshkey
          image: NS_userdata
          flavor: m1.vpx
          networks:
            - network: private_network
          user_data_format: "RAW"
          user_data:
            get_file: provision.sh
    
      testvpx_floating_ip:
        type: OS::Neutron::FloatingIP
        properties:
          floating_network: external_network
    
      testvpx_float_association:
        type: OS::Neutron::FloatingIPAssociation
        properties:
          floatingip_id: { get_resource: testvpx_floating_ip }
          port_id: {get_attr: [testvpx, addresses, private_network, 0, port]}
    
  3. Import in Glance a NetScaler image with the above changes for nsbefore.sh and nsafter.sh; name it NS_userdata
  4. Create a simple test provisioning script
  5. # cat provision.sh
    #!/usr/bin/bash
    
    echo foo
    touch /var/tmp/foobar
    echo bar >> /var/tmp/foobar
    
    nscli -U :nsroot:nsroot add ns ip 172.16.30.40 255.255.255.0
    
  6. Create a stack and identify the NetScaler floating IP address
  7. # heat stack-create -f template vpx__userdata
    +--------------------------------------+------------------+--------------------+---------------------+--------------+
    | id                                   | stack_name       | stack_status       | creation_time       | updated_time |
    +--------------------------------------+------------------+--------------------+---------------------+--------------+
    | 540cb3d2-3b21-443c-a43b-10c745d28498 | vpx__userdata    | CREATE_IN_PROGRESS | 2016-04-17T16:49:49 | None         |
    +--------------------------------------+------------------+--------------------+---------------------+--------------+
    # # nova list | grep testvpx
    | 77388ebc-97e8-4a74-b863-40e822cb88c7 | vpx__userdata-testvpx-t3r3avxl7unc        | ACTIVE | -          | Running     | private_network=192.168.100.200, 10.78.16.139
    

This should be it. In order to verify everything went smoothly SSH into the instance using your private SSH key and run “sh ns ip” to verify that the provisioning script properly executed.

# ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i privatekey.pem nsroot@10.78.16.139
Warning: Permanently added '10.78.16.139' (RSA) to the list of known hosts.
###############################################################################
#                                                                             #
#        WARNING: Access to this system is for authorized users only          #
#         Disconnect IMMEDIATELY if you are not an authorized user!           #
#                                                                             #
###############################################################################

Last login: Sun Apr 17 16:51:06 2016 from 10.78.16.59
 Done
> sh ns ip
        Ipaddress        Traffic Domain  Type             Mode     Arp      Icmp     Vserver  State
        ---------        --------------  ----             ----     ---      ----     -------  ------
1)      192.168.100.200  0               NetScaler IP     Active   Enabled  Enabled  NA       Enabled
2)      172.16.30.40     0               SNIP             Active   Enabled  Enabled  NA       Enabled
Advertisements

ceph: a quick critique

November 21, 2013

The other day I went ahead and had a short rant on Ceph at twitter:

This prompted a response by Ian Colle and I somehow managed to get myself to write a short blog post explaining my thoughts.

A good place to start is the ceph-deploy tool. I think this tweet sums up how I feel about the existence of the tool in the first place:

Now the tool itself could be great (more on that later). And it’s OK to involve it in a quick start guide of sorts. But I would have hoped that the deep dive sections provided some more insight on what is happening under the hood.

That said, the ceph guys have decided to go ahead with ceph-deploy. Maybe it cut the docs size by half (bundle what used to be 10+ steps in a single ceph-deploy invocation), maybe it makes user errors fewer and support much easier. So I bit the bullet and went ahead with it. Installed Ubuntu 13.10, typed “apt-get install ceph*” on my admin and my two test nodes and tried to start away hacking. 1 day later I was nowhere more near to having a working cluster working, my monitor health displaying 2 OSDs, 0 in, 0 up. It wasn’t a full day of work but it was frustrating. At the end of the day I gave up and decided to declare the Ubuntu Saucy packages broken.

Now I appreciate that InkTank may have nothing to do with the packages in the default Ubuntu repos. It may not provide them, it may not test against them. In fact most of their guides recommend using the repositories at ceph.com. But they’re there. And if something is in the repo, people expect for it to work.

Having finally bit the bullet I decided to go ahead with the “official” ceph-deploy and packages. This was not without its problems. Locating the packages for Ubuntu saucy took a little bit more time than it had to. Having resolved that even that I kept running into issues. Turns out that if at any point “you want to start over” purgedata is not enough. Turns out that this is a known problem too. “apt-get install –reinstall” fixed things for me and voila, I had a ceph cluster.

Neat. “ceph health” indicated my 2 OSDs up and running, I could mount the pool from a client, etc. Let me take a look at ceph.conf:


# cat /etc/ceph/ceph.conf
[global]
fsid = 2e36c280-4b7f-4474-aa87-9fe317388060
mon_initial_members = foo
mon_host = W.X.Y.Z
auth_supported = cephx
osd_journal_size = 1024
filestore_xattr_use_omap = true

This is it. No sections for my one monitor, my one MDS, my 2 OSDs. If you have read Configuring Ceph congrats. You still are non-the-wiser of where all these configuration settings are stored. I’ll find out. Eventually.

Was this the end of my problems? Almost. I went ahead, rebooted my small test cluster (2 servers; 1x MON/MDS/OSD, 1x OSD) and noticed the following:


ceph> health
HEALTH_WARN mds cluster is degraded

Thankfully that was an easy one. Clickety-click:


osd-0# mount /dev/sdb3 /var/lib/ceph/osd/ceph-0
osd-1# mount /dev/sdb3 /var/lib/ceph/osd/ceph-1
# ceph
ceph> health
HEALTH_OK

Does it work? Like a charm. But the frustration in the process over what seems to be silly bugs was constantly mounting. And it shouldn’t have. This is the simplest setup one could possibly come-up with, with just a couple of nodes. I was using the latest Ubuntu, not some niche distribution like Gentoo or Arch nor a distro with outdated packages like CentOS/RHEL/Debian-stable. I should have this up and running in an hour not a couple of days, so that I can hack at stuff of more direct interest to me.

Getting back to my original tweet: I exaggerated. You can certainly grab an expert from InkTank and help you set up Ceph. Or you can invest some time on your own. But I still wanted this to be simpler.

Netscaler 10.x, XenServer 6.2 and Cloudstack

November 1, 2013

Quick tech note. After my Cloudstack testbed upgrade my Netscaler VMs no longer booted. The issue was similar to the one described in CTX135226 but affected both Netscaler 9.3 and 10.1 VMs.

After wasting a few good hours it turns out that this is caused by the presence of a DVD drive in the VM. The mere presence of a DVD drive, even if no ISO is loaded, somehow messes up with the Netscaler boot process under XenServer 6.2. The workaround is trivial:

  1. 1. Spot the DVD drive (hdd) using the vbd-list command

  2. # xe vbd-list vm-name-label=NetScaler\ Virtual\ Appliance
    uuid ( RO) : 5e218b95-68e9-37b2-860f-8b97678e2c02
    vm-uuid ( RO): ab229b8f-2b18-4931-b80b-30ab8265b843
    vm-name-label ( RO): NetScaler Virtual Appliance
    vdi-uuid ( RO): 5bcf6d76-1d2f-411a-9dd7-469aad0f86ae
    empty ( RO): false
    device ( RO): hdd

    uuid ( RO) : 9d6ff62c-93fa-3af5-b4b5-c5c66a0556e4
    vm-uuid ( RO): ab229b8f-2b18-4931-b80b-30ab8265b843
    vm-name-label ( RO): NetScaler Virtual Appliance
    vdi-uuid ( RO): 7b1fa1ff-7388-4e5a-8c67-00e4d943a470
    empty ( RO): false
    device ( RO): hda

  3. Destroy it

  4. # xe vbd-destroy uuid=5e218b95-68e9-37b2-860f-8b97678e2c02

  5. 3. Power on the Netscaler VM again; this time it should work like a charm

This is good enough if you have access to the XenServer running the VM. Not as good of a workaround for a Cloudstack environment which requires you to jump a few extra hoops. Specifically, if you are using XenServer you can update the StartAnswer method to selectively invoke the createVbd function in CitrixResourceBase.java: if the guest OS type evaluate the guest OS type matches “Other install media” and the disk type is ISO skip VBD creation. Here is the relevant sample code snippet:


public StartAnswer execute(StartCommand cmd) {
...
String guestOsTypeName = getGuestOsType(vmSpec.getOs(), vmSpec.getBootloader() == BootloaderType.CD);
for (DiskTO disk : vmSpec.getDisks()) {
if (type == Volume.Type.ISO && gguestOsTypeName.toLowerCase().contains("other install media")) {
s_logger.debug("mperedim: skipping VBD creation");
} else {
createVbd(conn, disk, vmName, vm, vmSpec.getBootloader());
}
}
...

One can improve this workaround by mapping Netscaler VPXs to a dedicated OS template, so that DVD drives are still created for other VMs that get mapped to the “Other install media” XS template.

XML is like violence …

April 18, 2010

There is an infamous quote that goes by:

XML is like violence. If it doesn’t solve your problems, you are not using enough.

It’s only a pity that the guys working on the new OpenSolaris Automated installer have either not heard of it or are either fraking morons. I mean I really can’t explain a good reason that they have replaced something like:

# wc -l ramu/sysidcfg
9 ramu/sysidcfg
# wc -l profiles/default
10 profiles/default

20 lines of simple and easy to edit text with:

# wc -l *
100 ai_manifest.defval.xml
496 ai_manifest.rng
171 ai_manifest.xml
87 default.xml
854 total

almost a thousand lines of a big and ugly PITA to edit set of files. Which can’t even do what Jumpstart can yet.

And I will be kind enough not to comment on IPS; not now at least.

Tip of the day: pax and removal of leading slashes

March 24, 2010

This came up at dayjob a couple of weeks ago and again today.

A nice feature of GNU tar is the removal of leading slashes in an archive. For instance if an archive foobar.tar contains files /foo and /bar the following command in a GNU Linux system:

cp foobar.tar /test
cd /test
tar xvf foobar.tar

Will generate /test/foo and /test/bar. Not so with a Solaris 10 system, where the files will be placed in the root directory.

Thanksfully there is a trivial workaround to remove the leading slash:

pax -rv -s ',^/,,' -f foobar.tar

A workaround now documented in a convenient place, rather than me having to search for it in either the bash history of a random system or an interesting but rather lengthy manpage.

Solaris: IRQ and CPU affinity

March 2, 2010

Today I faced an interesting problem. To cut a long story short, Solaris 10, 05/09 release, seemed to assign network card IRQs in the following manner in a 16-way X4450 server I am testing:

# echo '::interrupts -d' | mdb -k | egrep 'CPU|e1000g'
IRQ Vector IPL Bus Type CPU Share APIC/INT# Driver Name(s)
51 0x60 6 MSI 9 1 - e1000g#0
52 0x61 6 MSI 12 1 - e1000g#2
53 0x62 6 MSI 13 1 - e1000g#1
54 0x63 6 MSI 14 1 - e1000g#3

While this seems nice, 1 CPU core per NIC, it’s not. psrinfo provides some further insight:

# psrinfo -pv
The physical processor has 4 virtual processors (0 4-6)
x86 (chipid 0x0 GenuineIntel family 6 model 15 step 11 clock 2933 MHz)
Intel(r) CPU @ 2.93GHz
The physical processor has 4 virtual processors (1 7-9)
x86 (chipid 0x2 GenuineIntel family 6 model 15 step 11 clock 2933 MHz)
Intel(r) CPU @ 2.93GHz
The physical processor has 4 virtual processors (2 10-12)
x86 (chipid 0x4 GenuineIntel family 6 model 15 step 11 clock 2933 MHz)
Intel(r) CPU @ 2.93GHz
The physical processor has 4 virtual processors (3 13-15)
x86 (chipid 0x6 GenuineIntel family 6 model 15 step 11 clock 2933 MHz)
Intel(r) CPU @ 2.93GHz

In short the default e1000g IRQ assignment seems to have a “preference” towards the last physical CPU. On my end I’d prefer a more balanced approach, something like getting each driver on a single core of a single CPU.

A number of webpage views later intrd was not available in my system, hacking the network driver was not a valid option (it was almost midnight; and besides installing a driver of my own in one of the largest mobile operators in Europe? give me a break) and I was finding myself cursing being unable to  SMP affinity for each IRQ on my own. I asked my Sun expert to help and almost gave up.

Then somehow I decided not to give up and run into the Solaris 10/09 What’s new notes and pcitool (which curiously enough doesn’t even its manpage at docs.sun.com). Party time!

  • Copy the SUNWio-tools package from a Solaris 10/09 DVD and install it


# cd /net/jumpstart/Sol10_U8_x86/Solaris_10/Product
# find SUNWio-tools | cpio -p -dum -v /var/tmp
# cd /var/tmp
# yes | pkgadd -d . SUNWio-tools

  • Figure out the PCI nexus where your e1000g cards are present


# cat /etc/path_to_inst | grep e1000
"/pci@0,0/pci8086,3605@2/pci8086,3500@0/pci8086,3518@2/pci108e,4836@0" 0 "e1000g"
"/pci@0,0/pci8086,3605@2/pci8086,3500@0/pci8086,3518@2/pci108e,4836@0,1" 1 "e1000g"
"/pci@0,0/pci8086,2690@1c/pci108e,4836@0" 2 "e1000g"
"/pci@0,0/pci8086,2690@1c/pci108e,4836@0,1" 3 "e1000g"

  • Figure out the “ino” for each e1000g card


# pcitool /pci@0,0 -i
[snip]
ino 60 mapped to cpu 9
Device: /pci@0,0/pci8086,3605@2/pci8086,3500@0/pci8086,3518@2/pci108e,4836@0
Driver: e1000g, instance 0

ino 61 mapped to cpu c
Device: /pci@0,0/pci8086,2690@1c/pci108e,4836@0
Driver: e1000g, instance 2

ino 62 mapped to cpu d
Device: /pci@0,0/pci8086,3605@2/pci8086,3500@0/pci8086,3518@2/pci108e,4836@0,1
Driver: e1000g, instance 1

ino 63 mapped to cpu e
Device: /pci@0,0/pci8086,2690@1c/pci108e,4836@0,1
Driver: e1000g, instance 3
[/snip]

  • Assign the desired interrupt to each device (in this example we go with 6, 9, 12, 15)


560 pcitool /pci@0,0 -i ino=60 -w cpu=6
561 pcitool /pci@0,0 -i ino=61 -w cpu=9
564 pcitool /pci@0,0 -i ino=62 -w cpu=c
565 pcitool /pci@0,0 -i ino=63 -w cpu=f

  • Profit!


# echo '::interrupts -d' | mdb -k | egrep 'CPU|e1000g'
IRQ Vector IPL Bus Type CPU Share APIC/INT# Driver Name(s)
51 0x60 6 MSI 6 1 - e1000g#0
52 0x61 6 MSI 9 1 - e1000g#2
53 0x62 6 MSI 12 1 - e1000g#1
54 0x63 6 MSI 15 1 - e1000g#3

Update: My Solaris go-to guy suggested the following:


eeprom(1M):

acpi-user-options=2
e1000g_intpt_bind_cpus=0,0,2,2

Not as generic and almost totally undocumented but it should do the trick.

POLA and traceroute

March 1, 2010

The principle (or rule or law) of least astonishment (or surprise), often abbreviated to POLA, applies to user interface design, programming language design, and ergonomics. It states that, when two elements of an interface conflict, or are ambiguous, the behaviour should be that which will least surprise the human user or programmer at the time the conflict arises.

(via Wikipedia)

Now mtr might be the new cool kid in the block but some of us are either hanging out in the wrong neighborhood or just plain old. We surely deserve traceroute installed by default in our Ubuntu server alongside mtr, don’t we?

P.S. FWIW RHEL seems to take POLA a little bit further, taking into consideration even the poor guys in the village.


# ls -la `which tracert`
lrwxrwxrwx 1 root root 10 Oct 27 16:28 /bin/tracert -> traceroute

rsyslog evaluation

January 21, 2010

I recently had to carry an evaluation of centralized logging facilities for my $dayjob. A few days later that included some experimentation (and some frustration) with LinuI managed to get a setup into place that emulated 8 syslog clients pushing messages to the centralized syslog server. Thus, with the help of the excellent loggen utility that’s part of syslog-ng, I opted to benchmark rsyslog.

A few days later here are some interesting findings:

  1. UDP stands for Unreliable Datagram Protocol: sure you can add some app-level logic to get around the limitations of UDP but syslog doesn’t have any. In my experiments messages started being lost at as low as 3,000 messages per second (mind you this was on a dedicated network backed by a 10GbE switch and links)
  2. Unix sockets (the /dev/log device) are reliable but don’t scale as well as a TCP listening socket
  3. Rsyslog rulesets are your friend. Just remember to create a separate queue per ruleset so that you take advantage of the multiple threads that your CPU will probably support
  4. Logging to an HDD introduces a healthy performance penalty. Make sure that you try out logging to the “/dev/null” device to see how much you could gain by reducing the impact of the disk I/O by moving to a RAID-1+0 or even a network storage facility (NFS, iSCSI LUNs etc). Trying to optimize your filesystem doesn’t hurt either.
  5. When every last bit of performance has been squeezed, check out if you’ve already tried increasing OMFileIOBufferSize 128k and MainMsgQueueSize to 100k.
  6. Know your limits: there is no point in storing your logs in a database if your database can take 4000 writes per second but your expectations are for 8000 new messages per second. Buffering can help but only that much.
  7. KISS: Keep it simple. rsyslog (and syslog-ng for that matter) is extremely powerful but all those nice features may hurt performance. If you have to implement complex rules and filters and (…) make sure that you figure out the best and most efficient way to do so. Once you have, go back to step (6) and see if it’s efficient enough.
  8. Plan ahead: just because it works for now doesn’t mean it’s going to work in one month or one year.

Making sure that I follow the above advice my centralized logging server scales to 250,000 messages per second, using 512 bytes long messages. This is enough to kill a Gigabit link, so it’s not too bad 🙂 Hardware for the centralized log server was an HP BL460c blade with a single X5550 processors and 6 SAS 15K 146GB hard disk drives, configured in a RAID-0 fashion using the built-in RAID controller.

Wrapping up this effort I shortly evaluated syslog-ng and found its performance to be notably worse. With minimal tuning I could barely exceed 50,000 messages per second. And given its creators claims for 75,000 messages per second I doubt it could scale anywhere near to the levels I was able to achieve with rsyslog.

The bigger picture

January 8, 2009

Perforce is cool. Really cool. Sure it may be centralized, require a network connection all the time and have a strange commit flow (yes p4 open, I am looking at you). However the GUI client is fantastic, the CLI tool excellent (with well thought-out command names), p4 help almost always saves the day, the documentation is great, the technotes can be a life savior. And to add to that they offer a proxy. A perforce proxy is no substitute for a distributed SCM (since in turn it must be always connected to the main server), does not automatically mean that user experience is identical as if the main repository was on the local LAN but is a welcome add-on.

Lately people noticed our perforce proxy being rather slow. Admittedly it’s an old machine, with a measly (for today’s standards) 512MB RAM. So what was the verdict? Let’s buy a new one! YEAH! And let’s make sure that it can expand to 32GB, the entry level model that grows to “just” 8GB are probably not enough.

I am frequently impressed by how often people end-up getting confined in their micro-world and fail to get the big picture. Good product QA engineers, who could be great if they would get a better understanding of the underlying O/S. Same for development engineers. Customer support engineers who have an excellent grasp of the customer and provide prompt responses for documented issues but with poor troubleshooting skills, resorting to product development engineers for the slightest of problems (this is not an excuse for poor documentation btw). And the list goes on.

For what is worth, per the Perforce technical notes 8GB would probably be a little bit too much for the new server. 512MB would be more than enough. With a careful setup (disabling unnecessary daemons of the underlying O/S) even 256MB could suffice (and yes this means that we may not need to buy a new server)