The “German style” upgrade

While the trigger of this post may have been a series of unfortunate events and an ugly flame bait that followed on my inbox some time last week, the inspiration is a certain German customer of Bytemobile.

So what exactly is the “German style” of upgrading? Here are a few rough notes based on more than 5 on-site visits I’ve had to the aforementioned customer to support the upgrade of Bytemobile’s products.

  1. Test environment: This isn’t an environment with a bunch of (virtual) machines where new features are experimented with. With the exception of scale it actually mirrors every single detail of the live environment. Nothing goes live before being installed at the test environment, the installation being documented to the finest and most ridiculous detail you can think of (i.e. UTP cable colors), rigorously tested in both a positive and negative manner (yes, this includes stuff like pulling PSUs, including redundant ones to test failover). And once it goes live the test environment for both the production and the “failover” release remain in the test environment. There is no such thing as a “simple configuration change”.
  2. Night shifts: No upgrade takes place in business hours. A bunch of people from both the customer and the vendors involved actually stay up all night. The upgrade process is kicked off after midnight. It happens in a gradual manner, a small but measurable amount of users being transitioned to the new environment. All elements and units related to the new system are closely monitored for any abnormalities or alarms. A full barrage of end-to-end tests is being performed to spot anything out of the ordinary.
  3. Rollback: rollback is always a possibility accounted for in the project schedule. In fact the rollback window is scheduled before the sun rises and traffic volumes start increasing (typically some time at 6). In the event that something goes wrong the fewer people notice the better.
  4. Day shift: Regardless of the outcome people that stayed up all night get to enjoy lots of sleep, handing over to the day shift, either physically or over a conference call if remote personnel is involved. Traffic volumes are now starting to build up, hence the day shift has its hands quite full: they continue to run tests and actively monitoring the system to make sure that it handles the load gracefully.
  5. Failure is an option: When complex systems and multiple vendors are involved failure is not only an option but part of the business. Everyone does a hard work of avoiding it but it may and will occasionally happen. When it does, it’s not dealt with hastily. People make sure that they have ample time to understand what and why it went wrong as well as why they failed to capture it in their rigorous prior testing and planning. Unless the issue was a really minor one and easily fixed (i.e. the network administrator neglecting to set up an alarm clock; not that this has ever happened), the next activation is planned for at least a week into the future, more if the root cause is hard to address or other upgrades take place at that time.

Dedicated to the unnamed engineers who feel that flipping a switch and working long hours to clean up the mess caused by poor in-advance testing and planning has any “German” qualities


Tags: , ,

3 Responses to “The “German style” upgrade”

  1. Tweets that mention The “German style” upgrade « ~mperedim/weblog -- Says:

    […] This post was mentioned on Twitter by Yiorgos Adamopoulos, mperedim. mperedim said: new post: the German upgrade style – […]

  2. Michael Iatrou Says:

    This is the typical case for any decent telco and broadcaster I have came along. It’s an enormous pleasure working with these people, just because dealing with “the other guys” is superfluously painful!

    P.S. The “German” part is not that relevant to my experience: it could be as well “American”, “Australian”, “British”, “Danish”, etc

    • mperedim Says:

      This is the typical case for any decent telco and broadcaster … it could be “American”, “Australian” etc

      Correct. Contrary to the blog post that is inspired by personal experience, the title is inspired by the flame bait in my Inbox 🙂

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: