I got back home from Toronto somewhere around 02:30.
Warning: Never do this alone!
Well, good thing I didn’t… Some of that gear can be heavy, so a second set of hands is useful. Loading it into a personal vechicle is also easier with two people - good thing I don’t have a tiny compact car. I can just see this happening in a SMART car or an Innocenti (yes, shudder).
The drive can be rather monotonous as it takes a good 4.5 hours to do the run to Mississauga. Unless you really like solitude and are not prone to highway hypnosis, make sure you have someone to talk to. The drive goes a lot faster that way.
After arrival, you have the joy of installing shelves in a rack (yes, these systems have no rails). and of course, the shelves chew up rack space as well as the systems. In this case, there was sufficient spare space due to some fortuitous prior planning. Don’t forget to have power available either. Oh yeah, and network connectivity. Your plans will end rather abruptly if you didn’t plan for your network connections. In actual fact, the hardware install was rather smooth. It went from wheeling into the data center to powered up in less than 30 minutes.
The real problems start after the fun part is completed…
This particular system is a quite old legacy application that originally took up a rather significant chunk of data center real estata a few years ago. Back in 2006, I said that it could be squeezed into 4U of rack space. There was disbelief, but a weeks work proved that all of the components could run on a Sun T2000 with an attached disk array. To make it plain what we are using here, the T2000 is configured with 4 8 core cool thread processors, so 32 cores and has the max memory for the time it was purchased, a rather respectable 32GB. Attached is a StorEDGE 3510 unit with the FCAL RAID controller (12 72 GB drives, so 720GB of RAID 5 with parity and hot spare).
Prior to the move, the system had to be backed up in a manner that would allow for a fast recovery if necessary. The solution was a 1TB HDD (significantly cheaper than the array) to hold the data for an emergency. A test that I ran last year showed that the backup could be expected to take 16 hours across a GB network (did I mention the array was slow?). There were a few other concerns as well, but the final backup was created, was usable and was carefully packed for shipment if necessary.
I think that a restore would take approximately 24 hours, assuming I get better than a 50% performance hit on the write operation. At least that step was not necessary.
Right… system power on… We powered it up, logged onto the console, verified that everything was running in each zone, the database was operating as expected and that all processes appeared to be running. It’s rather early on Saturday morning at this point. Did I mention that I left the office mid afternoon on Friday and drove straight to the data center? Anyway, we did take a break for a meal post hardware install, that went on for longer than originally planned. We met up with a co-worker based out of Toronto, as we never meet, it’s just phone calls, so the social aspect went on for a while. After returning from the meal, we did the sanity test previously described, made sure that we had the network addresses properly identified and the external ranges for the customer allocated. Time for sleep…
6 hours later, it’s time to change the addresses, plug in the network, set up the firewall rules, change DNS and get out of Dodge… Right… Did I mention LEGACY application originally on Solaris 8 and now running on Solaris 10 in zones.
First item - reduction of network ports. This box was multi-homed in the past to talk to a management network, a storage network, and a production network. Now it lives in a single VLAN.
Second item - New IP addresses. Can’t use the old ones, we have a different scheme in this data center
Third item - We don’t use Solaris daily, so it’s always a challenge to remember the quirks, especially when you only started using 10 for this specific system. It is sufficiently different from older versions that you constantly get caught doing things the old way and they no longer work as expected.
Fourth item - Zones are not a full OS. Don’t forget that.
Fifth item - Nothing is ever as easy as it looks :)
So… off we go. make the appropriate host file changes, restart networking, most things look good… wait, the adapter aliases have the old addresses still… right, zone configuration files need to be edited and changes made there. Sun provides tools to edit these. Use them unless you are feeling either confident or lucky. Don’t forget to use the sanity checker tool before you restart the newly changed zones.
OK, we have most of what we want, but some things are stil off. I can ping everything except “my own interface address” by name. No matter what we do, the names appear to resolve to the old addresses… messing with nsswitch.conf, resolv.conf and the host files does not resolve it. Eventually, my partner in crime (remember I said to not do this alone?) finds an additional file in /etc/inet - ipnodes that has the old addresses in it. Making the changes there solve a few more items.
OK, accessing each zone’s interface from the same zone now works. Some of the legacy items need this.
The management tools for the software show things down. Well, look at that, embedded addresses. What ever happened to using a name?
More fixing…
Things are still not right. It’s now 18:00, we haven’t eaten since 09:00. I’m of the opinion we have to be close. Maybe a break for food and some time away will bring fresh perspective. We are now at the point of deciding if we need a hotel room for another night and if we need to consider backing out and returning to Ottawa. This is a path I do not want to go down.
1 hour for food and talk. What a difference the break makes. We need to search the file system for config files. The tools show that old addresses are still being used somewhere.
The ancient web server platform has a registry file with (you guessed it) embedded addresses. This is Unix, not Windows. A registry file?
More fixing…
This time we look for a registry file for the app server - hey, there is one with… embedded addresses…
More fixing
We have managed to get one instance up and runining now. The other is being stubborn. Eventually I figure it out. We need another host file entry so that it doesn’t try searching DNS, as this is all behind a firewall that doesn’t permit returning through the same interface. external addresses == very bad :)
Shortly after this, we had success.
Anyway, around 21:00, it was time to pack up and drive back. The actual road trip started closer to 10:00 we were back in Ottawa by 02:30.
I guess the short answer is always have someone with you to bounce ideas off of :D