2005-11-17

Unconventional firmware update on a p5-520

Today I connected a p5-520 to one of my HMCs, in order to partition it and to manage it remotely afterwards. To this day the machine was running in stand-alone (non-partitioned) mode. The plan was to "encapsulate" the existing OS inside an LPAR, assigning to it the resources it was using, and removing those that weren't used. A straight RJ45 t/p cable was passed beforehand to connect the machine to the private HMC DHCP LAN.

After connecting the cable, the HMC discovers the new managed system automatically after about 5 minutes. So far so good. Note here that when a machine is transitioned like this from stand-alone to partitioned mode, you end up with a single LPAR designated as the service partition for the managed system. The name of the LPAR matches the system serial number, and it has only one default profile, named accordingly "default". I decided to delete this LPAR and start from scratch, so I had to go into the managed system's properties and set the service partition to "unassigned" before I could delete it.

Now, I create my new LPAR and with its default (and only) profile. That's when the strangeness started. After creation, I go back into the profile to verify the resources, and discover that all the fields are greyed (inactive), and no modifications are possible!! I can't add/remove any resources, nor change the name of the LPAR or profile! I know this is not right because I have a dozen other LPARs on the same HMC (but in different managed systems), without this kind of problem.

My first guess is that this is a firmware problem, so I check the machine's firmware level and find that it is at level SF223, the latest being SF235 at this moment. Between them there are several releases with severity HIPER, as shown here. That pretty much got me decided, so I downloaded the latest SF235 level to another machine accessible via FTP. According to the docs, system firmware can be updated either from the HMC or from an AIX partition (which I presume must be the service partition, didn't read in detail). I decided to update from the HMC, and maybe try the other method on another occasion.

Before starting the firmware update, I deleted the partition and profile that I have created, powered the machine down and brought it back to the "partition standby" state.

The update involved several phases:

1) fetching the firmware package (one RPM package and one XML file) from the other machine via FTP
2) installing the updates
3) powering down the system
4) apparently rebooting the service processor (at one point the operator panel value showed "Firmware not ready")
5) powering on the system into the "standby" state

During phase 2 I got a huge scare when looking in parallel at the firmware description (see above link), and finding this:

Before attempting to load this system firmware please ensure that your HMC software has been upgraded to Version 5, Release 1.0.

Shit!! My HMC is at level 4.5.0! I could see myself calling IBM, saying that I attempted a firmware update on an outdated HMC, getting brushed off for not reading the procedure carefully enough, etc, while users were already calling asking when the machine is coming back up. I would be humiliated twice on the same day (the first time for a failed RAM upgrade on the same machine, but that's another story).

Meanwhile, phase 2 is taking forever, cancelling and backing out is impossible (except killing the window and/or rebooting the HMC, but I didn't have the guts). So I just waited and prayed. Luckily for me I asked the users for a whole day's downtime beforehand, so I told them that they should not expect the machine today, and was ready to go to the site (the machine is 1.5 hours away by train and bus), look at the LEDs and call IBM from there (I hate debugging this kind of problems without physical access to the machine). Then I saw the process going into phase 3 and my hopes were restored. The whole thing took in excess of 30 minutes, and then - success!! WHEW. Just in case, I rebooted the HMC before creating the LPAR again. Then I recreated the LPAR and profile, went back into the profile, and no more greyed fields! Problem resolved. I boot the system, and everything is fine and dandy. I haven't felt this kind of relief in a long time.

So, in spite of IBM's recommendation (or should I say "requirement"), installing firmware SF235 on a p5-520 via an HMC at level 4.5.0 works. Don't know if it will work for other models, but I don't think I'll take my chances :-)

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home