At the Hadrian Hotel

At the Hadrian Hotel

Wednesday, June 11, 2008

How Not to Upgrade Your Computer Room UPS

The Computer Science Department at Princeton University needed to replace its computer room UPS. The old unit was a PowerWare 9330-40, a 208V 40 kVA UPS, installed in July 2005. When this box was first installed, the load was around 60%. Over time this has crept up to over 90%, causing us to believe that an upgrade was in order.

The new UPS is a PowerWare 9390-80, a 480V 80 kVA UPS that fills two 6 foot tall racks and weighs in at something near 4000 lbs -- we needed to add additional supports under the raised floor to handle the increased weight load. The UPS and its wall-mounted bypass switch cabinet arrived a few weeks ago, were put into the computer room, and were wired together in preparation for the big day.

That day was Tuesday, June 10, when we had scheduled a full computer room downtime from 6 AM until 2 PM. We figured that this should be plenty of time to unwire the old 208V UPS, re-wire our Liebert power distribution center back to factory specs (we had to change things around in order to accommodate a 208V UPS in a room with a 480V power feed), and get the new UPS and the computers and network gear up and running. Well as they say, "Two out of three ain't bad."

We had everything in the room shut down and powered off somewhere between 6:00 and 6:30, allowing the electrician to get started re-wiring at 7 AM. The PowerWare technician arrived around 9 and started his prep and check work on the new UPS. He needed to connect all of the batteries together and test voltages, check over all of the pre-wiring work, as well as a bunch of other stuff required before the unit could be turned on.

Most of the morning, everything seemed to be going along fine. The electrician finishes ripping out the old UPS wiring hacks and gets the new UPS and bypass switched wired in the way they should be. The UPS technician does his initial power-on and starts testing the system. Some time was lost when the electrician had trouble getting lugs installed on a breaker, but that was only about a 30 minute diversion. Then the first problem hit...

There are three panel boards in the Liebert system for power distribution, each of which has a 225A main breaker. When we went to put some test load on the UPS, one of those main breakers refused to stay on. It didn't trip, the handle just slowly moved from "on" to "off" on its own. It never made that satisfying "click" when you turned it on. Since about half of our load was on that panel board, this was going to be a problem.

The electrician kept working at it, and eventually the breaker clicked into the on position and stayed there. We're planing on having it replaced the next time we take the room down, but we decided to live with it for now. If the breaker goes, we'll just be replacing it a bit sooner than we planned.

At some point, the PowerWare technician decided that he didn't like the path that some of the signal wires took between the two UPS cabinets and decided they should be re-routed. During that re-wiring process, he discovered that he could no longer get the battery breaker to turn on, rendering the UPS essentially useless. He puzzled over this for quite a while (hours, in fact) and decided that we needed a replacement circuit breaker for the battery cabinet. Since this was a new UPS start-up and not a failure situation in the eyes of the company, the earliest we could get a replacement breaker was Wednesday morning at 8:30 AM. Even getting this required quite a bit of the technician's time on the phone.

We had underestimated a little on how long the electrician would need to complete his tasks, but if that and the 225A breaker were the only issues, we would have only missed our downtime window by about an hour. However, the problems with the new UPS stretched our downtime until 6 PM, four additional hours (for 12 hours of downtime). At this point, we had to decide whether or not to bring up the computer room without a UPS. As there were thunderstorms predicted overnight, we decided to leave everything off. Considering that the lights blinked out later that night, I believe we made the correct decision.

Fast forward to Wednesday at 8:30 AM (26.5 hours of downtime). Two replacement battery cabinet circuit breakers were delivered by FedEx. The PowerWare technician arrived around 9:15 and got to work. He was not able to find the hex wrench he needed to remove the wires from the breaker, so one of us went to track one down. By the time we found a wrench, the technician had found his and got to work. The replacement breaker was installed and when the technician tried to turn it on, it didn't work either. By 11 AM (29 hours of downtime), the cause of the problem was still unknown. By 1 PM (31 hours of downtime) , after examining his schematics (which didn't appear to exactly match the UPS we had) and making a number of phone calls, the technician finally figured out that the breaker required 48V available in order to be switched on and that the fuse protecting that 48V circuit had blown. This is an $11 part that had taken, by this time, over 8 hours of effort to identify as the source of the problem. The total cost of this UPS is around $80,000.

Of course, the technician didn't have one of these fuses in his van. Two of my co-workers called around to a number of electrical houses and eventually found some. By around 3 PM (33 hours of downtime), the technician had replaced the fuse and was running his final checks. We decided to test the EPO (Emergency Power Off) switch before we put any load on the system. It did absolutely nothing. The wire from the EPO switches had never been connected to the UPS control input. Once again the UPS was opened up and the wire connected. We test again and the UPS only went into bypass, but didn't shut down. This is not exactly what we would have expected a "power off" button to do. The technician did some studying of the documentation, changed some settings and we tried again, with the same result. It turns out that we need to hold EPO switch for at least 3 seconds for the UPS to actually shut down. None of us like this, but I guess we'll have to live with it for now.

We got our operational training so that we could properly take the UPS in and out of maintenance bypass, and we started to bring up the room. It was now around 5 PM (35 hours of downtime). With everything down for so long, we decide that caution was the order of the day when we brought the world back up. What normally takes 45 minutes to an hour took 90 minutes. By 6:30 PM (36.5 hours after our adventure started), everything was back up and we were starting to receive all of the EMail that had been queued up for us around the world. All of us in the Computer Science Department are hoping that we never have to do this again.

Technorati Tags: , , , ,

Monday, June 09, 2008
Originally uploaded by Chris Tengi
Today, Apple announced, the re-branding of .mac. At least, I think that was their intent....

Technorati Tags: , , ,

Saturday, June 07, 2008

Back in the Sadle

Yesterday, I rode my bicycle to work for the first time in about 15 years. The reason for the long hiatus was mostly inertia. It was always: "I really should try riding to work again." Now, the last time I did this, I had a front tire blow-out on the way home and went flying over the handlebars, making a perfect 3-point landing on my knee, hand and head. Fortunately I was wearing a helmet and gloves, so my knee was the only part of me that sustained any damage. I'm firmly convinced that if I hadn't been wearing a helmet, my injuries would have been far worse than a scraped up knee. This little incident is one of the reasons that I'm a strong advocate for bicycle helmets.

But back to the current ride...

Like many Americans, I don't get nearly enough exercise, so I thought that riding my bike to work a couple of times a week might be a good thing to do, now that the weather has gotten nice. At 8.6 miles each way, it's not a bad little ride. It took me about 45 minutes both ways in light traffic. Of course, it wasn't the traffic that made the ride take that long, it was just me.

In my car it takes about 20 minutes to go from home to work, so only taking a little more than twice that for a bike ride was actually pretty good. And the ride itself was very pleasant, despite the fact that most of the ride was along 8 to 12 inch shoulders. Both in the morning and the evening, drivers gave me plenty of room as they passed and it was 68F for my morning ride in and 86F for the afternoon ride home.

The forecast for Monday is 97F, so I don't think riding then will be such a good idea, but Wednesday and Friday look good, so maybe I'll get in 2 rides next week. With any luck, I'll be back in the saddle and the only thing I'll have to worry about is getting saddle sore.

Technorati Tags: , , ,