The new UPS is a PowerWare 9390-80, a 480V 80 kVA UPS that fills two 6 foot tall racks and weighs in at something near 4000 lbs -- we needed to add additional supports under the raised floor to handle the increased weight load. The UPS and its wall-mounted bypass switch cabinet arrived a few weeks ago, were put into the computer room, and were wired together in preparation for the big day.
That day was Tuesday, June 10, when we had scheduled a full computer room downtime from 6 AM until 2 PM. We figured that this should be plenty of time to unwire the old 208V UPS, re-wire our Liebert power distribution center back to factory specs (we had to change things around in order to accommodate a 208V UPS in a room with a 480V power feed), and get the new UPS and the computers and network gear up and running. Well as they say, "Two out of three ain't bad."
We had everything in the room shut down and powered off somewhere between 6:00 and 6:30, allowing the electrician to get started re-wiring at 7 AM. The PowerWare technician arrived around 9 and started his prep and check work on the new UPS. He needed to connect all of the batteries together and test voltages, check over all of the pre-wiring work, as well as a bunch of other stuff required before the unit could be turned on.
Most of the morning, everything seemed to be going along fine. The electrician finishes ripping out the old UPS wiring hacks and gets the new UPS and bypass switched wired in the way they should be. The UPS technician does his initial power-on and starts testing the system. Some time was lost when the electrician had trouble getting lugs installed on a breaker, but that was only about a 30 minute diversion. Then the first problem hit...
There are three panel boards in the Liebert system for power distribution, each of which has a 225A main breaker. When we went to put some test load on the UPS, one of those main breakers refused to stay on. It didn't trip, the handle just slowly moved from "on" to "off" on its own. It never made that satisfying "click" when you turned it on. Since about half of our load was on that panel board, this was going to be a problem.
The electrician kept working at it, and eventually the breaker clicked into the on position and stayed there. We're planing on having it replaced the next time we take the room down, but we decided to live with it for now. If the breaker goes, we'll just be replacing it a bit sooner than we planned.
At some point, the PowerWare technician decided that he didn't like the path that some of the signal wires took between the two UPS cabinets and decided they should be re-routed. During that re-wiring process, he discovered that he could no longer get the battery breaker to turn on, rendering the UPS essentially useless. He puzzled over this for quite a while (hours, in fact) and decided that we needed a replacement circuit breaker for the battery cabinet. Since this was a new UPS start-up and not a failure situation in the eyes of the company, the earliest we could get a replacement breaker was Wednesday morning at 8:30 AM. Even getting this required quite a bit of the technician's time on the phone.
We had underestimated a little on how long the electrician would need to complete his tasks, but if that and the 225A breaker were the only issues, we would have only missed our downtime window by about an hour. However, the problems with the new UPS stretched our downtime until 6 PM, four additional hours (for 12 hours of downtime). At this point, we had to decide whether or not to bring up the computer room without a UPS. As there were thunderstorms predicted overnight, we decided to leave everything off. Considering that the lights blinked out later that night, I believe we made the correct decision.
Fast forward to Wednesday at 8:30 AM (26.5 hours of downtime). Two replacement battery cabinet circuit breakers were delivered by FedEx. The PowerWare technician arrived around 9:15 and got to work. He was not able to find the hex wrench he needed to remove the wires from the breaker, so one of us went to track one down. By the time we found a wrench, the technician had found his and got to work. The replacement breaker was installed and when the technician tried to turn it on, it didn't work either. By 11 AM (29 hours of downtime), the cause of the problem was still unknown. By 1 PM (31 hours of downtime) , after examining his schematics (which didn't appear to exactly match the UPS we had) and making a number of phone calls, the technician finally figured out that the breaker required 48V available in order to be switched on and that the fuse protecting that 48V circuit had blown. This is an $11 part that had taken, by this time, over 8 hours of effort to identify as the source of the problem. The total cost of this UPS is around $80,000.
Of course, the technician didn't have one of these fuses in his van. Two of my co-workers called around to a number of electrical houses and eventually found some. By around 3 PM (33 hours of downtime), the technician had replaced the fuse and was running his final checks. We decided to test the EPO (Emergency Power Off) switch before we put any load on the system. It did absolutely nothing. The wire from the EPO switches had never been connected to the UPS control input. Once again the UPS was opened up and the wire connected. We test again and the UPS only went into bypass, but didn't shut down. This is not exactly what we would have expected a "power off" button to do. The technician did some studying of the documentation, changed some settings and we tried again, with the same result. It turns out that we need to hold EPO switch for at least 3 seconds for the UPS to actually shut down. None of us like this, but I guess we'll have to live with it for now.
We got our operational training so that we could properly take the UPS in and out of maintenance bypass, and we started to bring up the room. It was now around 5 PM (35 hours of downtime). With everything down for so long, we decide that caution was the order of the day when we brought the world back up. What normally takes 45 minutes to an hour took 90 minutes. By 6:30 PM (36.5 hours after our adventure started), everything was back up and we were starting to receive all of the EMail that had been queued up for us around the world. All of us in the Computer Science Department are hoping that we never have to do this again.
Technorati Tags: