At the Hadrian Hotel

At the Hadrian Hotel

Saturday, December 06, 2008

Shosholoza

My son Eric neglected to tell me that he had a duet at last night's winter concert (his last one of high school). If I had known, I might have brought something better than my Canon PowerShot A710 to record this. Oh, Eric is the boy on the right. :-)

Wednesday, October 22, 2008

Cluster Node-Locking with Torque and Maui

These are mostly notes to myself so that I can figure out how to do this more quickly next time...

We needed to add some nodes to a Rocks 4.1 cluster where members of a particular lab were to have exclusive use of the nodes for a period of time. So, we had to find a way to allow these folks to submit jobs that would run only on the new nodes and to also prevent anybody else from running on the nodes. We chose a belt-and-suspenders approach using features of both Torque (PBS) and Maui.

Previously, we had a single "default" queue for all users of this cluster. We added a "vision" queue for the users of the new machines so that they would be able to explicitly request that their jobs run on the new hardware. This queue specifies ACLs for the node list as well as the users allowed to submit jobs to the queue. In addition, there is a "neednodes" resource specified that gives Maui a clue as to where any jobs in this queue can be run. Here are the commands we ran to set up the queue:

qmgr -c "create queue vision queue_type=execution"
qmgr -c "set queue vision resources_default.neednodes = vision"
qmgr -c "set queue vision acl_hosts=compute-0-22+compute-0-23+compute-0-24"
qmgr -c "set queue vision acl_host_enable = false"
qmgr -c "set queue vision acl_users=user1"
qmgr -c "set queue vision acl_users+=user2"
qmgr -c "set queue vision acl_users+=user3"
qmgr -c "set queue vision acl_user_enable=true"
qmgr -c "set queue vision enabled = True"
qmgr -c "set queue vision started = True"

The acl_host_enable = false setting causes Torque to use the acl_hosts list as nodes on which jobs should be queued, rather than as nodes that can run the qsub command. Note that there does not appear to be a way to set multiple acl_users in a single command. While a "list queue" command will show the users in a comma-separated list, if you try to set the ACL that way you get a syntax error. The same can be said for the method of using a plus sign as is done for the hosts ACL.

In addition to setting up the vision queue, a change was needed for the default queue and to the Torque nodes file which, in our case, was /opt/torque/server_priv/nodes but generically would be found at $TORQUE_HOME/server_priv/nodes. We added a "neednodes" resource to the default queue as we did for the vision queue:
qmgr -c "set queue default resources_default.neednodes = general"


For each of the 3 new machines, we appended the word "vision" to the line defining the node like so:
compute-0-22.local np=4 vision

For the rest of the nodes in the file, we added the word "general" like so:
compute-0-0.local np=4 general

After restarting the pbs_server and maui daemons, the end result was that anybody could submit jobs to the default queue and they would run on any node except the 3 nodes dedicated to the vision lab. Only specific users could submit jobs to the vision queue and those jobs would only run on the 3 new machines. This is just what we were looking for. If we ever want to allow everybody to use the new nodes from the default queue, I believe that it should be as simple as appending the word "general" to the "vision" nodes in the server_prive/nodes file.

Technorati Tags: , , , ,

Monday, September 22, 2008

Space Shuttles and Rainbows


A co-worker of mine pointed me to this picture that he saw on digg. That prompted me to go off and find the whole series, which I tagged on del.icio.us.

Technorati Tags: , , , , ,

Wednesday, August 20, 2008

It's the Little Things

The other day, I discovered yet another little useful thing about the iPhone 3G. I don't know if the original iPhone did this, but the new one remembers volume settings in the iPod application. I normally use a cassette adapter to listen to podcasts when I drive into work. The first time I plugged in the iPhone earbuds/mic, I lowered the volume from the "full" I needed for the cassette adapter. A little later, I tried the built-in speaker and had to adjust the volume again.

The next time I went to listen in the car, I plugged in the adapter and the volume jumped all the way to full on its own. I unplugged the adapter and the volume dropped back to the value for the built-in speaker. I then tried the earbuds and the volume slider moved again. I know this may seem like a little thing, but it's kinda nice that I don't have to muck with the volume every time I want to listen a different way....

Technorati Tags: , ,

Drinking the Kool-Aid

It looks like the iPhone Kool-Aid has been flowing in my department, and it seems that everybody but my manager has imbibed. At first, it was just Steve who really wanted one. Then, I think he slipped something into my coffee one morning and I decided to give it a go as well. One-by-one, 4 more guys in the group decide to take the plunge (cue the big red pitcher saying "Oh yeah!"). Even the guy who really wanted the Blackberry Bold decided to go for it. So far, Shazam and SimplifyMedia are the hit applications....


Technorati Tags: , , ,

Wednesday, June 11, 2008

How Not to Upgrade Your Computer Room UPS

The Computer Science Department at Princeton University needed to replace its computer room UPS. The old unit was a PowerWare 9330-40, a 208V 40 kVA UPS, installed in July 2005. When this box was first installed, the load was around 60%. Over time this has crept up to over 90%, causing us to believe that an upgrade was in order.

The new UPS is a PowerWare 9390-80, a 480V 80 kVA UPS that fills two 6 foot tall racks and weighs in at something near 4000 lbs -- we needed to add additional supports under the raised floor to handle the increased weight load. The UPS and its wall-mounted bypass switch cabinet arrived a few weeks ago, were put into the computer room, and were wired together in preparation for the big day.

That day was Tuesday, June 10, when we had scheduled a full computer room downtime from 6 AM until 2 PM. We figured that this should be plenty of time to unwire the old 208V UPS, re-wire our Liebert power distribution center back to factory specs (we had to change things around in order to accommodate a 208V UPS in a room with a 480V power feed), and get the new UPS and the computers and network gear up and running. Well as they say, "Two out of three ain't bad."

We had everything in the room shut down and powered off somewhere between 6:00 and 6:30, allowing the electrician to get started re-wiring at 7 AM. The PowerWare technician arrived around 9 and started his prep and check work on the new UPS. He needed to connect all of the batteries together and test voltages, check over all of the pre-wiring work, as well as a bunch of other stuff required before the unit could be turned on.

Most of the morning, everything seemed to be going along fine. The electrician finishes ripping out the old UPS wiring hacks and gets the new UPS and bypass switched wired in the way they should be. The UPS technician does his initial power-on and starts testing the system. Some time was lost when the electrician had trouble getting lugs installed on a breaker, but that was only about a 30 minute diversion. Then the first problem hit...

There are three panel boards in the Liebert system for power distribution, each of which has a 225A main breaker. When we went to put some test load on the UPS, one of those main breakers refused to stay on. It didn't trip, the handle just slowly moved from "on" to "off" on its own. It never made that satisfying "click" when you turned it on. Since about half of our load was on that panel board, this was going to be a problem.

The electrician kept working at it, and eventually the breaker clicked into the on position and stayed there. We're planing on having it replaced the next time we take the room down, but we decided to live with it for now. If the breaker goes, we'll just be replacing it a bit sooner than we planned.

At some point, the PowerWare technician decided that he didn't like the path that some of the signal wires took between the two UPS cabinets and decided they should be re-routed. During that re-wiring process, he discovered that he could no longer get the battery breaker to turn on, rendering the UPS essentially useless. He puzzled over this for quite a while (hours, in fact) and decided that we needed a replacement circuit breaker for the battery cabinet. Since this was a new UPS start-up and not a failure situation in the eyes of the company, the earliest we could get a replacement breaker was Wednesday morning at 8:30 AM. Even getting this required quite a bit of the technician's time on the phone.

We had underestimated a little on how long the electrician would need to complete his tasks, but if that and the 225A breaker were the only issues, we would have only missed our downtime window by about an hour. However, the problems with the new UPS stretched our downtime until 6 PM, four additional hours (for 12 hours of downtime). At this point, we had to decide whether or not to bring up the computer room without a UPS. As there were thunderstorms predicted overnight, we decided to leave everything off. Considering that the lights blinked out later that night, I believe we made the correct decision.

Fast forward to Wednesday at 8:30 AM (26.5 hours of downtime). Two replacement battery cabinet circuit breakers were delivered by FedEx. The PowerWare technician arrived around 9:15 and got to work. He was not able to find the hex wrench he needed to remove the wires from the breaker, so one of us went to track one down. By the time we found a wrench, the technician had found his and got to work. The replacement breaker was installed and when the technician tried to turn it on, it didn't work either. By 11 AM (29 hours of downtime), the cause of the problem was still unknown. By 1 PM (31 hours of downtime) , after examining his schematics (which didn't appear to exactly match the UPS we had) and making a number of phone calls, the technician finally figured out that the breaker required 48V available in order to be switched on and that the fuse protecting that 48V circuit had blown. This is an $11 part that had taken, by this time, over 8 hours of effort to identify as the source of the problem. The total cost of this UPS is around $80,000.

Of course, the technician didn't have one of these fuses in his van. Two of my co-workers called around to a number of electrical houses and eventually found some. By around 3 PM (33 hours of downtime), the technician had replaced the fuse and was running his final checks. We decided to test the EPO (Emergency Power Off) switch before we put any load on the system. It did absolutely nothing. The wire from the EPO switches had never been connected to the UPS control input. Once again the UPS was opened up and the wire connected. We test again and the UPS only went into bypass, but didn't shut down. This is not exactly what we would have expected a "power off" button to do. The technician did some studying of the documentation, changed some settings and we tried again, with the same result. It turns out that we need to hold EPO switch for at least 3 seconds for the UPS to actually shut down. None of us like this, but I guess we'll have to live with it for now.

We got our operational training so that we could properly take the UPS in and out of maintenance bypass, and we started to bring up the room. It was now around 5 PM (35 hours of downtime). With everything down for so long, we decide that caution was the order of the day when we brought the world back up. What normally takes 45 minutes to an hour took 90 minutes. By 6:30 PM (36.5 hours after our adventure started), everything was back up and we were starting to receive all of the EMail that had been queued up for us around the world. All of us in the Computer Science Department are hoping that we never have to do this again.

Technorati Tags: , , , ,

Monday, June 09, 2008

me.com


me.com
Originally uploaded by Chris Tengi
Today, Apple announced me.com, the re-branding of .mac. At least, I think that was their intent....

Technorati Tags: , , ,

Saturday, June 07, 2008

Back in the Sadle

Yesterday, I rode my bicycle to work for the first time in about 15 years. The reason for the long hiatus was mostly inertia. It was always: "I really should try riding to work again." Now, the last time I did this, I had a front tire blow-out on the way home and went flying over the handlebars, making a perfect 3-point landing on my knee, hand and head. Fortunately I was wearing a helmet and gloves, so my knee was the only part of me that sustained any damage. I'm firmly convinced that if I hadn't been wearing a helmet, my injuries would have been far worse than a scraped up knee. This little incident is one of the reasons that I'm a strong advocate for bicycle helmets.

But back to the current ride...

Like many Americans, I don't get nearly enough exercise, so I thought that riding my bike to work a couple of times a week might be a good thing to do, now that the weather has gotten nice. At 8.6 miles each way, it's not a bad little ride. It took me about 45 minutes both ways in light traffic. Of course, it wasn't the traffic that made the ride take that long, it was just me.

In my car it takes about 20 minutes to go from home to work, so only taking a little more than twice that for a bike ride was actually pretty good. And the ride itself was very pleasant, despite the fact that most of the ride was along 8 to 12 inch shoulders. Both in the morning and the evening, drivers gave me plenty of room as they passed and it was 68F for my morning ride in and 86F for the afternoon ride home.

The forecast for Monday is 97F, so I don't think riding then will be such a good idea, but Wednesday and Friday look good, so maybe I'll get in 2 rides next week. With any luck, I'll be back in the saddle and the only thing I'll have to worry about is getting saddle sore.

Technorati Tags: , , ,

Friday, April 04, 2008

Project 365 (+42)

Yesterday, I finally finished up my flickr Project 365 work. I really didn't plan on it taking exactly 42 days more than the 365, but you know that 42 is the answer to the ultimate question of Life, the Universe, and Everything, so nobody should be surprised. I actually took the last picture on Feb. 14 and uploaded the last batch on April 2, but I didn't finish sending the last ones to the Project 365 Group until yesterday.

For those of you that don't know, the idea behind Project 365 is to take a new picture (or 2, or 3, or 4, or....) every day, with the goal of improving your photographic skill and technique. Over the course of a year, I've taken literally thousands of pictures (gotta love digital cameras!) . I've gotten some OK shots, some really nice ones, and some that never made it off of the camera. I got a new camera, broke the new camera, moved back to the old camera, and moved back after getting the new one repaired. Some of the shots I found easily, and some were desperation shots taken just before (or in some cases shortly after) midnight. During the last few months, I started really obsessing about getting a picture every day and my wife suggested, rightly so, that I should just quit. However, I really wanted to see this through, so I just kept at it.

So, did I get a new picture every day? No, but I did take one to be used on the days I missed. Did this project make me a better photographer? Maybe yes, maybe no. I did learn how to use some of the features of my cameras that allowed me to take pictures I would not have otherwise been able to take, so that's something gained. Would I do this again, or something like it, in the future? No, I have too many other things to do. That being said, I don't have any regrets about the project and wouldn't caution others against it. I would just suggest that they go in with their eyes open and that they not get obsessive about it. It's all just a bunch of bits anyway. ;-)


Technorati Tags: , ,

Tuesday, March 25, 2008

Eric's Whirlwind Spring Break College Tour


It's that time of year when parents of high school juniors all across the country venture forth to visit colleges and universities. We, as you might expect, are no exception. Follow our progress at http://cjtengi.jaiku.com

Thursday, March 06, 2008

I Love Planned Economy

Last night at ETech 08, Comrad Nikita Chrusov held a BOF session where he told us all about the leading-edge technology used by his country. He ended his presentation with a song....




      Thursday, February 21, 2008

      Disk Encryption May Not Protect Your Data

      Ed Felten and his research group have found a fairly easy way to defeat disk encryption technologies used by MS-Windows, Mac OS X, and Linux. It turns out that encryption keys in DRAM can be recovered fairly easily if you have physical access to a laptop either powered-on or in sleep mode.

      Technorati Tags: , , , ,

      Wednesday, February 20, 2008

      LoC Photo Collection on Flickr

      I just came across this post in the flickr blog talking about "The Commons." It looks like the Library of Congress has made a chunk of their photo collection available on flickr. This is really cool. From what I've heard, the LoC has quite a collection. Not only have they made these photos available, they're encouraging flickr users to tag them, making everything searchable. And if that isn't enough, the LoC includes a "Persistent URL" in the photo description that takes you to a page where you can download a higher-res JPEG or their archival TIFF file. The 137 MB TIFF I downloaded was 1800x1800 DPI, which made zooming and panning quite a bit of fun.

      Sunday, February 17, 2008

      Bonjour Printing to a Linux CUPS Server

      Bonjour (aka ZeroConf - mostly) printing looks like a great way to print from a Mac to a Bonjour-capable printer on the same IP subnet. However, on a network where the printers are not on the same subnet as the hosts that want to use them, something else needs to be done.

      In our case, we use a Linux box running CUPS with the cups-lpd listener. All of the printers are on their own private subnet that can only be accessed form within the Computer Science department. All printing is intended to go through the CUPS server, which very nicely takes care of format conversions as needed. This architecture also gives us the ability to do print job accounting and to move jobs to other printers should one fail with jobs in the queue.

      The problem is that anybody with a Mac or PC that wants to print something needs to add a network printer definition to their machine. Thanks to the Bonjour capabilities of the Mac, we have been able to use static DNS entries to define and advertise all of our print queues to any Mac user. Here's what we did....

      For our printer advertisements we are using statically defined entries in our DNS files to enable DNS Service Discovery (DNS-SD) without the need for the Multicast DNS used by Bonjour-enabled printers to advertise themselves. We start with some boilerplate resource records in our main zone file:

      ;; Service Browsing
      b._dns-sd._udp PTR @
      lb._dns-sd._udp PTR @

      ;; Available Services
      _services._dns-sd._udp PTR _ipp._tcp
      _services._dns-sd._udp PTR _printer._tcp

      While we only use LPD for printing (the '_printer' record above), I've included the '_ipp' record for those who prefer a more more printing protocol.

      The next thing to do is define the print queues themselves, making sure to specify the various DNS records required by Bonjour printing support. Most of the information was found in the Apple Bonjour Printing Specification, but there were a few bits here and there that I put together from various searches around the 'net. I also used a bit of tcpdump sleuthing to discover what actual Macs and printers announce when advertising Bonjour printers. With all that said, here is an example print queue entry:

      _printer._tcp PTR hp_218._printer._tcp
      hp_218._printer._tcp SRV 0 0 515 lpdrelay.example.edu.
      TXT "txtvers=1" \
      "qtotl=1" \
      "rp=hp_218" \
      "ty=HP LaserJet 4000 Series PS" \
      "product=(HP LaserJet 4000 Series)" \
      "transparent=t" \
      "copies=t" \
      "duplex=t" \
      "color=f" \
      "pdl=application/pdf,application/postscript"

      Please note that the "\"-escaped newlines are only included for readability. One key insight I picked up from a mailing list post was that the TXT record fields all needed to be on a single line (at least for the version of BIND I'm running). The Bonjour printing specification allows for only a single TXT record, but requires multiple attributes.

      The txtvers, qtotl, transparent, copies and pdl attributes are the same for all of our printers. The rp attribute specifies the name of the print queue on the CUPS server. The ty attribute provides a "display name" for the user's printer browser and product needs to be the same value as the Product specification in the printer's PPD file. I suspect that the values for color and duplex are fairly obvious. :-)

      In our DNS configuration file, we have a definition similar to the one above for every printer that our CUPS server is driving, and while we are currently using the LPD protocol, we could just as just as easily use IPP for printing. For IPP, the above example would require 2 simple changes: All instances of _printer would be replaced with _ipp and the 515 port number in the SRV record would be changed to 631. Both configurations have been tested and appear to work just fine.

      If you have any interest in reading further, I have saved a number of bookmarks concerning Bonjour and DNS-SD in del.icio.us.

      Monday, January 14, 2008

      Computing in the Cloud: Possession and ownership of data

      The first panel on Monday afternoon concerned the possession and ownership of data. Moderated by Ed Felten, the panelists were:

      The panelists presented their positions and answered audience questions as well as responding to each other's positions. I don't feel that I can do justice to the panelists and what they said, so I'll just give a little flavor of what was said. When the video recording of the session becomes available, I'll add at the bottom of this post.

      Tim Lee started things off with the position that privacy is governed by a series of trade-offs. Some data sharing is a pre-requisite for any useful online service and users are generally willing to give up some privacy in return for a valuable service. Some users will be willing to share more information for more value. Tim also spoke a bit about the history of browser cookies, GMail, and the Facebook news feed. All 3 of these things were initially looked at negatively by at least some segment of the user community. In all 3 cases, users became more accepting as they learned more about how the technologies worked, what "opt-out" options existed, and what benefits the users could derive from the technologies. A key point Tim made is that having private companies collecting data about you is less troubling than having the government doing the same. If you don't like the policies of a particular service provider, you can choose not to use that provider, as there are others around with different policies. There are no such choices available when it comes to the government.

      Joel Reidenberg focused no 3 sets of implications: ownership of data, embedded values in the architecture, and irony. Data ownership is really how you get to use the bits and bytes. Fair information practice standards provide a control here. However, if data usage is based on a user consent model and the user doesn't understand, how can the model be effective? Joel also raised the question of whether data on social networking sites is public or private. Despite what many users may think, the data is generally public and can be accessed by anybody (including law enforcement). Next up, Joel talked about how privacy values are embedded in the architecture of a given technology. With the Facebook beacon fiasco, we got to see "how the data mining sausage was made" and it bothered quite a few people. We got to see what was going on behind the scenes in a way that was quite graphic when compared to GMail's ad scanning. Joel said that data privacy rules have to focus on effective transparency and proposed that a data usage rule set should travel along with data wherever it goes. Finally, he spoke of the irony that cloud computing actually opens the doors for privacy enhancement. Centralized data holders are easier to find, regulate and prosecute. However, we will need more cooperation in the future between lawmakers and standards bodies if we are to have effective data privacy standards and rules.

      Marc Rotenberg gave an introduction to privacy culture. He presented the concept of fair information practices where the entity that collects data on individuals takes on obligations for security, accuracy and rights of access, among others. The custodian of the data has the responsibility to prevent "bad things" from happening to the data. Privacy people by and large believe that technology can be a solution to privacy problems, but the techniques need to be evaluated: having secure encryption keys will protect your data, but having a key escrow system will erode that protection in at least some (if not all) cases. Anonymity is critical to privacy. A person's actual identity should not be required to determine if they have the credentials to use a given service. Also, there is a paradox in that much of privacy is about transparency. Imposing obligations on custodians to be more open and accountable about the data they collect makes it easier to ensure that the data will only be used in known ways. The greater the secrecy about how data is being collected, the greater the possibility that it can be used in negative ways without people learning about it.

      As I was trying to actually pay attention to what was being said as I was taking notes, I feel that I may have given short shrift to all 3 presenters and I encourage you to watch the video of the panel (once it becomes available - please check back for an update). That way you'll hear first-hand what they had to say. You'll also get to hear the lively debate that took place during the rebuttal and audience question section.

      UPDATE: The video recordings from the workshop are now available at the Princeton UChannel.

      Computing in the Cloud Workshop

      Today is day one of the Computing in the Cloud Workshop being presented by Princeton's Center for Information Technology Policy. After opening remarks by H. Vincent Poor, Dean of the School of Engineering and Applied Science, Ed Felten got things rolling with "Computing in the Cloud: What, How and Why."

      Starting with definitions of Cloud Computing from John Markoff, Wikipedia, the MIT Technology Review and Eric Schmidt, Ed then went on to expand on them and delve into the history and some of the implications. It's all about location, but why does it matter where the data and software actually is? Possession of data implies control, and control implies power. Whomever owns the systems on which data resides has the ultimate control of how that data is retained and who has access. If, for example, all of your EMail is in your Google Mail account, how confident are you that what you delete is actually gone forever? Are you confident that your data on a third party server will not be accessible by anybody else, except as you decide? If the government presents a subpoena to the holder of your data, what, if anything, will be released?

      Ed also gave a broad over of how we got here, talking about the swing back and forth between centralized computing and a more distributed model. Early on, computers were big and expensive so there was an economic incentive to have the users come to the computer. This was followed by timesharing, where users had terminals at remote locations (such as their office), but the actual computer was still in a large, air-conditioned room somewhere. In the late 1970s and early 1980s PCs and Sun Workstations (for example) were available at a low enough cost that individual users could now have local computing. This gave users more autonomy and the potential for a more rich user interface, but at the loss of the lower cost per user, expert management and higher utilization that centralized computing facilities could provide.

      During the 1980s and 1990s, the client-server computing model gained popularity but was soon overtaken by the World Wide Web. This swing of the pendulum took us back to a more centralized model of computing where all of the data and manipulations took place on a remote computer and the results were displayed locally, as in early timesharing. In the early 2000s, the web browser became more like a computing device as AJAX and other programming models came into existence. More like the client-server model, some computation takes place on the remotre "back end" and some on the local computer. This brings with it all of the complexities of client-server computing along with those inherent in trying to shoe-horn a computing engine into the browser. In addition, these applications are typically written in multiple programming languages such as SQL for database access, PHP for page generation, and combinations of HTML, XML and Javascript for local processing and display.

      The modern tools and infrastructure available today make many interesting "real-time" applications possible. For example, during the Iowa Caucuses, the Democratic Party was able to utilize infrastructure from Amazon to present an "Iowa Democratic Party Caucus Results" web page, that was kept updated as results came in and was not adversely effected by the amount of traffic the page received. The tools of today also allow the creation of sites such as Facebook and ebay. Sites such as these would not have been easily created in the past.

      With disk storage prices dropping and as a side-effect of the AJAX-type programming model, data is continually building up in remote data centers. It is in the best interest of the data center owner to hold onto that data for as long as possible as there is probably some value that can be extracted from it along the way.

      There are additional concerns and implications to having your data on somebody else's server. How portable is your data? Can you easily extract it and move it to another provider if you so choose? Does your current provider have data retention policies that meet your needs? When you access your data, how secure and private is the connection between your computer and the provider's site? If an intermediary has lots of customer data and amkes it difficult for customers to move that data, the provider gains market power.

      Concerns such as those above can be addresses in a number of ways. If a cloud computing provider is a "community" then the members of that community have a say about how their data is managed. A provider may also decide that they won't "be evil" and if you trust them to follow through, you may feel more secure about your data. There are also the options of ex post regulations that would control how a provider that has already amassed data must manage the data or ex ante agreements where the provider makes promises up-front as to how they will deal with data on their servers.

      A number of the above issues and concerns were addressed in the first afternoon panel discussion: "Possession and Ownership of Data."

      UPDATE: The video recordings of the workshop are now available at the Princeton UChannel.