At the Hadrian Hotel

At the Hadrian Hotel

Monday, January 14, 2008

Computing in the Cloud Workshop

Today is day one of the Computing in the Cloud Workshop being presented by Princeton's Center for Information Technology Policy. After opening remarks by H. Vincent Poor, Dean of the School of Engineering and Applied Science, Ed Felten got things rolling with "Computing in the Cloud: What, How and Why."

Starting with definitions of Cloud Computing from John Markoff, Wikipedia, the MIT Technology Review and Eric Schmidt, Ed then went on to expand on them and delve into the history and some of the implications. It's all about location, but why does it matter where the data and software actually is? Possession of data implies control, and control implies power. Whomever owns the systems on which data resides has the ultimate control of how that data is retained and who has access. If, for example, all of your EMail is in your Google Mail account, how confident are you that what you delete is actually gone forever? Are you confident that your data on a third party server will not be accessible by anybody else, except as you decide? If the government presents a subpoena to the holder of your data, what, if anything, will be released?

Ed also gave a broad over of how we got here, talking about the swing back and forth between centralized computing and a more distributed model. Early on, computers were big and expensive so there was an economic incentive to have the users come to the computer. This was followed by timesharing, where users had terminals at remote locations (such as their office), but the actual computer was still in a large, air-conditioned room somewhere. In the late 1970s and early 1980s PCs and Sun Workstations (for example) were available at a low enough cost that individual users could now have local computing. This gave users more autonomy and the potential for a more rich user interface, but at the loss of the lower cost per user, expert management and higher utilization that centralized computing facilities could provide.

During the 1980s and 1990s, the client-server computing model gained popularity but was soon overtaken by the World Wide Web. This swing of the pendulum took us back to a more centralized model of computing where all of the data and manipulations took place on a remote computer and the results were displayed locally, as in early timesharing. In the early 2000s, the web browser became more like a computing device as AJAX and other programming models came into existence. More like the client-server model, some computation takes place on the remotre "back end" and some on the local computer. This brings with it all of the complexities of client-server computing along with those inherent in trying to shoe-horn a computing engine into the browser. In addition, these applications are typically written in multiple programming languages such as SQL for database access, PHP for page generation, and combinations of HTML, XML and Javascript for local processing and display.

The modern tools and infrastructure available today make many interesting "real-time" applications possible. For example, during the Iowa Caucuses, the Democratic Party was able to utilize infrastructure from Amazon to present an "Iowa Democratic Party Caucus Results" web page, that was kept updated as results came in and was not adversely effected by the amount of traffic the page received. The tools of today also allow the creation of sites such as Facebook and ebay. Sites such as these would not have been easily created in the past.

With disk storage prices dropping and as a side-effect of the AJAX-type programming model, data is continually building up in remote data centers. It is in the best interest of the data center owner to hold onto that data for as long as possible as there is probably some value that can be extracted from it along the way.

There are additional concerns and implications to having your data on somebody else's server. How portable is your data? Can you easily extract it and move it to another provider if you so choose? Does your current provider have data retention policies that meet your needs? When you access your data, how secure and private is the connection between your computer and the provider's site? If an intermediary has lots of customer data and amkes it difficult for customers to move that data, the provider gains market power.

Concerns such as those above can be addresses in a number of ways. If a cloud computing provider is a "community" then the members of that community have a say about how their data is managed. A provider may also decide that they won't "be evil" and if you trust them to follow through, you may feel more secure about your data. There are also the options of ex post regulations that would control how a provider that has already amassed data must manage the data or ex ante agreements where the provider makes promises up-front as to how they will deal with data on their servers.

A number of the above issues and concerns were addressed in the first afternoon panel discussion: "Possession and Ownership of Data."

UPDATE: The video recordings of the workshop are now available at the Princeton UChannel.


No comments: