Monday, January 14, 2008
Starting with definitions of Cloud Computing from John Markoff, Wikipedia, the MIT Technology Review and Eric Schmidt, Ed then went on to expand on them and delve into the history and some of the implications. It's all about location, but why does it matter where the data and software actually is? Possession of data implies control, and control implies power. Whomever owns the systems on which data resides has the ultimate control of how that data is retained and who has access. If, for example, all of your EMail is in your Google Mail account, how confident are you that what you delete is actually gone forever? Are you confident that your data on a third party server will not be accessible by anybody else, except as you decide? If the government presents a subpoena to the holder of your data, what, if anything, will be released?
Ed also gave a broad over of how we got here, talking about the swing back and forth between centralized computing and a more distributed model. Early on, computers were big and expensive so there was an economic incentive to have the users come to the computer. This was followed by timesharing, where users had terminals at remote locations (such as their office), but the actual computer was still in a large, air-conditioned room somewhere. In the late 1970s and early 1980s PCs and Sun Workstations (for example) were available at a low enough cost that individual users could now have local computing. This gave users more autonomy and the potential for a more rich user interface, but at the loss of the lower cost per user, expert management and higher utilization that centralized computing facilities could provide.
The modern tools and infrastructure available today make many interesting "real-time" applications possible. For example, during the Iowa Caucuses, the Democratic Party was able to utilize infrastructure from Amazon to present an "Iowa Democratic Party Caucus Results" web page, that was kept updated as results came in and was not adversely effected by the amount of traffic the page received. The tools of today also allow the creation of sites such as Facebook and ebay. Sites such as these would not have been easily created in the past.
With disk storage prices dropping and as a side-effect of the AJAX-type programming model, data is continually building up in remote data centers. It is in the best interest of the data center owner to hold onto that data for as long as possible as there is probably some value that can be extracted from it along the way.
There are additional concerns and implications to having your data on somebody else's server. How portable is your data? Can you easily extract it and move it to another provider if you so choose? Does your current provider have data retention policies that meet your needs? When you access your data, how secure and private is the connection between your computer and the provider's site? If an intermediary has lots of customer data and amkes it difficult for customers to move that data, the provider gains market power.
Concerns such as those above can be addresses in a number of ways. If a cloud computing provider is a "community" then the members of that community have a say about how their data is managed. A provider may also decide that they won't "be evil" and if you trust them to follow through, you may feel more secure about your data. There are also the options of ex post regulations that would control how a provider that has already amassed data must manage the data or ex ante agreements where the provider makes promises up-front as to how they will deal with data on their servers.
A number of the above issues and concerns were addressed in the first afternoon panel discussion: "Possession and Ownership of Data."
UPDATE: The video recordings of the workshop are now available at the Princeton UChannel.