Tag Archives: Distributed Systems

Deep Into the Cloud Computing

In the last few months I haven’t posted at all so now it’s time to put some more useful links concerning Cloud Computing and all the stuff part of it.

Xkoto Resources ( database virtualization)

Open Kernel Labs Community

Useful Cloud Computing Blogs from High Scalability Website

And last but not list one interesting post discussing Google’s Cloud approach from x86Virtualization .

For the research type of readers I would recommend to have a look at a lecture offered by Ashraf Aboulnaga from Waterloo University:

Advanced Topics in Data Bases: Databases in Cloud Computing Environments

22.09.2008 Update links:

Paper: On Delivering Embarrassingly Distributed Cloud Services

Aster In-Database MapReduce

Setting up a VMware ESX test environment for free

Internet-Scale Services

Every now and then, I like to spend some time on reading articles and papers about developing and administrating database information systems.Today I came across the Lisa conference (Large Installation System Administration) and particularly on the paper On Designing and Deploying Internet-Scale Services where you can read a bunch of important advices on how to develop and administrate a big online platform and many more ) Also if you have interest you can look at the other participants in the conference Lisa 2007 .

Update: A Service-Oriented Data Grid: Beyond Storage Virtualization, at USENIX LISA07

Another interesting post following the database topic I found in Labnotes blog is

Read Consistency: Dumb Databases, Smart Services – the article is a bit longer but there are good points reminding us what is important in DB world.

The Bigtable behind Google

Google are developing their own distributed storage system which has really interesting structure.It is described in the paper:

Abstract

“Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this paper we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.”

Bigtable: A Distributed Storage System for Structured Data and Google presentation at University of Washington:

Update: Thanks to a friend I get to know the MapReduce model of Google for simplified data processing on large clusters.

Abstract

“MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google’s clusters every day, processing a total of more than twenty petabytes of data per day. “

MapReduce: simplified data processing on large clusters or here

Update: Overview of Google system architecture from Seattle Conference on Scalability 2007.

System Abstractions for Handling Large Datasets

Jeff Dean, Google, Inc. :

Update (April 2008 presentation & slides): Behind The Scenes of Google Scalability