1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

If you're new here, you may want to subscribe to my RSS feed or follow me on Twitter. Thanks for visiting!

Google File System (GFS) is a file system developed by Sanjay Ghemawat, Shun-Tak Leung and Urs Holzle for Google to handle rapid growth of the company’s infrastructure and to suit their needs. The company was looking for a way to improve the stability and reliability of their services and the only reasonable way to do that was to build their own file management system as no other system was able to handle massive number of requests over huge amount of servers.

The GFS supports gigantic user population combined with very cheap computer equipment that tends to break down quite often. From the beginning of Google, the company used cheap computers running Linux to maintain the need for storage space for information gathered by web crawlers (and other services, like GMail, etc.). It caused problems with reliability of service and low efficiency. They had to create reliable software to run over unreliable hardware. It was a total change to the economy of the IT companies.

The system was designed to be fully scaled, flexible and be easy to expand. It also doesn’t remove or override data, but rather append it. It is easier and faster way of storing data to add new files instead of updating and deleting old ones. Google system is also planned to resist numerous hardware failures and human error factors. That’s why each file in the system has three copies (default, but for high demand files even more) on separate servers to ensure that if one server goes down, there are two back up servers ready to take the request and fulfill the search process. And all of that is performed in milliseconds.

Files managed by GFS range from hundred of megabytes to several gigabytes as it is better to operate on couple of huge files rather than handle with millions of small sized files. So, to manage efficiently GFS stores data in chunks of about 64 megabytes each – chunks are similar to clusters or sectors in regular file system – the smallest part of data that the system supports.

For example, to store 128 megabytes of data GFS will use 2 chunks. But there are files smaller than 64 megabytes, like 20 megabytes, what to do with them? Fortunately there are so few files small as that, that Google doesn’t bother with those files. Common files consume multiply chunks.

GFS consists of master server and chunk servers. Master server contains metadata, like file names, file location on server and their sizes. When there is a request to certain file, the master server gives directions to the file in proper chunk server and the file is accessed.

Very good and more technical articles about Google File System can be found here:
http://storagemojo.com/?page_id=152
http://www.baselinemag.com/article2/0,1397,1985047,00.asp

http://labs.google.com/papers/gfs-sosp2003.pdf (PDF file)

Sphere: Related Content