
Design Assumptions
The design of Google File System (GFS) was driven by key obserations of their application workloads and technological environment. It has serveral assumptions:
- Node failure is the norm. The system is built from many inexpensive commodity components that often fail. It must constantly monitor itself and detect, tolerate, and recover promptly from component failures on a routine basis.
- The system stores a modest number of large files. We expect a few million files, each typically 100 MB or larger in size. Multi-GB files are the common case and should be managed efficiently. Small files must be supported, but we need not optimize for them.
- The workloads primarily consist: large streaming reads, small random reads and large, sequential appending writes.
- The system must efficiently implement well-defined semantics for multiple clients that concurrently append to the same file. Our files are often used as producer-consumer queues or for many-way merging.
- High sustained bandwidth is more important than low latency.
Architechture
A GFS cluster consistes of a single master and multiple chunkservers and is accessed by multiple clients.

Files are divided into fixed-size chunks. Each chunk is identified by a unique 64 bit chunk handle assigned by the master at the time of chunk creation. The large chunck size (64MB) reduce clients’ needs to interact with the master, reduce network overhead by keeping TCP connection and reduce the size of metadata stored on the master. Chunkservers store chunks on local disks as Linux files and read or write chunk data specified by a chunk handle and byte range. For reliability, each chunk is replicated on multiple chunkservers (default: 3 replicas).
The master maintains all file system metadata. This includes the namespace, access control information, the mapping from files to chunks, and the current locations of chunks. It also controls system-wide activities such as chunk lease management, garbage collection of orphaned chunks, and chunk migration between chunkservers. The master periodically communicates with each chunkserver in HeartBeat messages to give it instructions and collect its state.
Clients interact with the master for metadata operations, but all data-bearing communication goes directly to the chunkservers to avoid master being a bottleneck. Neither the client nor the chunkserver caches file data. Clients do cache metadata, however.
—Metadata
The master stores 3 major types of metadata:
- the file and chunk namespace
- the mapping from files to chunks
- the locations of each chunk’s replicas
All metadata is kept in the master’s memory. They first two types are also kept persistent by logging mutations to an operation log stored on the master’s local disk and replicated on remote machines.
Using a log allows us to update the master state simply, reliably, and without risking inconsistencies in the event of a master crash.The operation log also serves as a logical time line that defines the order of concurrent operations. The master does not store chunk location information persistently. Instead, it asks each chunkserver about its chunks at master startup and whenever a chunkserver joins the cluster. The master can keep itself up-to-date thereafter with regular HeartBeat messages.
—Consistency Model: relaxed consistency
First, file namespace mutations are handled exclusively by the master. Therefore, it guarantees atomicity and correctness.
As for mutation in file region, the state after a data mutation depends on the type of mutation.




近期评论