- Incremental scalability: New chunkserver nodes can be
added as storage needs increase; the system automatically
adapts to the new nodes.
- Availability: Replication is used to provide
availability due to chunk server failures. Typically,
files are replicated 3-way.
- Per file degree of replication: The degree of
replication is configurable on a per file basis, with a
max. limit of 64.
- Re-replication: Whenever the degree of replication
for a file drops below the configured amount (such as,
due to an extended chunkserver outage), the metaserver
forces the block to be re-replicated on the remaining
chunk servers. Re-replication is done in the background
without overwhelming the system.
- Rack-aware data placement: The chunk placement algorithm is rack-aware. Whereever possible, it places chunks on different racks.
- Re-balancing: Periodically, the meta-server may
rebalance the chunks amongst chunkservers. This is done
to help with balancing disk space utilization amongst
- Data integrity: To handle disk corruptions to data
blocks, data blocks are checksummed. Checksum
verification is done on each read; whenever there is a
checksum mismatch, re-replication is used to recover the
- File writes: The system follows the standard model.
When an application creates a file, the filename becomes
part of the filesystem namespace. For performance, writes
are cached at the CloudStore client library. Periodically, the
cache is flushed and data is pushed out to the
chunkservers. Also, applications can force data to be
flushed to the chunkservers. In either case, once data is
flushed to the server, it is available for reading.
- Leases: CloudStore client library uses caching to improve
performance. Leases are used to support cache
- Chunk versioning: Versioning is used to detect stale
- Client side fail-over: The client library is
resilient to chunksever failures. During reads, if the
client library determines that the chunkserver it is
communicating with is unreachable, the client library
will fail-over to another chunkserver and continue the
read. This fail-over is transparent to the
- Language support: CloudStore client library can be accessed
from C++, Java, and Python.
- FUSE support on Linux: By mounting CloudStore via FUSE, this
allows existing linux utilities (such as, ls) to
interface with CloudStore.
- Tools: A shell binary is included in the set of
tools. This allows users to navigate the filesystem tree
using utilities such as, cp, ls, mkdir, rmdir, rm, mv.
Tools to also monitor the chunk/meta-servers are
- Deploy scripts: To simplify launching CloudStore servers, a
set of scripts to (1) install CloudStore binaries on a set of
nodes, (2) start/stop CloudStore servers on a set of nodes are
- Job placement support: The CloudStore client library exports
an API to determine the location of a byte range of a
file. Job placement systems built on top of CloudStore can
leverage this API to schedule jobs appropriately.
- Local read optimization: When applications are run on
the same nodes as chunkservers, the CloudStore client library
contains an optimization for reading data locally. That
is, if the chunk is stored on the same node as the one on
which the application is executing, data is read from the