Kendrick Boyd: In-Flight Data Management for Distributed Storage Systems

Student's Name: 
Kendrick Boyd
None
Advisor's Name: 
Scott Brandt
Home University: 
Lawrence University
AttachmentSize
Image icon Kendrick_Boyd.jpg89.44 KB
PDF icon boyd.pdf149.26 KB
PDF icon boyd_diagram.pdf124.11 KB
PDF icon boyd_report.pdf57.44 KB
Year: 
2006

During the summer of 2006 I worked on distributed storage systems with the Storage Systems Research Center (SSRC) at University of California Santa Cruz. Simulations, especially in physics, are frequently computed on high performance clusters of thousands of computers. With large calculations on numerous computers storing the resulting data becomes a considerable obstacle. To address large-scale storage, SSRC is developing a distributed storage system called Ceph whose goal is to handle multiple petabytes of storage with billions of files ranging in size from bytes to terabytes while allowing thousands of clients to interact with the storage system.

I worked on a small extension of Ceph called in-flight data management which would allow clients to not only read and write directly to the storage devices but also to other clients' caches. Currently, file access latencies are much higher when a file is being shared by multiple clients. This is because when a file is already opened by a client which is modifying the data, a second client must wait to open the file until the first client closes the file and transmits the changes back to the storage device. However, this “in-flight” data would be available directly from the first client's cache, so another client could immediately read the updated data from the cache instead of waiting to read from the storage device.

During this project, I identified the design space for several variables about the details of how in-flight data management could be implemented. These variables range from whether clients should be able to write to other client caches to how to preserve coherence and transfer modifications to already existing caches of the modified data. Eventually my advisors and I selected several design requirements for a specific design and I selected the options which most efficiently fulfilled those requirements.

Although we have not implemented and tested in-flight data management in Ceph, a performance gain is expected for simulations on high performance clusters. Further, allowing clients to read and write directly to other client caches begins to move the storage system into the cluster and away from special storage devices which could eventually lead to the storage for a cluster existing completely within the clients without any need for dedicated storage machines.