Kendrick Boyd: In-Flight Data Management for Distributed Storage Systems
Student's Name:
Kendrick Boyd
Advisor's Name:
Scott Brandt
Home University:
Lawrence University
Year:
2006
|
I worked on a small extension of Ceph called in-flight data management which would allow clients to not only read and write directly to the storage devices but also to other clients' caches. Currently, file access latencies are much higher when a file is being shared by multiple clients. This is because when a file is already opened by a client which is modifying the data, a second client must wait to open the file until the first client closes the file and transmits the changes back to the storage device. However, this “in-flight” data would be available directly from the first client's cache, so another client could immediately read the updated data from the cache instead of waiting to read from the storage device.
During this project, I identified the design space for several variables about the details of how in-flight data management could be implemented. These variables range from whether clients should be able to write to other client caches to how to preserve coherence and transfer modifications to already existing caches of the modified data. Eventually my advisors and I selected several design requirements for a specific design and I selected the options which most efficiently fulfilled those requirements.
Although we have not implemented and tested in-flight data management in Ceph, a performance gain is expected for simulations on high performance clusters. Further, allowing clients to read and write directly to other client caches begins to move the storage system into the cluster and away from special storage devices which could eventually lead to the storage for a cluster existing completely within the clients without any need for dedicated storage machines.
| Attachment | Size |
|---|---|
| Kendrick_Boyd.jpg | 89.44 KB |
| boyd.pdf | 149.26 KB |
| boyd_diagram.pdf | 124.11 KB |
| boyd_report.pdf | 57.44 KB |



