Michael McThrow: CLIP: A Compact, Load-balancing Index Placement Function

Student's Name: 
Michael McThrow
None
Advisor's Name: 
Carlos Maltzahn and Scott Brandt
Home University: 
Cal Poly San Luis Obispo
AttachmentSize
Microsoft Office document icon mcthrow-nugget.doc21 KB
PDF icon mcthrow-report.pdf276.87 KB
PDF icon mcthrow-poster.pdf285.25 KB
PDF icon mmcthrow_diagram.pdf14.04 KB
Year: 
2007

Michael McThrow will be a third-year computer science student at the California Polytechnic State University in San Luis Obispo, CA in Fall 2007. Michael worked with professors Carlos Maltzahn and Scott Brandt this summer at the UC Santa Cruz Storage Systems Research Center (SSRC), trying to help solve the problem of making file searching in large-scale file systems just as easy as searching for documents on the World Wide Web. Part of what makes web searching so efficient is the use of inverted indices to store terms in web pages. Search engines don't "surf the web" every time they receive a search query; they search the index instead. The SSRC already developed a petabyte-scale distributed file system named Ceph. Michael and his advisors decided to use an inverted index for searching in Ceph. But how does the index get stored without negatively impacting the performance of Ceph?

They proposed CLIP, a compact load-balancing index placement function. CLIP uses a hash function named CRUSH (developed as part of Ceph) that places data to nodes (called OSDs) within Ceph. CLIP creates a power function out of the distribution of terms, and uses the power function to map terms (given their ranking in the distribution) to the amount of OSDs needed to properly load-balance the update stream. Michael was very intrigued by this research, and he enjoyed his time at SURF-IT. His experiences this summer helped encourage him to earn a PhD in the future, and he looks forward to a career in research.