Nadine Amsel: Load Balancing in File Systems

Student's Name: 
Nadine Amsel
None
Advisor's Name: 
Carlos Maltzahn
Home University: 
Hamilton College
AttachmentSize
Office presentation icon amsel_poster_FINAL.ppt538.5 KB
Microsoft Office document icon Amsel_Report_final-1.doc331.5 KB
Microsoft Office document icon Amsel-nugget.doc64.5 KB
Year: 
2007

Nadine Amsel, a senior Computer Science major at Hamilton College, worked with Dr. Carlos Maltzahn in the Storage Systems Research Center of the University of California, Santa Cruz. Their research investigated distributed search and indexing in petabyte-scale computer file systems. Nadine’s work involved dynamic load balancing, a method used to replicate and distribute data evenly among different storage devices. The goal of this research was to discover how quickly load balancing schemes need to be changed in a large data collection. Large file systems are essential for government agencies and business enterprises with substantial data storage needs. It is important to be able to find stored data quickly and efficiently.

In order to address load balancing, time stamped query traces from AOL searches were taken to determine the magnitude of Object Storage Device (OSD) overload. Each term in a query is stored on exactly one OSD, and the load of an OSD is determined by the number of queries an OSD receives in each minute (from a total of 3 months covered). The query traces were analyzed using different numbers of OSDs and overload thresholds: 128, 1K, and 64K numbers of nodes (OSDs) and 10, 30, and 50 threshold values.

At the completion of this project, Nadine reached the conclusion that query workload leads to overload even if distributed over a large number of nodes. One overloaded node can slow down the whole system. Therefore workloads cannot be effectively improved by only increasing the number of OSDs (see graph). Load-balancing mechanisms need to adapt on a minute-by-minute basis and any mechanism that takes longer than an hour to do so will not be able to handle 99% of the workload changes.

Future work involves changing the evaluation program to account for the occasional one-minute breaks where the load happens to dip under the threshold. To account for this, the overload period would only be declared complete if there was no overload for at least two minutes, instead of one. This could show that the majority of overload periods are actually much longer, and lead to different patterns of overload.