19 aprile 2013 - Ore 11:00
Sala conferenze - DEIB
Dipartimento di Elettronica, Informazione e Bioingegneria
Politecnico di Milano
Via Ponzio, 34 - Milano
Lecture by Li Zhang, IBM
MapReduce is becoming a dominant paradigm for big data analytics. Schedulers are critical in enhancing the performance of MapReduce/Hadoop in presence of multiple jobs with different characteristics and performance goals. Though current schedulers for Hadoop are quite successful, they still have room for improvement: map tasks and reduce tasks are not jointly optimized, albeit there is a strong dependence between them. This can cause job starvation and unfavorable data locality. In this talk, we present the design of a resource-aware scheduler for Hadoop. It couples the progresses of MapTasks and ReduceTasks, utilizing Wait Scheduling for ReduceTasks and Random Peeking Scheduling for MapTasks to jointly optimize the task placement. This mitigates the starvation problem and improves the overall data locality. Our extensive experiments demonstrate significant improvements in job response times.
Contatti:
Danilo Ardagna