Professional Joomla Templates Free

λFlow: Automatic Pushdown of Dataflow Operators Close to the Data

Research Area: Cloud Computing Year: 2019
Type of Publication: In Proceedings
Authors:
Book title: CCGrid'19
Pages: 112-121
BibTex:
Abstract:
Modern data analytics infrastructures are composed of physically disaggregated compute and storage clusters. Thus, dataflow analytics engines, such as Apache Spark or Flink, are left with no choice but to transfer datasets to the compute cluster prior to their actual processing. For large data volumes, this becomes problematic, since it involves massive data transfers that exhaust network bandwidth, that waste compute cluster memory, and that become a performance barrier. To overcome this problem, we present λFlow: a framework for automatically pushing dataflow operators (e.g., map, flatMap, filter, etc.) down onto the storage layer. The novelty of λFlow is that it manages the pushdown granularity at the operator level, which makes it a unique problem. To wit, it requires addressing several challenges, such as how to encapsulate dataflow operators and execute them on the storage cluster, and how to keep track of dependencies such that operators can be pushed down safely~onto the storage layer. Our evaluation reports significant reductions~in resource usage for a large variety of IO-bound jobs. For instance, λFlow was able to reduce both network bandwidth and memory requirements by 90\% in Spark. Our Flink experiments also prove the extensibility of λFlow to other engines.
Joomla Templates
Joomla Templates
Joomla Templates
Joomla Templates
Joomla Templates