Apache Launched Apache Arrow ‘open-source project’ to Receives Big Data

Now Apache project defining the influence of Big Data, with its new Project ‘Apache Arrow’ to shape the landscape in the field of Big Data further.

The Apache Software Foundation launched Arrow, a top-level project, related Apache Drill, project based on code with improvements of more than 100x on analytic workloads. Arrow is designed to provide a high-performance data layer for columnar in-memory analytics, across disparate systems.

According to the Apache Software Foundation, “Arrow enables multi-system workloads by eliminating cross-system communication overhead.”

Code committers to the project include developers from other Apache big-data projects such as Calcite, Cassandra, Drill, Hadoop, HBase, Impala, Kudu, Parquet, Phoenix, Spark and Storm.

According to Jacques Nadeau, vice president of new project and Apache Drill, “The open-source community has joined forces on Apache Arrow, We anticipate the majority of the world’s data will be processed through Arrow within the next few years.”

“In many workloads, between 70 percent and 80 percent of CPU cycles are spent serializing and deserializing data. Arrow alleviates that burden by enabling data to be shared among systems and processed with no serialization, deserialization or memory copies.” Says Foundation

According to Ted Dunning, vice president of the Apache Incubator and member of Apache Arrow Project Management Committee, “An industry-standard columnar in-memory data layer enables users to combine multiple systems, applications and programming languages in a single workload without the usual overhead.”

Arrow supports complex data with dynamic schemas in addition to traditional relational data. Like: it can handle JSON data, which is commonly used in Internet-of-Things (IoT) workloads, modern applications and log files.

Apache Arrow software is available under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project.

Quick Links