MapReduce was fun and pretty useful while it lasted, but it looks like Spark is set to take the reins as the primary processing framework for new Hadoop workloads. The technology took a meaningful, if not huge, step toward that end on Thursday when the Apache Software Foundation announced that Spark is now a top-level project.
Spark has already garnered a large and vocal community of users and contributors because it’s faster than MapReduce (in memory and on disk) and easier to program. This means it’s well suited for next-generation big data applications that might require lower-latency queries, real-time processing or iterative computations on the same data (i.e., machine learning). Spark’s creators from the University of California, Berkeley, have created a company called Databricks to commercialize the technology.
Spark is technically a standalone project, but it was always designed to work with the Hadoop Distributed File System…
View original post 208 more words