Editor’s Note: Vaibhav Nivargi is the founder and chief architect of ClearStory Data, a data analytics service provider.
This week the fast-growing Apache Spark community is gathering in New York City to celebrate and collaborate on one of the most popular open source projects today.
Launched in U.C. Berkeley’s AMPLab in 2009, Apache Spark has begun to catch on like wildfire during the last year and a half. Spark had more than 465 contributors in 2014, making it the most active project in the Apache Software Foundation and among big data open source projects globally.
Early on, we bet on the cluster-computing platform ourselves, rather than building our own software from scratch.
Its in-memory, parallel processing power runs programs 100X faster than Hadoop MapReduce in memory and 10X faster on disk. This allows dozens of data sources to be blended and harmonized at once.
View original post 625 more words