Bureaucrats, cc_docs_admin, cc_staff
2,318
edits
(Importing a new version from external source) |
No edit summary |
||
Line 1: | Line 1: | ||
Apache Spark | Apache Spark is an open source framework for distributed computation initially developed by the AMPLab at Berkeley University and is now a project sponsored by the Apache foundation. Unlike the MapReduce algorithm implemented by Hadoop that uses disk storage, Spark makes use of primitives which are stored in memory, thereby achieving up to 100x the performance of Hadoop in certain applications. Loading data in memory allows them to be queried frequently, making Spark a framework especially appropriate for automated learning and interactive data analysis. |