Apache Spark

Apache Spark
Original authorMatei Zaharia
DeveloperApache Spark
Initial releaseMay 26, 2014 (2014-05-26)
Stable release
4.0.1 (Scala 2.13) / September 6, 2025 (2025-09-06)
Written inScala
Operating systemWindows, macOS, Linux
Available inScala, Java, SQL, Python, R, C#, F#
TypeData analytics, machine learning algorithms
LicenseApache License 2.0
Websitespark.apache.org
RepositorySpark Repository

Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of California, Berkeley's AMPLab starting in 2009, in 2013, the Spark codebase was donated to the Apache Software Foundation, which has maintained it since.