-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Spark is a MapReduce-like cluster computing framework designed to support low-latency iterative jobs and interactive use from an interpreter. It is written in Scala, a high-level language for the JVM, and exposes a clean language-integrated syntax that makes it easy to write parallel jobs. Spark runs on top of the Apache Mesos cluster manager.
Get Spark by checking out the master branch of the Git repository, using git clone git://github.com/mesos/spark.git
.
Spark requires Scala 2.9 or 2.8. In addition, to run Spark on a cluster, you will need to install Mesos, using the steps in Running Spark on Mesos. However, if you just want to run Spark on a single machine (possibly using multiple cores), you do not need Mesos.
To build and run Spark, you will need to have Scala's bin
directory in your PATH
,
or you will need to set the SCALA_HOME
environment variable to point
to where you've installed Scala. Scala must be accessible through one
of these methods on Mesos slave nodes as well as on the master.
Spark uses Simple Build Tool, which is bundled with it.
- Check out the scala-2.8 git branch with:
git checkout -b scala-2.8 --track origin/scala-2.8
in the top-level Spark directory. - Update dependencies and build Spark with:
sbt/sbt update compile
run sbt/sbt compile
in the top-level Spark directory to build Spark.
Spark comes with a number of sample programs in the examples
directory.
To run one of the samples, use ./run <class> <params>
in the top-level Spark directory
(the run
script sets up the appropriate paths and launches that program).
For example, ./run spark.examples.SparkPi
will run a sample program that estimates Pi. Each of the
examples prints usage help if no params are given.
Note that all of the sample programs take a <host>
parameter that is the Mesos master
to connect to. This can be a Mesos master URL, or local
to run locally with one
thread, or local[N]
to run locally with N threads. You should start by using local
for testing.
Finally, Spark can be used interactively from a modified version of the Scala interpreter that you can start through
./spark-shell
. This is a great way to learn Spark.
- Spark Programming Guide
- Configuration
- Running Spark on Amazon EC2 (easy way to launch a Spark cluster on EC2)
- Running Spark on Mesos (to deploy to a cluster)
- Spark Debugger
- Bagel Programming Guide
- Spark Homepage
- Introduction to Spark on Matei Zaharia's website
- Paper describing the programming model
To keep up with Spark development or get help, sign up for the spark-users mailing list.