-
Notifications
You must be signed in to change notification settings - Fork 385
Profiling Spark Using YourKit
This page has been moved to the Apache Spark confluence wiki: https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage
Here are instructions on profiling Spark applications using YourKit Java Profiler.
-
After logging into the master node, download the YourKit Java Profiler for Linux from the YourKit downloads page (at the time of writing, the latest version is
yjp-12.0.5-linux.tar.bz2
; you will need to substitute different paths if using a newer version). This file is pretty big (~100 MB) and YourKit downloads site is somewhat slow, so you may consider mirroring this file or including it on a custom AMI. -
Untar this file somewhere (in
/root
in our case):tar xvjf yjp-12.0.5-linux.tar.bz2
-
Copy the expanded YourKit files to each node using
copy-dir
:~/spark-ec2/copy-dir /root/yjp-12.0.5
-
Configure the Spark JVMs to use the YourKit profiling agent by editing
~/spark/conf/spark-env.sh
and adding the linesSPARK_DAEMON_JAVA_OPTS+=" -agentpath:/root/yjp-12.0.5/bin/linux-x86-64/libyjpagent.so=sampling" export SPARK_DAEMON_JAVA_OPTS SPARK_JAVA_OPTS+=" -agentpath:/root/yjp-12.0.5/bin/linux-x86-64/libyjpagent.so=sampling" export SPARK_JAVA_OPTS
-
Copy the updated configuration to each node:
~/spark-ec2/copy-dir ~/spark/conf/spark-env.sh
-
Restart your Spark cluster:
~/spark/bin/stop-all.sh ~/spark/bin/start-all.sh
-
By default, the YourKit profiler agents use ports 10001-10010. To connect the YourKit desktop application to the remote profiler agents, you'll have to open these ports in the cluster's EC2 security groups.
To do this, sign into the AWS Management Console. Go to the EC2 section and select
Security Groups
from theNetwork & Security
section on the left side of the page. Find the security groups corresponding to your cluster; if you launched a cluster namedtest_cluster
, then you will want to modify the settings for thetest_cluster-slaves
andtest_cluster-master
security groups. For each group, select it from the list, click theInbound
tab, and create a newCustom TCP Rule
opening the port range10001-10010
. Finally, clickApply Rule Changes
. Make sure to do this for both security groups.Note: by default,
spark-ec2
re-uses security groups: if you stop this cluster and launch another cluster with the same name, your security group settings will be re-used. -
Launch the YourKit profiler on your desktop.
-
Select "Connect to remote application..." from the welcome screen and enter the the address of your Spark master or worker machine, e.g.
ec2-*-*-*-*.compute-1.amazonaws.com
-
YourKit should now be connected to the remote profiling agent. It may take a few moments for profiling information to appear.
Please see the full YourKit documentation for the full list of profiler agent startup options.