@Copyright 2013-2017 Inidana University Apache License 2.0
@Author: Bingjing Zhang
Harp is a framework for machine learning applications.
- A Hadoop plugin. It currently supports hadoop 2.6.0 ~ 2.7.3 version.
- Hierarchical data abstraction (arrays/objects, partitions/tables)
- Pool based memory management
- Collective + event-driven programming model (distributed computing)
- Dynamic Scheduler + Static Scheduler (multi-threading)
1. Install Maven by following the maven official instruction
mvn clean package
cp harp-project/target/harp-project-1.0-SNAPSHOT.jar $HADOOP_HOME/share/hadoop/mapreduce/
cp third_party/fastutil-7.0.13.jar $HADOOP_HOME/share/hadoop/mapreduce/
6. Edit mapred-site.xml in $HADOOP_HOME/etc/hadoop, add java opts settings for map-collective tasks. For example:
<property>
<name>mapreduce.map.collective.memory.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.map.collective.java.opts</name>
<value>-Xmx256m -Xms256m</value>
</property>
jobConf.set("mapreduce.framework.name", "map-collective");
cp harp-app/target/harp-app-1.0-SNAPSHOT.jar $HADOOP_HOME
cd $HADOOP_HOME
sbin/start-dfs.sh
sbin/start-yarn.sh
hadoop jar harp-app-1.0-SNAPSHOT.jar edu.iu.kmeans.regroupallgather.KMeansLauncher <num of points> <num of centroids> <vector size> <num of point files per worker> <number of map tasks> <num threads> <number of iteration> <work dir> <local points dir>
hadoop jar harp-app-1.0-SNAPSHOT.jar edu.iu.kmeans.regroupallgather.KMeansLauncher 1000 10 100 5 2 2 10 /kmeans /tmp/kmeans