In our project, we used AWS to start Hadoop cluster for development purposes. However, the application we created can be run on any Hadoop cluster. Please email us if you wish to test our application on AWS as we need to share AWS credentials.
-
Start a Hadoop cluster.
-
Copy the
CZ4123_project-1.0-SNAPSHOT.jar
from local to the Hadoop master computer. -
Copy the
weatherData.csv
from local to the Hadoop master computer. -
Place the
weatherData.csv
in HDFS with the following commands.
hdfs dfs -mkdir -p CZ4123/input
hdfs dfs -put weatherData.csv CZ4123/input
- Run the
CZ4123_project-1.0-SNAPSHOT.jar
file with the following command.
hadoop jar CZ4123_project-1.0-SNAPSHOT.jar
-
Follow the instructions as shown in the UI. The input path of choice 1 should be
CZ4123/input/weatherData.csv
. The outpath path of choice 1 can be left as default. The input and output path of choice 2 - 6 can be left as default. -
The final output file after choice 1 - 6 is executed will be located in
CZ4123/kmean/part-r-00000
in HDFS. This final output file will contain clusters.