Getting Started

In our project, we used AWS to start Hadoop cluster for development purposes. However, the application we created can be run on any Hadoop cluster. Please email us if you wish to test our application on AWS as we need to share AWS credentials.

Start a Hadoop cluster.
Copy the CZ4123_project-1.0-SNAPSHOT.jar from local to the Hadoop master computer.
Copy the weatherData.csv from local to the Hadoop master computer.
Place the weatherData.csv in HDFS with the following commands.

hdfs dfs -mkdir -p CZ4123/input
hdfs dfs -put weatherData.csv CZ4123/input

Run the CZ4123_project-1.0-SNAPSHOT.jar file with the following command.

hadoop jar CZ4123_project-1.0-SNAPSHOT.jar

Follow the instructions as shown in the UI. The input path of choice 1 should be CZ4123/input/weatherData.csv. The outpath path of choice 1 can be left as default. The input and output path of choice 2 - 6 can be left as default.
The final output file after choice 1 - 6 is executed will be located in CZ4123/kmean/part-r-00000 in HDFS. This final output file will contain clusters.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
src/main/java/com/ntu/bdm		src/main/java/com/ntu/bdm
terraform		terraform
.gitignore		.gitignore
CZ4123_project.iml		CZ4123_project.iml
README.md		README.md
data_extraction.py		data_extraction.py
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting Started

About

Releases

Packages

Contributors 3

Languages

hitesh-ag1/CZ4123_project

Folders and files

Latest commit

History

Repository files navigation

Getting Started

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages