This repository contains a simple implementations of z-score data standardization and k-means clustring algorithms created using CUDA framework and C++ language. Created program runs both parallel and single-threaded versions of aforementioned algorithms to compare the performance.
To build the project one should entry the project directory and execute following commands:
mkdir build
cd build
cmake ..
make
In addition, one can decide to build the project using either Release
of Debug
configuration. To achieve this, one should add -DCMAKE_BUILD_TYPE=Release
or -DCMAKE_BUILD_TYPE=Debug
option. To run this project, one should type:
./CUDA_Data_Mining_cpp <number_of_clusters> <number_of_iterations> <data_file> <columns_to_omit>
There are four command line arguments this program takes, but the last one could be optional, depending on used dataset. They have a following meaning:
<number_of_clusters>
- number of clusters and initial centroids program should create<number_of_iterations>
- number of iteration of the K-Means algorithm that should be performed<data_file>
- path to file with data<colums_to_omit>
- indexes of dataset columns that should not be loaded from file (useful when the dataset contains non-numerical data, that can not be handled by this implementation)
For example, to create 3 clusters in 100 iteration using the iris dataset, one should type:
./CUDA_Data_Mining_cpp 3 100 ../data/iris.data 4
This repository contains following directories and files:
common
- contains C++ code responsible for common functionalities, for example: parsing command line arguments, loading dataset from file, handling CUDA errors and choosing initial centroidscpu
- contains C++ code with single-threaded implementation of data standardization and k-means algorithmsgpu
- contains C++ code with CUDA implementation of data standardization and k-means algorithmspython_scripts
- contains utility scripts for generating test data and plotting resultsdata
- contains example dataset (Iris) used for testingmain.cu
file - application entry pointCMakeLists.txt
- CMake file that I used to build and test the project