Parallel and sequential implementation of cluster finding algorithm

based on KMeans, parallel is implemented with CUDA (not yet implemented) and OpenMP, communication is done with MPI (mpich1.4.1p)

Installation

(if you use visual studio) link the MPICH library follow those instructions:

http://nick-goodman.blogspot.co.il/2012/02/using-mpich-from-ms-visual-studio-2010.html
http://www.screencast.com/t/YwAEXhPlMT (make sure you are using x64 as platform)

The problem world

impleneted for points [(x,y) plane], each point is moving in circle around its center

Given input is:

N - number of points (a, b, radius)
K - number of clusters wanted
delta_t - the time step between calculation of centers of the clusters
T - time interval
LIMIT - the max number of iterations fo KMeans alg.

input is delivered from file looks like:

line 1: N K delta_t T LIMIT

line 2 - line N: point_index x y radius

Input

provided in this repository in file: kmeans10.txt (300k points)

line 1: N K delta_t T LIMIT

line 2 - line N: point_index x y radius

Output

Output is a file with the delta_t where the clusters centers were the closest to each other, format example:

d = 14.5

delta_t = 120.3

centers of cluster:

12.4 55.6

2.4 90.15

66.0 45.2

Implementation

Kmeans algorithm is implemented in its naive way delta_t / T number of times. Pseudo code:

for each delta_t of T do
	for n to LIMIT do:
		for each cluster out of k clusters do
			number_of_points = 0
			set_initial_center_to_be_the_next_point_of_the_points_array()
			set_current_centers_array() // for later comparison if cluster changed or not

		for each current_point out of points array do
			min_distance = get_distance_from_cluster(0)
			min_cluster = cluster_0
			for clusters c 1 to N do:
				new_distance = get_distance_from_cluster(c)
				if new_distance < min_distance
					min_distance = new_distance
					min_cluster = c
			add_point_to_cluster(current_point, min_cluster)

		for each cluster c out of N do:
			cluster_recenter(c) // goes along all of local clusters points and calculate center

		not_changed_clusters = 0 
		foreach j = 0 to K (number of clusters) do
			If centers are close in approximation
				not_changed_clusters++

		If not_changed_clusters = K
			return

Complexity

number_of_increments = delta_t / T

num_points = n

num_clusters = k

limit = LIMIT

O(number_of_increments * num_points * num_clusters * limit )

Adding parallelism with MPI library

The idea is to distribute the hard work which is the KMeans algorithm itself between processes. Each process (can be hosted on different CPU core or different machine, simulated on windows OS with winmpiexec.exe application) will be in charge of a chunk of work of the whole problem scope. The MASTER process (id=0) will get the input data and will distribute it to all other processes (broadcast). Each process based on simple calculation will get number of increments (chunks of delta_t of the time T) and perform the KMeans algorithm on his chunks. After all processes finished, MASTER gathers the results which are represented in a special information data type to include only relevant data for the results, and makes the calculation to find the closest clusters center as described in the scope of problem section. MASTER is the one who writes results into a final results file.

Basically what MPI parallelism improves in the complexity is the number_of_increments, which is basically in the context of the problem scope could be reduced to 1. MPI initialization and data transferring constant is added but of course it is a constant so when scale up we can ignore it. And the complexity turns into:

O((num_points * num_clusters * limit) + MPI_INIT_CONSTANT)

Tests

The test were made on a machine with 4 cores and the following specifications:

Tests with mpi initialization

From the results above we can see that mpi parallelism is helping us to gain much more computing power when number of processes grow respectively with the number of cores. We can see though, when number of processes used to calculate is equal or bigger than number of core on your machine, results tend to be inconsistent and more random, that happens because the core the OS using in order manage the OS I/O depends on the I/O itself, means when there are more interrupts in the background, more context switches, resulting more time to compute.

Adding parallelism with OPENMP

To get even more impressive results I've added OPENMP library parallelism to the calculation of the KMeans algorithm which took most of the process power in the program. The limitation of the OPENMP library is that you need to find places where you can use it, that means it fits only to pieces of code that are loosely coupled and have no effect on each other that can lead you to a deadlock. That means that I couldn't add it to pieces of code where I make allocation of memory where the next operation depends on the previous one. For those places I could add parallelism with CUDA, which is yet to be implemented and is on the to-do list. The place where I could add it is on the step where the algorithm calculates the centers of the clusters, it is great for running on for loops and was easy to implement. This piece of code runs each time it invoked on the whole set of moving points attached to the current cluster, and calculates the center of it by simple avg function, although it seems as simple function it could possibly run on the whole N points and so parallelism is really necessary there. We can also see that adding the support for OPENMP library adds some more complexity in the initialization of the program and hence adds some more time for the algorithm when running on low number of processes, as we add more processes we can see that the results getting better and better so the addition of the library on the long run improves the results but not as we would expect. In my opinion I should look for more ways to parallelize the code, make it more uncoupled and then the effect will be much better:

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
MovingPointsCluster		MovingPointsCluster
.gitattributes		.gitattributes
.gitignore		.gitignore
MovingPointsCluster.sln		MovingPointsCluster.sln
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parallel and sequential implementation of cluster finding algorithm

Installation

The problem world

Input

Output

Implementation

Complexity

Adding parallelism with MPI library

Tests

The test were made on a machine with 4 cores and the following specifications:

Tests with mpi initialization

Adding parallelism with OPENMP

About

Releases

Packages

Languages

dimagimburg/ParallelKmeans

Folders and files

Latest commit

History

Repository files navigation

Parallel and sequential implementation of cluster finding algorithm

Installation

The problem world

Input

Output

Implementation

Complexity

Adding parallelism with MPI library

Tests

The test were made on a machine with 4 cores and the following specifications:

Tests with mpi initialization

Adding parallelism with OPENMP

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages