Task 1:
The file cluster1.csv contains 500 data points. Each data point has two features. Python is used to apply k-means partition on the data set into three clusters (The cluster count three is provided in the problem statement of the assignment for academic course work). The resulting cluster is plotted. K-means algorithm is run with different centeroid seeds and the clustering performance is evaluated visually.
Reason as to why there are five clusters instead of three in the data set.
Task 2:
The file cluster2.csv contains 1000 data points. Each data point has two features. For this data set the number of clusters it can be segregated into is not specified. The model such as K-means or GMM that works best with it is also not specified. Hence evaluation of each model while identifying the optimal clustering count is what the code tries to implement.