ML_Coursework

1st Objective (partitioning clustering)

Pre-processing tasks (2 marks for scaling and 5 marks for outliers detection/removal)
Determine the number of cluster centres by showing all necessary steps/methods via “automated” tools (1 mark for each one of these “automated” tools)
K-means analysis for the chosen k (all attributes used) and show all requested outputs
Show the silhouette plot (2 marks) and provide related discussion on this output, following this Kmeans attempt (2 marks)
Apply a PCA for this vehicle dataset and show all related R-outputs (2 marks). Create a new dataset with those PCs with a cumulative score at least > 92%, as attributes and provide a discussion for your choice (2 marks).
Determine the number of cluster centres by showing all necessary steps/methods via “automated” tools (1 mark for each one of these “automated” tools)
K-means analysis for this “pca”-based dataset for the chosen k and show all requested outputs
Show the silhouette plot (2 marks) and provide related discussion on this output, following this “pca-based” Kmeans attempt (2 marks)
Implement and show the Calinski-Harabasz index. Provide, a brief discussion on the outcome of this index.

Brief discussion of the various methods used for defining the input vector in electricity load forecasting problems
Evidence of various adopted input vectors and the related input/output matrices for both “AR” (4 marks) and “NARX” (3 marks) based approaches
Evidence of correct normalisation/de-normalisation (3 marks) and brief discussion of its necessity for MLP networks (3 marks)
Implement a number of MLPs for the “AR” approach, using various internal structures (layers/nodes)/input variables/network parameters and show in the comparison table, their performances (based on testing data) through the provided stat. indices. (4 marks for structures with different input vectors, 8 marks for different internal NN structures).
Discussion of the meaning of these four stat. indices (2 marks for each index)
Creation of the comparison matrix for the “AR” case
Discuss the issue of “efficiency” with your two best NN structures (for the “AR” approach)
Implement a number of MLPs for the “NARX” approach, following the same procedure as the previous “AR” case. Provide a brief discussion. (2 marks for structures with different input vectors, 4 marks for different internal NN structures, 2 marks for the comparison table and 2 marks for the discussion).
Provide your best results both graphically (your prediction output vs. desired output) and via performance indices (2 marks for the graphical display and 2 marks for showing the requested statistical indices)

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Data		Data
Specification and Report		Specification and Report
src		src
.gitignore		.gitignore
ML_Coursework.Rproj		ML_Coursework.Rproj
README.md		README.md