- Pre-processing tasks (2 marks for scaling and 5 marks for outliers detection/removal)
- Determine the number of cluster centres by showing all necessary steps/methods via “automated” tools (1 mark for each one of these “automated” tools)
- K-means analysis for the chosen k (all attributes used) and show all requested outputs
- Show the silhouette plot (2 marks) and provide related discussion on this output, following this Kmeans attempt (2 marks)
- Apply a PCA for this vehicle dataset and show all related R-outputs (2 marks). Create a new dataset with those PCs with a cumulative score at least > 92%, as attributes and provide a discussion for your choice (2 marks).
- Determine the number of cluster centres by showing all necessary steps/methods via “automated” tools (1 mark for each one of these “automated” tools)
- K-means analysis for this “pca”-based dataset for the chosen k and show all requested outputs
- Show the silhouette plot (2 marks) and provide related discussion on this output, following this “pca-based” Kmeans attempt (2 marks)
- Implement and show the Calinski-Harabasz index. Provide, a brief discussion on the outcome of this index.
- Brief discussion of the various methods used for defining the input vector in electricity load forecasting problems
- Evidence of various adopted input vectors and the related input/output matrices for both “AR” (4 marks) and “NARX” (3 marks) based approaches
- Evidence of correct normalisation/de-normalisation (3 marks) and brief discussion of its necessity for MLP networks (3 marks)
- Implement a number of MLPs for the “AR” approach, using various internal structures (layers/nodes)/input variables/network parameters and show in the comparison table, their performances (based on testing data) through the provided stat. indices. (4 marks for structures with different input vectors, 8 marks for different internal NN structures).
- Discussion of the meaning of these four stat. indices (2 marks for each index)
- Creation of the comparison matrix for the “AR” case
- Discuss the issue of “efficiency” with your two best NN structures (for the “AR” approach)
- Implement a number of MLPs for the “NARX” approach, following the same procedure as the previous “AR” case. Provide a brief discussion. (2 marks for structures with different input vectors, 4 marks for different internal NN structures, 2 marks for the comparison table and 2 marks for the discussion).
- Provide your best results both graphically (your prediction output vs. desired output) and via performance indices (2 marks for the graphical display and 2 marks for showing the requested statistical indices)