csv and report

cadia-lvl · Nov 13, 2023 · dc1ffbb · dc1ffbb
1 parent 93002c4
commit dc1ffbb
Show file tree

Hide file tree

Showing 3 changed files with 252 additions and 64 deletions.
diff --git a/src/ProgressReport.ipynb b/src/ProgressReport.ipynb
@@ -30,6 +30,8 @@
     "\n",
     "\\textit{Supervised by} Stefán Ólafsson and Hrafn Loftsson\n",
     "\n",
+    "\\textit{Examined by} Sigurjón Ingi Garðarsson\n",
+    "\n",
     "\\hfill\n",
     "\n",
     "\n",
@@ -51,13 +53,119 @@
     "\n",
     "This report aims to provide a summary of the work we did when doing the research and development of the Sentiment Analysis on Icelandic text. Given the research-oriented nature of our project, as opposed to corporate work, we opted for a Kanban approach rather than Scrum. We started well in advance, with initial preparations and research activities commencing in late July to early August. This timeframe allowed us to familiarize ourselves with the intricacies of machine learning, particularly since only one team member possessed prior experience in Machine Learning and Deep Learning.\n",
     "\n",
-    "We've allocated a collective 40 hours per week for all team members, distributed across a span of 20 weeks, aiming to complete this project within this timeframe. This amounts to a total of 800 hours dedicated to the project. We expect the burndown to go under the planned line in October since we are picking up the pace but still keeping the 40 hours as a median.\n",
-    "\n",
-    "We created tasks in JIRA and set goals before each status meeting, details of the status meeting is shown in the document. We also created a Github repository for the project and used it to store all the code and documentation. \n",
+    "We've allocated a collective 40 hours per week for all team members, distributed across a span of 20 weeks, aiming to complete this project within this timeframe. This amounts to a total of 800 hours dedicated to the project. \n",
+    "We used Discord for communication and had a three weekly standup meetings that were mandatory, but sometimes we added more meetings when needed. GitHub was the repository we used to store the Jupyter Notebooks and Python scripts we used and the CSV dataset as well. JIRA was used to create tasks and plan the work needed. We set specific goals before each status meeting, which were held in October, November and December and this document will go into detail summarizing the work we did.\n",
     "\n",
     "# Project Overview\n",
     "\n",
-    "The project is a research project that aims to evaluate the performance of Icelandic sentiment analysis models trained on translated data. The project is supervised by Stefán Ólafsson and Hrafn Loftsson. Output of the project is expected to be a research paper, dataset, and a trained model."
+    "The project is a research project that aims to evaluate the performance of Icelandic sentiment analysis models trained on translated data using Google Translate and Miðeind Translate. The project is supervised by Stefán Ólafsson and Hrafn Loftsson. Output of the project is expected to be a research paper, dataset, and trained models."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "97821be4",
+   "metadata": {},
+   "source": [
+    "# Team Members\n",
+    "\n",
+    "- Ólafur Aron Jóhannsson\n",
+    "\n",
+    "- Birkir\n",
+    "\n",
+    "- Eysteinn"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bcf38db9",
+   "metadata": {},
+   "source": [
+    "# Initial project proposal"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "67baccdc",
+   "metadata": {},
+   "source": [
+    "# Work Plan\n",
+    "\n",
+    "The work plan we opted for was to set specific goals before each status meeting, we amended the goals with regards to the time we had, you can see the initial project proposal in the chapter compared to the amended goals. We also go into more detail of each phase of the work we did in the later chapters. The status meetings and the goals were roughly setup in the following list:\n",
+    "\n",
+    "## Status meeting 1\n",
+    "\n",
+    "- Translate IMDB data using Google and Miðeind into usable CSV datasets\n",
+    "- Pre-process translated data (remove noise, HTML, lemmatize) for the machine learning classifiers\n",
+    "- Train and predict three baseline machine learning classifiers\n",
+    "-- Naive Bayes, Logistic Regression and Support Vector Classifier\n",
+    "- Train on English dataset as well as Miðeind and Google for sanity checking\n",
+    "- Start writing the final report and the results\n",
+    "- Try out the models on other dataset from (OfficialStation hand-written Icelandic data in this case)\n",
+    "\n",
+    "## Status meeting 2\n",
+    "\n",
+    "- Pre-process translated data (remove noise, HTML, lemmatize) for the neural network models\n",
+    "- Train and predict three neural network models\n",
+    "-- RoBERTa for English and IceBERT/Electra for translated Google/Miðeind dataset\n",
+    "- Get dataset from OfficialStation and Kvikmyndaryni and run the models on them\n",
+    "\n",
+    "## Status meeting 3\n",
+    "\n",
+    "- Get all results into the final report\n",
+    "- Finish the presentation slides\n",
+    "- Finish the progress report"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2bea1b1d",
+   "metadata": {},
+   "source": [
+    "# Phase 1 (Late July - August)\n",
+    "\n",
+    "## Beginning of project\n",
+    "\n",
+    "The project was initially started by Eysteinn and Ólafur Aron communicating through email to find a team member to work with in the final project, Eysteinn had been working on a project relating to Neural Networks and had an idea if we should do something similar for this project. Soon afterwards Birkir and Júlíus also joined as team members for the project. We created the Discord channel for communication at this stage.\n",
+    "\n",
+    "\n",
+    "## Project idea\n",
+    "\n",
+    "We got into contact with Hrafn Loftsson at Reykjavik University and told him we wanted to work on something relating to Neural Networks. He told us to find a few ideas and bring them to him, we started brainstorming and talking on Discord and after some sessions the ideas that we proposed to him were:\n",
+    "\n",
+    "- Text simplification idea for Icelandic\n",
+    "- AI to fix vocabulary and spelling\n",
+    "- GameQA extension\n",
+    "- Sentiment analysis\n",
+    "- AI poem generator\n",
+    "- Automatic classification of emails with AI\n",
+    "\n",
+    "Hrafn suggested that we focus on Sentiment Analysis, this was in early August and soon after we had a meeting with him to talk about the ideas.\n",
+    "\n",
+    "## Initial project proposal\n",
+    "\n",
+    "We started working on the project proposal to submit to Kári after the meeting and we also started to familiarize ourselves with AI and Machine Learning, around the end of August we sent a project proposal into Canvas which looked like:\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ab2b3e58",
+   "metadata": {},
+   "source": [
+    "# Phase 2 (September-October)\n",
+    "\n",
+    "In early September we created the GitHub repository and also the JIRA board, it was also decided at that particular time that we would use Kanban mainly because it was difficult to exactly plan sprints and tasks before-hand so we opted out of a Scrum approach.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b179a8e",
+   "metadata": {},
+   "source": [
+    "# Phase 3 (November-December)"
    ]
   },
   {