Macedonian LLM eval 🇲🇰

This repository is adapted from the original work by Aleksa Gordić. If you find this work useful, please consider citing or acknowledging the original source.

🎯 What is currently covered:

Common sense reasoning: Hellaswag, Winogrande, PIQA, OpenbookQA, ARC-Easy, ARC-Challenge
World knowledge: NaturalQuestions
Reading comprehension: BoolQ

You can find the Macedonian LLM eval dataset on HuggingFace. The dataset was translated from Serbian to Macedonian using the Google Translate API. The Serbian dataset was selected as the source instead of English because Serbian and Macedonian are closer from a linguistic standpoint, making Serbian a better starting point for translation. Additionally, the Serbian dataset was refined using GPT-4, which, according to the original report, significantly improved the quality of the translation.

Quality check was conducted on the translated Macedonian dataset, and the translations were deemed to be of good quality.

📊 Latest Results - January 16, 2025

Model	Version	ARC Easy	ARC Challenge	Bool Q	HellaSwag	Openbook QA	PIQA	NQ Open	WinoGrande
MKLLM-7B-Instruct	7B	0.5034	0.3003	0.7878	0.4328	0.2940	0.6420	0.0432	0.6148
BLOOM	7B	0.2774	0.1800	0.5028	0.2664	0.1580	0.5316	0	0.4964
Phi-3.5-mini	3.8B	0.2887	0.1877	0.6028	0.2634	0.1640	0.5256	0.0025	0.5193
Mistral	7B	0.4625	0.2867	0.7593	0.3722	0.2180	0.5783	0.0241	0.5612
Mistral-Nemo	12B	0.4718	0.3191	0.8086	0.3997	0.2420	0.6066	0.0291	0.6062
Qwen2.5	7B	0.3906	0.2534	0.7789	0.3390	0.2160	0.5598	0.0042	0.5351
LLaMA 3.1	8B	0.4453	0.2824	0.7639	0.3740	0.2520	0.5865	0.0335	0.5683
LLaMA 3.2	3B	0.3224	0.2329	0.6624	0.2976	0.2060	0.5462	0.0044	0.5059
🏆LLaMA 3.3 - 8bit	70B	0.5808	0.3686	0.8511	0.4656	0.2820	0.6600	0.0878	0.6093
domestic-yak-instruct	8B	0.5467	0.3362	0.7865	0.4480	0.3020	0.6910	0.0457	0.6267

📋Evaluation

To run the evaluation using the current version of macedonian-llm-eval you can follow the steps below:

Prerequisites

Before running the evaluation, ensure you have installed the necessary dependencies. First create an environment, e.g:

conda create -n mk_eval python==3.10
conda activate mk_eval

Then run:

pip install -e .

Run Evaluation

To evaluate a specific language model on a specific task run:

python3 main.py --language "Macedonian" --model hf-causal-experimental --model_args "pretrained=microsoft/Phi-3.5-mini-instruct" --tasks arc_challenge,arc_easy,boolq,hellaswag,openbookqa,piqa,winogrande --batch_size 8 --output_path "results_eval"

Info: You can run the evaluation for Serbian and Slovenian as well, just swap Macedonian with either one of them.

🈂️ (Optional) Translation

running this this will eat your google cloud credits or will bill you if you're already in the billing mode (this happens after you spend free credits and then deliberately enable billing again).
you can use your free credits to translate 500.000 chars / month!
if this is the first time you're creating a gcloud project you'll have 300$ of free credits!

Prerequisites

Before you begin, ensure you meet the following requirements:

For Linux Users:

Git and Miniconda

For Windows Users:

Windows Subsystem for Linux (WSL2). If you don't have WSL2 installed, follow these steps in Windows cmd/powershell in administrator mode:

wsl --install

// Check version and distribution name.
wsl -l -v

// Set the newly downloaded linux distro as default.
wsl --set-default <distribution name>

Install Git from the WSL terminal.

sudo apt update
sudo apt install git
git --version

Install Miniconda from the WSL terminal.

mkdir -p ~/miniconda3

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh

bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3

rm -rf ~/miniconda3/miniconda.sh

// Initialize conda with bash.
~/miniconda3/bin/conda init bash

Follow the instructions below on WSL.

📗 Instructions for translating from Serbian into Macedonian

First let's setup a minimal Python program that makes sure you can run Google Translate on your local machine.

Create a Google Console project (https://console.cloud.google.com/)
Enable Google Translation API -> to enable it you have to setup the billing and input your credit card details (a note regarding safety: you'll have 300$ of free credit (if this is the first time you're doing it) and no one can spend money from your credit card unless all those free credits are spent and you re-enable the billing again! if you already had it setup in that case you have 500.000 chars/month for free!)
Install Google Cloud CLI (gsutil) on your machine (see this: https://cloud.google.com/storage/docs/gsutil_install/)

a.) Download the Linux archive file (find latest version from link above)

curl -O https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-cli-455.0.0-linux-x86_64.tar.gz

b.) Extract the contents from the archive file above.

tar -xf google-cloud-cli-455.0.0-linux-x86_64.tar.gz

c.) Run installation script. ./google-cloud-sdk/install.sh

d.) Initiate and authenticate your account. ./google-cloud-sdk/bin/gcloud init

e.) Create a credentials file with gcloud auth application-default login
Create and setting up the conda env

a.) Open a terminal (if on Windows use the WSL terminal, if you're on Linux just use your terminal conda will already be in the PATH)

b.) Run conda create -n sr_mk_translate python=3.10 -y

c.) Run conda activate sr_mk_translate

d.) Run pip install google-cloud-translate
To run translation following these steps:

a.) Download the Serbian LLM evaluation dataset Serbian LLM Eval, or any other dataset of choice (just make sure to change the source language in translate.py - maybe the logic as well).

b.) Place the dataset in a data/ folder.

c.) Navigate to the translate/ directory.

d.) Set up the config.py file.

e.) Run the translation script:
```
python translate.py
```

🤝 How to Contribute?

We welcome contributions to the Macedonian LLM Eval! If you'd like to contribute, here’s how you can get involved:

Translate Popular Benchmarks:
- Identify benchmarks that have not yet been translated into Macedonian. For example, PubmedQA, SQuAD, or any other popular datasets.
- Translate the dataset into Macedonian using appropriate tools or methods (e.g., Google Translate API).
Fork and Modify the Repository:
- Fork this repo.
- Modify the necessary parts of the repository to support the new dataset. This includes:
  - Updating the evaluation script (lm_eval/tasks/<dataset_name>.py) to include the new benchmark.
  - Refer to existing implementations (e.g., ARC, SuperGLUE, HellaSwag) for guidance on how to implement evaluation logic.
Update and Modify the Script:
- Edit the evaluation script to include the new benchmark.
- Ensure all changes are tested and documented.
Open a PR:
- Open a PR to submit your changes.
- In your PR description, detail the following:
  - The benchmark you translated.
  - The modifications you made to the code.
  - How your changes were tested.
- If applicable, attach the modified evaluation script to your PR.

📝 TODOs

⬜️ Add COPA-MK to the eval (https://huggingface.co/datasets/classla/COPA-MK)

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.vscode		.vscode
docs		docs
lm_eval		lm_eval
scripts		scripts
templates		templates
tests		tests
translate		translate
translated		translated
.coveragerc		.coveragerc
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.bib		CITATION.bib
CODEOWNERS		CODEOWNERS
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
ignore.txt		ignore.txt
main.py		main.py
pile_statistics.json		pile_statistics.json
requirements.txt		requirements.txt
run_eval.sh		run_eval.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Macedonian LLM eval 🇲🇰

🎯 What is currently covered:

📊 Latest Results - January 16, 2025

📋Evaluation

Prerequisites

Run Evaluation

🈂️ (Optional) Translation

Prerequisites

📗 Instructions for translating from Serbian into Macedonian

🤝 How to Contribute?

📝 TODOs

About

Releases 1

Packages

Contributors 3

Languages

License

LVSTCK/macedonian-llm-eval

Folders and files

Latest commit

History

Repository files navigation

Macedonian LLM eval 🇲🇰

🎯 What is currently covered:

📊 Latest Results - January 16, 2025

📋Evaluation

Prerequisites

Run Evaluation

🈂️ (Optional) Translation

Prerequisites

📗 Instructions for translating from Serbian into Macedonian

🤝 How to Contribute?

📝 TODOs

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages