Quran recitations split by Aya #2994

obadx · 2024-12-04T04:48:20Z

سلام عليكم
جزاكم الله خيرا على ذلك الجهد
I am working on creating an AI Quran recitation teacher project and I need recitations split by Aya as a training dataset for the segmenter part of the project.

Could you please give me the access of your recitations split by Aya !

ahmedre · 2024-12-04T21:33:18Z

وعليكم السلام ورحمة الله وبركاته
جزاكم الله خيراً

if you need full mp3s (i.e. one mp3 per sura) and timing files, I recommend using the Quran.com API for this.

if you want an mp3 per ayah, you can try everyayah.com

obadx · 2024-12-05T04:59:34Z

جزاكم الله خير الجزاء 😍😍
Another question. How do you generate timings for every ayah ?

ahmedre · 2024-12-05T19:31:14Z

typically, this is done using machine learning models - initially it was this project, but others have built other models since.

@nabil6391 maybe you can shed some light since most of the best recent work on this was done by the gtaf.org team.

obadx · 2024-12-06T19:09:00Z

Thank you for your help. Let me more clear what we are building:

we aim to build a AI recitation teacher similar to tarteel.ai not an application but as an open source component that can be included in any application with those new features:
- The machine learning model can run robustly on edge devices.
- The AI teacher can detect أخطاء مخارج الحروف وصفاتها (مثل التفخيم والترقيق ) و أخطاء التجويد tajweed and Quranic pronunciation rules.

To do this we have to get : (data, machine learning model):

The data is a tuple (recitation, phontic script representing tajween and quranic pronunciation rules)
Machine learning model we have to seelect and tweak more than one like whisper, conformer. ...

The data

The data is the most important part. we have no data available ((recitation, phontic script representing tajween and quranic pronunciation rules) so what to do?

gather raw recitation from professional reciters from the web
split these recitation by pause وقف not ayah (this is the purpose of this issue I'm raising) why pause not by Ayah:

There are some jajweed rules valid only at pause like مد العارض للسكون و مد العوض وترقيق وتفخيم الراء
If the reciters recites the Ayah in more than one shot we will miss the annotation of these rules

Use machine learning model to extract Imlaey transcripts like this
Convert the Imaley script to the phontic script.
We are done ! No these data we create has no errors at all it is a prefect recitations. What to do?
Use forced alighment techniques to induce errors like shorting the madd, replacing س with ص ....

Are we done ? No we need to gather actual data (recitations annotated by expert reciters) after releasing the model to make it better

If you have further suggestion, ideas, contributions I will be happy @ahmedre @nabil6391

ahmedre · 2024-12-06T19:15:38Z

I'll let Nabil answer since he knows this space better - but will share a few small things - first, the Tarteel dataset used to be open source, but it is closed now. They do link to the DeepSearch-Quran project which might be interesting to you.

obadx · 2024-12-07T15:54:49Z

وعليكم السلام ورحمة الله وبركاته جزاكم الله خيراً

if you need full mp3s (i.e. one mp3 per sura) and timing files, I recommend using the Quran.com API for this.

if you want an mp3 per ayah, you can try everyayah.com

I notice something strange while calling /recitations endpoint of api.quran.com. I got only 12 recitations. But when using the quran_android application I got more than 50 recitations!

The full output of then /recitations endpoint

{
    "recitations": [
        {
            "id": 2,
            "reciter_name": "AbdulBaset AbdulSamad",
            "style": "Murattal",
            "translated_name": {
                "name": "AbdulBaset AbdulSamad",
                "language_name": "english"
            }
        },
        {
            "id": 1,
            "reciter_name": "AbdulBaset AbdulSamad",
            "style": "Mujawwad",
            "translated_name": {
                "name": "AbdulBaset AbdulSamad",
                "language_name": "english"
            }
        },
        {
            "id": 3,
            "reciter_name": "Abdur-Rahman as-Sudais",
            "style": null,
            "translated_name": {
                "name": "Abdur-Rahman as-Sudais",
                "language_name": "english"
            }
        },
        {
            "id": 4,
            "reciter_name": "Abu Bakr al-Shatri",
            "style": null,
            "translated_name": {
                "name": "Abu Bakr al-Shatri",
                "language_name": "english"
            }
        },
        {
            "id": 5,
            "reciter_name": "Hani ar-Rifai",
            "style": null,
            "translated_name": {
                "name": "Hani ar-Rifai",
                "language_name": "english"
            }
        },
        {
            "id": 12,
            "reciter_name": "Mahmoud Khalil Al-Husary",
            "style": "Muallim",
            "translated_name": {
                "name": "Mahmoud Khalil Al-Husary",
                "language_name": "english"
            }
        },
        {
            "id": 6,
            "reciter_name": "Mahmoud Khalil Al-Husary",
            "style": null,
            "translated_name": {
                "name": "Mahmoud Khalil Al-Husary",
                "language_name": "english"
            }
        },
        {
            "id": 7,
            "reciter_name": "Mishari Rashid al-`Afasy",
            "style": null,
            "translated_name": {
                "name": "Mishari Rashid al-`Afasy",
                "language_name": "english"
            }
        },
        {
            "id": 9,
            "reciter_name": "Mohamed Siddiq al-Minshawi",
            "style": "Murattal",
            "translated_name": {
                "name": "Mohamed Siddiq al-Minshawi",
                "language_name": "english"
            }
        },
        {
            "id": 8,
            "reciter_name": "Mohamed Siddiq al-Minshawi",
            "style": "Mujawwad",
            "translated_name": {
                "name": "Mohamed Siddiq al-Minshawi",
                "language_name": "english"
            }
        },
        {
            "id": 10,
            "reciter_name": "Sa`ud ash-Shuraym",
            "style": null,
            "translated_name": {
                "name": "Sa`ud ash-Shuraym",
                "language_name": "english"
            }
        },
        {
            "id": 11,
            "reciter_name": "Mohamed al-Tablawi",
            "style": null,
            "translated_name": {
                "name": "Mohamed al-Tablawi",
                "language_name": "english"
            }
        }
    ]
}

@ahmedre

ahmedre · 2024-12-07T17:29:41Z

today they are not synced together, but that's one of the things we hope to do in sha' Allah in the future.

obadx · 2024-12-07T20:17:26Z

I'm very sorry for this long thread and very grateful to your help truly جزاك الله خير الجزاء I hope to contribute with you in future projects إن شاء الله , but could you please give me the recitations metadata (android app API schema) (links and timings) in the android app as more data means better results. It will be hard for me to wait until is published on the api إن شاء الله @ahmedre

Note: As you know the API.quran and the Quran web app are limited in recitations and the android app is the most rich one

nabil6391 · 2024-12-08T04:12:38Z

Assalamu Alaikum. MashaAllah brother, you have a lot of knowledge in this and yes you are in the right direction.

I have used https://everyayah.com/ for training the model, obiously had to filter ayahs longer than 30s.

We at gtaf.org have some of the similar goals as you have and we are considering open sourcing our model as well in sha Allah. If you want to know more and collaborate just contact me at Nabil@GTAF (nabil6391) in discord or [email protected].

obadx · 2024-12-08T04:30:53Z

I have sent you a friend request in discord my username is (abdullah.aml) @nabil6391

nacer80 · 2024-12-11T10:11:02Z

السلام عليكم ورحمة الله وبركاته

I hope everyone is in good health.
I have noticed many attempts to find a way to determine the timing of Quranic words -timestamps- in a verse or an entire surah, but so far there is no precise tool for this. We all know the linguistic complexities of the Arabic language in general and the Quran in particular, due to the diacritical marks such as tanween in its various forms and the rules of tajweed like ikhfa, idgham, and iqlab, which can make separating some words from each other difficult. The matter becomes more complicated when using other recitations like Warsh, where the hamza can sometimes be merged with the following letter, such as in the word "الأرض" (the earth).

In general, we hope that all brothers will intensify their efforts and cooperate to find a model for converting audio to text and extracting the timing of each word specifically for the Quran and developing it.
May Allah reward you all.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quran recitations split by Aya #2994

Quran recitations split by Aya #2994

obadx commented Dec 4, 2024

ahmedre commented Dec 4, 2024

obadx commented Dec 5, 2024 •

edited

Loading

ahmedre commented Dec 5, 2024

obadx commented Dec 6, 2024

ahmedre commented Dec 6, 2024

obadx commented Dec 7, 2024

ahmedre commented Dec 7, 2024

obadx commented Dec 7, 2024 •

edited

Loading

nabil6391 commented Dec 8, 2024

obadx commented Dec 8, 2024 •

edited

Loading

nacer80 commented Dec 11, 2024

Quran recitations split by Aya #2994

Quran recitations split by Aya #2994

Comments

obadx commented Dec 4, 2024

ahmedre commented Dec 4, 2024

obadx commented Dec 5, 2024 • edited Loading

ahmedre commented Dec 5, 2024

obadx commented Dec 6, 2024

The data

ahmedre commented Dec 6, 2024

obadx commented Dec 7, 2024

ahmedre commented Dec 7, 2024

obadx commented Dec 7, 2024 • edited Loading

nabil6391 commented Dec 8, 2024

obadx commented Dec 8, 2024 • edited Loading

nacer80 commented Dec 11, 2024

obadx commented Dec 5, 2024 •

edited

Loading

obadx commented Dec 7, 2024 •

edited

Loading

obadx commented Dec 8, 2024 •

edited

Loading