This service was born out of a need to convert currency amounts "eg. 120,329 Shillings" and other numbers "eg. 23.45 kilograms" into intelligible audio in Swahili for IVR messages to rural Tanzanians. When we talked to some developers from Jumo, we realised that this was a shared need, and thought this would be a great partnership opportunity.
We figured that this libary could be very useful for others working in to develop IVR applications for financial inclusion in Africa and around the world, and thus decided to release it publicly under an Open Source license. The hope for this project is that others can add their own languages and benefit financial inclusion projects around the world.
Yes. That's right.
Since 2017 we have been working along with Teller and Ker-twang on building more inclusive financial services across a number of countries and markets. We learned that many people recieve text messages from mobile money operators or cash disbursement programmes. For low-income people with poor literacy or poor eyesight, reading text messages with numbers on a screen isn't a great experience.
We've been crafting IVR (Interactive Voice Response) experiences that aim to engage these very people, and thus needed a way to read out numbers to them in their own native language. We believe that this library can be a small piece in enabling organizations around the world to better engage their customers over voice.
DFS Talk is able to convert complicated numbers into audio by stitching together individual audio files. There's no magic going on here with complex speech synthesis; we are just appending different words until it makes a number.
Below are some examples of numbers that have been converted to audio:
(coming soon!)
This is a draft api, and is currently under review before wider implementation
The API is specified in YAML using the OpenApi v2 spec. You can find the api spec at ./swagger.yaml. You can also browse the api with the Swagger UI Editor here
Supported Languages
Language | Code | Description |
---|---|---|
English | en_AU_male | Male English with Australian accent (courtesy of yours truly) |
Swahili | sw_TZ_male (coming soon) | Male Swahili with Tanzanian accent |
Swahili | sw_KE_male (coming soon) | Male Swahili with Kenyan accent |
Authentication is performed using Basic Auth. Simply include an auth header: Authorization: Basic <base64Encode('username:password')>
in your requests.
For example, where username=[email protected] and password=password
Auth header value is:
Basic ZW1haWxAZXhhbXBsZS5jb206cGFzc3dvcmQ=
e.g.:
curl -X POST "https://us-central1-dfs-talk.cloudfunctions.net/number/" \
-H "accept: application/json" \
-H "authorization: Basic ZW1haWxAZXhhbXBsZS5jb206cGFzc3dvcmQ=" \
-H "Content-Type: application/json" \
-d "{ \"language\": \"en_AU_male\", \"number\": 1032}"
Below is a summary of the key endpoints to use the library. You can browse the full API here.
Generates audio for a given number and language code.
Request Format:
{
"language": "<supported language code>",
"number": integer
}
Response Format:
{
"expiry": "" //<time in seconds>,
"url": "" //<url of the generated audio. Will be deleted after the expiry time>
}
curl -X POST "https://us-central1-dfs-talk.cloudfunctions.net/number/" \
-H "accept: application/json" \
-H "authorization: Basic ZW1haWxAZXhhbXBsZS5jb206cGFzc3dvcmQ=" \
-H "Content-Type: application/json" \
-d "{ \"language\": \"en_AU_male\", \"number\": 239572}"
Response:
{
"expiry": 86400,
"url": "https://www.googleapis.com/download/storage/v1/b/dfs-talk.appspot.com/o/generated%2Fee19ce03-5a41-4e90-a113-dfc043c57d4e.mp3?alt=media&token=1111222233334444"
}
coming soon!
We have a live deployment that is currently in private Alpha testing, and not yet ready for the public. To request access, get in touch with us at: lewis [at] vesselstech [dot] com.
If you would like to contribute a new language or audio for a given language, please get in touch with us at: lewis [at] vesselstech [dot] com.
The process will change for each language, but follow these rough steps:
- Determine which unique words for a language need to be recorded. For example, in English this means all numbers from 0-20, all tens (10, 20, 30), hundred, thousand, and a few more such as 'minus', 'point' and 'and'.
- Write a function in Javascript that takes a digit within a given number and returns a word or words. Eg. 119 in English is converted into "one", "hundred", "nineteen"
- Record the audio each word, and add to
functions/audio/<language_code>/
firebase login #only if you haven't logged in in some time
cd functions && npm install
cd ..
make switch-prod
make deploy
DFSTalk is licensed under a GNU GPL v3.0 license. See the LICENSE file for more info.
Copyright (c) 2019 Vessels Tech
All of this work is made possible with generous funding from DFSLab.