Skip to content

Commit

Permalink
[arm] Re-scrapes Armenian data (#338)
Browse files Browse the repository at this point in the history
* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

* Adds Icelandic phones, filtered TSV file

* Updates changelog

* Adds Slovenian phones, filtered Slovenian data

* Updates changelog

* Add normalization to list_phones.py

* Updates changelog

* Reformats list_phones.py

* Adds Welsh phoneme lists, filtered Welsh TSV data

* Updates changelog

* Updates  with instructions to re-scrape

* Updates changelog

* Updates

* Updates data/phones/README.md

* Adds Vietnamese phones, Vietnamese TSV files

* Updates changelog

* Adds Hindi  file, new/updated TSV files

* Updates changelog

* Fixes Serbo-Croatian phones

* Updates CHANGELOG

* Revert "Adds Hindi  file, new/updated TSV files"

This reverts commit 964c3be.

* Adds Portuguese .phones files, re-scraped TSV data

* Rescrapes Portuguese data

* Updates changelog

* Adds Burmese phones, updated Burmese data

* Updates changelog

* Adds Japanese phone list. Rescrapes Japanese data

* Updates changelog

* Removes data/tsv/jpn_hira_phonemic.tsv

* Adds Azerbaijani phones, updated TSV data

* Updates changelog

* Adds Turkish phones, rescraped Turkish data

* Updates changelog

* Adds Maltese phones, updated data

* Updates changelog

* Adds Latvian phones, updated Latvian data

* Updates changelog

* Adds Khmer phones and updated TSV data

* Updates changelog

* Adds Østnorsk (Bokmål) phones and updated TSV data

* Updates changelog

* Fixes typo

* Update data/phones/README.md

* Update changelog

* Re-scrapes Armenian data. Fixes error in West Armenian phone list

* Updates changelog

Co-authored-by: Kyle Gorman <[email protected]>
  • Loading branch information
ajmalanoski and kylebgorman authored Jan 27, 2021
1 parent 1e85909 commit 67e43a6
Show file tree
Hide file tree
Showing 8 changed files with 3,447 additions and 339 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,8 @@ Unreleased
- Updated `data/phones/README.md` to specify that `.phones` files should be
in NFC normalization form. (\#333)
- Kurdish (`kur`) and Opata (`opt`) removed from `languages.json`. (\#334)
- Re-scraped Armenian data. Fixed an error in West Armenian phone list.
(\#338)

#### Fixed

Expand Down
8 changes: 4 additions & 4 deletions data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,10 @@
| [TSV](tsv/grc_phonemic.tsv) | grc | Ancient Greek (to 1453) | Ancient Greek | True | Phonemic | 90,222 |
| [TSV](tsv/ara_phonemic.tsv) | ara | Arabic | Arabic | False | Phonemic | 7,279 |
| [TSV](tsv/arc_hebr_phonemic.tsv) | arc | Imperial Aramaic (700-300 BCE); Official Aramaic (700-300 BCE) | Aramaic (Hebrew) | False | Phonemic | 1,156 |
| [TSV](tsv/arm_e_phonetic.tsv) | arm | Armenian | Armenian (Eastern Armenian, standard) | True | Phonetic | 14,129 |
| [TSV](tsv/arm_e_phonetic_filtered.tsv) | arm | Armenian | Armenian (Eastern Armenian, standard) | True | Phonetic_filtered | 14,122 |
| [TSV](tsv/arm_w_phonetic.tsv) | arm | Armenian | Armenian (Western Armenian, standard) | True | Phonetic | 13,035 |
| [TSV](tsv/arm_w_phonetic_filtered.tsv) | arm | Armenian | Armenian (Western Armenian, standard) | True | Phonetic_filtered | 12,073 |
| [TSV](tsv/arm_e_phonetic.tsv) | arm | Armenian | Armenian (Eastern Armenian, standard) | True | Phonetic | 14,182 |
| [TSV](tsv/arm_e_phonetic_filtered.tsv) | arm | Armenian | Armenian (Eastern Armenian, standard) | True | Phonetic_filtered | 14,177 |
| [TSV](tsv/arm_w_phonetic.tsv) | arm | Armenian | Armenian (Western Armenian, standard) | True | Phonetic | 14,065 |
| [TSV](tsv/arm_w_phonetic_filtered.tsv) | arm | Armenian | Armenian (Western Armenian, standard) | True | Phonetic_filtered | 14,040 |
| [TSV](tsv/rup_phonetic.tsv) | rup | Macedo-Romanian | Aromanian | True | Phonetic | 149 |
| [TSV](tsv/asm_phonemic.tsv) | asm | Assamese | Assamese | False | Phonemic | 2,354 |
| [TSV](tsv/ast_phonetic.tsv) | ast | Asturian | Asturian | True | Phonetic | 133 |
Expand Down
8 changes: 4 additions & 4 deletions data/languages_summary.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,10 @@ ale_phonemic.tsv ale Aleut Aleut True Phonemic 104
grc_phonemic.tsv grc Ancient Greek (to 1453) Ancient Greek True Phonemic 90222
ara_phonemic.tsv ara Arabic Arabic False Phonemic 7279
arc_hebr_phonemic.tsv arc Imperial Aramaic (700-300 BCE); Official Aramaic (700-300 BCE) Aramaic (Hebrew) False Phonemic 1156
arm_e_phonetic.tsv arm Armenian Armenian (Eastern Armenian, standard) True Phonetic 14129
arm_e_phonetic_filtered.tsv arm Armenian Armenian (Eastern Armenian, standard) True Phonetic_filtered 14122
arm_w_phonetic.tsv arm Armenian Armenian (Western Armenian, standard) True Phonetic 13035
arm_w_phonetic_filtered.tsv arm Armenian Armenian (Western Armenian, standard) True Phonetic_filtered 12073
arm_e_phonetic.tsv arm Armenian Armenian (Eastern Armenian, standard) True Phonetic 14182
arm_e_phonetic_filtered.tsv arm Armenian Armenian (Eastern Armenian, standard) True Phonetic_filtered 14177
arm_w_phonetic.tsv arm Armenian Armenian (Western Armenian, standard) True Phonetic 14065
arm_w_phonetic_filtered.tsv arm Armenian Armenian (Western Armenian, standard) True Phonetic_filtered 14040
rup_phonetic.tsv rup Macedo-Romanian Aromanian True Phonetic 149
asm_phonemic.tsv asm Assamese Assamese False Phonemic 2354
ast_phonetic.tsv ast Asturian Asturian True Phonetic 133
Expand Down
1 change: 1 addition & 0 deletions data/phones/arm_w_phonetic.phones
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
i
ɔ
u
ʏ # allophone that is automatically transcribed from word-medial յու /ju/
#
# Older entries might contain [o] or [e]. These have been fixed. You shouldn't find these phones anymore.
#
Expand Down
Loading

0 comments on commit 67e43a6

Please sign in to comment.