diff --git a/index.js b/index.js index 2cc98e26f..7c7c0869f 100644 --- a/index.js +++ b/index.js @@ -136,6 +136,7 @@ URLS=[ "tf/convert/xml.html", "tf/convert/xmlCustom.html", "tf/convert/addnlp.html", +"tf/convert/makewatm.html", "tf/convert/tei.html", "tf/convert/watm.html", "tf/convert/mql.html", @@ -3585,7 +3586,7 @@ INDEX=[ { "ref":"tf.about.releases", "url":72, -"doc":" Release notes ! hint \"Consult the tutorials after changes\" When we change the API, we make sure that the tutorials show off all possibilities. See the app-specific tutorials via tf.about.corpora . - The TEI converter is still in active development. If you need the latest version, clone the TF repo and in its top-level directory run the command: sh pip install -e . 12 12.4 12.4.2 2024-04-24 Tiny fixes in the TEI and WATM conversions. 12.4.1 2024-04-24 Improvements in the TEI and WATM conversions. 12.4.0 2024-04-21 Support for [Marimo notebooks](https: docs.marimo.io/index.html). TF detects when its run in a notebook, and also in what kind of notebook: ipython (~ Jupyter) or marimo . When it needs to display material in the output of a cell, it will choose the methods that are suitable for the kind of notebook it is working in. 12.3 12.3.7 2024-04-19 Improvements in tf.convert.watm : the resulting data is much more compact, because: you can choose to export it as TSV instead of JSON; no annotations of type node are produced anymore, they only served to map annotations and text segments to TF nodes; now that mapping is exported as a simple TSV file; you can opt to exclude an arbitrary set of tags from being exported as WATM annotations. 12.3.6 2024-04-16 Minimal fixes in tf.convert.tei : it can handle a biography. Fixed prettyTuple() when passed _asString=True : it did not pass this on to pretty() which caused a Python error. 12.3.5 2024-03-26 extra functionality: When adding types with tf.dataset.modify you can link nodes of a newly added type to nodes that were added as the preiviously added type. This is a bit of a limited and ad hoc extension of the functionality of this function. I needed a quick fix to add nodes for entities and entity occurrences at the same time and link them with edges. This is for the corpus [ CLARIAH/wp6-missieven ](https: github.com/CLARIAH/wp6-missieven). fix: the express download of a dataset (complete.zip) was nit triggered in all cases where it should. 12.3.4 2024-02-26 The output of tf.convert.watm has been changed. It now generates token files per section, where you can configure the TF section level for that. The syntax for targets has been changed: more things are possible. Tests have been adapted and strengthened. 12.3.3 2024-02-20 Fix in tf.advanced.repo.publishRelease : it did not work if you are on a branch named main because master was hard-coded in the source code. Now it takes the branch name from the app context. Do not forget to specify branch: main under the provenanceSpecs in your /app/config.yaml . Many thanks to Tony Jorg for reporting this error. 12.3.1,2 2024-02-15 Minor improvements to the WATM converter, and an update of its docs. 12.3.0 2024-02-08 A new data export conversion, from TF to WATM. See tf.convert.watm . WATM is a not yet hardened data format that powers the publishing line for text and annotations built by Team Text at [KNAW/HuC Digital Infrastructure](https: di.huc.knaw.nl/text-analysis-en.html). Currently this export is used for the corpora [Mondriaan Proeftuin](https: github.com/annotation/mondriaan) [Suriano Letters](https: gitlab.huc.knaw.nl/suriano/letters) [TransLatin Corpus](https: gitlab.huc.knaw.nl/translatin/corpus) Small fixes in tf.convert.addnlp : when the NLP data is integrated in the TF dataset, the NLP-generated features will get some metadata 12.2 12.2.8,9,10 2024-01-24/25 TF can auto-download extra data with a TF dataset, e.g. a directory with named entities ( ner ) as in the [suriano corpus](https: gitlab.huc.knaw.nl/suriano/letters). However, this only worked when the repo was in the github backend and the extra data had been packed for express-download and attached to a release. Now it also works with the normal download methods using the GitHub and GitLab APIs. So, after the move of Suriano from GitHub to GitLab, this functionality is still available. There was a glitch in the layout of the NER tool which caused section labels to be chopped off at the margin, only in notebooks. Thats has been fixed by moving some CSS code from one file to an other. 12.2.7 2024-01-23 There were issues with starting up the Text-Fabric browser: If the system could not start the browser, the TF stopped the webserver. That is not helpful, because one can always open a browser and enter the url in the address bar. Now TF shows the url rather prominently when it does not open a browser. If debug mode is on, Flask reloads the whole process, and that might include opening the browser as well. Now Flask only opens the browser after the startup of the webserver, and not anymore after successive reloads. 12.2.6 2024-01-15 Somehow the express way of downloading data (via complete.zip attached to the latest release) did not get triggered in all cases where it should. It is now triggered in more cases than before. 12.2.5 2023-12-18 Small fix in NER browser: prevent submitting the form if the focus is in a textarea field or in an input field that does not have type=submit. 12.2.3,4 2023-12-09 Writing support for Ugaritic, thanks to Martijn Naaijer and Christian H\u00f8jgaard for converting a Ugaritic corpus to TF. Fix in display functions (continued): The logic of feature display, fixed in the previous version, was not effective when things are displayed in the TF browser. Because in the TF browser the features of the last query were passed as extraFeatures instead of tupleFeatures . This has been fixed by using tupleFeatures in the TF browser as well. 12.2.2 2023-12-02 Fix in display functions, thanks to Tony Jurg: if you do A.pretty(x, queryFeatures=False, extraFeatures=\"yy zz\") the extra features were not shown. So there was no obvious way to control exactly the features that you want to show in a display. That has been fixed. Further clarification: the node features that are used by a query are stored in the display option tupleFeatures . That is what causes them to be displayed in subsequent display statements. You can also explicitly set/pass the tupleFeatures parameter. However, the fact that queryFeatures=False prohibited the display of features mentioned in extraFeatures was against the intuitions. Improvements in the PageXML conversion. There are token features str , after that reflect the logical tokens There are token features rstr , rafter that reflect the physical tokens The distincition between logical and physical is that physical token triplets with the soft hyphen as the middle one, are joined to one logical token; this happens across line boundaries, but also region and page boundaries. 12.2.0,1 2023-11-28 New conversion tool: from PageXML. Still in its infancy. It uses the [PageXML tools](https: github.com/knaw-huc/pagexml) by Marijn Koolen. For an example see [translatin/logic](https: gitlab.huc.knaw.nl/translatin/logic/-/blob/main/tools/convertPlain.ipynb?ref_type=heads). Fix: TF did not fetch an earlier version of a corpus if the newest release contains a complete.zip (which only has the latest version). For some technical reason that still escapes me, the TF browser was slow to start. Fixed it by saying threaded=True to Flask, as suggested on [stackoverflow](https: stackoverflow.com/a/11150849/15236220) From now on: TF does not try to download complete.zip if you pass a version argument to the use() command. 12.1 12.1.6,7 2023-11-15 Various fixes: Some package data was not included for the NER annotation tool. In the NER tool, the highlighting of hits of the search pattern is now exact, it was sometimes off. Deleted tf.tools.docsright again, but developed it further in [docsright](https: github.com/annotation/docsright). 12.1.5 2023-11-02 Improvement in dependencies. Text-Fabric is no longer mandatory dependent on openpyxl , pandas , pyarrow , lxml . The optional dependencies on pygithub and python-gitlab remain, but most users will never need them, because TF can also fetch the complete.zip that is available as release asset for most corpora. Whenever TF invokes a module that is not in the mandatory dependencies, it will act gracefully, providing hints to install the modules in question. 12.1.3,4 2023-11-01 API change in the Annotator: Calling the annotator is now easier: A.makeNer() (No need to make an additional import statement.) This will give you access to all annotation methods, including using a spreadsheet to read annotation instructions from. Removal of deprecated commands (on the command line) in version 11: text-fabric (has become tf ) text-fabric-zip (has become tf-zip ) text-fabric-make (has become tf-make ) Bug fixes: [ 81](https: github.com/annotation/text-fabric/issues/81) and [ 82](https: github.com/annotation/text-fabric/issues/82) Spell-checked all bits of the TF docs here (33,000 lines). Wrote a script tf.tools.docsright to separate the code content from the markdown content, and to strip bits from the markdown content that lead to false positives for the spell checker. Then had the Vim spell checker run over those lines and corrected all mistakes by hand. Still, there might be grammar errors and content inaccuracies. 12.1.4 follows 12.1.3. quickly, because in corpora without a NER configuration file, TF did not start up properly. 12.1.1,2 2023-10-29 Bug fix: the mechanism to make individual exceptions when adding named entities in the tf.browser.ner.annotate tool was broken. Thanks to Daniel Swanson for spotting it. Additional fixes and enhancements. 12.1.0 2023-10-28 New stuff In the TF browser there will be a new tab in the vertical sidebar: Annotate , which will give access to manual annotation tools. I am developing the first one, a tool to annotate named entities efficiently, both in the TF browser and in a Jupyter Notebook. Reed more in tf.about.annotate . These tools will let you save your work as files on your own computer. In tf.convert.addnlp we can now extract more NLP information besides tokens and sentences: part-of-speech, morphological tagging, lemmatisation, named entity recognition Fixes in the TEI converter. 12.0 12.0.6,7 2023-09-13 Trivial fix in code that exports the data from a job in the TF browser. In the meanwhile there is unfinished business in the Annotate tab in the TF browser, that will come into production in the upcoming 12.1 release. The Chrome browser has an attractive feature that other browsers such as Safari lack: It supports the CSS property [content-visibility](https: developer.mozilla.org/en-US/docs/Web/CSS/content-visibility). With this property you can prevent the browser to do the expensive rendering of content that is not visible on the screen. That makes it possible to load a lot of content in a single page without tripping up the browser. You also need the [ IntersectionObserver API](https: developer.mozilla.org/en-US/docs/Web/API/Intersection_Observer_API), but that is generally supported by browsers. With the help of that API you can restrict the binding of event listeners to elements that are visible on the screen. So, you can open the TF browser in Chrome by passing the option chrome . But if Chrome is not installed, it will open in the default browser anyway. Also, when the opening of the browser fails somehow, the web server is stopped. 12.0.5 2023-07-10 Fixed references to static files that still went to /server instead of /browser . This has to do with the new approach to the TF browser. 12.0.0-4 2023-07-05 Simplification The TF browser no longer works with a separate process that holds the TF corpus data. Instead, the web server (flask) loads the corpus itself. This will restrict the usage of the TF browser to local-single-user scenarios. TF no longer exposes the installation options [browser, pandas] pip install 'text-fabric[browser]' pip install 'text-fabric[pandas]' If you work with Pandas (like exporting to Pandas) you have to install it yourself: pip install pandas pyarrow The TF browser is always supported. The reason to have these distinct capabilities was that there are python libraries involved that do not install on the iPad. The simplification of the TF browser makes it possible to be no longer dependent on these modules. Hence, TF can be installed on the iPad, although the TF browser works is not working there yet. But the auto-downloading of data from GitHub / GitLab works. Minor things Header. After loading a dataset, a header is shown with shows all kinds of information about the corpus. But so far, it did not show the TF app settings. Now they are included in the header. There are two kinds: the explicitly given settings and the derived and computed settings. The latter ones will be suppressed when loading a dataset in a Jupyter notebook, because these settings can become quite big. You can still get them with A.showContext() . In the TF browser they will be always included, you find it in the Corpus tab. - Older releases See tf.about.releasesold ." +"doc":" Release notes ! hint \"Consult the tutorials after changes\" When we change the API, we make sure that the tutorials show off all possibilities. See the app-specific tutorials via tf.about.corpora . - The TEI converter is still in active development. If you need the latest version, clone the TF repo and in its top-level directory run the command: sh pip install -e . 12 12.4 12.4.3 2024-05-08 Fix in TF browser, spotted by Jorik Groen. When exporting query results, the values features used in the query were not written to the table at all. The expected behaviour was that features used in the query lead to extra columns in the exported table. It has been fixed. The cause was an earlier fix in the display of features in query results. This new fix only affects the export function from the browser, not the advanced.display.export function, which did not have this bug. 12.4.2 2024-04-24 Tiny fixes in the TEI and WATM conversions. 12.4.1 2024-04-24 Improvements in the TEI and WATM conversions. 12.4.0 2024-04-21 Support for [Marimo notebooks](https: docs.marimo.io/index.html). TF detects when its run in a notebook, and also in what kind of notebook: ipython (~ Jupyter) or marimo . When it needs to display material in the output of a cell, it will choose the methods that are suitable for the kind of notebook it is working in. 12.3 12.3.7 2024-04-19 Improvements in tf.convert.watm : the resulting data is much more compact, because: you can choose to export it as TSV instead of JSON; no annotations of type node are produced anymore, they only served to map annotations and text segments to TF nodes; now that mapping is exported as a simple TSV file; you can opt to exclude an arbitrary set of tags from being exported as WATM annotations. 12.3.6 2024-04-16 Minimal fixes in tf.convert.tei : it can handle a biography. Fixed prettyTuple() when passed _asString=True : it did not pass this on to pretty() which caused a Python error. 12.3.5 2024-03-26 extra functionality: When adding types with tf.dataset.modify you can link nodes of a newly added type to nodes that were added as the preiviously added type. This is a bit of a limited and ad hoc extension of the functionality of this function. I needed a quick fix to add nodes for entities and entity occurrences at the same time and link them with edges. This is for the corpus [ CLARIAH/wp6-missieven ](https: github.com/CLARIAH/wp6-missieven). fix: the express download of a dataset (complete.zip) was nit triggered in all cases where it should. 12.3.4 2024-02-26 The output of tf.convert.watm has been changed. It now generates token files per section, where you can configure the TF section level for that. The syntax for targets has been changed: more things are possible. Tests have been adapted and strengthened. 12.3.3 2024-02-20 Fix in tf.advanced.repo.publishRelease : it did not work if you are on a branch named main because master was hard-coded in the source code. Now it takes the branch name from the app context. Do not forget to specify branch: main under the provenanceSpecs in your /app/config.yaml . Many thanks to Tony Jorg for reporting this error. 12.3.1,2 2024-02-15 Minor improvements to the WATM converter, and an update of its docs. 12.3.0 2024-02-08 A new data export conversion, from TF to WATM. See tf.convert.watm . WATM is a not yet hardened data format that powers the publishing line for text and annotations built by Team Text at [KNAW/HuC Digital Infrastructure](https: di.huc.knaw.nl/text-analysis-en.html). Currently this export is used for the corpora [Mondriaan Proeftuin](https: github.com/annotation/mondriaan) [Suriano Letters](https: gitlab.huc.knaw.nl/suriano/letters) [TransLatin Corpus](https: gitlab.huc.knaw.nl/translatin/corpus) Small fixes in tf.convert.addnlp : when the NLP data is integrated in the TF dataset, the NLP-generated features will get some metadata 12.2 12.2.8,9,10 2024-01-24/25 TF can auto-download extra data with a TF dataset, e.g. a directory with named entities ( ner ) as in the [suriano corpus](https: gitlab.huc.knaw.nl/suriano/letters). However, this only worked when the repo was in the github backend and the extra data had been packed for express-download and attached to a release. Now it also works with the normal download methods using the GitHub and GitLab APIs. So, after the move of Suriano from GitHub to GitLab, this functionality is still available. There was a glitch in the layout of the NER tool which caused section labels to be chopped off at the margin, only in notebooks. Thats has been fixed by moving some CSS code from one file to an other. 12.2.7 2024-01-23 There were issues with starting up the Text-Fabric browser: If the system could not start the browser, the TF stopped the webserver. That is not helpful, because one can always open a browser and enter the url in the address bar. Now TF shows the url rather prominently when it does not open a browser. If debug mode is on, Flask reloads the whole process, and that might include opening the browser as well. Now Flask only opens the browser after the startup of the webserver, and not anymore after successive reloads. 12.2.6 2024-01-15 Somehow the express way of downloading data (via complete.zip attached to the latest release) did not get triggered in all cases where it should. It is now triggered in more cases than before. 12.2.5 2023-12-18 Small fix in NER browser: prevent submitting the form if the focus is in a textarea field or in an input field that does not have type=submit. 12.2.3,4 2023-12-09 Writing support for Ugaritic, thanks to Martijn Naaijer and Christian H\u00f8jgaard for converting a Ugaritic corpus to TF. Fix in display functions (continued): The logic of feature display, fixed in the previous version, was not effective when things are displayed in the TF browser. Because in the TF browser the features of the last query were passed as extraFeatures instead of tupleFeatures . This has been fixed by using tupleFeatures in the TF browser as well. 12.2.2 2023-12-02 Fix in display functions, thanks to Tony Jurg: if you do A.pretty(x, queryFeatures=False, extraFeatures=\"yy zz\") the extra features were not shown. So there was no obvious way to control exactly the features that you want to show in a display. That has been fixed. Further clarification: the node features that are used by a query are stored in the display option tupleFeatures . That is what causes them to be displayed in subsequent display statements. You can also explicitly set/pass the tupleFeatures parameter. However, the fact that queryFeatures=False prohibited the display of features mentioned in extraFeatures was against the intuitions. Improvements in the PageXML conversion. There are token features str , after that reflect the logical tokens There are token features rstr , rafter that reflect the physical tokens The distincition between logical and physical is that physical token triplets with the soft hyphen as the middle one, are joined to one logical token; this happens across line boundaries, but also region and page boundaries. 12.2.0,1 2023-11-28 New conversion tool: from PageXML. Still in its infancy. It uses the [PageXML tools](https: github.com/knaw-huc/pagexml) by Marijn Koolen. For an example see [translatin/logic](https: gitlab.huc.knaw.nl/translatin/logic/-/blob/main/tools/convertPlain.ipynb?ref_type=heads). Fix: TF did not fetch an earlier version of a corpus if the newest release contains a complete.zip (which only has the latest version). For some technical reason that still escapes me, the TF browser was slow to start. Fixed it by saying threaded=True to Flask, as suggested on [stackoverflow](https: stackoverflow.com/a/11150849/15236220) From now on: TF does not try to download complete.zip if you pass a version argument to the use() command. 12.1 12.1.6,7 2023-11-15 Various fixes: Some package data was not included for the NER annotation tool. In the NER tool, the highlighting of hits of the search pattern is now exact, it was sometimes off. Deleted tf.tools.docsright again, but developed it further in [docsright](https: github.com/annotation/docsright). 12.1.5 2023-11-02 Improvement in dependencies. Text-Fabric is no longer mandatory dependent on openpyxl , pandas , pyarrow , lxml . The optional dependencies on pygithub and python-gitlab remain, but most users will never need them, because TF can also fetch the complete.zip that is available as release asset for most corpora. Whenever TF invokes a module that is not in the mandatory dependencies, it will act gracefully, providing hints to install the modules in question. 12.1.3,4 2023-11-01 API change in the Annotator: Calling the annotator is now easier: A.makeNer() (No need to make an additional import statement.) This will give you access to all annotation methods, including using a spreadsheet to read annotation instructions from. Removal of deprecated commands (on the command line) in version 11: text-fabric (has become tf ) text-fabric-zip (has become tf-zip ) text-fabric-make (has become tf-make ) Bug fixes: [ 81](https: github.com/annotation/text-fabric/issues/81) and [ 82](https: github.com/annotation/text-fabric/issues/82) Spell-checked all bits of the TF docs here (33,000 lines). Wrote a script tf.tools.docsright to separate the code content from the markdown content, and to strip bits from the markdown content that lead to false positives for the spell checker. Then had the Vim spell checker run over those lines and corrected all mistakes by hand. Still, there might be grammar errors and content inaccuracies. 12.1.4 follows 12.1.3. quickly, because in corpora without a NER configuration file, TF did not start up properly. 12.1.1,2 2023-10-29 Bug fix: the mechanism to make individual exceptions when adding named entities in the tf.browser.ner.annotate tool was broken. Thanks to Daniel Swanson for spotting it. Additional fixes and enhancements. 12.1.0 2023-10-28 New stuff In the TF browser there will be a new tab in the vertical sidebar: Annotate , which will give access to manual annotation tools. I am developing the first one, a tool to annotate named entities efficiently, both in the TF browser and in a Jupyter Notebook. Reed more in tf.about.annotate . These tools will let you save your work as files on your own computer. In tf.convert.addnlp we can now extract more NLP information besides tokens and sentences: part-of-speech, morphological tagging, lemmatisation, named entity recognition Fixes in the TEI converter. 12.0 12.0.6,7 2023-09-13 Trivial fix in code that exports the data from a job in the TF browser. In the meanwhile there is unfinished business in the Annotate tab in the TF browser, that will come into production in the upcoming 12.1 release. The Chrome browser has an attractive feature that other browsers such as Safari lack: It supports the CSS property [content-visibility](https: developer.mozilla.org/en-US/docs/Web/CSS/content-visibility). With this property you can prevent the browser to do the expensive rendering of content that is not visible on the screen. That makes it possible to load a lot of content in a single page without tripping up the browser. You also need the [ IntersectionObserver API](https: developer.mozilla.org/en-US/docs/Web/API/Intersection_Observer_API), but that is generally supported by browsers. With the help of that API you can restrict the binding of event listeners to elements that are visible on the screen. So, you can open the TF browser in Chrome by passing the option chrome . But if Chrome is not installed, it will open in the default browser anyway. Also, when the opening of the browser fails somehow, the web server is stopped. 12.0.5 2023-07-10 Fixed references to static files that still went to /server instead of /browser . This has to do with the new approach to the TF browser. 12.0.0-4 2023-07-05 Simplification The TF browser no longer works with a separate process that holds the TF corpus data. Instead, the web server (flask) loads the corpus itself. This will restrict the usage of the TF browser to local-single-user scenarios. TF no longer exposes the installation options [browser, pandas] pip install 'text-fabric[browser]' pip install 'text-fabric[pandas]' If you work with Pandas (like exporting to Pandas) you have to install it yourself: pip install pandas pyarrow The TF browser is always supported. The reason to have these distinct capabilities was that there are python libraries involved that do not install on the iPad. The simplification of the TF browser makes it possible to be no longer dependent on these modules. Hence, TF can be installed on the iPad, although the TF browser works is not working there yet. But the auto-downloading of data from GitHub / GitLab works. Minor things Header. After loading a dataset, a header is shown with shows all kinds of information about the corpus. But so far, it did not show the TF app settings. Now they are included in the header. There are two kinds: the explicitly given settings and the derived and computed settings. The latter ones will be suppressed when loading a dataset in a Jupyter notebook, because these settings can become quite big. You can still get them with A.showContext() . In the TF browser they will be always included, you find it in the Corpus tab. - Older releases See tf.about.releasesold ." }, { "ref":"tf.about.clientmanual", @@ -7133,96 +7134,148 @@ INDEX=[ "func":1 }, { -"ref":"tf.convert.tei", +"ref":"tf.convert.makewatm", +"url":137, +"doc":"" +}, +{ +"ref":"tf.convert.makewatm.MakeWATM", "url":137, +"doc":"Base class for running conversions to WATM. This class has methods to convert corpora from TEI or PageXML to TF and then to WATM. But if the corpus needs additional preparation, you can make a sub class based on this with additional tasks defined an implemented. Any class M in m.py based on this class can be called from the command line as follows: python m.py flags tasks If you base a superclass on this, you can register the additional tasks as follows: For each extra task xxx, write an method doTask_xxx(self) Then provide for each task a simple doc line and register them all by: self.setOptions( taskSpecs=( (task1, docLine1), (task2, docLine2), . ), flagSpecs=( (flag1, docLine1), (flag2, docLine2), . ), ) Localize upon creation. When an object of this class is initialized, we assume that the script doing it is localized in the programs directory in a corpus repo. Parameters fileLoc: string The full path of the file that creates an instance of this class." +}, +{ +"ref":"tf.convert.makewatm.MakeWATM.doTask_tei2tf", +"url":137, +"doc":"", +"func":1 +}, +{ +"ref":"tf.convert.makewatm.MakeWATM.doTask_page2tf", +"url":137, +"doc":"", +"func":1 +}, +{ +"ref":"tf.convert.makewatm.MakeWATM.doTask_watm", +"url":137, +"doc":"", +"func":1 +}, +{ +"ref":"tf.convert.makewatm.MakeWATM.doTask_watms", +"url":137, +"doc":"", +"func":1 +}, +{ +"ref":"tf.convert.makewatm.MakeWATM.run", +"url":137, +"doc":"", +"func":1 +}, +{ +"ref":"tf.convert.makewatm.MakeWATM.setOptions", +"url":137, +"doc":"", +"func":1 +}, +{ +"ref":"tf.convert.makewatm.MakeWATM.main", +"url":137, +"doc":"", +"func":1 +}, +{ +"ref":"tf.convert.tei", +"url":138, "doc":" TEI import You can convert any TEI source into TF by specifying a few details about the source. TF then invokes the tf.convert.walker machinery to produce a TF dataset out of the source. TF knows the TEI elements, because it will read and parse the complete TEI schema. From this the set of complex, mixed elements is distilled. If the TEI source conforms to a customised TEI schema, it will be detected and the importer will read it and override the generic information of the TEI elements. It is also possible to pass a choice of template and adaptation in a processing instruction. This does not influence validation, but it may influence further processing. If the TEI consists of multiple source files, it is possible to specify different templates and adaptations for different files. The possible values for models, templates, and adaptations should be declared in the configuration file. For each model there should be a corresponding schema in the schema directory, either an RNG or an XSD file. The converter goes the extra mile: it generates a TF app and documentation (an about.md file and a transcription.md file), in such a way that the TF browser is instantly usable. The TEI conversion is rather straightforward because of some conventions that cannot be changed. Configuration and customization We assume that you have a programs directory at the top-level of your repo. In this directory we'll look for two optional files: a file tei.yaml in which you specify a bunch of values to get the conversion off the ground. a file tei.py in which you define custom functions that are executed at certain specific hooks: transform(text) which takes a text string argument and delivers a text string as result. The converter will call this on every TEI input file it reads before feeding it to the XML parser. This can be used to solve some quirks in the input, e.g. replacing two consecutive commas ( ) by a single unicode character ( \u201e = 201E); beforeTag : just before the walker starts processing the start tag of a TEI element; beforeChildren : just after processing the start tag, but before processing the element content (text and child elements); afterChildren : just after processing the complete element content (text and child elements), but before processing the end tag of the TEI element; afterTag : just after processing the end tag of a TEI element. The before and after functions should take the following arguments cv : the walker converter object; cur : the dictionary with information that has been gathered during the conversion so far and that can be used to dump new information into; it is nonlocal, i.e. all invocations of the hooks get the same dictionary object passed to them; xnode : the LXML node corresponding to the TEI element; tag : the tag name of the element, without namespaces; this is a bit redundant, because it can also be extracted from the xnode , but it is convenient. atts : the attributes (names and values) of the element, without namespaces; this is a bit redundant, because it can also be extracted from the xnode , but it is convenient. These functions should not return anything, but they can write things to the cur dictionary. And they can create slots, nodes, and terminate them, in short, they can do every cv -based action that is needed. You can define these functions out of this context, but it is good to know what information in cur is guaranteed to be available: xnest : the stack of XML tag names seen at this point; tnest : the stack of TF nodes built at this point; tsiblings (only if sibling nodes are being recorded): the list of preceding TF nodes corresponding to the TEI sibling elements of the current TEI element. Keys and values of the tei.yaml file generic dict, optional {} Metadata for all generated TF features. The actual source version of the TEI files does not have to be stated here, it will be inserted based on the version that the converter will actually use. That version depends on the tei argument passed to the program. The key under which the source version will be inserted is teiVersion . extra dict, optional {} Instructions and metadata for specific generated TF features, namely those that have not been generated by the vanilla TEI conversion, but by extra code in one of the customised hooks. The dict is keyed by feature name, the values are again dictionaries. These value dictionaries have a key meta under which any number of metadata key value pairs, such as description=\"xxx\" . If you put the string \u00abbase\u00bb in such a field, it will be expanded on the basis of the contents of the path key, see below. You must provide the key valueType and pass int or str there, depending on the values of the feature. You may provide extra keys, such as conversionMethod=\"derived\" , so that other programs can determine what to do with these features. The information in this dict will also end up in the generated feature docs. Besides the meta key, there may also be the keys path , and nodeType . Together they contain an instruction to produce a feature value from element content that can be found on the current stack of XML nodes and attributes. The value found will be put in the feature in question for the node of type specified in nodeType that is recently constructed. Example: yaml extra: letterid: meta: description: The identifier of a letter; \u00abbase\u00bb valueType: str conversionMethod: derived conversionCode: tt path: - idno: type: letterId - altIdentifier - msIdentifier - msDesc - sourceDesc nodeType: letter feature: letterid The meaning is: if, while parsing the XML, I encounter an element idno , and if that element has an attribute type with value letterId , and if it has parent altIdentifier , and grandparent msIdentifier , and great-grandparent msDesc , and great-great-grandparent sourceDesc , then look up the last created node of type letter and get the text content of the current XML node (the idno one), and put it in the feature letterid for that node. Moreover, the feature letterid gets metadata as specified under the key meta , where the description will be filled with the text The identifier of a letter; the content is taken from sourceDesc/msDesc/msIdentifier/altIdentifier/idno[type=letterId] models list, optional [] Which TEI-based schemas are to be used. For each model there should be an XSD or RNG file with that name in the schema directory. The tei_all schema is known to TF, no need to specify that one. We'll try a RelaxNG schema ( .rng ) first. If that exists, we use it for validation with JING, and we also convert it with TRANG to an XSD schema, which we use for analysing the schema: we want to know which elements are mixed and pure. If there is no RelaxNG schema, we try an XSD schema ( .xsd ). If that exists, we can do the analysis, and we will use it also for validation. ! note \"Problems with RelaxNG validation\" RelaxNG validation is not always reliable when performed with LXML, or any tool based on libxml , for that matter. That's why we try to avoid it. Even if we translate the RelaxNG schema to an XSD schema by means of TRANG, the resulting validation is not always reliable. So we use JING to validate the RelaxNG schema. See also [JING-TRANG](https: code.google.com/archive/p/jing-trang/downloads). templates list, optional [] Which template(s) are to be used. A template is just a keyword, associated with an XML file, that can be used to switch to a specific kind of processing, such as letter , bibliolist , artworklist . You may specify an element or processing instruction with an attribute that triggers the template for the file in which it is found. This will be retrieved from the file before XML parsing starts. For example, python templateTrigger=\"?editem@template\" will read the file and extract the value of the template attribute of the editem processing instruction and use that as the template for this file. If no template is found in this way, the empty template is assumed. adaptations list, optional [] Which adaptations(s) are to be used. An adaptation is just a keyword, associated with an XML file, that can be used to switch to a specific kind of processing. It is meant to trigger tweaks on top of the behaviour of a template. You may specify an element or processing instruction with an attribute that triggers the adaptation for the file in which it is found. This will be retrieved from the file before XML parsing starts. For example, python adaptationTrigger=\"?editem@adaptation\" will read the file and extract the value of the adaptation attribute of the editem processing instruction and use that as the adaptation for this file. If no adaptation is found in this way, the empty adaptation is assumed. prelim boolean, optional True Whether to work with the pre TF versions. Use this if you convert TEI to a preliminary TF dataset, which will receive NLP additions later on. That version will then lose the pre . granularity string, optional token What to take the basic entities (slots). Possible values: word : words are slots, even if they cross element boundaries. This leads to some imprecisions: words containing an element boundary will belong to just one of both elements around the boundary. char : all individual characters are separate slots. Very precise, but the dataset gets expensive with so many slots. token : every sequence of alphanumeric characters becomes a token, in sofar there is no intervening markup. Non alphanumeric characters become separate tokens. There are some additional rules: . or , tightly surrounded by digits also count as tokens. The datasets with granularity word and token have features str for the string content of the slots, and after for the material after the slots. In the case of word , the feature after can contain whitespace and punctuation characters, in the case of token , it only contains whitespace. If not, the characters are taken as basic entities. If you use an NLP pipeline to detect tokens, use the value False . The preliminary dataset is then based on characters, but the final dataset that we build from there is based on tokens, which are mostly words and non-word characters. parentEdges boolean, optional True Whether to create edges between nodes that correspond to XML elements and their parents. siblingEdges boolean, optional False Whether to create edges between nodes that correspond to XML elements and siblings. Edges will be created between each sibling and its preceding siblings. If you use these edges in the binary way, you can also find the following siblings. The edges are labeled with the distance between the siblings, adjacent siblings get distance 1. ! caution \"Overwhelming space requirement\" If the corpus is divided into relatively few elements that each have very many direct children, the number of sibling edges is comparable to the size of the corpus squared. That means that the TF dataset will consist for 50-99% of sibling edges! An example is [ ETCBC/nestle1904 ](https: github.com/ETCBC/nestle1904) (Greek New Testament) where each book element has all of its sentences as direct children. In that dataset, the siblings would occupy 40% of the size, and we have taken care not to produce sibling edges for sentences. procins boolean, optional False If True, processing instructions will be treated. Processing instruction will be converted as if it were an empty element named foo with attribute bar with value xxx . lineModel dict, optional False If not passed, or an empty dict, line model I is assumed. A line model must be specified with the parameters relevant for the model: python dict( model=\"I\", ) (model I does not require any parameters) or python dict( model=\"II\", element=\"p\", nodeType=\"ln\", ) For model II, the default parameters are: python element=\"p\", nodeType=\"ln\", Model I is the default, and nothing special happens to the elements. In model II the elements translate to nodes of type ln , which span content, whereas the original lb elements just mark positions. Instead of ln , you can also specify another node type by the parameter element . We assume that the material that the elements divide up is the material that corresponds to their parent element. Instead of , you can also specify another element in the parameter element . We assume that lines start and end at the start and end of the elements and the elements. For the material etween these boundaries, we build ln nodes. If an element follows a start tag without intervening slots, a ln node will be created but not linked to slots, and it will be deleted later in the conversion. Likewise, if an element is followed by a end tag without intervening slots, a ln node is created that is not linked to slots. The attributes of the elements become features of the ln node that starts with that element. If there is no explicit element at the start of a paragraph, the first ln node of that paragraph gets no features. pageModel dict, optional False If not passed, or an empty dict, page model I is assumed. A page model must be specified with the parameters relevant for the model: python dict( model=\"I\", ) (model I does not require any parameters) or python dict( model=\"II\", element=\"div\", attributes=dict(type=[\"original\", \"translation\"]), pbAtTop=True, nodeType=\"page\", ) For model II, the default parameters are: python element=\"div\", pbAtTop=True, nodeType=\"page\", attributes={}, Model I is the default, and nothing special happens to the elements. In model II the elements translate to nodes of type page , which span content, whereas the original pb elements just mark positions. Instead of page , you can also specify another node type by the parameter element . We assume that the material that the elements divide up is the material that corresponds to their parent element. Instead of , you can also specify another element in the parameter element . If you want to restrict the parent elements of pages, you can do so by specifying attributes, like type=\"original\" . Then only parents that carry those attributes will be chopped up into pages. You can specify multiple values for each attribute. Elements that carry one of these values are candidates for having their content divided into pages. We assume that the material to be divided starts with a (as the TEI-guidelines prescribe) and we translate it to a page element that we close either at the next or at the end of the div . But if you specify pbAtTop=False , we assume that the marks the end of the corresponding page element. We start the first page at the start of the enclosing element. If there is material at between the last till the end of the enclosing element, we generate an extra page node without features. sectionModel dict, optional {} If not passed, or an empty dict, section model I is assumed. A section model must be specified with the parameters relevant for the model: python dict( model=\"II\", levels=[\"chapter\", \"chunk\"], element=\"head\", attributes=dict(rend=\"h3\"), ) (model I does not require the element and attribute parameters) or python dict( model=\"I\", levels=[\"folder\", \"file\", \"chunk\"], ) This section model (I) accepts a few other parameters: python backMatter=\"backmatter\" This is the name of the folder that should not be treated as an ordinary folder, but as the folder with the sources for the back-matter, such as references, lists, indices, bibliography, biographies, etc. python drillDownDivs=True Whether the chunks are the immediate children of body elements, or whether we should drill through all intervening div levels. For model II, the default parameters are: python element=\"head\" levels=[\"chapter\", \"chunk\"], attributes={} In model I, there are three section levels in total. The corpus is divided in folders (section level 1), files (section level 2), and chunks within files. The parameter levels allows you to choose names for the node types of these section levels. In model II, there are 2 section levels in total. The corpus consists of a single file, and section nodes will be added for nodes at various levels, mainly outermost and elements and their siblings of other element types. The section heading for the second level is taken from elements in the neighbourhood, whose name is given in the parameter element , but only if they carry some attributes, which can be specified in the attributes parameter. Usage Command-line sh tf-fromtei tasks flags From Python python from tf.convert.tei import TEI T = TEI() T.task( tasks, flags) For a short overview the tasks and flags, see HELP . Tasks We have the following conversion tasks: 1. check : makes and inventory of all XML elements and attributes used. 1. convert : produces actual TF files by converting XML files. 1. load : loads the generated TF for the first time, by which the pre-computation step is triggered. During pre-computation some checks are performed. Once this has succeeded, we have a workable TF dataset. 1. app : creates or updates a corpus specific TF app with minimal sensible settings, plus basic documentation. 1. apptoken : updates a corpus specific TF app from a character-based dataset to a token-based dataset. 1. browse : starts the TF browser on the newly created dataset. Tasks can be run by passing any choice of task keywords to the TEI.task() method. Note on versions The TEI source files come in versions, indicated with a data. The converter picks the most recent one, unless you specify an other one: python tf-from-tei tei=-2 previous version tf-from-tei tei=0 first version tf-from-tei tei=3 third version tf-from-tei tei=2019-12-23 explicit version The resulting TF data is independently versioned, like 1.2.3 or 1.2.3pre . When the converter runs, by default it overwrites the most recent version, unless you specify another one. It looks at the latest version and then bumps a part of the version number. python tf-fromtei tf=3 minor version, 1.2.3 becomes 1.2.4; 1.2.3pre becomes 1.2.4pre tf-fromtei tf=2 intermediate version, 1.2.3 becomes 1.3.0 tf-fromtei tf=1 major version, 1.2.3 becomes 2.0.0 tf-fromtei tf=1.8.3 explicit version Examples Exactly how you can call the methods of this module is demonstrated in the small corpus of 14 letter by the Dutch artist Piet Mondriaan. [Mondriaan](https: nbviewer.org/github/annotation/mondriaan/blob/master/programs/convertExpress.ipynb)." }, { "ref":"tf.convert.tei.makeCssInfo", -"url":137, +"url":138, "doc":"Make the CSS info for the style sheet.", "func":1 }, { "ref":"tf.convert.tei.getRefs", -"url":137, +"url":138, "doc":"", "func":1 }, { "ref":"tf.convert.tei.TEI", -"url":137, +"url":138, "doc":"Converts TEI to TF. For documentation of the resulting encoding, read the [transcription template](https: github.com/annotation/text-fabric/blob/master/tf/convert/app/transcription.md). Below we describe how to control the conversion machinery. We adopt a fair bit of \"convention over configuration\" here, in order to lessen the burden for the user of specifying so many details. Based on current directory from where the script is called, it defines all the ingredients to carry out a tf.convert.walker conversion of the TEI input. This function is assumed to work in the context of a repository, i.e. a directory on your computer relative to which the input directory exists, and various output directories: tf , app , docs . Your current directory must be at ~/backend/org/repo/relative where ~ is your home directory; backend is an online back-end name, like github , gitlab , git.huc.knaw.nl ; org is an organization, person, or group in the back-end; repo is a repository in the org . relative is a directory path within the repo (0 or more components) This is only about the directory structure on your local computer; it is not required that you have online incarnations of your repository in that back-end. Even your local repository does not have to be a git repository. The only thing that matters is that the full path to your repo can be parsed as a sequence of home/backend/org/repo/relative . Relative to this directory the program expects and creates input / output directories. Input directories tei Location of the TEI-XML sources. If it does not exist, the program aborts with an error. Several levels of subdirectories are assumed: 1. the version of the source (this could be a date string). 1. volumes / collections of documents. The subdirectory __ignore__ is ignored. 1. the TEI documents themselves, conforming to the TEI schema or some customization of it. schema TEI or other XML schemas against which the sources can be validated. They should be XSD or RNG files. ! note \"Multiple XSD files\" When you started with a RNG file and used tf.tools.xmlschema to convert it to XSD, you may have got multiple XSD files. One of them has the same base name as the original RNG file, and you should pass that name. It will import the remaining XSD files, so do not throw them away. We use these files as custom TEI schemas, but to be sure, we still analyse the full TEI schema and use the schemas here as a set of overriding element definitions. Output directories report Directory to write the results of the check task to: an inventory of elements / attributes encountered, and possible validation errors. If the directory does not exist, it will be created. The default value is . (i.e. the current directory in which the script is invoked). tf The directory under which the TF output file (with extension .tf ) are placed. If it does not exist, it will be created. The TF files will be generated in a folder named by a version number, passed as tfVersion . app and docs Location of additional TF app configuration and documentation files. If they do not exist, they will be created with some sensible default settings and generated documentation. These settings can be overridden in the app/config_custom.yaml file. Also a default display.css file and a logo are added. Custom content for these files can be provided in files with _custom appended to their base name. docs Location of additional documentation. This can be generated or hand-written material, or a mixture of the two. Parameters tei: string, optional If empty, use the latest version under the tei directory with sources. Otherwise it should be a valid integer, and it is the index in the sorted list of versions there. 0 or latest : latest version; -1 , -2 , . : previous version, version before previous, .; 1 , 2 , .: first version, second version, everything else that is not a number is an explicit version If the value cannot be parsed as an integer, it is used as the exact version name. tf: string, optional If empty, the TF version used will be the latest one under the tf directory. If the parameter prelim was used in the initialization of the TEI object, only versions ending in pre will be taken into account. If it can be parsed as the integers 1, 2, or 3 it will bump the latest relevant TF version: 0 or latest : overwrite the latest version 1 will bump the major version 2 will bump the intermediate version 3 will bump the minor version everything else is an explicit version Otherwise, the value is taken as the exact version name. verbose: integer, optional -1 Produce no (-1), some (0) or many (1) progress and reporting messages" }, { "ref":"tf.convert.tei.TEI.readSchemas", -"url":137, +"url":138, "doc":"", "func":1 }, { "ref":"tf.convert.tei.TEI.getSwitches", -"url":137, +"url":138, "doc":"", "func":1 }, { "ref":"tf.convert.tei.TEI.getParser", -"url":137, +"url":138, "doc":"Configure the LXML parser. See [parser options](https: lxml.de/parsing.html parser-options). Returns - object A configured LXML parse object.", "func":1 }, { "ref":"tf.convert.tei.TEI.getXML", -"url":137, +"url":138, "doc":"Make an inventory of the TEI source files. Returns - tuple of tuple | string If section model I is in force: The outer tuple has sorted entries corresponding to folders under the TEI input directory. Each such entry consists of the folder name and an inner tuple that contains the file names in that folder, sorted. If section model II is in force: It is the name of the single XML file.", "func":1 }, { "ref":"tf.convert.tei.TEI.checkTask", -"url":137, +"url":138, "doc":"Implementation of the \"check\" task. It validates the TEI, but only if a schema file has been passed explicitly when constructing the TEI() object. Then it makes an inventory of all elements and attributes in the TEI files. If tags are used in multiple namespaces, it will be reported. ! caution \"Conflation of namespaces\" The TEI to TF conversion does construct node types and attributes without taking namespaces into account. However, the parsing process is namespace aware. The inventory lists all elements and attributes, and many attribute values. But is represents any digit with n , and some attributes that contain ids or keywords, are reduced to the value x . This information reduction helps to get a clear overview. It writes reports to the reportPath : errors.txt : validation errors elements.txt : element / attribute inventory.", "func":1 }, { "ref":"tf.convert.tei.TEI.getConverter", -"url":137, +"url":138, "doc":"Initializes a converter. Returns - object The tf.convert.walker.CV converter object, initialized.", "func":1 }, { "ref":"tf.convert.tei.TEI.getDirector", -"url":137, +"url":138, "doc":"Factory for the director function. The tf.convert.walker relies on a corpus dependent director function that walks through the source data and spits out actions that produces the TF dataset. The director function that walks through the TEI input must be conditioned by the properties defined in the TEI schema and the customised schema, if any, that describes the source. Also some special additions need to be programmed, such as an extra section level, word boundaries, etc. We collect all needed data, store it, and define a local director function that has access to this data. Returns - function The local director function that has been constructed.", "func":1 }, { "ref":"tf.convert.tei.TEI.convertTask", -"url":137, +"url":138, "doc":"Implementation of the \"convert\" task. It sets up the tf.convert.walker machinery and runs it. Returns - boolean Whether the conversion was successful.", "func":1 }, { "ref":"tf.convert.tei.TEI.loadTask", -"url":137, +"url":138, "doc":"Implementation of the \"load\" task. It loads the TF data that resides in the directory where the \"convert\" task deliver its results. During loading there are additional checks. If they succeed, we have evidence that we have a valid TF dataset. Also, during the first load intensive pre-computation of TF data takes place, the results of which will be cached in the invisible .tf directory there. That makes the TF data ready to be loaded fast, next time it is needed. Returns - boolean Whether the loading was successful.", "func":1 }, { "ref":"tf.convert.tei.TEI.appTask", -"url":137, +"url":138, "doc":"Implementation of the \"app\" task. It creates / updates a corpus-specific app plus specific documentation files. There should be a valid TF dataset in place, because some settings in the app derive from it. It will also read custom additions that are present in the target app directory. These files are: about_custom.md : A markdown file with specific colophon information about the dataset. In the generated file, this information will be put at the start. transcription_custom.md : A markdown file with specific encoding information about the dataset. In the generated file, this information will be put at the start. config_custom.yaml : A YAML file with configuration data that will be merged into the generated config.yaml. app_custom.py : A python file with named snippets of code to be inserted at corresponding places in the generated app.py display_custom.css : Additional CSS definitions that will be appended to the generated display.css . If the TF app for this resource needs custom code, this is the way to retain that code between automatic generation of files. Returns - boolean Whether the operation was successful.", "func":1 }, { "ref":"tf.convert.tei.TEI.browseTask", -"url":137, +"url":138, "doc":"Implementation of the \"browse\" task. It gives a shell command to start the TF browser on the newly created corpus. There should be a valid TF dataset and app configuration in place Returns - boolean Whether the operation was successful.", "func":1 }, { "ref":"tf.convert.tei.TEI.task", -"url":137, +"url":138, "doc":"Carry out any task, possibly modified by any flag. This is a higher level function that can execute a selection of tasks. The tasks will be executed in a fixed order: check , convert , load , app , apptoken , browse . But you can select which one(s) must be executed. If multiple tasks must be executed and one fails, the subsequent tasks will not be executed. Parameters check: boolean, optional False Whether to carry out the check task. convert: boolean, optional False Whether to carry out the convert task. load: boolean, optional False Whether to carry out the load task. app: boolean, optional False Whether to carry out the app task. apptoken: boolean, optional False Whether to carry out the apptoken task. browse: boolean, optional False Whether to carry out the browse task\" verbose: integer, optional -1 Produce no (-1), some (0) or many (1) progress and reporting messages validate: boolean, optional True Whether to perform XML validation during the check task Returns - boolean Whether all tasks have executed successfully.", "func":1 }, @@ -7240,351 +7293,351 @@ INDEX=[ }, { "ref":"tf.convert.tei.main", -"url":137, +"url":138, "doc":"", "func":1 }, { "ref":"tf.convert.watm", -"url":138, +"url":139, "doc":"Export to Web Annotation Text Model The situation This module can export a TF corpus to WATM (Web Annotation Text Model), which is the input format of the suite of systems developed by Team Text for serving text plus annotations over the web. If we can convert TF corpora to WATM, then we have an avenue to the [KNAW/HuC/DI/Team-Text](https: di.huc.knaw.nl/text-analysis-en.html) web publishing machinery. Given the fact that TF can already convert TEI and PageXML corpora, this completes a pipeline from source to publication. We have done this for the following corpora: [ mondriaan/letters ](https: github.com/annotation/mondriaan) [ translatin/corpus ](https: gitlab.huc.knaw.nl/translatin/corpus) [ suriano/letters ](https: gitlab.huc.knaw.nl/suriano/letters) All these corpora need distinct preprocessing steps before they are \"canalized\" into TF, see the illustration below. ![confluence]( /images/text-confluence.jpg) At the same time, [Maarten van Gompel](https: github.com/proycon) is also making pipelines to the Team-Text publishing street. He uses his [STAM](https: github.com/annotation/stam) software to build a [pipeline](https: github.com/knaw-huc/brieven-van-hooft-pipeline/blob/main/README.md) from a corpus of letters by P.C. Hooft in Folia format to text segments and web annotations. Excursion: STAM ![stam]( /images/stam.png) We now have two sytems, [STAM](https: github.com/annotation/stam) and Text-Fabric that can untangle text and markup. They are implemented very differently, and have a different flavour, but at the same time they share the preference of separating the textual data from all the data around the text. intent STAM : make it easier for tools and infrastructure to handle texts with annotations. TF : support researchers in analysing textual corpora. implementation STAM : Rust + [Python bindings](https: github.com/annotation/stam-python). TF : Pure Python. organization STAM : very neatly in a core with extensions. TF : core data functionality in tf.core modules, search functionality in tf.search modules, lots of other functions are included in the code with varying degrees of integration and orderliness! standards STAM : actively seeks to interoperate with existing standards, but internally it uses its own way of organizing the data. TF : also relies on a few simple conventions w.r.t. data organization and efficient serialization. These conventions are documented. It has several import and export functions, e.g. from TEI, PageXML, MQL, and to MQL, TSV. But it prefers to input and output data in minimalistic streams, without the often redundant strings that are attached to standard formats. model STAM : very generic w.r.t. annotations, annotations can target annotations and /or text segments. TF : [graph model](https: annotation.github.io/text-fabric/tf/about/datamodel.html) where nodes stand for textual positions and subsets of them, nodes and edges can have features, which are the raw material of annotations, but annotations are not a TF concept. query language STAM : [STAMQL](https: github.com/annotation/stam/tree/master/extensions/stam-query), evolving as an SQL-like language, with user-friendly primitives for annotations. TF : [TF-Query](https: annotation.github.io/text-fabric/tf/about/searchusage.html), a noise-free language for graph templates with reasonable performance. display STAM : In development, see stam view in [STAM tools](https: github.com/annotation/stam-tools). TF : Powerful functions to display corpus fragments with highlighting in tf.advanced . The challenge is to build generic display functions that detect the peculiarities of the corpora. API STAM : in Rust and Python. TF : Python. GUI STAM : not yet. TF : locally served web interface for browsing and searching the corpus. Both libraries can be used to manage corpus data in intricate ways for research and publishing purposes. How STAM and Text-Fabric will evolve in the dynamic landscape of corpora, analytical methods and AI, is something we cannot predict. For now, their different flavour and intent will define their appeal to the different categories of users. The general idea The idea of WATM is, like the idea of Text-Fabric, to untangle the text from its markup. Everything outside the text itself is coded in annotations. Annotations look a lot like TF features, but they are a bit more general. Annotations can also annotate annotations, not only pieces of text. We need this extra generality, because unlike TF, WATM does not have a concept of node. The only parallel are the slot nodes of TF, which corresponds to the tokens of the text in WATM. Every node in TF is linked to a set of slot nodes. As such it can be mapped to an annotation to the corresponding tokens. Features of such nodes can be mapped to annotations on annotations. TF also has edges. These can be mapped to WATM annotations whose targets are pairs: one for the thing the edge is from , and one for the thing the edge is to . These things are typical annotations that correspond to TF nodes, since TF edges are links between TF nodes. If the TF dataset itself is the result of converting an XML file (e.g TEI or PageXML), then there is a further correspondence between the XML and the TF: elements translate into nodes; element tags translate into node types; attributes translate into features; values of attributes translate into values of features. In our terminology below we assume that the TF data comes from XML files, but this is not essential. Whenever we talk about elements and tags , you may read nodes and node types if the TF dataset does not have an XML precursor. Likewise, for attributes you may read features . The specifics We generate tokens and annotations out of a TF dataset. Here is what we deliver and in what form. The files are either .tsv or .json , dependent on the configuration setting asTsv in the watm.yaml file in the project. a bunch of files text-0. ext , text-1. ext : containing a list of tokenlike segments; Each file corresponds with a section in the TF dataset; the level of the sections that correspond with these files is given in the watm.yaml config file, under the key textRepoLevel . It can have the values 1 (top level), 2 , and 3 (lower levels). a bunch of files anno-1. ext , anno-2. ext , .: all generated annotations; We pack at most 400,000 annotations in one file, that keeps their size below 50MB, so that they still can live in a git directory without large file support. The numbering in the anno- i . ext files it independent of the numbering in the text- i .json files! a pair of files anno2node.tsv and pos2node.tsv that map annotations resp. text positions to their corresponding TF nodes. Format of the text files A text-i.json is a JSON file with the following structure: { \"_ordered_segments\": [ \"token1 \", \"token2 \", . ] } These tokens may contain newlines and tabs. A text-i.tsv is a TSV file with the following structure: token token1 token2 . The first line is a header line with fixed content: token . Newlines and tabs must be escaped in TSV files. We do that by \\n and \\t . each token1 , token2 , . corresponds to one token; the item contains the text of the token plus the subsequent whitespace, if any; if the corpus is converted from TEI, we skip all material inside the TEI-header. Tokens Tokens correspond to the slot nodes in the TF dataset. Depending on the original format of the corpus we have the following specifics. TEI corpora The base type is t , the atomic token. Atomic tokens are tokens as they come from some NLP processing, except when tokens contain element boundaries. In those cases tokens are split in fragments between the element boundaries. It is guaranteed that a text segment that corresponds to a t does not contain element boundaries. The original, unsplit tokens are also present in the annotations, they have type token . Tokens have the attributes str and after , both may be empty. PageXML corpora The base type is token , it is available without NLP processing. Tokens have the attributes str and after , both may be empty. They may also have the attributes rstr and rafter . str is the logical string value of a token, after is empty or a space: what comes after the token before the next token. rstr is the raw string value of a token, when it deviates from the logical value , otherwise no value. rafter analogously. Example token | 1 | 2 | 3 | 4 | 5 - | - | - | - | - | - rstr | empty | efflagitan | \u00ac | do | empty str | improb\u00e8 | efflagitando | empty | empty | tandem Format of the annotation files The anno-1.json file is a JSON file with the following structure: { \"a000001\": [ \"element\", \"tei\", \"p\", \"0:10-60\" ], \"a000002\": [ \"element\", \"tei\", \"p\", \"0:60-70\" ], . } A anno-i.tsv is a TSV file with the following structure: annoid kind namespace body target a000001 element tei p 0:10-60 a000002 element tei p 0:60-70 . The first line is a header line with fixed content: de field names separeted by tabs. Newlines and tabs must be escaped in TSV files. We do that by \\n and \\t . It only has to be done for the body field. When reading these lines, it is best to collect the information in a dict, keyed by the annoid , whose values are lists of the remaining fields, just as in the JSON. You get a big dictionary, keyed by annotation ids and each value is the data of an annotation, divided in the following fields: kind : the kind of annotation: element : targets the text location where an element occurs, the body is the element name; pi : targets the text location where a processing instruction occurs, the body is the target of the pi ; attribute : targets an annotation (an element or pi ), the body has the shape name = value , the name and value of the attribute in question; edge : targets two node annotations, the body has the shape name or name = value , where name is the name of the edge and value is the label of the edge if the edge has a label; format : targets an individual token, the body is a formatting property for that token, all tokens in note elements get a format annotation with body note ; anno : targets an arbitrary annotation or text range, body has an arbitrary value; can be used for extra annotations, e.g. in the Mondriaan corpus to provide an URL to an artwork derived from an element. namespace : the namespace of the annotation; an indicator where the information comes from. Possible values: pagexml : annotation comes from the PageXML, possibly indirectly, e.g. h , w , x , y tei : annotation comes [literally](https: annotation.github.io/text-fabric/tf/convert/helpers.html tf.convert.helpers.CM_LIT) from the TEI guidelines or the PageXML specification, or is [processed](https: annotation.github.io/text-fabric/tf/convert/helpers.html tf.convert.helpers.CM_LITP) straightforwardly from it; tf : annotation is [composed](https: annotation.github.io/text-fabric/tf/convert/helpers.html tf.convert.helpers.CM_LITC) in a more intricate way from the original source or even [added](https: annotation.github.io/text-fabric/tf/convert/helpers.html tf.convert.helpers.CM_PROV) to it; nlp : annotation is generated as a result of [NLP processing](https: annotation.github.io/text-fabric/tf/convert/helpers.html tf.convert.helpers.CM_NLP); tt : annotation is derived from other material in the source for the benefit of the Team Text infrastructure. Defined in the watm.yaml file next to this program. Currently used for annotations that derive from project specific requirements. body : the body of an annotation (probably the kind and body fields together will make up the body of the resulting web annotation); target : a string specifying the target of the annotation, of the following kinds: single this is a target pointing to a single thing, either: fn:bbb : a single token fn:bbb-eee : a range of text segments in the _ordered_segments in the file text-fn.json ; the token at position eee is not included. It is guaranteed that bbb ttt where fff is a \"from\" target and ttt is a \"to\" target; both targets can vary independently between a range and an annotation id. N.B. It is allowed that fff and ttt target segments in distinct text-i.json files. In this case, it is not implied that the intermediate tokens are part of the target, because this target conveys the information that the body of the annotation is a property of the pair (fff, ttt) . If fff and ttt target segments, than they must both contain a file specifier, even if both target a segment in the same token file. Configuration In the file config.yaml (in the directory where the program runs) certain parameters can be set: textRepoLevel : the TF section level for which individual textRepo json files will be made. Default: 1 : the top level. Other possible values: 2 and 3 (lower levels). Only the special TF section levels can be specified, not arbitrary node types. Because we must guarantee that all tokens in the corpus fall under one of the nodes belonging to this section level. excludeElements : the names of elements for which no annotations will be generated. All node and edge features that target those elements will be filtered, so that there are no annotations that target non-existing annotations. asTsv : the text and anno files are written as tsv instead of json. The text files consist of one token per line. The newline token is written as . The anno files are written as one anno per line. The tab separated fields are anno id , kind , namespace , body , target . Any tab or newline in the body must be written as resp. . The tsv files will have exactly one header line. Caveat The WATM representation of the corpus is a faithful and complete representation of the TF dataset and hence of the TEI/PageXML source from which the TF dataset has been converted. Well, don't take this too literally, probably there are aspects where the different representations differ. I am aware of the following: If the TF has nodes whose slots are not an interval, the WATM will smooth that over: the target of those nodes will be the complete interval from its first slot to its last slot, including the gaps. The program will show warnings when this happens. Cases where this can happen are instances of text-critical elements in the TEI, where variant readings are given. When we construct sentences by means of NLP, we will exclude the non-chosen readings from the sentence, but these occupy slots between the start and the end of the sentence. Other cases occur where tokens, coming from the NLP, have been split because of intervening elements, which may leave an empty token. In such cases, the fragments of the original token are what ends up as tokens in the output, and they have the node type t , and not token . The TEI to TF conversion has lost the exact embedding of elements in the following case: Suppose element A contains the same words as element B. Then the TF data does not know whether A is a child of B or the other way round. This is repairable by adding parenthood edges between nodes when constructing the TF data. We should then also convert these TF edges to WATM annotations, for which we need structured targets: If n is the parent of m , we must make an annotation with body \"parent\" and target [n, m] . Something similar holds for the sibling relationship: if two nodes are adjacent in a TF dataset, we do not know whether they are siblings elements in the original XML. It is also possible to add sibling edges to the TF dataset. See tf.convert.tei under parentEdges and siblingEdges . The TF to WATM conversion forgets the types of feature values: it does not make a distinction between the integer 1 and the string \"1\" . This is repairable by creating annotations with structured bodies like {\"att\": value} instead of strings like att=value as we do now. In practice, the meaning of the features in TF are known, and hence the attributes in the WATM data, so this is not a blocking problem for now. The excludeElements setting will prevent some TF information from reaching the WATM." }, { "ref":"tf.convert.watm.rep", -"url":138, +"url":139, "doc":"Represent a boolean status for a message to the console. Parameters status: boolean Returns - string", "func":1 }, { "ref":"tf.convert.watm.WATM", -"url":138, +"url":139, "doc":"The export machinery is exposed as a class, wrapped around a TF dataset. Wrap the WATM exporter around a TF dataset. Given an already loaded TF dataset, we make an inventory of all data we need to perform an export to WATM. Parameters app: object A loaded TF dataset, as obtained by a call use( .) . See tf.app.use nsOrig: string A namespace corresponding to the format of the original, pre-Text-Fabric representation. For example tei for a TEI corpus, pagexml for a PageXML corpus. The namespace is not related to XML namespaces, it is merely a device to categorize the resulting annotations. skipMeta: boolean, optional False Only relevant for TEI corpora. If True, all material in the TEI Header will not be converted to tokens in the text. More precisely: all TF slots for which the feature is_meta has a true-ish value will be skipped. If there is no feature is_meta in the dataset, the setting of skipMeta will have no effect: nothing will be excluded. extra: dictionary, optional {} The data for extra annotations, which will be generated on the fly under the namespace anno . The keys are the names of features/attributes, the value for each key is a dictionary that maps nodes to values. silent: boolean, optional False Whether to suppress output to the console" }, { "ref":"tf.convert.watm.WATM.makeText", -"url":138, +"url":139, "doc":"Creates the text data. The text is a list of tokens and will be stored in member text in this object. Additionally, the mapping from slot numbers in the TF data to indices in this list is stored in member waFromTF .", "func":1 }, { "ref":"tf.convert.watm.WATM.mkAnno", -"url":138, +"url":139, "doc":"Make a single annotation and return its id. Parameters kind: string The kind of annotation. ns: string The namespace of the annotation. body: string The body of the annotation. target: string or tuple of strings The target of the annotation.", "func":1 }, { "ref":"tf.convert.watm.WATM.makeAnno", -"url":138, +"url":139, "doc":"Make all annotations. The annotations are stored in a big list, in member anno of this object. The mapping from slots to indices in the list of tokens is now extended with the mapping from nodes to corresponding node annotations. So member waFromTF is now a full mapping from all nodes in TF to tokens and/or annotations in WATM.", "func":1 }, { "ref":"tf.convert.watm.WATM.writeAll", -"url":138, +"url":139, "doc":"Write text and annotation data to disk. The data will be written as JSON files, or, is asTsv is in force, as TSV files. When the annotation data grows larger than a certain threshold, it will be divided over several files. The annotations are sorted by annotation id.", "func":1 }, { "ref":"tf.convert.watm.WATM.numEqual", -"url":138, +"url":139, "doc":"Compare two numbers and report the outcome. Used for testing the WATM conversion. Parameters nTF: integer The number as it is counted from the original TF dataset. nWA: integer The number as it is counted from the generated WATM dataset. Returns - boolean Whether the two values are equal.", "func":1 }, { "ref":"tf.convert.watm.WATM.strEqual", -"url":138, +"url":139, "doc":"Compare two strings and report the outcome. Used for testing the WATM conversion. Parameters nTF: string The string as encountered in the original TF dataset. nWA: string The string as encountered in the generated WATM dataset. Returns - boolean Whether the two values are equal.", "func":1 }, { "ref":"tf.convert.watm.WATM.testAll", -"url":138, +"url":139, "doc":"Test all aspects of the WATM conversion. For all kinds of information, such as nodes, edges, features, tokens, annotations, we check whether the parts that should correspond between the TF dataset and the WATM annotations do so indeed. We present some statistics, and highlight the mismatches. Parameters condensed: boolean, optional False If silent has been passed to the object, there is still some output for each corpus, namely whether all tests have passed. If condensed is True, we suppress this output. Returns - boolean Whether all things that must agree do indeed agree.", "func":1 }, { "ref":"tf.convert.watm.WATM.testSetup", -"url":138, +"url":139, "doc":"Prepare the tests. We read the WATM dataset and store the tokens in member testTokens and the annotations in the member testAnnotations , and the node mapping in the member nodeFromAid . We unpack targets if they contain structured information.", "func":1 }, { "ref":"tf.convert.watm.WATM.testText", -"url":138, +"url":139, "doc":"Test the text. We test the number of tokens and the equality of the resulting text: whether the TF and WATM datasets agree on it. Returns - boolean Whether all these tests succeed.", "func":1 }, { "ref":"tf.convert.watm.WATM.testElements", -"url":138, +"url":139, "doc":"Test the elements. We test the annotations representing elements/processing instructions and check whether they correspond 1-1 to the non-slot nodes in the TF dataset. Returns - boolean Whether all these tests succeed.", "func":1 }, { "ref":"tf.convert.watm.WATM.testAttributes", -"url":138, +"url":139, "doc":"Test the attributes. We test whether attributes and features correspond to each other. Some attributes in the original TEI are converted in a special way into TF features: this holds for the rend attribute. Basically, a value rend=\"italic\" is translated into feature is_italic=1 . In turn, these features have been translated into annotations of kind format . We test them separately. Returns - boolean Whether all these tests succeed.", "func":1 }, { "ref":"tf.convert.watm.WATM.testExtra", -"url":138, +"url":139, "doc":"Test the extra data for on-the-fly annotations. Annotations that have been generated out of the data stored in the extra parameter with which the object has been initialized, all got the kind anno . Now we check these annotations against the data that went into it. Returns - boolean Whether all these tests succeed.", "func":1 }, { "ref":"tf.convert.watm.WATM.testEdges", -"url":138, +"url":139, "doc":"Test the edges. Edges in TF are links between nodes, and they translate into annotations of kind edge which target a pair of annotations: the from annotation, and the to annotation. Here we check whether the TF edges are faithfully and completely parallelled by annotations. Returns - boolean Whether all these tests succeed.", "func":1 }, { "ref":"tf.convert.watm.WATMS", -"url":138, +"url":139, "doc":"Export corpora that are divided over multiple TF datasets. We set up and run WATM objects for each TF dataset, and generate results for them separately. We assume that all corpora have been generated by the same method and originate from the same original format. They must reside in the same repository, in adjacent directories under the tf top-level directory of the repo. Collect the parameters for the WATM machinery. We will initialize many WATM objects with mostly the same parameters. These are collected when we initialize this object. Parameters org: string The organization of all TF datasets. repo: string The repo of all TF datasets. backend: string The backend of all TF datasets. nsOrig: string The original namespace of all TF datasets. See tf.convert.watm.WATM . skipMeta: boolean, optional False See tf.convert.watm.WATM . extra: dictionary, optional {} See tf.convert.watm.WATM . silent: boolean, optional False Whether to operate in silence." }, { "ref":"tf.convert.watm.WATMS.produce", -"url":138, +"url":139, "doc":"Convert all relevant TF datasets. Parameters doc: string, optional None Subdirectory where one of the TF datasets resides. If passed, only this dataset will be converted. Otherwise all datasets will be converted.", "func":1 }, { "ref":"tf.convert.mql", -"url":139, +"url":140, "doc":" MQL You can interchange with [MQL data](https: emdros.org). TF can read and write MQL dumps. An MQL dump is a text file, like an SQL dump. It contains the instructions to create and fill a complete database. Correspondence TF and MQL After exporting a TF dataset to MQL, the resulting MQL database has the following properties with respect to the TF dataset it comes from: the TF slots correspond exactly with the MQL monads and have the same numbers; provided the monad numbers in the MQL dump are consecutive. In MQL this is not obligatory. Even if there gaps in the monads sequence, we will fill the holes during conversion, so the slots are tightly consecutive; the TF nodes correspond exactly with the MQL objects and have the same numbers Node features in MQL The values of TF features are of two types, int and str , and they translate to corresponding MQL types integer and string . The actual values do not undergo any transformation. That means that in MQL queries, you use quotes if the feature is a string feature. Only if the feature is a number feature, you may omit the quotes: [word sp='verb'] [verse chapter=1 and verse=1] Enumeration types It is attractive to use enumeration types for the values of a feature, where ever possible, because then you can query those features in MQL with IN and without quotes: [chapter book IN (Genesis, Exodus)] We will generate enumerations for eligible features. Integer values can already be queried like this, even if they are not part of an enumeration. So we restrict ourselves to node features with string values. We put the following extra restrictions: the number of distinct values is less than 1000 all values must be legal C names, in practice: starting with a letter, followed by letters, digits, or _ . The letters can only be plain ASCII letters, uppercase and lowercase. Features that comply with these restrictions will get an enumeration type. Currently, we provide no ways to configure this in more detail. Instead of creating separate enumeration types for individual features, we collect all enumerated values for all those features into one big enumeration type. The reason is that MQL considers equal values in different types as distinct values. If we had separate types, we could never compare values for different features. There is no place for edge values in MQL. There is only one concept of feature in MQL: object features, which are node features. But TF edges without values can be seen as node features: nodes are mapped onto sets of nodes to which the edges go. And that notion is supported by MQL: edge features are translated into MQL features of type LIST OF id_d , i.e. lists of object identifiers. ! caution \"Legal names in MQL\" MQL names for databases, object types and features must be valid C identifiers (yes, the computer language C). The requirements are for names are: start with a letter (ASCII, upper-case or lower-case) follow by any sequence of ASCII upper / lower-case letters or digits or underscores ( _ ) avoid being a reserved word in the C language So, we have to change names coming from TF if they are invalid in MQL. We do that by replacing illegal characters by _ , and, if the result does not start with a letter, we prepend an x . We do not check whether the name is a reserved C word. With these provisos: the given dbName correspond to the MQL database name the TF otypes correspond to the MQL objects the TF features correspond to the MQL features The MQL export is usually quite massive (500MB for the Hebrew Bible). It can be compressed greatly, especially by the program bzip2 . ! caution \"Existing database\" If you try to import an MQL file in Emdros, and there exists already a file or directory with the same name as the MQL database, your import will fail spectacularly. So do not do that. A good way to prevent clashes: export the MQL to outside your ~/text-fabric-data directory, e.g. to ~/Downloads ; before importing the MQL file, delete the previous copy; Delete existing copy: sh cd ~/Downloads rm dataset ; mql -b 3 < dataset.mql " }, { "ref":"tf.convert.mql.exportMQL", -"url":139, +"url":140, "doc":"Exports the complete TF dataset into single MQL database. Parameters app: object A tf.advanced.app.App object, which holds the corpus data that will be exported to MQL. mqlDb: string Name of the MQL database exportDir: string, optional None Directory where the MQL data will be saved. If None is given, it will end up in the same repo as the dataset, in a new top-level subdirectory called mql . The exported data will be written to file exportDir/mqlDb.mql . If exportDir starts with ~ , the ~ will be expanded to your home directory. Likewise, will be expanded to the parent of the current directory, and . to the current directory, both only at the start of exportDir . Returns - None See Also tf.convert.mql", "func":1 }, { "ref":"tf.convert.mql.importMQL", -"url":139, +"url":140, "doc":"Converts an MQL database dump to a TF dataset. Parameters mqlFile: string Path to the file which contains the MQL code. saveDir: string Path to where a new TF app will be created. silent: string How silent the newly created TF object must be. slotType: string You have to tell which object type in the MQL file acts as the slot type, because TF cannot see that on its own. otext: dict You can pass the information about sections and text formats as the parameter otext . This info will end up in the otext.tf feature. Pass it as a dictionary of keys and values, like so: otext = { 'fmt:text-trans-plain': '{glyphs}{trailer}', 'sectionFeatures': 'book,chapter,verse', } meta: dict Likewise, you can add a dictionary keyed by features that will added to the metadata of the corresponding features. You may also add metadata for the empty feature , this will be added to the metadata of all features. Handy to add provenance data there. Example: meta = { : dict( dataset='DLC', datasetName='Digital Language Corpus', author=\"That 's me\", ), \"sp\": dict( description: \"part-of-speech\", ), } ! note \"description\" TF will display all metadata information under the key description in a more prominent place than the other metadata. ! caution \" value type \" Do not pass the value types of the features here. Returns - object A tf.core.fabric.FabricCore object holding the conversion result of the MQL data into TF.", "func":1 }, { "ref":"tf.convert.mql.MQL", -"url":139, +"url":140, "doc":"" }, { "ref":"tf.convert.mql.MQL.write", -"url":139, +"url":140, "doc":"", "func":1 }, { "ref":"tf.convert.mql.makeuni", -"url":139, +"url":140, "doc":"Make proper UNICODE of a text that contains byte escape codes such as backslash xb6 ", "func":1 }, { "ref":"tf.convert.mql.uni", -"url":139, +"url":140, "doc":"", "func":1 }, { "ref":"tf.convert.mql.tfFromMql", -"url":139, +"url":140, "doc":"Generate TF from MQL Parameters tmObj: object A tf.core.timestamp.Timestamp object mqlFile, slotType, otype, meta: mixed See tf.convert.mql.importMQL ", "func":1 }, { "ref":"tf.convert.mql.parseMql", -"url":139, +"url":140, "doc":"", "func":1 }, { "ref":"tf.convert.mql.tfFromData", -"url":139, +"url":140, "doc":"", "func":1 }, { "ref":"tf.convert.helpers", -"url":140, +"url":141, "doc":"" }, { "ref":"tf.convert.helpers.SECTION_MODELS", -"url":140, +"url":141, "doc":"Models for sections. A section is a part of the corpus that is defined by a set of files, or by elements within a single TEI source file. A model" }, { "ref":"tf.convert.helpers.SECTION_MODEL_DEFAULT", -"url":140, +"url":141, "doc":"Default model for sections." }, { "ref":"tf.convert.helpers.CM_LIT", -"url":140, +"url":141, "doc":"The value is taken literally from a TEI attribute. Code tei , since there is a 1-1 correspondence with the TEI source." }, { "ref":"tf.convert.helpers.CM_LITP", -"url":140, +"url":141, "doc":"The value results from straightforward processing of material in the TEI. Code tei , since there is a direct correspondence with the TEI source. Straightforward means: by taking into account the semantics of XML. Examples: Generated white-space based on whether elements are pure or mixed; Edges between parent and child elements, or sibling elements." }, { "ref":"tf.convert.helpers.CM_LITC", -"url":140, +"url":141, "doc":"The value is results from more intricate processing of material in the TEI. More intricate means : we derive data that goes beyond pure XML syntax. Examples: The values of the rend attributes are translated into rend_ value features; Adding features is_meta (being inside the TEI-header) and is_note (being inside a note); The feature that gives the content of a (character) slot; Decomposing strings into words material and after-word material. Code tf , since this is for the benefit of the resulting TF dataset." }, { "ref":"tf.convert.helpers.CM_PROV", -"url":140, +"url":141, "doc":"The value is added by the conversion to TF w.r.t. the material in the TEI. Examples: Slots in empty elements, in order to anchor the element to the text sequence; Section levels, based on the folder and file that the TEI source is in; A section level within the TEI, defined from several elements and the way they are nested; Code tf , since this is for the benefit of the resulting TF dataset." }, { "ref":"tf.convert.helpers.CM_NLP", -"url":140, +"url":141, "doc":"The value is added by an NLP pipeline w.r.t. the material in the TEI. Code nlp , since this comes from third party software. Examples: The feature nsent which gives the sentence number in the corpus. Sentences are not encoded in the TEI, but detected by an NLP program such as Spacy." }, { "ref":"tf.convert.helpers.CONVERSION_METHODS", -"url":140, +"url":141, "doc":"Information about the conversion. When we produce TF features, we specify a bit of information in the feature metadata as how we arrived at the specific value. That information ends up in two keys: conversionMethod : with values any of: CM_LIT CM_LITP CM_LITC CM_PROV CM_NLP conversionCode : the value is derived from conversionMethod by looking it up in this table. These values can be used to qualify the name of the attribute for further processing. For example, if you have a feature n that originates literally from the TEI, you could pass it on as tei:n . But if you have a feature chapter that is provided by the conversion, you could pass it on as tf:chapter . This passing on is a matter of other software, that takes the generated TF as input and processes it further, e.g. as annotations. ! note \"More methods and codes\" The TEI conversion is customizable by providing your own methods to several hooks in the program. These hooks may generate extra features, which you can give metadata in the tei.yaml file next to the tei.py file where you define the custom functions. It is advised to state appropriate values for the conversionMethod and conversionCode fields of these features. Examples: A feature country is derived from specific elements in the TEI Header, and defined for nodes of type letter . This happens in order to support the software of Team Text that shows the text on a webpage. In such a case you could define conversionMethod=\"derived\" conversionCode=\"tt\"" }, { "ref":"tf.convert.helpers.getWhites", -"url":140, +"url":141, "doc":"", "func":1 }, { "ref":"tf.convert.helpers.tokenize", -"url":140, +"url":141, "doc":"", "func":1 }, { "ref":"tf.convert.helpers.repTokens", -"url":140, +"url":141, "doc":"", "func":1 }, { "ref":"tf.convert.helpers.checkModel", -"url":140, +"url":141, "doc":"", "func":1 }, { "ref":"tf.convert.helpers.matchModel", -"url":140, +"url":141, "doc":"", "func":1 }, { "ref":"tf.convert.helpers.setUp", -"url":140, +"url":141, "doc":"", "func":1 }, { "ref":"tf.convert.helpers.tweakTrans", -"url":140, +"url":141, "doc":"", "func":1 }, { "ref":"tf.convert.helpers.lookupSource", -"url":140, +"url":141, "doc":"Looks up information from the current XML stack. The current XML stack contains the ancestry of the current node, including the current node itself. It is a list of components, corresponding to the path from the root node to the current node. Each component is a tuple, consisting of the tag name and the attributes of an XML node. Against this stack a sequence of instructions, given in specs , is executed. These instructions collect information from the stack, under certain conditions, and put that information into a feature, as value for a certain node. Here is an example of a single instruction: Parameters cv: object The converter object, needed to issue actions. cur: dict Various pieces of data collected during walking and relevant for some next steps in the walk. specs: tuple A sequence of instructions what to look for. Each instruction has the following parts: pathSpec nodeType featureName The effect is: The pathSpec is compared to the current XML stack. If it matches the current node, the text content of the current node or one of its attributes will be collected and put in a feature with name featureName , for the current TF node of type nodeType . The pathSpec is a list of components. The first component should match the top of the XML stack, the second component the element that is below the top, etc. Each component is a tuple of a tag name; a dictionary of attribute values; The first component may have a tag name that has @ plus an attribute name appended to it. That means that the information will be extracted from that attribute, not from the content of the element.", "func":1 }, { "ref":"tf.convert.pagexml", -"url":141, +"url":142, "doc":"" }, { "ref":"tf.convert.pagexml.setUp", -"url":141, +"url":142, "doc":"", "func":1 }, { "ref":"tf.convert.pagexml.diverge", -"url":141, +"url":142, "doc":"", "func":1 }, { "ref":"tf.convert.pagexml.tokenLogic", -"url":141, +"url":142, "doc":"", "func":1 }, { "ref":"tf.convert.pagexml.emptySlot", -"url":141, +"url":142, "doc":"", "func":1 }, { "ref":"tf.convert.pagexml.linebreakSlot", -"url":141, +"url":142, "doc":"", "func":1 }, { "ref":"tf.convert.pagexml.walkObject", -"url":141, +"url":142, "doc":"Internal function to deal with a single element. Will be called recursively. Parameters cv: object The converter object, needed to issue actions. cur: dict Various pieces of data collected during walking and relevant for some next steps in the walk. The subdictionary cur[\"node\"] is used to store the currently generated nodes by node type. bj xode: object An PageXML object.", "func":1 }, { "ref":"tf.convert.pagexml.PageXML", -"url":141, +"url":142, "doc":"Converts PageXML to TF. Below we describe how to control the conversion machinery. Based on current directory from where the script is called, it defines all the ingredients to carry out a tf.convert.walker conversion of the PageXML input. This function is assumed to work in the context of a repository, i.e. a directory on your computer relative to which the input directory exists, and various output directories: tf , app , docs . The repoDir must be at ~/backend/org/repo/relative where ~ is your home directory; backend is an online back-end name, like github , gitlab , git.huc.knaw.nl ; org is an organization, person, or group in the back-end; repo is a repository in the org . relative is a directory path within the repo (0 or more components) This is only about the directory structure on your local computer; it is not required that you have online incarnations of your repository in that back-end. Even your local repository does not have to be a git repository. The only thing that matters is that the full path to your repo can be parsed as a sequence of home/backend/org/repo/relative . Relative to this directory the program expects and creates input / output directories. source/version directory The source directory is specified by sourceDir , and within it are version directories. Document directories These are the top-level directories within the version directories. They correspond to individual documents. Documents typically contain a set of pages. Input directories per document image : contain the scan images meta : contain metadata files page : contain the PageXML files The files in image and page have names that consist of a 4-digit number with leading zeros, and any two files with the same name in image and page represent the same document. Output directories tf The directory under which the TF output file (with extension .tf ) are placed. If it does not exist, it will be created. The TF files will be generated in a folder named by a version number, passed as tfVersion . app and docs Location of additional TF app configuration and documentation files. If they do not exist, they will be created with some sensible default settings and generated documentation. These settings can be overridden in the app/config_custom.yaml file. Also a default display.css file and a logo are added. docs Location of additional documentation. This can be generated or hand-written material, or a mixture of the two. Parameters sourceDir: string The location of the source directory repoDir: string The location of the target repo where the TF data is generated. source: string, optional If empty, use the latest version under the source directory with sources. Otherwise it should be a valid integer, and it is the index in the sorted list of versions there. 0 or latest : latest version; -1 , -2 , . : previous version, version before previous, .; 1 , 2 , .: first version, second version, everything else that is not a number is an explicit version If the value cannot be parsed as an integer, it is used as the exact version name. tf: string, optional If empty, the TF version used will be the latest one under the tf directory. If it can be parsed as the integers 1, 2, or 3 it will bump the latest relevant TF version: 0 or latest : overwrite the latest version 1 will bump the major version 2 will bump the intermediate version 3 will bump the minor version everything else is an explicit version Otherwise, the value is taken as the exact version name. verbose: integer, optional -1 Produce no (-1), some (0) or many (1) progress and reporting messages" }, { "ref":"tf.convert.pagexml.PageXML.getDirector", -"url":141, +"url":142, "doc":"Factory for the director function. The tf.convert.walker relies on a corpus dependent director function that walks through the source data and spits out actions that produces the TF dataset. Also some special additions need to be programmed, such as an extra section level, word boundaries, etc. We collect all needed data, store it, and define a local director function that has access to this data. Returns - function The local director function that has been constructed.", "func":1 }, { "ref":"tf.convert.pagexml.PageXML.getConverter", -"url":141, +"url":142, "doc":"Initializes a converter. Returns - object The tf.convert.walker.CV converter object, initialized.", "func":1 }, { "ref":"tf.convert.pagexml.PageXML.convertTask", -"url":141, +"url":142, "doc":"Implementation of the \"convert\" task. It sets up the tf.convert.walker machinery and runs it. Returns - boolean Whether the conversion was successful.", "func":1 }, { "ref":"tf.convert.pagexml.PageXML.loadTask", -"url":141, +"url":142, "doc":"Implementation of the \"load\" task. It loads the TF data that resides in the directory where the \"convert\" task deliver its results. During loading there are additional checks. If they succeed, we have evidence that we have a valid TF dataset. Also, during the first load intensive pre-computation of TF data takes place, the results of which will be cached in the invisible .tf directory there. That makes the TF data ready to be loaded fast, next time it is needed. Returns - boolean Whether the loading was successful.", "func":1 }, { "ref":"tf.convert.pagexml.PageXML.appTask", -"url":141, +"url":142, "doc":"Implementation of the \"app\" task. It creates / updates a corpus-specific app plus specific documentation files. There should be a valid TF dataset in place, because some settings in the app derive from it. It will also read custom additions that are present in the target app directory. These files are: about_custom.md : A markdown file with specific colophon information about the dataset. In the generated file, this information will be put at the start. transcription_custom.md : A markdown file with specific encoding information about the dataset. In the generated file, this information will be put at the start. config_custom.yaml : A YAML file with configuration data that will be merged into the generated config.yaml. app_custom.py : A python file with named snippets of code to be inserted at corresponding places in the generated app.py display_custom.css : Additional CSS definitions that will be appended to the generated display.css . If the TF app for this resource needs custom code, this is the way to retain that code between automatic generation of files. Returns - boolean Whether the operation was successful.", "func":1 }, { "ref":"tf.convert.pagexml.PageXML.browseTask", -"url":141, +"url":142, "doc":"Implementation of the \"browse\" task. It gives a shell command to start the TF browser on the newly created corpus. There should be a valid TF dataset and app configuration in place Returns - boolean Whether the operation was successful.", "func":1 }, { "ref":"tf.convert.pagexml.PageXML.task", -"url":141, +"url":142, "doc":"Carry out any task, possibly modified by any flag. This is a higher level function that can execute a selection of tasks. The tasks will be executed in a fixed order: convert , load , app , browse . But you can select which one(s) must be executed. If multiple tasks must be executed and one fails, the subsequent tasks will not be executed. Parameters convert: boolean, optional False Whether to carry out the convert task. load: boolean, optional False Whether to carry out the load task. app: boolean, optional False Whether to carry out the app task. browse: boolean, optional False Whether to carry out the browse task\" verbose: integer, optional -1 Produce no (-1), some (0) or many (1) progress and reporting messages Returns - boolean Whether all tasks have executed successfully.", "func":1 }, @@ -7602,346 +7655,346 @@ INDEX=[ }, { "ref":"tf.convert.pagexml.main", -"url":141, +"url":142, "doc":"", "func":1 }, { "ref":"tf.writing", -"url":142, +"url":143, "doc":" Writing systems support Transliteration tables for various writing systems. One can pass a language code to TF. When TF displays text (e.g. in tf.advanced.display ) the language code may trigger the writing direction and the choice of font. Here are the ones that have an effect: iso | language - | - akk | akkadian hbo | hebrew syc | syriac uga | ugaritic ara | arabic grc | greek cld | neo aramaic Default: : string " }, { "ref":"tf.writing.transcription", -"url":143, +"url":144, "doc":" Transcription TF has support for several writing systems, by means of transcription tables and fonts that will be invoked when displaying the main text. It also calls functions to use these tables for converting Hebrew and Syriac text material to transliterated representations and back. There is also a phonetic transcription for Hebrew, designed in [phono.ipynb](https: nbviewer.jupyter.org/github/etcbc/phono/blob/master/programs/phono.ipynb) Character tables and fonts hbo Hebrew tf.writing.hebrew : full list of characters covered by the ETCBC and phonetic transcriptions Font Ezra SIL . syc Syriac tf.writing.syriac : full list of characters covered by the ETCBC transcriptions Font Estrangelo Edessa . ara Arabic tf.writing.arabic : full list of characters covered by the transcription used for the Quran Font AmiriQuran . grc Greek Font Gentium . akk Akkadian Font Santakku . uga Ugaritic Font Santakku . cld Neo Aramaic Font CharisSIL-R ." }, { "ref":"tf.writing.transcription.Transcription", -"url":143, +"url":144, "doc":"Conversion between UNICODE and various transcriptions. Usage notes: Invoke the transcription functionality as follows: from tf.writing.transcription import Transcription Some of the attributes and methods below are class attributes, others are instance attributes. A class attribute aaa can be retrieved by saying python Transcription.aaa To retrieve an instance attribute, you need an instance first, like python tr = Transcription() tr.aaa " }, { "ref":"tf.writing.transcription.Transcription.decomp", -"url":143, +"url":144, "doc":"" }, { "ref":"tf.writing.transcription.Transcription.hebrew_mapping", -"url":143, +"url":144, "doc":"Maps all ETCBC transliteration character combinations for Hebrew to UNICODE. Example: sof-pasuq: python Transcription.hebrew_mapping['00'] Output: \u05c3 " }, { "ref":"tf.writing.transcription.Transcription.hebrew_cons", -"url":143, +"url":144, "doc":"" }, { "ref":"tf.writing.transcription.Transcription.trans_final_pat", -"url":143, +"url":144, "doc":"" }, { "ref":"tf.writing.transcription.Transcription.trans_hebrew_pat", -"url":143, +"url":144, "doc":"" }, { "ref":"tf.writing.transcription.Transcription.swap_accent_pat", -"url":143, +"url":144, "doc":"" }, { "ref":"tf.writing.transcription.Transcription.remove_accent_pat", -"url":143, +"url":144, "doc":"" }, { "ref":"tf.writing.transcription.Transcription.remove_point_pat", -"url":143, +"url":144, "doc":"" }, { "ref":"tf.writing.transcription.Transcription.remove_psn_pat", -"url":143, +"url":144, "doc":"" }, { "ref":"tf.writing.transcription.Transcription.remove_psq_pat", -"url":143, +"url":144, "doc":"" }, { "ref":"tf.writing.transcription.Transcription.shin_pat", -"url":143, +"url":144, "doc":"" }, { "ref":"tf.writing.transcription.Transcription.ph_simple_pat", -"url":143, +"url":144, "doc":"" }, { "ref":"tf.writing.transcription.Transcription.noorigspace", -"url":143, +"url":144, "doc":"" }, { "ref":"tf.writing.transcription.Transcription.ugaritic_mappingi", -"url":143, +"url":144, "doc":"Maps Ugaritic unicode characters to their conventional transliteration characters. Unidentified characters: x (damaged ?) / (alternative ?) only twice, in atyp\u02e4tba/r and xxxxl/d\u2026 , (comma) only once in a very long word starting at 551 . km,ad . (brackets marking uncertainty ?) \u2026 (unreadable ?) 00a0 (non-breaking space)" }, { "ref":"tf.writing.transcription.Transcription.ugaritic_mapping", -"url":143, +"url":144, "doc":"" }, { "ref":"tf.writing.transcription.Transcription.syriac_mapping_simple", -"url":143, +"url":144, "doc":"" }, { "ref":"tf.writing.transcription.Transcription.syriac_mapping_pil", -"url":143, +"url":144, "doc":"" }, { "ref":"tf.writing.transcription.Transcription.syriac_mapping", -"url":143, +"url":144, "doc":"Maps all ETCBC transliteration character combinations for Syriac to UNICODE. Example: semkath-final: python Transcription.syriac_mapping['s'] Output: \u0724 " }, { "ref":"tf.writing.transcription.Transcription.trans_syriac_pat", -"url":143, +"url":144, "doc":"" }, { "ref":"tf.writing.transcription.Transcription.arabic_mapping", -"url":143, +"url":144, "doc":"Maps an Arabic transliteration character to UNICODE. This is the mapping used in the Quran representation on tanzil.net. Example: beh python Transcription.syriac_mapping['b'] Output: \u0628 Maps an Arabic letter in UNICODE to its transliteration Example: beh transliteration python Transcription.syriac_mapping['\u0628'] Output: b " }, { "ref":"tf.writing.transcription.Transcription.arabic_mappingi", -"url":143, +"url":144, "doc":"" }, { "ref":"tf.writing.transcription.Transcription.arabicTrans", -"url":143, +"url":144, "doc":"More Arabic transcriptions: column 1: custom [Quran-tanzil](http: tanzil.net/ 1:1), slightly extended column 2: ascii resp. latin plus diacritics also known as betacode. We use a list compiled by [Peter Verkinderen](https: pverkind.github.io/betacodeTranscriber/js/betacode.js) column 4: standard (Library of Congress) (to-be filled). We use the [arabic romanization list of 2012](https: www.loc.gov/catdir/cpso/romanization/arabic.pdf) We refrain of from applying rules that cannot be computed without lexical/grammatical/dialectical knowledge of the arabic language." }, { "ref":"tf.writing.transcription.Transcription.arabicTransQuran", -"url":143, +"url":144, "doc":"" }, { "ref":"tf.writing.transcription.Transcription.arabicTransAscii", -"url":143, +"url":144, "doc":"" }, { "ref":"tf.writing.transcription.Transcription.arabicTransLatin", -"url":143, +"url":144, "doc":"" }, { "ref":"tf.writing.transcription.Transcription.arabicTransStandard", -"url":143, +"url":144, "doc":"" }, { "ref":"tf.writing.transcription.Transcription.ara", -"url":143, +"url":144, "doc":"" }, { "ref":"tf.writing.transcription.Transcription.qur", -"url":143, +"url":144, "doc":"" }, { "ref":"tf.writing.transcription.Transcription.asc", -"url":143, +"url":144, "doc":"" }, { "ref":"tf.writing.transcription.Transcription.lat", -"url":143, +"url":144, "doc":"" }, { "ref":"tf.writing.transcription.Transcription.std", -"url":143, +"url":144, "doc":"" }, { "ref":"tf.writing.transcription.Transcription.quranFromArabic", -"url":143, +"url":144, "doc":"", "func":1 }, { "ref":"tf.writing.transcription.Transcription.asciiFromArabic", -"url":143, +"url":144, "doc":"", "func":1 }, { "ref":"tf.writing.transcription.Transcription.latinFromArabic", -"url":143, +"url":144, "doc":"", "func":1 }, { "ref":"tf.writing.transcription.Transcription.standardFromArabic", -"url":143, +"url":144, "doc":"", "func":1 }, { "ref":"tf.writing.transcription.Transcription.sycSplitPunc", -"url":143, +"url":144, "doc":"", "func":1 }, { "ref":"tf.writing.transcription.Transcription.suffix_and_finales", -"url":143, +"url":144, "doc":"Given an ETCBC transliteration, split it into the word material and the interword material that follows it (space, punctuation). Replace the last consonant of the word material by its final form, if applicable. Output a tuple with the modified word material and the interword material. Example: python Transcription.suffix_and_finales('71T_H@>@95REY00') Output: ('71T_H@>@95REy', '00 ') Note that the Y has been replaced by y .", "func":1 }, { "ref":"tf.writing.transcription.Transcription.suppress_space", -"url":143, +"url":144, "doc":"Given an ETCBC transliteration of a word, match the end of the word for punctuation and spacing characters ( sof pasuq , paseq , nun hafukha , setumah , petuhah , space, no-space) Example: python Transcription.suppress_space('B.:&') Transcription.suppress_space('B.@R@74>') Transcription.suppress_space('71T_H@>@95REY00') Output: None ", "func":1 }, { "ref":"tf.writing.transcription.Transcription.to_etcbc_v", -"url":143, +"url":144, "doc":"Given an ETCBC transliteration of a fully pointed word, strip all the non-vowel pointing (i.e. the accents). Example: python Transcription.to_etcbc_v('HAC.@MA73JIm') Output: HAC.@MAJIm ", "func":1 }, { "ref":"tf.writing.transcription.Transcription.to_etcbc_c", -"url":143, +"url":144, "doc":"Given an ETCBC transliteration of a fully pointed word, strip everything except the consonants. Punctuation will also be stripped. Example: python Transcription.to_etcbc_c('HAC.@MA73JIm') Output: H MJM Note that the pointed shin ( C ) is replaced by an unpointed one ( ).", "func":1 }, { "ref":"tf.writing.transcription.Transcription.to_hebrew", -"url":143, +"url":144, "doc":"Given a transliteration of a fully pointed word, produce the word in UNICODE Hebrew. Care will be taken that vowel pointing will be added to consonants before accent pointing. Example: python Transcription.to_hebrew('HAC.@MA73JIm') Output: \u05d4\u05b7\ufb2a\u05bc\u05b8\u05de\u05b7\u0596\u05d9\u05b4\u05dd ", "func":1 }, { "ref":"tf.writing.transcription.Transcription.to_hebrew_v", -"url":143, +"url":144, "doc":"Given a transliteration of a fully pointed word, produce the word in UNICODE Hebrew, but without the accents. Example: python Transcription.to_hebrew_v('HAC.@MA73JIm') Output: \u05d4\u05b7\ufb2a\u05bc\u05b8\u05de\u05b7\u05d9\u05b4\u05dd ", "func":1 }, { "ref":"tf.writing.transcription.Transcription.to_hebrew_c", -"url":143, +"url":144, "doc":"Given a transliteration of a fully pointed word, produce the word in UNICODE Hebrew, but without the pointing. Example: python Transcription.to_hebrew_c('HAC.@MA73JIm') Output: \u05d4\u05e9\u05de\u05d9\u05de Note that final consonant forms are not being used.", "func":1 }, { "ref":"tf.writing.transcription.Transcription.to_hebrew_x", -"url":143, +"url":144, "doc":"Given a transliteration of a fully pointed word, produce the word in UNICODE Hebrew, but without the pointing. Vowel pointing and accent pointing will be applied in the order given by the input word. Example: python Transcription.to_hebrew_x('HAC.@MA73JIm') Output: \u05d4\u05b7\ufb2a\u05bc\u05b8\u05de\u05b7\u0596\u05d9\u05b4\u05dd ", "func":1 }, { "ref":"tf.writing.transcription.Transcription.ph_simplify", -"url":143, +"url":144, "doc":"Given a phonological transliteration of a fully pointed word, produce a more coarse phonological transliteration. Example: python Transcription.ph_simplify('\u0294\u1d49l\u014dh\u02c8\u00eem') Transcription.ph_simplify('m\u0101q\u02c8\u00f4m') Transcription.ph_simplify('kol') Output: \u0294l\u014dh\u00eem m\u00e5q\u00f4m k\u00e5l Note that the simplified version transliterates the qamets gadol and qatan to the same character.", "func":1 }, { "ref":"tf.writing.transcription.Transcription.from_hebrew", -"url":143, +"url":144, "doc":"Given a fully pointed word in UNICODE Hebrew, produce the word in ETCBC transliteration. Example: python tr.from_hebrew('\u05d4\u05b8\u05d0\u05b8\u05bd\u05e8\u05b6\u05e5\u05c3') Output: H@>@95REy00 ", "func":1 }, { "ref":"tf.writing.transcription.Transcription.to_syriac", -"url":143, +"url":144, "doc":"Given a word in ETCBC transliteration, produce the word in UNICODE Syriac. Example: python tr.to_syriac('MKSJN') Output: \u0721\u071f\u0723\u071d\u0722 ", "func":1 }, { "ref":"tf.writing.transcription.Transcription.from_syriac", -"url":143, +"url":144, "doc":"Given a word in UNICODE Syriac, produce the word in ETCBC transliteration. Example: python tr.from_syriac('\u0721\u071f\u0723\u071d\u0722') Output: MKSJN ", "func":1 }, { "ref":"tf.writing.transcription.Transcription.can_to_syriac", -"url":143, +"url":144, "doc":"", "func":1 }, { "ref":"tf.writing.transcription.Transcription.can_from_syriac", -"url":143, +"url":144, "doc":"", "func":1 }, { "ref":"tf.writing.transcription.Transcription.to_ugaritic", -"url":143, +"url":144, "doc":"Given a word in transliteration, produce the word in UNICODE Ugaritic. k\u1e6fbx \ud800\udf8b\ud800\udf98\ud800\udf81x Example: python Transcription.to_ugaritic('k\u1e6fbx') Output: \ud800\udf8b\ud800\udf98\ud800\udf81x ", "func":1 }, { "ref":"tf.writing.transcription.Transcription.from_ugaritic", -"url":143, +"url":144, "doc":"Given a word in UNICODE Ugaritic, produce the word in transliteration. Example: python Transcription.from_ugaritic('\ud800\udf8b\ud800\udf98\ud800\udf81x') Output: k\u1e6fbx ", "func":1 }, { "ref":"tf.writing.transcription.Transcription.to_arabic", -"url":143, +"url":144, "doc":"Given a word in transliteration, produce the word in UNICODE Arabic. Example: python Transcription.to_arabic('bisomi') Output: \u0628\u0650\u0633\u0652\u0645\u0650 ", "func":1 }, { "ref":"tf.writing.transcription.Transcription.from_arabic", -"url":143, +"url":144, "doc":"Given a word in UNICODE Arabic, produce the word in transliteration. Example: python Transcription.from_arabic('\u0628\u0650\u0633\u0652\u0645\u0650') Output: bisomi ", "func":1 }, { "ref":"tf.writing.greek", -"url":144, +"url":145, "doc":" Greek characters [Greek script in UNICODE](https: en.wikipedia.org/wiki/Greek_alphabet Greek_in_Unicode)" }, { "ref":"tf.writing.arabic", -"url":145, +"url":146, "doc":" Arabic characters @font-face { font-family: AmiriQuran; src: url('https: github.com/annotation/text-fabric/blob/master/tf/browser/static/fonts/AmiriQuran.woff2') format('woff2'), url('https: github.com/annotation/text-fabric/blob/master/tf/browser/static/fonts/AmiriQuran.woff') format('woff'), url('https: github.com/annotation/text-fabric/blob/master/tf/browser/static/fonts/AmiriQuran.ttf') format('truetype'); } body { font-family: sans-serif; } table.chars { border-collapse: collapse; } table.chars thead tr { color: ffffff; background-color: 444444; } table.chars tbody td { border: 2px solid bbbbbb; padding: 0.1em 0.5em; } h1.chars { margin-top: 1em; } .t { font-family: monospace; font-size: large; color: 0000ff; } .g { font-family: \"AmiriQuran\", sans-serif; font-size: x-large; } .p { font-family: monospace; font-size: large; color: 666600; } .r { font-family: sans-serif; font-size: small; color: 555555; } .n { font-family: sans-serif; color: 990000; font-size: small; } .u { font-family: monospace; color: 990000; } Letters quran / tanzil ASCII latin standard glyph remarks name UNICODE ' ' \u02be ' \u0621 ARABIC LETTER HAMZA 0621 A & x005f;a \u0101 \u0101 \u0627 ARABIC LETTER ALEF 0627 b b b b \u0628 ARABIC LETTER BEH 0628 p =t \u0167 t \u0629 ARABIC LETTER TEH MARBUTA 0629 t t t t \u062a ARABIC LETTER TEH 062a v & x005f;t \u1e6f th \u062b ARABIC LETTER THEH 062b j j \u01e7 j \u062c ARABIC LETTER JEEM 062c H & x002a;h \u1e25 \u1e25 \u062d ARABIC LETTER HAH 062d x & x005f;h \u1e2b kh \u062e ARABIC LETTER KHAH 062e d d d d \u062f ARABIC LETTER DAL 062f & x002a; & x005f;d \u1e0f dh \u0630 ARABIC LETTER THAL 0630 r r r r \u0631 ARABIC LETTER REH 0631 z z z z \u0632 ARABIC LETTER ZAIN 0632 s s s s \u0633 ARABIC LETTER SEEN 0633 $ ^s \u0161 sh \u0634 ARABIC LETTER SHEEN 0634 S & x002a;s \u1e63 \u1e63 \u0635 ARABIC LETTER SAD 0635 D & x002a;d \u1e0d \u1e0d \u0636 ARABIC LETTER DAD 0636 T & x002a;t \u1e6d \u1e6d \u0637 ARABIC LETTER TAH 0637 Z & x002a;z \u1e93 \u1e93 \u0638 ARABIC LETTER ZAH 0638 E \u02bf \u0639 ARABIC LETTER AIN 0639 g & x002a;g \u0121 gh \u063a ARABIC LETTER GHAIN 063a f f f f \u0641 ARABIC LETTER FEH 0641 q & x002a;k \u1e33 q \u0642 ARABIC LETTER QAF 0642 k k k k \u0643 ARABIC LETTER KAF 0643 l l l l \u0644 ARABIC LETTER LAM 0644 m m m m \u0645 ARABIC LETTER MEEM 0645 n n n n \u0646 ARABIC LETTER NOON 0646 h h h h \u0647 ARABIC LETTER HEH 0647 w w w w \u0648 ARABIC LETTER WAW 0648 Y /a \u00e1 \u0101 \u0649 ARABIC LETTER ALEF MAKSURA 0649 y y y y \u064a ARABIC LETTER YEH 064a { a a a \u0671 ARABIC LETTER ALEF WASLA 0671 G g g g \u06af ARABIC LETTER GAF 06af J y Y y \u06af ARABIC LETTER FARSI YEH 06cc Numerals quran / tanzil ASCII latin standard glyph remarks name UNICODE 0 0 0 0 & x0660; ARABIC INDIC DIGIT ZERO 0660 1 1 1 1 & x0661; ARABIC INDIC DIGIT ONE 0661 2 2 2 2 & x0662; ARABIC INDIC DIGIT TWO 0662 3 3 3 3 & x0663; ARABIC INDIC DIGIT THREE 0663 4 4 4 4 & x0664; ARABIC INDIC DIGIT FOUR 0664 5 5 5 5 & x0665; ARABIC INDIC DIGIT FIVE 0665 6 6 6 6 & x0666; ARABIC INDIC DIGIT SIX 0666 7 7 7 7 & x0667; ARABIC INDIC DIGIT SEVEN 0667 8 8 8 8 & x0668; ARABIC INDIC DIGIT EIGHT 0668 9 9 9 9 & x0669; ARABIC INDIC DIGIT NINE 0669 Stops quran / tanzil ASCII latin standard glyph remarks name UNICODE - . . . \u06ea ARABIC EMPTY CENTRE LOW STOP 06ea + . . . \u06eb ARABIC EMPTY CENTRE HIGH STOP 06eb % . . . \u06ec ARABIC ROUNDED HIGH STOP WITH FILLED CENTRE 06ec Letters (modified) quran / tanzil ASCII latin standard glyph remarks name UNICODE & x0060; ~a \u00e3 \u0670 ARABIC LETTER SUPERSCRIPT ALEF 0670 \u00bb & x005f;a \u0101 \u0101 \u0670\u0622 ARABIC LETTER ALEF WITH MADDA ABOVE 0622 : s S s \u06dc ARABIC SMALL HIGH SEEN 06dc [ m M M \u06e2 ARABIC SMALL HIGH MEEM ISOLATED FORM 06e2 ; s S S \u06e3 ARABIC SMALL LOW SEEN 06e3 , w W W \u06e5 ARABIC SMALL WAW 06e5 . y Y Y \u06e6 ARABIC SMALL YEH 06e6 M j J j \u06da ARABIC SMALL HIGH JEEM 06da ! n N N \u06e8 ARABIC SMALL HIGH NOON 06e8 ] m M M \u06ed ARABIC SMALL LOW MEEM 06ed Letters (combined) quran / tanzil ASCII latin standard glyph remarks name UNICODE > & x005f;a \u0101 \u0101 \u0623 ARABIC LETTER ALEF WITH HAMZA ABOVE 0623 & ' \u02be ' \u0624 ARABIC LETTER WAW WITH HAMZA ABOVE 0624 /td> & x005f;a \u0101 \u0101 \u0625 ARABIC LETTER ALEF WITH HAMZA BELOW 0625 } ' \u02be y \u0626 ARABIC LETTER YEH WITH HAMZA ABOVE 0626 SlY & x002a;sl/a \u1e63l\u00e1 \u1e63la \u06d6 ARABIC SMALL HIGH LIGATURE SAD WITH LAM WITH ALEF MAKSURA 06d6 Lengthening quran / tanzil ASCII latin standard glyph remarks name UNICODE & x005f; \u0640 ARABIC TATWEEL 0640 Vowel diacritics quran / tanzil ASCII latin standard glyph remarks name UNICODE F a& x002a;n a\u207f an \u064b ARABIC FATHATAN 064b N u& x002a;n u\u207f un \u064c ARABIC DAMMATAN 064c K i& x002a;n i\u207f in \u064d ARABIC KASRATAN 064d a a a a \u064e ARABIC FATHA 064e u u u u \u064f ARABIC DAMMA 064f i i i i \u0650 ARABIC KASRA 0650 Non-vocalic diacritics quran / tanzil ASCII latin standard glyph remarks name UNICODE ~ u u \u016bw \u0651 ARABIC SHADDA 0651 o a a a \u0652 ARABIC SUKUN 0652 ^ & x005f;a \u0101 \u0101 \u0653 ARABIC MADDAH ABOVE 0653 ' \u02be \u0101 \u0654 ARABIC HAMZA ABOVE 0654 = ' \u02be \u0101 \u0655 ARABIC HAMZA BELOW 0655 @ 0 0 0 \u06df ARABIC SMALL HIGH ROUNDED ZERO 06df \" 0 0 0 \u06e0 ARABIC SMALL HIGH UPRIGHT RECTANGULAR ZERO 06e0 Separators quran / tanzil ASCII latin standard glyph remarks name UNICODE SPACE 0020 See also [Arabic script in UNICODE](https: en.wikipedia.org/wiki/Arabic_script_in_Unicode) [Arabic diacritics](https: en.wikipedia.org/wiki/Arabic_diacritics harakat) [Beta code](https: pverkind.github.io/betacodeTranscriber/js/betacode.js) [Library of Congress](https: www.loc.gov/catdir/cpso/romanization/arabic.pdf)" }, { "ref":"tf.writing.hebrew", -"url":146, +"url":147, "doc":" Hebrew characters @font-face { font-family: \"Ezra SIL\"; src: url('https: github.com/annotation/text-fabric/blob/master/tf/browser/static/fonts/SILEOT.ttf?raw=true'); src: url('https: github.com/annotation/text-fabric/blob/master/tf/browser/static/fonts/SILEOT.woff?raw=true') format('woff'); } body { font-family: sans-serif; } table.chars { border-collapse: collapse; } table.chars thead tr { color: ffffff; background-color: 444444; } table.chars tbody td { border: 2px solid bbbbbb; padding: 0.1em 0.5em; } h1.chars { margin-top: 1em; } .t { font-family: monospace; font-size: large; color: 0000ff; } .g { font-family: \"Ezra SIL\", sans-serif; font-size: x-large; } .p { font-family: monospace; font-size: large; color: 666600; } .r { font-family: sans-serif; font-size: small; color: 555555; } .n { font-family: sans-serif; color: 990000; font-size: small; } .u { font-family: monospace; color: 990000; } ! note \"Disclaimer\" This just a look-up table, not a full exposition of the organization of the Masoretic system. ! abstract \"Transcriptions\" The ETCBC transcription is used by the ETCBC. It has entries for all accents, but not for text-critical annotations such as uncertainty, and correction. The Abegg transcription is used in the Dead Sea scrolls. It has no entries for accents, but it has a repertoire of text-critical marks. We have back translated the latter to ETCBC -compatible variants and entered them in the ETCBC column, although they are not strictly ETCBC marks. ! abstract \"Phonetics\" The phonetic representation is meant as a tentative 1-1 correspondence with pronunciation, not with the script. See [phono.ipynb](https: nbviewer.jupyter.org/github/ETCBC/phono/blob/master/programs/phono.ipynb), where the phonetic transcription is computed and thoroughly documented. Consonants ! abstract \"Details\" For most consonants: an inner dot is a dagesh forte . For the \u05d1\u05d2\u05d3\u05db\u05e4\u05ea consonants: an inner dot is either a dagesh forte or a dagesh lene . When the \u05d4 contains a dot, it is called a mappiq . transcription ( ETCBC ) transcription (Abegg) glyph phonetic remarks name UNICODE > a \u05d0 \u0294 when not mater lectionis letter alef 05D0 B b \u05d1 bb b v forte lene normal letter bet 05D1 G g \u05d2 gg g \u1e21 forte lene normal letter gimel 05D2 D d \u05d3 dd d \u1e0f forte lene normal letter dalet 05D3 H h \u05d4 h also with mappiq ; when not mater lectionis letter he 05D4 W w \u05d5 ww w \u00fb forte when not part of a long vowel with dagesh as vowel letter vav 05D5 Z z \u05d6 zz z forte normal letter zayin 05D6 X j \u05d7 \u1e25 letter het 05D7 V f \u05d8 \u1e6d letter tet 05D8 J y \u05d9 yy y \u02b8 forte when not part of long vowel in front of final \u05d5 letter yod 05D9 K k \u05db kk k \u1e35 forte lene normal letter kaf 05DB k K \u05da k \u1e35 forte normal letter final kaf 05DA L l \u05dc ll l forte normal letter lamed 05DC M m \u05de mm m forte normal letter mem 05DE m M \u05dd m letter final mem 05DD N n \u05e0 nn n forte normal letter nun 05E0 n N \u05df n letter final nun 05DF S s \u05e1 ss s forte normal letter samekh 05E1 < o \u05e2 \u0295 letter ayin 05E2 P p \u05e4 pp p f forte lene normal letter pe 05E4 p P \u05e3 p f forte normal letter final pe 05E3 Y x \u05e6 \u1e63\u1e63 \u1e63 forte normal letter tsadi 05E6 y X \u05e5 \u1e63 letter final tsadi 05E5 Q q \u05e7 qq q forte normal letter qof 05E7 R r \u05e8 rr r forte normal letter resh 05E8 C \u05e9 \u015d letter shin without dot 05E9 C v \u05e9\u05c1 \u0161\u0161 \u0161 forte normal letter shin with shin dot FB2A F c \u05e9\u05c2 \u015b\u015b \u015b forte normal letter shin with sin dot FB2B T t \u05ea tt t \u1e6f forte lene normal letter tav 05EA Vowels ! caution \"Qere Ketiv\" The phonetics follows the qere , not the ketiv , when they are different. In that case a is added. ! caution \"Tetragrammaton\" The tetragrammaton \u05d9\u05d4\u05d5\u05d4 is (vowel)-pointed in different ways; the phonetics follows the pointing, but the tetragrammaton is put between [ ] . transcription ( ETCBC ) transcription (Abegg) glyph phonetic remarks name UNICODE A A \u00c5 \u05b7 a \u2090 normal furtive point patah 05B7 :A S \u05b2 \u1d43 point hataf patah 05B2 @ D \u2202 \u00ce \u05b8 \u0101 o gadol qatan point qamats 05B8 :@ F \u0192 \u00cf \u05b3 \u1d52 point hataf qamats 05B3 E R \u00ae \u2030 \u05b6 e e\u02b8 normal with following \u05d9 point segol 05B6 :E T \u05b1 \u1d49 \u1d49\u02b8 normal with following \u05d9 point hataf segol 05B1 ; E \u00e9 \u00b4 \u05b5 \u00ea \u0113 with following \u05d9 alone point tsere 05B5 I I \u02c6 \u00ee \u00ca \u05b4 \u00ee i with following \u05d9 alone point hiriq 05B4 O O \u00f8 \u05b9 \u00f4 \u014d with following \u05d5 alone point holam 05B9 U U \u00fc \u00a8 \u05bb u point qubuts 05BB : V \u221a J \u25ca \u05b0 \u1d4a left out if silent point sheva 05B0 Other points and marks transcription ( ETCBC ) transcription (Abegg) glyph phonetic remarks name UNICODE . ; \u2026 \u00da \u00a5 \u03a9 \u05bc point dagesh or mapiq 05BC .c \u05c1 point shin dot 05C1 .f \u05c2 point sin dot 05C2 , \u05bf point rafe 05BF 35 \u05bd \u02c8 point meteg 05BD 45 \u05bd \u02c8 point meteg 05BD 75 \u05bd \u02c8 point meteg 05BD 95 \u05bd \u02c8 point meteg 05BD 52 \u05c4 \u02c8 mark upper dot 05C4 53 \u05c5 \u02c8 mark lower dot 05C5 & 42; \u05af mark masora circle 05AF Punctuation ! abstract \"Details\" Some specialities in the Masoretic system are not reflected in the phonetics: setumah \u05e1 ; petuhah \u05e3 ; nun-hafuka \u0307\u05c6 . transcription ( ETCBC ) transcription (Abegg) glyph phonetic remarks name UNICODE 00 . \u05c3 . punctuation sof pasuq 05C3 n\u0303 \u05c6 punctuation nun hafukha 05C6 & - \u05be - punctuation maqaf 05BE & 95; (non breaking space) space 0020 0000 \u00b1 Dead Sea scrolls. We use as Hebrew character a double sof pasuq. paleo-divider 05C3 05C3 ' / \u05f3 Dead Sea scrolls. We use as Hebrew character a geresh. morpheme-break 05F3 Hybrid ! abstract \"Details\" There is a character that is mostly punctuation, but that can also influence the nature of some accents occurring in the word before. Such a character is a hybrid between punctuation and accent. See also the documentation of the BHSA about [cantillation](https: ETCBC.github.io/bhsa/cantillation/). transcription glyph phonetic remarks name UNICODE 05 \u05c0 punctuation paseq 05C0 Accents ! abstract \"Details\" Some accents play a role in deciding whether a schwa is silent or mobile and whether a qamets is gadol or qatan . In the phonetics those accents appear as \u02c8 or \u02cc . Implied accents are also added. transcription glyph phonetic remarks name UNICODE 94 \u05a7 \u02c8 accent darga 05A7 13 \u05ad \u02c8 accent dehi 05AD 92 \u0591 \u02c8 accent etnahta 0591 61 \u059c \u02c8 accent geresh 059C 11 \u059d \u02c8 accent geresh muqdam 059D 62 \u059e \u02c8 accent gershayim 059E 64 \u05ac \u02c8 accent iluy 05AC 70 \u05a4 \u02c8 accent mahapakh 05A4 71 \u05a5 \u02cc accent merkha 05A5 72 \u05a6 \u02c8 accent merkha kefula 05A6 74 \u05a3 \u02c8 accent munah 05A3 60 \u05ab \u02c8 accent ole 05AB 03 \u0599 accent pashta 0599 83 \u05a1 \u02c8 accent pazer 05A1 33 \u05a8 \u02c8 accent qadma 05A8 63 \u05a8 \u02cc accent qadma 05A8 84 \u059f \u02c8 accent qarney para 059F 81 \u0597 \u02c8 accent revia 0597 01 \u0592 accent segol 0592 65 \u0593 \u02c8 accent shalshelet 0593 04 \u05a9 accent telisha qetana 05A9 24 \u05a9 accent telisha qetana 05A9 14 \u05a0 accent telisha gedola 05A0 44 \u05a0 accent telisha gedola 05A0 91 \u059b \u02c8 accent tevir 059B 73 \u0596 \u02cc accent tipeha 0596 93 \u05aa \u02c8 accent yerah ben yomo 05AA 10 \u059a \u02c8 accent yetiv 059A 80 \u0594 \u02c8 accent zaqef qatan 0594 85 \u0595 \u02c8 accent zaqef gadol 0595 82 \u0598 \u02c8 accent zarqa 0598 02 \u05ae \u02c8 accent zinor 05AE Numerals ! abstract \"Details\" These signs occur in the Dead Sea scrolls. We represent them with conventional Hebrew characters for numbers and use the geresh accent or another accent to mark the letter as a numeral. The ETCBC codes are obtained by translating back from the UNICODE. transcription (ETCBC) transcription (Abegg) glyph remarks name >' A \u05d0\u059c number 1 >52 \u00e5 \u05d0\u05c4 alternative for 1, often at the end of a number, we use the upper dot to distinguish it from the other 1 number 1 >53 B \u05d0\u05c5 alternative for 1, often at the end of a number, we use the lower dot to distinguish it from the other 1 number 1 >35 \u222b \u05d0\u05bd alternative for 1, often at the end of a number, we use the meteg to distinguish it from the other 1 number 1 J' C \u05d9\u059c number 10 k' D \u05da\u059c number 20 Q' F \u05e7\u059c number 100 & + \u05be we use the maqaf to represent addition between numbers add Text-critical ! abstract \"Details\" These signs occur in the Dead Sea scrolls. They are used to indicate uncertainty and editing acts by ancient scribes or modern editors. They do not have an associated glyph in UNICODE. The ETCBC does not have codes for them, but we propose an ETCBC-compatible encoding for them. The ETCBC codes are surrounded by space, except for the brackets, where a space at the side of the ( or ) is not necessary. Codes that are marked as flag apply to the preceding character. Codes that are marked as brackets apply to the material within them. transcription (Abegg) transcription ( ETCBC ) remarks name 0 \u03b5 token missing ? ? token uncertain (degree 1) & 92; token uncertain (degree 2) \ufffd ? token uncertain (degree 3) \u00d8 ? flag, applies to preceding character uncertain (degree 1) \u00ab flag, applies to preceding character uncertain (degree 2) \u00bb ? flag, applies to preceding character uncertain (degree 3) & 124; flag, applies to preceding character uncertain (degree 4) \u00ab \u00bb ( ) brackets uncertain (degree 2) \u2264 \u2265 (- -) brackets vacat (empty space) ( ) ( ) brackets alternative [ ] [ ] brackets reconstruction (modern) { } { } brackets removed (modern) {& 123; {& 123; brackets removed (ancient) < > (< >) brackets correction (modern) << >> (<< >>) brackets correction (ancient) ^ ^ (^ ^) brackets correction (supralinear, ancient) " }, { "ref":"tf.writing.syriac", -"url":147, +"url":148, "doc":" Syriac Characters @font-face { font-family: \"Estrangelo Edessa\"; src: url('https: github.com/annotation/text-fabric/blob/master/tf/browser/static/fonts/SyrCOMEdessa.otf?raw=true'); src: url('https: github.com/annotation/text-fabric/blob/master/tf/browser/static/fonts/SyrCOMEdessa.woff?raw=true') format('woff'); } body { font-family: sans-serif; } table.chars { border-collapse: collapse; } table.chars thead tr { color: ffffff; background-color: 444444; } table.chars tbody td { border: 2px solid bbbbbb; padding: 0.1em 0.5em; } h1.chars { margin-top: 1em; } .t { font-family: monospace; font-size: large; color: 0000ff; } .g { font-family: \"Estrangelo Edessa\", sans-serif; font-size: x-large; } .p { font-family: monospace; font-size: large; color: 666600; } .r { font-family: sans-serif; font-size: small; color: 555555; } .n { font-family: sans-serif; color: 990000; font-size: small; } .u { font-family: monospace; color: 990000; } Letters transcription glyph phonetic remarks name UNICODE > \u0710 alaph 0710 B \u0712 beth 0712 G \u0713 gamal 0713 D \u0715 dalat 0715 H \u0717 he 0717 W \u0718 waw 0718 Z \u0719 zain 0719 X \u071a heth 071A V \u071b teth 071B J \u071d yod 071D K \u071f kaf 071F L \u0720 lamad 0720 M \u0721 mim 0721 N \u0722 nun 0722 S \u0723 semkath 0723 < \u0725 e 0725 P \u0726 pe 0726 Y \u0728 tsade 0728 Q \u0729 qof 0729 R \u072a resh 072A C \u072b shin 072B T \u072c taw 072C Word-bound diacritics transcription glyph phonetic remarks name UNICODE \" \u0308 seyame 0308 \u0323 diacritical dot below 0323 ^ \u0307 diacritical dot above 0307 Non-vocalic letter-bound diacritics transcription glyph phonetic remarks name UNICODE ^! \u0743 unclear (syriac two vertical dots above) 0743 vocalic letter-bound diacritics transcription glyph phonetic remarks name UNICODE : shewa A \u0733 qamets 0733 A1 \u0734 zeqapa 0734 A2 \u0735 zeqofo 0735 O \u073f holem, rewaha 073F @ \u0730 patah 0730 @1 \u0731 petaha 0731 @2 \u0732 petoho 0732 E \u0736 segol 0736 E1 \u0737 revasa arrika 0737 E2 \u0738 revoso 0738 I \u073a hireq 073A I1 \u073b hevoso 073B U \u073d qubbuts 073D U1 \u073e esoso 073E Punctuation transcription glyph phonetic remarks name UNICODE & 92; \u0709 tahtaya, metkashpana (WS), meshalyana (WS) 0709 =. . pasuqa 002E = \u0707 elaya 0707 =: : shewaya (WS), zauga (ES) 003A =^ \u0706 unclear (SYRIAC COLON SKEWED LEFT) 0706 =/ \u0707 elaya 0707 =& 92; \u0706 unclear (SYRIAC COLON SKEWED LEFT) 0706 ^: \u0703 taksa (WS), zauga elaya (ES) 0703 ^& 92; \u0708 unclear (SYRIAC SUPRALINEAR COLON SKEWED LEFT) 0708 Pericope markers transcription glyph phonetic remarks name UNICODE & 42; \u0700 rosette 0700 . \u00b7 common dot in caesuras 00B7 & 95; \u2014 dash in caesuras 2014 o \u2022 large dot in caesuras 2022 .md" }, { "ref":"tf.writing.neoaramaic", -"url":148, +"url":149, "doc":" Neo Aramaic transcriptions body { font-family: sans-serif; } pre.chars { border-collapse: collapse; color: 000080; font-family: monospace; font-size: medium; line-height: 1.0; } The following table is provided by the collectors of the [NENA](https: github.com/CambridgeSemiticsLab/nena_corpus) corpus at [Cambridge Semitics Lab](https: github.com/CambridgeSemiticsLab). There is also a [PDF]( /images/neoaramaic.pdf) of the table below. Vowel inventory and conversions Special vowel signs \u250f \u2501\u2533 \u252f \u252f \u252f \u2501\u252f \u2501\u252f \u252f \u2501\u252f \u2501\u252f\u2501\u252f \u252f\u2501\u252f\u2501\u252f \u252f \u252f\u2501\u252f\u2501\u252f \u252f\u2501\u252f \u252f \u252f\u2501\u252f\u2501\u2513\u0010 \u2503 \u2503\u00e1 \u2502\u00e0 \u2502\u0101 \u2502\u0101\u0300 \u2502\u0101\u0301 \u2502\u0103 \u2502\u1eaf \u2502\u1eb1 \u2502e\u2502\u0113 \u2502\u025b\u2502i\u2502\u012b \u2502\u012d \u2502\u0259\u2502o\u2502\u014d \u2502u\u2502\u016b \u2502\u016d \u2502\u0131\u2502\u0251\u2503 \u2520 \u2500\u2542 \u253c \u253c \u253c \u2500\u253c \u2500\u253c \u253c \u2500\u253c \u2500\u253c\u2500\u253c \u253c\u2500\u253c\u2500\u253c \u253c \u253c\u2500\u253c\u2500\u253c \u253c\u2500\u253c \u253c \u253c\u2500\u253c\u2500\u2528 \u2503precise match\u2503a'\u2502a \u2502a-\u2502a- \u2502a-'\u2502a>\u2502a>'\u2502a> \u2502e\u2502e-\u25023\u2502i\u2502i-\u2502i>\u25029\u2502o\u2502o-\u2502u\u2502u-\u2502u Symbol inventory for conversions Special signs alphabetical \u250f \u2501\u2533 \u252f\u2501\u252f\u2501\u252f \u252f \u252f \u2501\u252f \u2501\u252f \u252f\u2501\u252f\u2501\u252f \u252f \u252f\u2501\u252f \u252f \u252f \u252f \u2501\u252f \u252f \u252f \u252f \u252f \u252f\u2501\u252f\u2501\u252f \u252f \u2513\u0010 \u2503 \u2503\u02be \u2502\u02bf\u2502c\u2502c\u032d \u2502\u010d \u2502\u010d\u032d \u2502\u010d\u0323 \u2502\u1e0d \u2502\u00f0\u2502\u00f0\u0323\u2502\u0121 \u2502\u1e25 \u2502\u025f\u2502k\u032d \u2502\u1e37 \u2502\u1e43 \u2502p\u032d,p\u030c\u2502p\u0323 \u2502\u1e5b \u2502\u1e63 \u2502\u0161 \u2502\u1e71 \u2502\u1e6d\u2502\u03b8\u2502\u017e \u2502\u1e93 \u2503 \u2520 \u2500\u2542 \u253c\u2500\u253c\u2500\u253c \u253c \u253c \u2500\u253c \u2500\u253c \u253c\u2500\u253c\u2500\u253c \u253c \u253c\u2500\u253c \u253c \u253c \u253c \u2500\u253c \u253c \u253c \u253c \u253c \u253c\u2500\u253c\u2500\u253c \u253c \u2528 \u2503precise match\u2503) \u2502(\u2502c\u2502c c\u2502>c c.\u2502d.\u25026\u25026\u2502g.\u2502h.\u25024\u2502k s\u2502t z\u2502z.\u2503 \u2503lite \u2503) \u2502(\u2502c\u2502c \u25025 \u2502 \u2502% \u2502D \u25026\u2502^\u2502G \u2502H \u25024\u2502& \u2502L \u2502M \u2502p \u2502P \u2502R \u2502S \u2502$ \u2502+ \u2502T\u25028\u25027 \u2502Z \u2503 \u2503fuzzy_all \u2503ignore\u2502(\u2502 \u2502 \u25025 \u25025 \u25025 \u2502d \u2502d\u2502d\u2502g \u2502h \u2502 \u2502 \u2502l \u2502m \u2502p \u2502p \u2502r \u2502s \u2502s \u2502t \u2502t\u2502 \u2502z \u2502z \u2503 \u2503fuzzy_Urmi \u2503 \u2502 \u2502k\u2502k \u2502 \u2502 \u2502 \u2502 \u2502 \u2502 \u2502 \u2502 \u2502g\u2502q \u2502 \u2502 \u2502 \u2502 \u2502 \u2502 \u2502 \u2502 \u2502 \u2502 \u2502 \u2502 \u2503 \u2503fuzzy_Barwar \u2503 \u2502 \u2502 \u2502 \u2502 \u2502 \u2502 \u2502 \u2502d\u2502 \u2502 \u2502 \u2502 \u2502k \u2502 \u2502 \u2502 \u2502 \u2502 \u2502 \u2502 \u2502 \u2502 \u2502t\u2502 \u2502 \u2503 \u2517 \u2501\u253b \u2537\u2501\u2537\u2501\u2537 \u2537 \u2537 \u2501\u2537 \u2501\u2537 \u2537\u2501\u2537\u2501\u2537 \u2537 \u2537\u2501\u2537 \u2537 \u2537 \u2537 \u2501\u2537 \u2537 \u2537 \u2537 \u2537 \u2537\u2501\u2537\u2501\u2537 \u2537 \u251b\u0010 Capitals \u250f \u2501\u2533\u2501\u252f\u2501\u252f\u2501\u252f\u2501\u252f\u2501\u252f\u2501\u252f\u2501\u252f\u2501\u252f\u2501\u252f\u2501\u252f\u2501\u252f\u2501\u252f\u2501\u252f\u2501\u252f\u2501\u2513\u0010 \u2503 \u2503\u1e0d\u2502\u0121\u2502\u1e25\u2502\u1e37\u2502\u1e43\u2502p\u0323\u2502\u1e5b\u2502\u1e63\u2502\u1e6d\u2502\u1e93\u2502\u0101\u2502\u0113\u2502\u012b\u2502\u014d\u2502\u016b\u2503 \u2520 \u2500\u2542\u2500\u253c\u2500\u253c\u2500\u253c\u2500\u253c\u2500\u253c\u2500\u253c\u2500\u253c\u2500\u253c\u2500\u253c\u2500\u253c\u2500\u253c\u2500\u253c\u2500\u253c\u2500\u253c\u2500\u2528 \u2503lite \u2503D\u2502G\u2502H\u2502L\u2502M\u2502P\u2502R\u2502S\u2502T\u2502Z\u2502A\u2502E\u2502I\u2502O\u2502U\u2503 \u2517 \u2501\u253b\u2501\u2537\u2501\u2537\u2501\u2537\u2501\u2537\u2501\u2537\u2501\u2537\u2501\u2537\u2501\u2537\u2501\u2537\u2501\u2537\u2501\u2537\u2501\u2537\u2501\u2537\u2501\u2537\u2501\u251b\u0010 Special symbols \u250f \u2501\u2533\u2501\u252f\u2501\u252f\u2501\u252f\u2501\u252f\u2501\u252f\u2501\u252f\u2501\u2513\u0010 \u2503 \u2503\u010d\u032d\u2502\u010d\u0323\u2502\u00f0\u0323\u2502k\u032d\u2502\u1e71\u2502\u0161\u2502\u0103\u2503 \u2520 \u2500\u2542\u2500\u253c\u2500\u253c\u2500\u253c\u2500\u253c\u2500\u253c\u2500\u253c\u2500\u2528 \u2503lite \u2503 \u2502%\u2502^\u2502&\u2502+\u2502$\u2502@\u2503 \u2517 \u2501\u253b\u2501\u2537\u2501\u2537\u2501\u2537\u2501\u2537\u2501\u2537\u2501\u2537\u2501\u251b\u0010 Numbers \u250f \u2501\u2533\u2501\u252f\u2501\u252f\u2501\u252f\u2501\u252f\u2501\u252f\u2501\u252f\u2501\u252f\u2501\u2513\u0010 \u2503 \u2503\u016d\u2502\u025b\u2502\u025f\u2502\u010d\u2502\u00f0\u2502\u017e\u2502\u03b8\u2502\u0259\u2503 \u2520 \u2500\u2542\u2500\u253c\u2500\u253c\u2500\u253c\u2500\u253c\u2500\u253c\u2500\u253c\u2500\u253c\u2500\u2528 \u2503lite \u25032\u25023\u25024\u25025\u25026\u25027\u25028\u25029\u2503 \u2517 \u2501\u253b\u2501\u2537\u2501\u2537\u2501\u2537\u2501\u2537\u2501\u2537\u2501\u2537\u2501\u2537\u2501\u251b\u0010 Consonant phoneme inventory: Lite and fuzzy conversions Legend lt = lite fz = fuzzy fzUr = fuzzy Urmi \u0454 = empty \u250f \u2533 \u2501\u2533 \u2533 \u2533 \u2501\u2533 \u2501\u2533 \u2533 \u2501\u2533 \u2501\u2513\u0010 \u2503 \u2503labial \u2503dental- \u2503palatal-\u2503palatal\u2503(post-)\u2503uvular\u2503pharyn-\u2503laryn- \u2503 \u2503 \u2503 \u2503alveolar\u2503alveolar\u2503 \u2503velar \u2503 \u2503geal \u2503geal \u2503 \u2523 \u254b \u2501\u252f \u252f \u254b\u2501\u252f \u252f \u2501\u254b\u2501\u252f \u252f \u2501\u254b\u2501\u252f \u252f \u254b\u2501\u252f \u252f \u254b \u252f \u2501\u254b\u2501\u252f \u252f \u254b\u2501\u252f \u252f \u252b \u2503Stops/affricates \u2503 \u2502lt\u2502fz\u2503 \u2502lt\u2502fz \u2503 \u2502lt\u2502fz \u2503 \u2502lt\u2502fz\u2503 \u2502lt\u2502fz\u2503 \u2502 \u2503 \u2502lt\u2502fz\u2503 \u2502lt\u2502fz\u2503 \u2503 \u2503 \u2502 \u2502 \u2503 \u2502 \u2502 \u2503 \u2502 \u2502 \u2503 \u2502 \u2502Ur\u2503 \u2502 \u2502Ur\u2503 \u2502 \u2503 \u2502 \u2502 \u2503 \u2502 \u2502 \u2503 \u2520 \u2542 \u2500\u253c \u253c \u2542\u2500\u253c \u253c \u2500\u2542\u2500\u253c \u253c \u2500\u2542\u2500\u253c \u253c \u2542\u2500\u253c \u253c \u2542 \u253c \u2500\u2542\u2500\u2534 \u2534 \u2542\u2500\u253c \u253c \u2528 \u2503Unvoiced aspirated \u2503p \u2502p \u2502 \u2503t\u2502t \u2502t \u2503\u010d\u25025 \u25025 \u2503c\u2502c \u2502k \u2503k\u2502k \u2502k \u2503 q\u2502q \u2503 \u2503\u02be\u2502) \u2502\u0454 \u2503 \u2503Unvoiced unaspirated\u2503p\u032d,p\u030c\u2502p \u2502p \u2503\u1e71\u2502+ \u2502t \u2503\u010d\u032d\u2502 \u25025 \u2503c\u032d\u2502c \u2502k \u2503k\u032d\u2502& \u2502q \u2503 \u2502 \u2503 \u2503 \u2502 \u2502 \u2503 \u2503Voiced \u2503b \u2502b \u2502 \u2503d\u2502d \u2502d \u2503j\u2502j \u2502j \u2503\u025f\u25024 \u2502g \u2503g\u2502g \u2502g \u2503 \u2502 \u2503 \u2503 \u2502 \u2502 \u2503 \u2503Emphatic \u2503p\u0323 \u2502P \u2502p \u2503\u1e6d\u2502T \u2502t \u2503\u010d\u2502% \u25025 \u2503 \u2502 \u2502 \u2503 \u2502 \u2502 \u2503 \u2502 \u2503 \u2503 \u2502 \u2502 \u2503 \u2503 \u2503 \u2502 \u2502 \u2503\u1e0d\u2502D \u2502d \u2503 \u2502 \u2502 \u2503 \u2502 \u2502 \u2503 \u2502 \u2502 \u2503 \u2502 \u2503 \u2503 \u2502 \u2502 \u2503 \u2523 \u254b \u2501\u2537 \u2537 \u254b\u2501\u2537 \u2537 \u2501\u254b\u2501\u2537 \u2537 \u2501\u253b\u2501\u2537 \u2537 \u254b\u2501\u2537 \u2537 \u254b \u2537 \u2501\u254b \u2501\u254b\u2501\u2537 \u2537 \u252b \u2503Fricatives \u2503 \u2503 \u2503 \u2503 \u2503 \u2503 \u2503 \u2503 \u2520 \u2542 \u2500\u252c \u252c \u2542\u2500\u252c \u252c \u2500\u2528 \u2520\u2500\u252c \u252c \u2528 \u2520\u2500\u252c \u252c \u2542\u2500\u252c \u252c \u2528 \u2503Unvoiced \u2503f \u2502f \u2502f \u2503\u03b8\u25028 \u2502t \u2503 \u2503x\u2502x \u2502x \u2503 \u2503\u1e25\u2502H \u2502h \u2503h\u2502h \u2502h \u2503 \u2503Voiced \u2503v \u2502v \u2502w \u2503\u00f0\u25026 \u2502d \u2503 \u2503\u0121\u2502G \u2502g \u2503 \u2503 \u2502 \u2502 \u2503 \u2502 \u2502 \u2503 \u2503Emphatic \u2503 \u2502 \u2502 \u2503\u00f0\u0323\u2502^ \u2502d \u2503 \u2503 \u2502 \u2502 \u2503 \u2503\u02bf\u2502( \u2502( \u2503 \u2502 \u2502 \u2503 \u2523 \u254b \u2501\u2537 \u2537 \u254b\u2501\u2537 \u2537 \u2501\u254b \u2513 \u2517\u2501\u2537 \u2537 \u251b \u2517\u2501\u2537 \u2537 \u253b\u2501\u2537 \u2537 \u252b \u2503Sibilants \u2503 \u2503 \u2503 \u2503 \u2503 \u2520 \u2528 \u2520\u2500\u252c \u252c \u2500\u2542\u2500\u252c \u252c \u2500\u2528 \u2503 \u2503Unvoiced \u2503 \u2503s\u2502s \u2502s \u2503\u0161\u2502$ \u2502s \u2503 \u2503 \u2503Voiced \u2503 \u2503z\u2502z \u2502z \u2503\u017e\u25027 \u2502z \u2503 \u2503 \u2503Emphatic \u2503 \u2503\u1e63\u2502S \u2502s \u2503 \u2502 \u2502 \u2503 \u2503 \u2503 \u2503 \u2503\u1e93\u2502Z \u2502z \u2503 \u2502 \u2502 \u2503 \u2503 \u2523 \u254b \u2501\u254b\u2501\u2537 \u2537 \u2501\u254b\u2501\u2537 \u2537 \u2501\u251b \u2503 \u2503Nasals \u2503 \u2503 \u2503 \u2503 \u2520 \u2542 \u2500\u252c \u252c \u2542\u2500\u252c \u252c \u2500\u2528 \u2503 \u2503Plain \u2503m \u2502m \u2502m \u2503n\u2502n \u2502n \u2503 \u2503 \u2503Emphatic \u2503\u1e43 \u2502M \u2502m \u2503 \u2502 \u2502 \u2503 \u2503 \u2523 \u254b \u2501\u2537 \u2537 \u254b\u2501\u2537 \u2537 \u2501\u252b \u2503 \u2503Laterals \u2503 \u2503 \u2503 \u2503 \u2520 \u2528 \u2520\u2500\u252c \u252c \u2500\u2528 \u2503 \u2503Plain \u2503 \u2503l\u2502l \u2502l \u2503 \u2503 \u2503Emphatic \u2503 \u2503\u1e37\u2502L \u2502l \u2503 \u2503 \u2523 \u254b \u2501\u254b\u2501\u2537 \u2537 \u2501\u252b \u250f \u2501\u2513 \u2503 \u2503Other approximants \u2503 \u2503 \u2503 \u2503 \u2503 \u2503 \u2520 \u2542 \u2500\u252c \u252c \u2542\u2500\u252c \u252c \u2500\u2528 \u2520\u2500\u252c \u252c \u2528 \u2503 \u2503Plain \u2503w \u2502w \u2502w \u2503r\u2502r \u2502r \u2503 \u2503y\u2502y \u2502y \u2503 \u2503 \u2503Emphatic \u2503 \u2502 \u2502 \u2503\u1e5b\u2502R \u2502r \u2503 \u2503 \u2502 \u2502 \u2503 \u2503 \u2517 \u253b \u2501\u2537 \u2537 \u253b\u2501\u2537 \u2537 \u2501\u253b \u253b\u2501\u2537 \u2537 \u253b \u251b\u0010 " }, { "ref":"tf.writing.ugaritic", -"url":149, +"url":150, "doc":" Ugaritic Characters @font-face { font-family: \"Santakku\"; src: local('Santakku'), url('/browser/static/fonts/Santakku.woff') format('woff'), url('https: github.com/annotation/text-fabric/blob/master/tf/browser/static/fonts/Santakku.woff?raw=true') format('woff'); } body { font-family: sans-serif; } table.chars { border-collapse: collapse; } table.chars thead tr { color: ffffff; background-color: 444444; } table.chars tbody td { border: 2px solid bbbbbb; padding: 0.1em 0.5em; } h1.chars { margin-top: 1em; } .t { font-family: monospace; font-size: large; color: 0000ff; } .g { font-family: \"Santakku\", sans-serif; font-size: x-large; } .p { font-family: monospace; font-size: large; color: 666600; } .r { font-family: sans-serif; font-size: small; color: 555555; } .n { font-family: sans-serif; color: 990000; font-size: small; } .u { font-family: monospace; color: 990000; } Letters and word separator \u0383 \u038c transcription glyph phonetic remarks name UNICODE a \ud800\udf80 \u0294a alpa 10380 b \ud800\udf81 b beta 10381 g \ud800\udf82 g gamla 10382 \u1e2b \ud800\udf83 x kha 10383 d \ud800\udf84 d delta 10384 h \ud800\udf85 h ho 10385 w \ud800\udf86 w wo 10386 z \ud800\udf87 z zeta 10387 \u1e25 \ud800\udf88 \u0127 hota 10388 \u1e6d \ud800\udf89 t\u02e4 tet 10389 y \ud800\udf8a j yod 1038A k \ud800\udf8b k kaf 1038B \u0161 \ud800\udf8c \u0283 shin 1038C l \ud800\udf8d l lamda 1038D m \ud800\udf8e m mem 1038E \u1e0f \ud800\udf8f \u00f0 dhal 1038F n \ud800\udf90 n nun 10390 \u1e93 \ud800\udf91 \u03b8\u02e4 zu 10391 s \ud800\udf92 s samka 10392 \u02e4 \ud800\udf93 \u0295 ain 10393 p \ud800\udf94 p pu 10394 \u1e63 \ud800\udf95 s\u02e4 sade 10395 q \ud800\udf96 q qopa 10396 r \ud800\udf97 r rasha 10397 \u1e6f \ud800\udf98 \u03b8 thanna 10398 \u0121 \ud800\udf99 \u0263 ghain 10399 t \ud800\udf9a t to 1039A i \ud800\udf9b \u0294i i 1039B u \ud800\udf9c \u0294u u 1039C s2 \ud800\udf9d su ssu 1039D . \ud800\udf9f divider 1039F " } ] \ No newline at end of file diff --git a/tf/about/annotate.html b/tf/about/annotate.html index dabaf10be..cb6909c03 100644 --- a/tf/about/annotate.html +++ b/tf/about/annotate.html @@ -298,7 +298,7 @@
"""
.. include:: ../docs/about/annotate.md
diff --git a/tf/about/annotateBrowser.html b/tf/about/annotateBrowser.html
index f3b764e0a..a631fb5cd 100644
--- a/tf/about/annotateBrowser.html
+++ b/tf/about/annotateBrowser.html
@@ -267,7 +267,7 @@ Programming
Expand source code
-Browse git
+Browse git
"""
.. include:: ../docs/about/annotateBrowser.md
diff --git a/tf/about/apps.html b/tf/about/apps.html
index a7954f0bd..be6ea86da 100644
--- a/tf/about/apps.html
+++ b/tf/about/apps.html
@@ -115,7 +115,7 @@ Two contexts
Expand source code
-Browse git
+Browse git
"""
.. include:: ../docs/about/apps.md
diff --git a/tf/about/background.html b/tf/about/background.html
index b58408d14..e823387b2 100644
--- a/tf/about/background.html
+++ b/tf/about/background.html
@@ -155,7 +155,7 @@ History
Expand source code
-Browse git
+Browse git
"""
.. include:: ../docs/about/background.md
diff --git a/tf/about/browser.html b/tf/about/browser.html
index 83dc001b1..4fefa69f6 100644
--- a/tf/about/browser.html
+++ b/tf/about/browser.html
@@ -174,7 +174,7 @@ UNICODE in Excel CSVs
Expand source code
-Browse git
+Browse git
"""
.. include:: ../docs/about/browser.md
diff --git a/tf/about/clientmanual.html b/tf/about/clientmanual.html
index c1464588e..8534cb313 100644
--- a/tf/about/clientmanual.html
+++ b/tf/about/clientmanual.html
@@ -565,7 +565,7 @@ Credits
Expand source code
-Browse git
+Browse git
"""
.. include:: ../docs/about/clientmanual.md
diff --git a/tf/about/code.html b/tf/about/code.html
index c85ff5c80..a094df1f9 100644
--- a/tf/about/code.html
+++ b/tf/about/code.html
@@ -98,7 +98,7 @@ Writing
Expand source code
-Browse git
+Browse git
"""
.. include:: ../docs/about/code.md
diff --git a/tf/about/corpora.html b/tf/about/corpora.html
index 5f09c358f..04a051d1c 100644
--- a/tf/about/corpora.html
+++ b/tf/about/corpora.html
@@ -351,7 +351,7 @@ Extra data
Expand source code
-Browse git
+Browse git
"""
.. include:: ../docs/about/corpora.md
diff --git a/tf/about/datamodel.html b/tf/about/datamodel.html
index 9187c15cd..11804c8dd 100644
--- a/tf/about/datamodel.html
+++ b/tf/about/datamodel.html
@@ -265,7 +265,7 @@ Serializing and pre-computing
Expand source code
-Browse git
+Browse git
"""
.. include:: ../docs/about/datamodel.md
diff --git a/tf/about/datasharing.html b/tf/about/datasharing.html
index 03a7a1362..193888099 100644
--- a/tf/about/datasharing.html
+++ b/tf/about/datasharing.html
@@ -362,7 +362,7 @@ More modules at the same time
Expand source code
-Browse git
+Browse git
"""
.. include:: ../docs/about/datasharing.md
diff --git a/tf/about/displaydesign.html b/tf/about/displaydesign.html
index 657af490a..5872bb433 100644
--- a/tf/about/displaydesign.html
+++ b/tf/about/displaydesign.html
@@ -151,7 +151,7 @@ Output
Expand source code
-Browse git
+Browse git
"""
.. include:: ../docs/about/displaydesign.md
diff --git a/tf/about/faq.html b/tf/about/faq.html
index 462601d48..4e7dc4f6c 100644
--- a/tf/about/faq.html
+++ b/tf/about/faq.html
@@ -161,7 +161,7 @@ GitHub Rate Limit Exceeded!
Expand source code
-Browse git
+Browse git
"""
.. include:: ../docs/about/faq.md
diff --git a/tf/about/fileformats.html b/tf/about/fileformats.html
index 8c93d628f..ebe17b2ad 100644
--- a/tf/about/fileformats.html
+++ b/tf/about/fileformats.html
@@ -158,7 +158,7 @@ Single values
Expand source code
-Browse git
+Browse git
"""
.. include:: ../docs/about/fileformats.md
diff --git a/tf/about/index.html b/tf/about/index.html
index cc57eec1b..d40bfd9b0 100644
--- a/tf/about/index.html
+++ b/tf/about/index.html
@@ -33,7 +33,7 @@ Documents
Expand source code
-Browse git
+Browse git
"""
# Documents
diff --git a/tf/about/install.html b/tf/about/install.html
index 8d7a47aa3..4d7befe77 100644
--- a/tf/about/install.html
+++ b/tf/about/install.html
@@ -108,7 +108,7 @@ Note for Linux users
Expand source code
-Browse git
+Browse git
"""
.. include:: ../docs/about/install.md
diff --git a/tf/about/manual.html b/tf/about/manual.html
index 8fe39011f..17845041e 100644
--- a/tf/about/manual.html
+++ b/tf/about/manual.html
@@ -390,7 +390,7 @@ Keyboard shortcuts
Expand source code
-Browse git
+Browse git
"""
.. include:: ../docs/about/manual.md
diff --git a/tf/about/optimizations.html b/tf/about/optimizations.html
index d60bcc002..51f7778f4 100644
--- a/tf/about/optimizations.html
+++ b/tf/about/optimizations.html
@@ -187,7 +187,7 @@ Edge features
Expand source code
-Browse git
+Browse git
"""
.. include:: ../docs/about/optimizations.md
diff --git a/tf/about/releases.html b/tf/about/releases.html
index 67b709532..43189c8f0 100644
--- a/tf/about/releases.html
+++ b/tf/about/releases.html
@@ -43,6 +43,17 @@ Release notes
12
12.4
+12.4.3
+2024-05-08
+Fix in TF browser, spotted by Jorik Groen.
+When exporting query results, the values features used in the query were not written
+to the table at all.
+The expected behaviour was that features used in the query lead to extra columns
+in the exported table.
+It has been fixed. The cause was an earlier fix in the display of features in query
+results.
+This new fix only affects the export function from the browser, not the
+advanced.display.export
function, which did not have this bug.
12.4.2
2024-04-24
Tiny fixes in the TEI and WATM conversions.
@@ -374,7 +385,7 @@ Older releases
Expand source code
-Browse git
+Browse git
"""
.. include:: ../docs/about/releases.md
@@ -448,6 +459,7 @@ Index
Release notes
- 12
- 12.4
+- 12.4.3
- 12.4.2
- 12.4.1
- 12.4.0
diff --git a/tf/about/releasesold.html b/tf/about/releasesold.html
index c3e1e7813..96881a321 100644
--- a/tf/about/releasesold.html
+++ b/tf/about/releasesold.html
@@ -3430,7 +3430,7 @@ Changed
Expand source code
-Browse git
+Browse git
"""
.. include:: ../docs/about/releasesold.md
diff --git a/tf/about/searchdesign.html b/tf/about/searchdesign.html
index 9a521b97b..9bd9fa7e8 100644
--- a/tf/about/searchdesign.html
+++ b/tf/about/searchdesign.html
@@ -477,7 +477,7 @@ Small-first strategy
Expand source code
-Browse git
+Browse git
"""
.. include:: ../docs/about/searchdesign.md
diff --git a/tf/about/searchusage.html b/tf/about/searchusage.html
index 62c6e82b5..f94275379 100644
--- a/tf/about/searchusage.html
+++ b/tf/about/searchusage.html
@@ -776,7 +776,7 @@ Implementation
Expand source code
-Browse git
+Browse git
"""
.. include:: ../docs/about/searchusage.md
diff --git a/tf/about/tests.html b/tf/about/tests.html
index 719d02b25..3b394fd70 100644
--- a/tf/about/tests.html
+++ b/tf/about/tests.html
@@ -71,7 +71,7 @@ Relations
Expand source code
-Browse git
+Browse git
"""
.. include:: ../docs/about/tests.md
diff --git a/tf/about/use.html b/tf/about/use.html
index a0e83b1d3..fe1eb1c0a 100644
--- a/tf/about/use.html
+++ b/tf/about/use.html
@@ -90,7 +90,7 @@ TF API
Expand source code
-Browse git
+Browse git
"""
.. include:: ../docs/about/use.md
diff --git a/tf/about/usefunc.html b/tf/about/usefunc.html
index 36478eed5..ca472ff29 100644
--- a/tf/about/usefunc.html
+++ b/tf/about/usefunc.html
@@ -419,7 +419,7 @@ Prevent data loading
Expand source code
-Browse git
+Browse git
"""
.. include:: ../docs/about/usefunc.md
diff --git a/tf/about/variants.html b/tf/about/variants.html
index de587244d..1c8f2e5e5 100644
--- a/tf/about/variants.html
+++ b/tf/about/variants.html
@@ -438,7 +438,7 @@ The stack
Expand source code
-Browse git
+Browse git
"""
.. include:: ../docs/about/variants.md
diff --git a/tf/about/volumes.html b/tf/about/volumes.html
index d97f364cf..e0d4c587b 100644
--- a/tf/about/volumes.html
+++ b/tf/about/volumes.html
@@ -323,7 +323,7 @@ Reflection
Expand source code
-Browse git
+Browse git
"""
.. include:: ../docs/about/volumes.md
diff --git a/tf/advanced/annotate.html b/tf/advanced/annotate.html
index 5bb09526c..6da258030 100644
--- a/tf/advanced/annotate.html
+++ b/tf/advanced/annotate.html
@@ -34,7 +34,7 @@ Module tf.advanced.annotate
Expand source code
-Browse git
+Browse git
"""
Enable manual annotation APIs.
@@ -84,7 +84,7 @@ Parameters
Expand source code
-Browse git
+Browse git
def annotateApi(app):
"""Produce the interchange functions API.
@@ -106,7 +106,7 @@ Parameters
Expand source code
-Browse git
+Browse git
def makeNer(app):
return NER(app)
diff --git a/tf/advanced/app.html b/tf/advanced/app.html
index 4e8697c68..a93df83cf 100644
--- a/tf/advanced/app.html
+++ b/tf/advanced/app.html
@@ -31,7 +31,7 @@ Module tf.advanced.app
Expand source code
-Browse git
+Browse git
import types
import traceback
@@ -834,7 +834,7 @@ Parameters
Expand source code
-Browse git
+Browse git
def findApp(
appName,
@@ -1129,7 +1129,7 @@ Returns
Expand source code
-Browse git
+Browse git
def loadApp(silent=DEEP):
"""Loads a given TF app or loads the TF app based on the working directory.
@@ -1197,7 +1197,7 @@ See Also
Expand source code
-Browse git
+Browse git
def useApp(appName, backend):
"""Make use of a corpus.
@@ -1299,7 +1299,7 @@ Parameters
Expand source code
-Browse git
+Browse git
class App:
def __init__(
@@ -1665,7 +1665,7 @@ Parameters
Expand source code
-Browse git
+Browse git
def hoist(self, hoist, silent=None):
"""Hoist the API handles of this TF app to the global scope.
@@ -1733,7 +1733,7 @@ Returns
Expand source code
-Browse git
+Browse git
def load(self, features, silent=SILENT_D):
"""Loads extra features in addition to the main dataset.
@@ -1771,7 +1771,7 @@ Returns
Expand source code
-Browse git
+Browse git
def reinit(self):
"""TF Apps may override this method.
@@ -1806,7 +1806,7 @@ Parameters
Expand source code
-Browse git
+Browse git
def reuse(self, hoist=False):
"""Re-initialize the app.
diff --git a/tf/advanced/condense.html b/tf/advanced/condense.html
index 93c85aca7..c6ec9f2bd 100644
--- a/tf/advanced/condense.html
+++ b/tf/advanced/condense.html
@@ -31,7 +31,7 @@ Module tf.advanced.condense
Expand source code
-Browse git
+Browse git
def condense(api, tuples, condenseType, multiple=False):
F = api.F
@@ -135,7 +135,7 @@ Functions
Expand source code
-Browse git
+Browse git
def condense(api, tuples, condenseType, multiple=False):
F = api.F
@@ -192,7 +192,7 @@ Functions
Expand source code
-Browse git
+Browse git
def condenseSet(api, tup, condenseType):
F = api.F
diff --git a/tf/advanced/data.html b/tf/advanced/data.html
index 64c8dbe8e..88dccad02 100644
--- a/tf/advanced/data.html
+++ b/tf/advanced/data.html
@@ -31,7 +31,7 @@ Module tf.advanced.data
Expand source code
-Browse git
+Browse git
from ..core.helpers import itemize
from ..core.files import backendRep, expandDir, prefixSlash, normpath
@@ -496,7 +496,7 @@ Parameters
Expand source code
-Browse git
+Browse git
def getModulesData(*args):
"""Retrieve all data for a corpus.
@@ -556,7 +556,7 @@ Parameters
Expand source code
-Browse git
+Browse git
class AppData:
def __init__(
@@ -989,7 +989,7 @@ See Also
Expand source code
-Browse git
+Browse git
def getExtra(self):
"""Get the extra data specified by the settings of the corpus.
@@ -1051,7 +1051,7 @@ See Also
Expand source code
-Browse git
+Browse git
def getMain(self):
"""Get the main data of the corpus.
@@ -1126,7 +1126,7 @@ Parameters
Expand source code
-Browse git
+Browse git
def getModule(
self,
@@ -1271,7 +1271,7 @@ Parameters
Expand source code
-Browse git
+Browse git
def getModules(self):
"""Get data from additional local directories.
@@ -1342,7 +1342,7 @@ Parameters
Expand source code
-Browse git
+Browse git
def getRefs(self):
"""Get data from additional modules.
@@ -1397,7 +1397,7 @@ See Also
Expand source code
-Browse git
+Browse git
def getStandard(self):
"""Get the data of the standard modules specified by the settings of the corpus.
diff --git a/tf/advanced/display.html b/tf/advanced/display.html
index 52d06abc5..81013f558 100644
--- a/tf/advanced/display.html
+++ b/tf/advanced/display.html
@@ -69,7 +69,7 @@ See also
Expand source code
-Browse git
+Browse git
"""
# Display
@@ -1126,7 +1126,7 @@ Parameters
Expand source code
-Browse git
+Browse git
def displayApi(app, silent=SILENT_D):
"""Produce the display API.
@@ -1193,7 +1193,7 @@ Parameters
Expand source code
-Browse git
+Browse git
def displayReset(app, *options):
"""Restore display parameters to their defaults.
@@ -1251,7 +1251,7 @@ See Also
Expand source code
-Browse git
+Browse git
def displaySetup(app, *show, **options):
"""Set up all display parameters.
@@ -1309,7 +1309,7 @@ See Also
Expand source code
-Browse git
+Browse git
def displayShow(app, *options):
"""Show display parameters.
@@ -1444,7 +1444,7 @@ Results
Expand source code
-Browse git
+Browse git
def export(app, tuples, toDir=None, toFile="results.tsv", **options):
"""Exports an iterable of tuples of nodes to an Excel friendly TSV file.
@@ -1589,7 +1589,7 @@ Returns
Expand source code
-Browse git
+Browse git
def getCss(app):
"""Export the CSS for this app.
@@ -1639,7 +1639,7 @@ Returns
Expand source code
-Browse git
+Browse git
def getToolCss(app, tool):
"""Export the CSS for a tool of this app.
@@ -1689,7 +1689,7 @@ Returns
Expand source code
-Browse git
+Browse git
def loadCss(app):
"""Load the CSS for this app.
@@ -1768,7 +1768,7 @@ Returns
Expand source code
-Browse git
+Browse git
def loadToolCss(app, tool, extraCss):
"""Load the Tool CSS for this app.
@@ -1840,7 +1840,7 @@ Result
Expand source code
-Browse git
+Browse git
def plain(app, n, _inTuple=False, _asString=False, explain=False, **options):
"""Display the plain text of a node.
@@ -1926,7 +1926,7 @@ Result
Expand source code
-Browse git
+Browse git
def plainTuple(
app,
@@ -2169,7 +2169,7 @@ Result
Expand source code
-Browse git
+Browse git
def pretty(app, n, explain=False, _asString=False, **options):
"""Displays the material that corresponds to a node in a graphical way.
@@ -2248,7 +2248,7 @@ Result
Expand source code
-Browse git
+Browse git
def prettyTuple(app, tup, seq=None, _asString=False, item=RESULT, **options):
"""Displays the material that corresponds to a tuple of nodes in a graphical way.
@@ -2361,7 +2361,7 @@ Result
Expand source code
-Browse git
+Browse git
def show(app, tuples, _asString=False, **options):
"""Displays an iterable of tuples of nodes.
@@ -2464,7 +2464,7 @@ Parameters
Expand source code
-Browse git
+Browse git
def table(app, tuples, _asString=False, **options):
"""Plain displays of an iterable of tuples of nodes in a table.
diff --git a/tf/advanced/find.html b/tf/advanced/find.html
index 0641225bf..a40e80687 100644
--- a/tf/advanced/find.html
+++ b/tf/advanced/find.html
@@ -31,7 +31,7 @@ Module tf.advanced.find
Expand source code
-Browse git
+Browse git
import sys
from importlib import util
@@ -270,7 +270,7 @@ Returns
Expand source code
-Browse git
+Browse git
def findAppClass(appName, appPath):
"""Find the class definition of an app.
@@ -325,7 +325,7 @@ See Also
Expand source code
-Browse git
+Browse git
def findAppConfig(
appName,
@@ -474,7 +474,7 @@ Parameters
Expand source code
-Browse git
+Browse git
def loadModule(moduleName, *args):
"""Load a module dynamically, by name.
diff --git a/tf/advanced/helpers.html b/tf/advanced/helpers.html
index 3f9e3d3f9..c173bf950 100644
--- a/tf/advanced/helpers.html
+++ b/tf/advanced/helpers.html
@@ -31,7 +31,7 @@ Module tf.advanced.helpers
Expand source code
-Browse git
+Browse git
import collections
from textwrap import dedent
@@ -864,7 +864,7 @@ Functions
Expand source code
-Browse git
+Browse git
def backendRepl(match):
thisBackend.append(match.group(1))
@@ -894,7 +894,7 @@ Returns
Expand source code
-Browse git
+Browse git
def dh(html, inNb="ipython", unexpand=False):
"""Display HTML.
@@ -952,7 +952,7 @@ Returns
Expand source code
-Browse git
+Browse git
def dm(md, inNb="ipython", unexpand=False):
"""Display markdown.
@@ -995,7 +995,7 @@ Returns
Expand source code
-Browse git
+Browse git
def getHeaderTypes(app, tuples):
api = app.api
@@ -1034,7 +1034,7 @@ Returns
Expand source code
-Browse git
+Browse git
def getHeaders(app, tuples):
headerTypes = getHeaderTypes(app, tuples)
@@ -1058,7 +1058,7 @@ Returns
Expand source code
-Browse git
+Browse git
def getLocalDir(backend, cfg, local, version):
provenanceSpec = cfg.get("provenanceSpec", {})
@@ -1098,7 +1098,7 @@ Parameters
Expand source code
-Browse git
+Browse git
def getResultsX(app, results, features, condenseType, fmt=None):
"""Transform a uniform iterable of nodes into a table with extra information.
@@ -1194,7 +1194,7 @@ Parameters
Expand source code
-Browse git
+Browse git
def getRowsX(app, tuples, features, condenseType, fmt=None):
"""Transform an iterable of nodes into a table with extra information.
@@ -1218,7 +1218,7 @@ Parameters
Expand source code
-Browse git
+Browse git
def getText(
app, isPretty, n, nType, outer, first, last, level, passage, descend, options=None
@@ -1311,7 +1311,7 @@ Parameters
Expand source code
-Browse git
+Browse git
def getTuplesX(app, results, condenseType, fmt=None):
"""Transform a non-uniform iterable of nodes into a table with extra information.
@@ -1379,7 +1379,7 @@ Parameters
Expand source code
-Browse git
+Browse git
def getValue(app, n, nType, feat, suppress, math=False):
F = app.api.F
@@ -1408,7 +1408,7 @@ Parameters
Expand source code
-Browse git
+Browse git
def hData(x):
if not x:
@@ -1439,7 +1439,7 @@ Parameters
Expand source code
-Browse git
+Browse git
def hDict(x, outer=False):
elem = f"{'o' if outer else 'u'}l"
@@ -1469,7 +1469,7 @@ Parameters