Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

emojis fail in PDF #661

Open
sydb opened this issue Feb 2, 2024 · 3 comments
Open

emojis fail in PDF #661

sydb opened this issue Feb 2, 2024 · 3 comments

Comments

@sydb
Copy link
Member

sydb commented Feb 2, 2024

Emojis in the prose and in examples come out as useless boxes (either with an X inside the box or not) that cannot even be copied-and-pasted.
See, e.g., section 9.6.1 on page 312 of this recent cmc-features build of the PDF.

This may be related to #151.

@joeytakeda
Copy link
Contributor

After falling down a fairly deep rabbit hole, I have a feeling this is going to very difficult due to the complexity of emoji rendering in TeX. Here's a good summary of the state (at least in 2023) of things: https://www.overleaf.com/learn/latex/Articles/An_overview_of_technologies_supporting_the_use_of_colour_emoji_fonts_in_LaTeX

Some other resources of note (though both discuss luatext and xelatex, but we use the latter I believe):

https://tex.stackexchange.com/questions/639067/defining-a-fallback-font-for-all-missing-characters
https://tex.stackexchange.com/questions/497403/how-to-use-noto-color-emoji-with-lualatex/500180

Suffice it to say that this will not be trivial in latex. That said, this would be much easier to do in FO, since I believe we could just a fallback font (e.g. https://fonts.google.com/noto/specimen/Noto+Color+Emoji).

Perhaps this is another reason we should add a switch to allow building the PDF in FO as per #208.

@joeytakeda
Copy link
Contributor

Since Unicode provides a nice test set of all emoji characters (https://www.unicode.org/Public/emoji/latest/emoji-test.txt), I ran a little experiment:

  1. I converted this Unicode document into a TEI file (mostly just a bunch of painful regular expressions)
  2. Then ran (in the docker container), ../bin/teitopdf testEmojis.xml to create a PDF , which—as we note—show none of the emojis (and in fact doesn't even show boxes where they ought to be)
  3. Then ran , ../bin/teitofo testEmojis.xml to create an FO file
  4. Downloaded both the Noto Color Emoji and Noto Emoji fonts from Google and stashed them in my /TEIC/ folder
  5. Created an Apache FO configuration file to embed the downloaded font, naming the font family "Emoji"1
  6. Then downloaded FOP 2.10 (the most recent version; the Stylesheets currently use 2.6, which doesn't seem to work for the emoji unicode range)
  7. Then ran sh ../fop-2.10/fop/fop -c emoji-cfg.xml testEmojis.fo testEmojis.pdf

For reference, I've attached the TEI, HTML (produced used bin/teitohtml), and latex and FO PDFs here: emojis.zip

So unless I'm missing something about xelatex (which isn't unlikely by any means), I'd say probably the most straightforward way to handle this for the TEI Guidelines build specifically is to replace the emoji character with a graphic since emoji fonts aren't really able to properly typeset AFAICT. This would mean, I think, that we need to tag each emoji with a <g> that points to a <glyph> that has a <graphic> representation (probably grabbed from Noto Color Emoji or some such).

If we go that route, then we should probably flag for others that they will encounter if they want to use emojis in their files (e.g. in their documentation, say, and they convert it to PDF via teitopdf). One can detect the presence of emojis in RegEx (Unicode provides one in 1.4.9), but they note that it will probably raise some false positives.2

--

Footnotes

  1. Something like: <font embed-url="../../../../../NotoEmoji-Regular.ttf"><font-triplet name="Emoji" style="normal" weight="normal"/></font>

  2. See also this StackOverflow thread, which provides some interesting alternatives and caveats re: RegEx capturing of emojis

@peterstadler
Copy link
Member

Wow, many thanks @joeytakeda for all that thorough research so far!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants