Based on sign/translate#114.
People without previous SignWriting experience have a hard time understanding SignWriting notation.
This project aims to provide an alternative view to SignWriting, using computer generated illustrations of the signs.
We use multiple data sources with SignWriting and illustrations:
- Vokabeltrainer - Swiss-German lexicon
- SignPuddle LSF-CH - Swiss-French lexicon
The illustrations are of different people, usually in grayscale. We use ChatGPT to generate the prompt to describe every illustration.
All images are then created at 512x512, for example:
An illustration of a woman with short hair, with orange arrows. The background is white and there is a watermark text '@signecriture.org
control | illustration |
---|---|
The prompt should include if this is an image or an illustration, if it colored or black and white, man or woman, hair style, and watermark. (see train/prompt.json for values)
create_images.py
- Generate parallel images - we create parallel files with the same name in directoriestrain/A
andtrain/B
to include the SignWriting (B) and illustration (A) in the same resolution (512x512).create_prompts.py
- Generate prompts - we use ChatGPT to generate the prompt for every illustration. All of the prompts are then stored intrain/prompt.json
. (aJSONL
file with{source: ..., target: ..., prompt: ...}
). Cost per 1000 illustrations is about $5.
We train a ControlNet model to control Stable Diffusion given the prompt and SignWriting image, generate the relevant illustration. This process benefits from the pretrained generative image diffusion model.
In inference time, we still give the control image of the new SignWriting image, but can control for the prompt. For example, we can always say "An illustration of a man with short hair." for consistency of character. This also removes any watermarks from the data, since watermarked illustrations are prompted with the watermark.
As diffusion models struggle to generate illustrations, we use the image-to-image pipeline with an initial white image. Unfortunately, while the model generates illustrations, they do not follow the SignWriting.
Here is a comparison of the results: