Skip to content

Latest commit

 

History

History
74 lines (47 loc) · 4.39 KB

README.md

File metadata and controls

74 lines (47 loc) · 4.39 KB

SignWriting Illustration

Based on sign/translate#114.

People without previous SignWriting experience have a hard time understanding SignWriting notation.

This project aims to provide an alternative view to SignWriting, using computer generated illustrations of the signs.

Data

We use multiple data sources with SignWriting and illustrations:

  1. Vokabeltrainer - Swiss-German lexicon
  2. SignPuddle LSF-CH - Swiss-French lexicon

The illustrations are of different people, usually in grayscale. We use ChatGPT to generate the prompt to describe every illustration.

Examples

00004 00007 00015
Video
SignWriting
Illustration
Prompt An illustration of a person with short hair, with black arrows. An illustration of a woman with short hair, with black arrows. An illustration of a man with short hair. The arrows are black.

All images are then created at 512x512, for example: An illustration of a woman with short hair, with orange arrows. The background is white and there is a watermark text '@signecriture.org

control illustration
B A

Training

Prompt information

The prompt should include if this is an image or an illustration, if it colored or black and white, man or woman, hair style, and watermark. (see train/prompt.json for values)

Data Preparation

  1. create_images.py - Generate parallel images - we create parallel files with the same name in directories train/A and train/B to include the SignWriting (B) and illustration (A) in the same resolution (512x512).
  2. create_prompts.py - Generate prompts - we use ChatGPT to generate the prompt for every illustration. All of the prompts are then stored in train/prompt.json. (a JSONL file with {source: ..., target: ..., prompt: ...}). Cost per 1000 illustrations is about $5.

Model Training

We train a ControlNet model to control Stable Diffusion given the prompt and SignWriting image, generate the relevant illustration. This process benefits from the pretrained generative image diffusion model.

Inference

In inference time, we still give the control image of the new SignWriting image, but can control for the prompt. For example, we can always say "An illustration of a man with short hair." for consistency of character. This also removes any watermarks from the data, since watermarked illustrations are prompted with the watermark.

As diffusion models struggle to generate illustrations, we use the image-to-image pipeline with an initial white image. Unfortunately, while the model generates illustrations, they do not follow the SignWriting.

Here is a comparison of the results:

ControlNet Pipeline

ControlNet Pipeline

ControlNet Image-to-Image Pipeline

ControlNet Pipeline