The lack of automatic pose evaluation metrics is a major obstacle in the development of sign language generation models.
The primary objective of this repository is to house a suite of automatic evaluation metrics specifically tailored for sign language poses. This includes metrics proposed by Ham2Pose1 as well as custom-developed metrics unique to our approach. We recognize the distinct challenges in evaluating single signs versus continuous signing, and our methods reflect this differentiation.
- Qualitative Evaluation
- Quantitative Evaluation
To qualitatively demonstrate the efficacy of these evaluation metrics, we implement a nearest-neighbor search for selected signs from the TODO corpus. The rationale is straightforward: the closer the sign is to its nearest neighbor in the corpus, the more effective the evaluation metric is in capturing the nuances of sign language transcription and translation.
Using a sample of the corpus, we compute the any-to-any scores for each metric. Intuitively, we expect a good metric given any two random signs to produce a bad score, since most signs are unrelated. This should be reflected in the distribution of scores, which should be skewed towards lower scores.
INSERT TABLE HERE
Given an isolated sign corpus such as AUTSL2, we repeat the evaluation of Ham2Pose1 on our metrics.
We also repeat the experiments of Atwell et al.3 to evaluate the bias of our metrics on different protected attributes.
We evaluate each metric in the context of continuous signing with our continuous metrics alongside our segmented metrics and correlate to human judgments.
TODO list evaluation metrics here.
If you use our toolkit in your research or projects, please consider citing the work.
@misc{pose-evaluation2024,
title={Pose Evaluation: Metrics for Evaluating Sign Langauge Generation Models},
author={Zifan Jiang, Colin Leong, Amit Moryossef},
howpublished={\url{https://github.com/sign-language-processing/pose-evaluation}},
year={2024}
}
- Zifan, Colin, and Amit developed the evaluation metrics and tools.
- Zifan, Anne, and Lisa conducted the qualitative and quantitative evaluations.
- Studying and Mitigating Biases in Sign Language Understanding Models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 268–283, Miami, Florida, USA. Association for Computational Linguistics.