Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fsw similarity metrics #3

Merged
merged 22 commits into from
Jan 16, 2024
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ the more effective the evaluation metric is in capturing the nuances of sign lan
- ✅ [Tokenized BLEU](signwriting_evaluation/metrics/bleu.py) - BLEU score for tokenized SignWriting FSW strings.
- ✅ [chrF](signwriting_evaluation/metrics/chrf.py) - chrF score for untokenized SignWriting FSW strings.
- ✅ [CLIPScore](signwriting_evaluation/metrics/clipscore.py) - CLIPScore between SignWriting images. (Using the original CLIP model)
- ✅ [SymbolDistance](signwriting_evaluation/metrics/symbol_distance.py) - symbol distance score for SignWriting FSW strings [(README)](signwriting_evaluation/metrics/symbol_distance.md).

## Qualitative Evaluation

Expand Down Expand Up @@ -80,5 +81,4 @@ For each sign and metric, either the first match is incorrect, or there is a mor
[^4]: Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi.
2021. [CLIPScore: A Reference-free Evaluation Metric for Image Captioning](https://aclanthology.org/2021.emnlp-main.595/).
In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7514–7528, Online
and
Punta Cana, Dominican Republic. Association for Computational Linguistics.
and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Binary file added assets/equations/graph1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/equations/graph2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,8 @@ dev = [
"pylint",
# to plot metric evaluation results
"matplotlib",
"numpy"
"numpy",
"scipy"
]

[tool.yapf]
Expand Down
3 changes: 2 additions & 1 deletion signwriting_evaluation/evaluation/closest_matches.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@
import numpy as np

from signwriting.visualizer.visualize import signwriting_to_image

from signwriting_evaluation.metrics.base import SignWritingMetric
from signwriting_evaluation.metrics.bleu import SignWritingBLEU
from signwriting_evaluation.metrics.chrf import SignWritingCHRF
from signwriting_evaluation.metrics.clip import SignWritingCLIPScore
from signwriting_evaluation.metrics.symbol_distance import SignWritingSimilarityMetric


CURRENT_DIR = Path(__file__).parent
Expand Down Expand Up @@ -84,6 +84,7 @@ def metrics_distribution(signs: list[str], metrics: list[SignWritingMetric]):
print(f"Found {len(single_signs)} signs")

all_metrics = [
SignWritingSimilarityMetric(),
SignWritingBLEU(),
SignWritingCHRF(),
SignWritingCLIPScore(cache_directory=None),
Expand Down
35 changes: 35 additions & 0 deletions signwriting_evaluation/metrics/symbol_distance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Evaluation metric for SignWriting
### Introduction
This code introduces a novel metric for assessing the similarity of two phrases written
in Formal SignWriting (FSW). Unlike generic string comparison methods like BLEU and CHRF, our approach
is tailored to the unique characteristics and rules of SignWriting, offering a task-specific evaluation.

### Evaluation Method
Our method addresses key aspects of SignWriting, such as:

- Symbols are organized in the FSW dictionary to reflect their types (e.g., hand signals, motion, touch), with proximity
indicating visual and semantic closeness.
- Symbols forming a sign can be written in different orders, representing the same visual output.
- Each symbol part has distinct meaning and importance, emphasizing aspects like symbol type, facing direction, angle, and position.

### Main concept
The evaluation process is built on three main stages, each with its own intent and purposes:
1. Symbol Distance Function: Evaluates similarity between two symbols based on SignWriting rules, considering custom
weights for different symbol differences.
2. Distance Normalization: Normalizes distance values using the following non-linear function for better representation.

![Graph of f(x) = x^{\frac{1}{3}}](/assets/equations/graph1.png)

$$
f(x) = x^{\frac{1}{3}}
$$

3. Matching and Grading: Utilizes symbol distances to generalize similarity for entire signs. The Hungarian algorithm
matches similar parts, and using a weight calculated using the formula below, the weighted mean accounts for length differences.

![Graph of f(x) = x^{\frac{3}{2}}](/assets/equations/graph2.png)


$$
f(x) = x^{\frac{3}{2}}
$$
93 changes: 93 additions & 0 deletions signwriting_evaluation/metrics/symbol_distance.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
from math import sqrt, exp
from typing import Tuple

import numpy as np
from scipy.optimize import linear_sum_assignment
from scipy.spatial import distance as dis
RotemZilberman marked this conversation as resolved.
Show resolved Hide resolved
from signwriting.types import Sign, SignSymbol
from signwriting.formats.fsw_to_sign import fsw_to_sign
from signwriting_evaluation.metrics.base import SignWritingMetric


class SignWritingSimilarityMetric(SignWritingMetric):
def __init__(self):
super().__init__("SymbolsDistances")
self.symbol_classes = {
'hand_shapes': range(0x100, 0x205),
'contact_symbols': range(0x205, 0x2FF),
'etc': range(0x2FF, 0x38C)
}
self.weight = {
"shape": 5, # same weight as switching parallelization
"facing": 5/3, # more important than angle, not as much as shape and orientation
"angle": 5/24, # lowest importance out of the criteria
"parallel": 5, # parallelization is 3 columns compare to 1 for the facing direction
"positional": 1/10, # may be big values
"normalized_factor": 1 / 3, # fitting shape of function
"exp_factor": 1.5, # exponential distribution
"class_penalty": 250, # big penalty for each class type passed
}
self.max_distance = self.calculate_distance({"symbol": "S10000", "position": (250, 250)},
{"symbol": "S38b07", "position": (750, 750)})

def get_attributes(self, symbol: SignSymbol) -> Tuple[int, int, int, bool]:
shape = int(symbol['symbol'][1:4], 16)
facing = int(symbol['symbol'][4], 16)
angle = int(symbol['symbol'][5], 16)
parallel = facing > 2
return shape, facing, angle, parallel

def calculate_distance(self, hyp: SignSymbol, ref: SignSymbol) -> float:
hyp_veq = self.get_attributes(hyp)
ref_veq = self.get_attributes(ref)

hyp_class = next((i for i, r in enumerate(self.symbol_classes.values()) if hyp_veq[0] in r), None)
ref_class = next((i for i, r in enumerate(self.symbol_classes.values()) if ref_veq[0] in r), None)
RotemZilberman marked this conversation as resolved.
Show resolved Hide resolved

hyp_veq = tuple(val * weight for val, weight in zip(hyp_veq, [self.weight["shape"], self.weight["angle"],
self.weight["facing"], self.weight["parallel"],
self.weight["positional"]]))
ref_veq = tuple(val * weight for val, weight in zip(ref_veq, [self.weight["shape"], self.weight["angle"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could also be a function btw weigh_vector

self.weight["facing"], self.weight["parallel"],
self.weight["positional"]]))
distance = (dis.euclidean(hyp_veq, ref_veq) +
self.weight["positional"] * dis.euclidean(hyp["position"], ref["position"]))
distance = distance + abs(hyp_class - ref_class) * self.weight["class_penalty"]
return distance

def normalized_distance(self, unnormalized: float) -> float:
return pow(unnormalized / self.max_distance, self.weight["normalized_factor"])

def symbols_score(self, hyp: SignSymbol, ref: SignSymbol) -> float:
distance = self.calculate_distance(hyp, ref)
normalized = self.normalized_distance(distance)
return normalized

def length_acc(self, hyp: Sign, ref: Sign) -> float:
hyp = hyp["symbols"]
ref = ref["symbols"]
# plus 1 for the box symbol
return abs(len(hyp) - len(ref)) / (max(len(hyp), len(ref)) + 1)

def error_rate(self, hyp: Sign, ref: Sign) -> float:
# Calculate the evaluate score for a given hypothesis and ref.
if (not hyp["symbols"] and ref["symbols"]) or (hyp["symbols"] and not ref["symbols"]):
return 1
cost_matrix = np.array(
[self.symbols_score(first, second) for first in hyp["symbols"] for second in ref["symbols"]])
cost_matrix = cost_matrix.reshape(len(hyp["symbols"]), -1)
# Find the lowest cost matching
row_ind, col_ind = linear_sum_assignment(cost_matrix)
pairs = list(zip(row_ind, col_ind))
# Print the matching and total cost
values = [cost_matrix[row, col] for row, col in pairs]
mean_cost = sum(values) / len(values)
length_error = self.length_acc(hyp, ref)
length_weight = pow(length_error, self.weight["exp_factor"])
return length_weight + mean_cost * (1 - length_weight)

def score(self, hypothesis: str, reference: str) -> float:
# Calculate the evaluate score for a given hypothesis and ref.
hyp = fsw_to_sign(hypothesis)
ref = fsw_to_sign(reference)
return pow(1 - self.error_rate(hyp, ref), 2)
24 changes: 24 additions & 0 deletions signwriting_evaluation/metrics/test_symbol_distance.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import unittest
from signwriting_evaluation.metrics.symbol_distance import SignWritingSimilarityMetric


class TestSignWritingSymbolDistance(unittest.TestCase):
def setUp(self):
self.metric = SignWritingSimilarityMetric()

def test_score(self):
hypothesis = "M530x538S37602508x462S15a11493x494S20e00488x510S22f03469x517"
reference = "M519x534S37900497x466S3770b497x485S15a51491x501S22f03481x513"
score = self.metric.score(hypothesis, reference)
self.assertIsInstance(score, float) # Check if the score is a float
self.assertAlmostEqual(score, 0.5040447299637176)

hypothesis = "M530x538S37602508x462S15a11493x494S20e00488x510S22f03469x517"
reference = "M530x538S22f03469x517S37602508x462S20e00488x510S15a11493x494"
score = self.metric.score(hypothesis, reference)
self.assertIsInstance(score, float) # Check if the score is a float
self.assertAlmostEqual(score, 1)
RotemZilberman marked this conversation as resolved.
Show resolved Hide resolved


if __name__ == '__main__':
unittest.main()
Loading