Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demo notebook NE display using HTML #56

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

Demo notebook NE display using HTML #56

wants to merge 17 commits into from

Conversation

fexfl
Copy link
Collaborator

@fexfl fexfl commented Jan 12, 2025

Implemented function to highlight NEs in the performance notebook using HTML

  • Added per_list, org_list, loc_list and misc_list attibutes to pseudonymization class which can be accessed for the currently processed email
  • Adjusted pseudonymize_per method which now uses class attribute per_list instead of passing list as parameter
  • Adjusted tests for pseudonymization methods accordingly
  • Added highlighting function to notebook/demo.ipynb

@fexfl fexfl requested a review from iulusoy January 12, 2025 16:30
Copy link

codecov bot commented Jan 12, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.90%. Comparing base (dd841fc) to head (6be53b5).

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #56      +/-   ##
==========================================
+ Coverage   93.47%   93.90%   +0.43%     
==========================================
  Files           4        4              
  Lines         383      394      +11     
==========================================
+ Hits          358      370      +12     
+ Misses         25       24       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@fexfl
Copy link
Collaborator Author

fexfl commented Jan 12, 2025

This still addresses the demonstration notebook #40, which lacked a way to visualize the results. Visualization can still be improved further, as this method highlights every occurence of an ORG, LOC or MISC, while in reality only one occurence is replaced at a time in the pseudonymization process. This can lead to parts of the text being highlighted, although it was not replaced in the pseudonymization. For PER, the highlighting is working accurately, since the pseudonymization class replaces every occurence of each name as well.

@iulusoy
Copy link
Member

iulusoy commented Jan 13, 2025

This crashes when parsing too many emails though, something to keep in mind for further testing (did not display anything anymore / the kernel died for two separate tests on vscode / using jupyter-notebook).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants