Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: adds datacite tests for Zenodo records, moves _detag function fr… #75

Merged
merged 4 commits into from
Nov 7, 2023
Merged

Fix: adds datacite tests for Zenodo records, moves _detag function fr… #75

merged 4 commits into from
Nov 7, 2023

Conversation

seasidesparrow
Copy link
Member

…om JATSParser to BaseBeautifulSoupParser and renames tagsets from JATS_ to HTML_

modified:   adsingestp/parsers/base.py
modified:   adsingestp/parsers/datacite.py
modified:   adsingestp/parsers/jats.py
new file:   tests/stubdata/input/zenodo_test.xml
new file:   tests/stubdata/input/zenodo_test2.xml
new file:   tests/stubdata/input/zenodo_test3.xml
new file:   tests/stubdata/input/zenodo_test4.xml
modified:   tests/stubdata/output/datacite_schema3.1_example-full.json
modified:   tests/stubdata/output/datacite_schema4.1_example-full.json
modified:   tests/stubdata/output/datacite_schema4.1_example-software.json
modified:   tests/stubdata/output/datacite_schema4_example-habanero-pdsdataset.json
new file:   tests/stubdata/output/zenodo_test.json
new file:   tests/stubdata/output/zenodo_test2.json
new file:   tests/stubdata/output/zenodo_test3.json
new file:   tests/stubdata/output/zenodo_test4.json
modified:   tests/test_datacite.py

…om JATSParser to BaseBeautifulSoupParser and renames tagsets from JATS_ to HTML_

 	modified:   adsingestp/parsers/base.py
 	modified:   adsingestp/parsers/datacite.py
 	modified:   adsingestp/parsers/jats.py
 	new file:   tests/stubdata/input/zenodo_test.xml
 	new file:   tests/stubdata/input/zenodo_test2.xml
 	new file:   tests/stubdata/input/zenodo_test3.xml
 	new file:   tests/stubdata/input/zenodo_test4.xml
 	modified:   tests/stubdata/output/datacite_schema3.1_example-full.json
 	modified:   tests/stubdata/output/datacite_schema4.1_example-full.json
 	modified:   tests/stubdata/output/datacite_schema4.1_example-software.json
 	modified:   tests/stubdata/output/datacite_schema4_example-habanero-pdsdataset.json
 	new file:   tests/stubdata/output/zenodo_test.json
 	new file:   tests/stubdata/output/zenodo_test2.json
 	new file:   tests/stubdata/output/zenodo_test3.json
 	new file:   tests/stubdata/output/zenodo_test4.json
 	modified:   tests/test_datacite.py
 	modified:   adsingestp/parsers/datacite.py
 	modified:   adsingestp/parsers/jats.py
 	modified:   tests/stubdata/input/zenodo_test4.xml
@codecov-commenter
Copy link

codecov-commenter commented Nov 5, 2023

Codecov Report

Attention: 7 lines in your changes are missing coverage. Please review.

Comparison is base (39ae85a) 88.54% compared to head (68ccd8b) 89.79%.
Report is 32 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #75      +/-   ##
==========================================
+ Coverage   88.54%   89.79%   +1.25%     
==========================================
  Files          24       25       +1     
  Lines        2496     2616     +120     
==========================================
+ Hits         2210     2349     +139     
+ Misses        286      267      -19     
Files Coverage Δ
adsingestp/parsers/crossref.py 92.13% <100.00%> (+0.05%) ⬆️
tests/test_base.py 100.00% <100.00%> (ø)
tests/test_datacite.py 91.17% <100.00%> (+0.26%) ⬆️
tests/test_jats.py 95.06% <100.00%> (+0.12%) ⬆️
adsingestp/parsers/datacite.py 91.44% <92.30%> (+2.61%) ⬆️
adsingestp/parsers/base.py 93.49% <94.11%> (+0.16%) ⬆️
adsingestp/parsers/jats.py 93.33% <96.49%> (+2.22%) ⬆️

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@seasidesparrow
Copy link
Member Author

Also adds OA/license capture to to Datacite parser

@@ -471,6 +471,35 @@ class BaseBeautifulSoupParser(IngestBase):
out of the input XML stream.
"""

fix_ampersand = re.compile(r"(&amp;)(.*?)(;)")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if you have ever encountered this in this context: I have seen cases where ampersands got encoded as __amp__amp;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you remember which publisher(s) specifically? I'm looking for an example to make a unit test with.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has an example: /proj/ads/references/sources/MNRAS/0423/iss4.wiley2.xml

 	modified:   adsingestp/parsers/base.py
 	new file:   tests/test_base.py
 	modified:   tests/test_base.py
@seasidesparrow seasidesparrow merged commit 9d806ea into adsabs:main Nov 7, 2023
4 checks passed
@seasidesparrow seasidesparrow deleted the dc_zenodo_tests.20231105 branch November 7, 2023 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants