-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor fixes to work with new Zenodo backend #192
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for investigating the strange new old world of the Zenodo API!
I got the deposition deletion test + the metadata test assertion working with this diff:
diff --git a/src/pudl_archiver/depositors/zenodo.py b/src/pudl_archiver/depositors/zenodo.py
index 17afc72..1159ff9 100644
--- a/src/pudl_archiver/depositors/zenodo.py
+++ b/src/pudl_archiver/depositors/zenodo.py
@@ -16,25 +16,22 @@ logger = logging.getLogger(f"catalystcoop.{__name__}")
class ZenodoClientException(Exception):
"""Captures the JSON error information from Zenodo."""
- def __init__(self, kwargs):
+ def __init__(self, status, message, errors=None):
"""Constructor.
Args:
kwargs: dictionary with "response" mapping to the actual
aiohttp.ClientResponse and "json" mapping to the JSON content.
"""
- self.kwargs = kwargs
- self.status = kwargs["response"].status
- self.message = kwargs["json"].get("message", {})
- self.errors = kwargs["json"].get("errors", {})
+ self.status = status
+ self.message = message
+ self.errors = errors
def __str__(self):
- """The JSON has all we really care about."""
- return f"ZenodoClientException({self.kwargs['json']})"
+ return repr(self)
def __repr__(self):
- """But the kwargs are useful for recreating this object."""
- return f"ZenodoClientException({repr(self.kwargs)})"
+ return f"ZenodoClientException(status={self.status}, message={self.message}, errors={self.errors})"
class ZenodoDepositor:
@@ -94,9 +91,19 @@ class ZenodoDepositor:
async def run_request():
response = await session.request(method, url, **kwargs)
if response.status >= 400:
- raise ZenodoClientException(
- {"response": response, "json": await response.json()}
- )
+ if response.headers["Content-Type"] == "application/json":
+ json_resp = await response.json()
+ raise ZenodoClientException(
+ status=response.status,
+ message=json_resp.get("message"),
+ errors=json_resp.get("errors"),
+ )
+ else:
+ message = await response.text()
+ raise ZenodoClientException(
+ status=response.status,
+ message=message,
+ )
if parse_json:
return await response.json()
return response
diff --git a/tests/integration/zenodo_depositor_test.py b/tests/integration/zenodo_depositor_test.py
index 89f4e52..8af762b 100644
--- a/tests/integration/zenodo_depositor_test.py
+++ b/tests/integration/zenodo_depositor_test.py
@@ -46,13 +46,13 @@ async def empty_deposition(depositor):
],
description="Test dataset for the sandbox, thanks!",
version="1.0.0",
- license="CC0-1.0",
+ license="cc-zero",
keywords=["test"],
)
deposition = await depositor.create_deposition(deposition_metadata)
- # assert clean_metadata(deposition.metadata) == clean_metadata(deposition_metadata)
+ assert clean_metadata(deposition.metadata) == clean_metadata(deposition_metadata)
assert deposition.state == "unsubmitted"
return deposition
@@ -97,15 +97,15 @@ async def test_publish_empty(depositor, empty_deposition, mocker):
mocker.patch("asyncio.sleep", mocker.AsyncMock())
with pytest.raises(ZenodoClientException) as excinfo:
await depositor.publish_deposition(empty_deposition)
- error_json = excinfo.value.kwargs["json"]
- assert "validation error" in error_json["message"].lower()
- assert "missing uploaded files" in error_json["errors"][0]["messages"][0].lower()
+ assert "validation error" in excinfo.value.message.lower()
+ assert "missing uploaded files" in excinfo.value.errors[0]["messages"][0].lower()
@pytest.mark.asyncio()
-async def test_delete_deposition(depositor, initial_deposition):
+async def test_delete_deposition(depositor, initial_deposition, mocker):
"""Make a new draft, delete it, and see that the conceptdoi still points
at the original."""
+ mocker.patch("asyncio.sleep", mocker.AsyncMock())
draft = await depositor.get_new_version(initial_deposition)
latest = await get_latest(
@@ -113,13 +113,13 @@ async def test_delete_deposition(depositor, initial_deposition):
)
assert latest.id_ == draft.id_
assert not latest.submitted
+
+ # Currently, Zenodo server will delete the deposition, but return an error;
+ # on retry it will throw a 404 since the deposition is already deleted
with pytest.raises(ZenodoClientException) as excinfo:
await depositor.delete_deposition(draft)
-
- # Reconfigure to meet server flakiness on deletions.
if excinfo:
- error_json = excinfo.value.kwargs["json"]
- assert "persistent identifier does not exist" in error_json["message"].lower()
+ assert "persistent identifier does not exist" in excinfo.value.message.lower()
else:
latest = await get_latest(
depositor, initial_deposition.conceptdoi, published_only=True
Note that something funny happened with the most recent EIA-861 archive, resulting in "Draft" URLs showing up in the {
"profile": "data-resource",
"name": "eia861-1990.zip",
"path": "https://zenodo.org/api/records/10093091/draft/files/eia861-1990.zip/content",
"remote_url": "https://zenodo.org/api/records/10093091/draft/files/eia861-1990.zip/content",
"title": "eia861-1990.zip",
"parts": {
"year": 1990
},
"encoding": "utf-8",
"mediatype": "application/zip",
"format": ".zip",
"bytes": 1195700,
"hash": "47660a1b8df008ae0e94998fb71c1cde"
}, |
How was that archive created? We should probably just use the "canonical" path logic from the New API Branch when creating |
The 861 archive was created from this branch to unblock work on new data integration - clearly more has changed on the backend than I'd caught, though! I can update the link referenced to in this PR before it gets merged and create a new production archive. |
It seems like maybe the draft path shouldn't still be a problem (though using the simple canonical path still seems like a good idea) given that the return cls(
name=file.filename,
path=file.links.download,
remote_url=file.links.download,
title=filename.name,
mediatype=mt,
parts=parts,
bytes=file.filesize,
hash=file.checksum,
format=filename.suffix,
) I attempted creating a new In the process I noticed that after running it with Then running the archiver a 3rd time, it didn't have that issue any more. So it seems like the 2nd run cleaned up the issue whatever it was, but couldn't actually continue to create a draft archive. So it seems like maybe
Re-running after the above, I get a nominally successful new draft archive created, but then it identifies that there have been no changes relative to the previous version, and attempts to delete the archive, but runs into errors in that process... despite apparently successfully deleting the draft archive.
Because it never gets to the point of creating a new archive I can't see what paths it producing in the |
Looks like the "resources": [
{
"profile": "data-resource",
"name": "eia860m-2015-07.xlsx",
"path": "https://zenodo.org/api/records/10086240/draft/files/eia860m-2015-07.xlsx/content",
"remote_url": "https://zenodo.org/api/records/10086240/draft/files/eia860m-2015-07.xlsx/content",
"title": "eia860m-2015-07.xlsx",
"parts": {
"year_month": "2015-07"
},
"encoding": "utf-8",
"mediatype": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
"format": ".xlsx",
"bytes": 2730620,
"hash": "f37c121bd05397e41f4f19c0149a7cd4"
}, |
The web interface also seems to be struggling with deletion of draft records like this one |
I just pushed up a change that:
The dry-run stuff that's going in is similar to some stuff I was seeing when running tests last week / 2 weeks ago:
This "delete succeeds in reality, but times out" thing was happening on sandbox and eventually stopped happening. But until then, I suppose we could include something in the delete logic of "If we get a 404 on deletion, in particular, we treat that as "deletion was successful!" |
Unfortunately we can't just create this URL with f'{base_url}/records/{record_id}/files/{filename}' because that would require merging Deposition scope and FileLinks scopes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks good to me!
The Zenodo backend migration has resulted in changed behaviors of the API, even without the full API migration noted in #184. These changes are accommodated here:
get_new_version
endpoint no longer returns theversion
consistently, so we get it from the earlierGET
response metadata.delete_deposition
method has a new behavior that errors our existing tests, and behavior has been changed to catch errors in response to deletionget_new_version
canonical
method for file links that removes anyapi
,draft
orcontent
references to get the most stable version of a file's link