Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor fixes to work with new Zenodo backend #192

Merged
merged 8 commits into from
Nov 21, 2023
Merged

Minor fixes to work with new Zenodo backend #192

merged 8 commits into from
Nov 21, 2023

Conversation

e-belfer
Copy link
Member

@e-belfer e-belfer commented Nov 8, 2023

The Zenodo backend migration has resulted in changed behaviors of the API, even without the full API migration noted in #184. These changes are accommodated here:

  • get_new_version endpoint no longer returns the version consistently, so we get it from the earlier GET response metadata.
  • The delete_deposition method has a new behavior that errors our existing tests, and behavior has been changed to catch errors in response to deletion
  • changes a few tests to reflect changed error messages and behaviors around deletion, get_new_version
  • adds a canonical method for file links that removes any api, draft or content references to get the most stable version of a file's link

@e-belfer e-belfer requested a review from jdangerx November 8, 2023 21:42
@e-belfer e-belfer self-assigned this Nov 8, 2023
Copy link
Member

@jdangerx jdangerx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for investigating the strange new old world of the Zenodo API!

I got the deposition deletion test + the metadata test assertion working with this diff:

diff --git a/src/pudl_archiver/depositors/zenodo.py b/src/pudl_archiver/depositors/zenodo.py
index 17afc72..1159ff9 100644
--- a/src/pudl_archiver/depositors/zenodo.py
+++ b/src/pudl_archiver/depositors/zenodo.py
@@ -16,25 +16,22 @@ logger = logging.getLogger(f"catalystcoop.{__name__}")
 class ZenodoClientException(Exception):
     """Captures the JSON error information from Zenodo."""
 
-    def __init__(self, kwargs):
+    def __init__(self, status, message, errors=None):
         """Constructor.
 
         Args:
             kwargs: dictionary with "response" mapping to the actual
                 aiohttp.ClientResponse and "json" mapping to the JSON content.
         """
-        self.kwargs = kwargs
-        self.status = kwargs["response"].status
-        self.message = kwargs["json"].get("message", {})
-        self.errors = kwargs["json"].get("errors", {})
+        self.status = status
+        self.message = message
+        self.errors = errors
 
     def __str__(self):
-        """The JSON has all we really care about."""
-        return f"ZenodoClientException({self.kwargs['json']})"
+        return repr(self)
 
     def __repr__(self):
-        """But the kwargs are useful for recreating this object."""
-        return f"ZenodoClientException({repr(self.kwargs)})"
+        return f"ZenodoClientException(status={self.status}, message={self.message}, errors={self.errors})"
 
 
 class ZenodoDepositor:
@@ -94,9 +91,19 @@ class ZenodoDepositor:
             async def run_request():
                 response = await session.request(method, url, **kwargs)
                 if response.status >= 400:
-                    raise ZenodoClientException(
-                        {"response": response, "json": await response.json()}
-                    )
+                    if response.headers["Content-Type"] == "application/json":
+                        json_resp = await response.json()
+                        raise ZenodoClientException(
+                            status=response.status,
+                            message=json_resp.get("message"),
+                            errors=json_resp.get("errors"),
+                        )
+                    else:
+                        message = await response.text()
+                        raise ZenodoClientException(
+                            status=response.status,
+                            message=message,
+                        )
                 if parse_json:
                     return await response.json()
                 return response
diff --git a/tests/integration/zenodo_depositor_test.py b/tests/integration/zenodo_depositor_test.py
index 89f4e52..8af762b 100644
--- a/tests/integration/zenodo_depositor_test.py
+++ b/tests/integration/zenodo_depositor_test.py
@@ -46,13 +46,13 @@ async def empty_deposition(depositor):
         ],
         description="Test dataset for the sandbox, thanks!",
         version="1.0.0",
-        license="CC0-1.0",
+        license="cc-zero",
         keywords=["test"],
     )
 
     deposition = await depositor.create_deposition(deposition_metadata)
 
-    # assert clean_metadata(deposition.metadata) == clean_metadata(deposition_metadata)
+    assert clean_metadata(deposition.metadata) == clean_metadata(deposition_metadata)
     assert deposition.state == "unsubmitted"
     return deposition
 
@@ -97,15 +97,15 @@ async def test_publish_empty(depositor, empty_deposition, mocker):
     mocker.patch("asyncio.sleep", mocker.AsyncMock())
     with pytest.raises(ZenodoClientException) as excinfo:
         await depositor.publish_deposition(empty_deposition)
-    error_json = excinfo.value.kwargs["json"]
-    assert "validation error" in error_json["message"].lower()
-    assert "missing uploaded files" in error_json["errors"][0]["messages"][0].lower()
+    assert "validation error" in excinfo.value.message.lower()
+    assert "missing uploaded files" in excinfo.value.errors[0]["messages"][0].lower()
 
 
 @pytest.mark.asyncio()
-async def test_delete_deposition(depositor, initial_deposition):
+async def test_delete_deposition(depositor, initial_deposition, mocker):
     """Make a new draft, delete it, and see that the conceptdoi still points
     at the original."""
+    mocker.patch("asyncio.sleep", mocker.AsyncMock())
     draft = await depositor.get_new_version(initial_deposition)
 
     latest = await get_latest(
@@ -113,13 +113,13 @@ async def test_delete_deposition(depositor, initial_deposition):
     )
     assert latest.id_ == draft.id_
     assert not latest.submitted
+
+    # Currently, Zenodo server will delete the deposition, but return an error;
+    # on retry it will throw a 404 since the deposition is already deleted
     with pytest.raises(ZenodoClientException) as excinfo:
         await depositor.delete_deposition(draft)
-
-    # Reconfigure to meet server flakiness on deletions.
     if excinfo:
-        error_json = excinfo.value.kwargs["json"]
-        assert "persistent identifier does not exist" in error_json["message"].lower()
+        assert "persistent identifier does not exist" in excinfo.value.message.lower()
     else:
         latest = await get_latest(
             depositor, initial_deposition.conceptdoi, published_only=True

tests/integration/zenodo_depositor_test.py Outdated Show resolved Hide resolved
@zaneselvans
Copy link
Member

Note that something funny happened with the most recent EIA-861 archive, resulting in "Draft" URLs showing up in the datapackage.json which is resulting in 403: Forbidden errors.

        {
            "profile": "data-resource",
            "name": "eia861-1990.zip",
            "path": "https://zenodo.org/api/records/10093091/draft/files/eia861-1990.zip/content",
            "remote_url": "https://zenodo.org/api/records/10093091/draft/files/eia861-1990.zip/content",
            "title": "eia861-1990.zip",
            "parts": {
                "year": 1990
            },
            "encoding": "utf-8",
            "mediatype": "application/zip",
            "format": ".zip",
            "bytes": 1195700,
            "hash": "47660a1b8df008ae0e94998fb71c1cde"
        },

@jdangerx
Copy link
Member

How was that archive created? We should probably just use the "canonical" path logic from the New API Branch when creating datapackage.json, even when we're using the old API.

@e-belfer
Copy link
Member Author

The 861 archive was created from this branch to unblock work on new data integration - clearly more has changed on the backend than I'd caught, though! I can update the link referenced to in this PR before it gets merged and create a new production archive.

@zaneselvans
Copy link
Member

zaneselvans commented Nov 17, 2023

It seems like maybe the draft path shouldn't still be a problem (though using the simple canonical path still seems like a good idea) given that the path and remote_url are using exactly the same input in the datapackage construction now. And if it still an issue that's kinda wild!

        return cls(
            name=file.filename,
            path=file.links.download,
            remote_url=file.links.download,
            title=filename.name,
            mediatype=mt,
            parts=parts,
            bytes=file.filesize,
            hash=file.checksum,
            format=filename.suffix,
        )

I attempted creating a new eia861 archive with --dry-run to see if the draft path thing was still a problem, and discovered that we didn't fix the CLI to produce a summary_file of type Path so fixed that.

In the process I noticed that after running it with --dry-run the next attempt to run the archiver script failed with some errors and timeouts, apparently related to needing to delete some leftover something on the server, from the previous --dry-run which was surprising.

Then running the archiver a 3rd time, it didn't have that issue any more. So it seems like the 2nd run cleaned up the issue whatever it was, but couldn't actually continue to create a draft archive. So it seems like maybe --dry-run is leaving some junk that needs to be dealt with somehow.

pudl_archiver --datasets eia861 --summary-file eia861-summary.json
2023-11-17 17:42:31 [    INFO] catalystcoop.pudl_archiver.archivers.classes:102 Archiving eia861
2023-11-17 17:42:31 [    INFO] catalystcoop.pudl_archiver.depositors.zenodo:92 GET https://zenodo.org/api/deposit/depositions - Query depositions for 10.5281/zenodo.4127028
2023-11-17 17:42:34 [    INFO] catalystcoop.pudl_archiver.depositors.zenodo:92 GET https://zenodo.org/api/deposit/depositions/10152542 - Get deposition for 10152542
2023-11-17 17:42:35 [    INFO] catalystcoop.pudl_archiver.depositors.zenodo:92 DELETE https://zenodo.org/api/deposit/depositions/10152542 - Deleting deposition
2023-11-17 17:43:06 [    INFO] catalystcoop.pudl_archiver.utils:46 Error while executing <coroutine object ZenodoDepositor._make_requester.<locals>.requester.<locals>.run_request at 0x292e6d360> (try #1, retry in 2s): <class 'pudl_archiver.depositors.zenodo.ZenodoClientError'> - ZenodoClientError(status=504, message=<html><body><h1>504 Gateway Time-out</h1>

Re-running after the above, I get a nominally successful new draft archive created, but then it identifies that there have been no changes relative to the previous version, and attempts to delete the archive, but runs into errors in that process... despite apparently successfully deleting the draft archive.

pudl_archiver --datasets eia861 --summary-file eia861-summary.json
2023-11-17 17:49:04 [    INFO] catalystcoop.pudl_archiver.archivers.classes:102 Archiving eia861
2023-11-17 17:49:04 [    INFO] catalystcoop.pudl_archiver.depositors.zenodo:92 GET https://zenodo.org/api/deposit/depositions - Query depositions for 10.5281/zenodo.4127028
2023-11-17 17:49:07 [    INFO] catalystcoop.pudl_archiver.depositors.zenodo:92 GET https://zenodo.org/api/deposit/depositions/10093091 - Get deposition for 10093091
2023-11-17 17:49:09 [    INFO] catalystcoop.pudl_archiver.depositors.zenodo:92 POST https://zenodo.org/api/deposit/depositions/10093091/actions/newversion - Creating new version
2023-11-17 17:49:14 [    INFO] catalystcoop.pudl_archiver.depositors.zenodo:92 PUT https://zenodo.org/api/deposit/depositions/10152579 - Updating version number from 8.0.0 (10152579) to 9.0.0
2023-11-17 17:49:19 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-1997.zip.
2023-11-17 17:49:20 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-2020.zip.
2023-11-17 17:49:22 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-2001.zip.
2023-11-17 17:49:22 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-1991.zip.
2023-11-17 17:49:22 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-1995.zip.
2023-11-17 17:49:25 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-2010.zip.
2023-11-17 17:49:25 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-2006.zip.
2023-11-17 17:49:25 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-2005.zip.
2023-11-17 17:49:25 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-2019.zip.
2023-11-17 17:49:25 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-1994.zip.
2023-11-17 17:49:25 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-2014.zip.
2023-11-17 17:49:26 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-2000.zip.
2023-11-17 17:49:26 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-2008.zip.
2023-11-17 17:49:27 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-2002.zip.
2023-11-17 17:49:28 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-2009.zip.
2023-11-17 17:49:28 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-2004.zip.
2023-11-17 17:49:29 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-1999.zip.
2023-11-17 17:49:29 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-1990.zip.
2023-11-17 17:49:30 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-2007.zip.
2023-11-17 17:49:30 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-2003.zip.
2023-11-17 17:49:31 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-2012.zip.
2023-11-17 17:49:32 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-2022.zip.
2023-11-17 17:49:32 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-2011.zip.
2023-11-17 17:49:32 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-2016.zip.
2023-11-17 17:49:32 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-2013.zip.
2023-11-17 17:49:33 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-1998.zip.
2023-11-17 17:49:35 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-1993.zip.
2023-11-17 17:49:36 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-1992.zip.
2023-11-17 17:49:37 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-1996.zip.
2023-11-17 17:49:38 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-2018.zip.
2023-11-17 17:49:49 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-2021.zip.
2023-11-17 17:50:02 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-2017.zip.
2023-11-17 17:50:06 [    INFO] catalystcoop.pudl_archiver.archivers.classes:358 Downloaded /var/folders/ps/6jyqvztj5fq_tvhwx5h59d7w0000z8/T/tmpud0an70w/eia861-2015.zip.
2023-11-17 17:50:06 [    INFO] catalystcoop.pudl_archiver.depositors.zenodo:92 GET https://zenodo.org/api/deposit/depositions/10152579 - Get deposition for 10152579
2023-11-17 17:50:08 [    INFO] catalystcoop.pudl_archiver.orchestrator:266 No changes detected.
2023-11-17 17:50:08 [    INFO] catalystcoop.pudl_archiver.depositors.zenodo:92 DELETE https://zenodo.org/api/deposit/depositions/10152579 - Deleting deposition
2023-11-17 17:50:39 [    INFO] catalystcoop.pudl_archiver.utils:46 Error while executing <coroutine object ZenodoDepositor._make_requester.<locals>.requester.<locals>.run_request at 0x13a55be20> (try #1, retry in 2s): <class 'pudl_archiver.depositors.zenodo.ZenodoClientError'> - ZenodoClientError(status=504, message=<html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>
, errors=None)
2023-11-17 17:51:12 [    INFO] catalystcoop.pudl_archiver.utils:46 Error while executing <coroutine object ZenodoDepositor._make_requester.<locals>.requester.<locals>.run_request at 0x135d815a0> (try #2, retry in 4s): <class 'pudl_archiver.depositors.zenodo.ZenodoClientError'> - ZenodoClientError(status=504, message=<html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>
, errors=None)
2023-11-17 17:51:22 [    INFO] catalystcoop.pudl_archiver.utils:46 Error while executing <coroutine object ZenodoDepositor._make_requester.<locals>.requester.<locals>.run_request at 0x13a55be20> (try #3, retry in 8s): <class 'pudl_archiver.depositors.zenodo.ZenodoClientError'> - ZenodoClientError(status=500, message=The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application., errors=None)
2023-11-17 17:51:31 [    INFO] catalystcoop.pudl_archiver.utils:46 Error while executing <coroutine object ZenodoDepositor._make_requester.<locals>.requester.<locals>.run_request at 0x135d815a0> (try #4, retry in 16s): <class 'pudl_archiver.depositors.zenodo.ZenodoClientError'> - ZenodoClientError(status=404, message=The persistent identifier does not exist., errors=None)
2023-11-17 17:51:48 [    INFO] catalystcoop.pudl_archiver.utils:46 Error while executing <coroutine object ZenodoDepositor._make_requester.<locals>.requester.<locals>.run_request at 0x13a55be20> (try #5, retry in 32s): <class 'pudl_archiver.depositors.zenodo.ZenodoClientError'> - ZenodoClientError(status=404, message=The persistent identifier does not exist., errors=None)
2023-11-17 17:52:21 [    INFO] catalystcoop.pudl_archiver.utils:46 Error while executing <coroutine object ZenodoDepositor._make_requester.<locals>.requester.<locals>.run_request at 0x135d815a0> (try #6, retry in 64s): <class 'pudl_archiver.depositors.zenodo.ZenodoClientError'> - ZenodoClientError(status=404, message=The persistent identifier does not exist., errors=None)
Encountered exceptions, showing traceback for last one: ["('eia861', ZenodoClientError(status=404, message=The persistent identifier does not exist., errors=None))"]
Traceback (most recent call last):
  File "/Users/zane/miniforge3/envs/pudl-archiver/bin/pudl_archiver", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/zane/code/catalyst/pudl-archiver/src/pudl_archiver/cli.py", line 81, in main
    asyncio.run(archive_datasets(**vars(args)))
  File "/Users/zane/miniforge3/envs/pudl-archiver/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-archiver/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-archiver/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/zane/code/catalyst/pudl-archiver/src/pudl_archiver/__init__.py", line 106, in archive_datasets
    raise exceptions[-1][1]
  File "/Users/zane/code/catalyst/pudl-archiver/src/pudl_archiver/orchestrator.py", line 267, in run
    await self.depositor.delete_deposition(self.new_deposition)
  File "/Users/zane/code/catalyst/pudl-archiver/src/pudl_archiver/depositors/zenodo.py", line 322, in delete_deposition
    await self.request(
  File "/Users/zane/code/catalyst/pudl-archiver/src/pudl_archiver/depositors/zenodo.py", line 113, in requester
    response = await retry_async(
               ^^^^^^^^^^^^^^^^^^
  File "/Users/zane/code/catalyst/pudl-archiver/src/pudl_archiver/utils.py", line 44, in retry_async
    raise e
  File "/Users/zane/code/catalyst/pudl-archiver/src/pudl_archiver/utils.py", line 41, in retry_async
    return await coro
           ^^^^^^^^^^
  File "/Users/zane/code/catalyst/pudl-archiver/src/pudl_archiver/depositors/zenodo.py", line 99, in run_request
    raise ZenodoClientError(
pudl_archiver.depositors.zenodo.ZenodoClientError: ZenodoClientError(status=404, message=The persistent identifier does not exist., errors=None)

Because it never gets to the point of creating a new archive I can't see what paths it producing in the datapackage.json

@zaneselvans
Copy link
Member

zaneselvans commented Nov 18, 2023

Looks like the draft issue also appears in the most recent EIA860M, but here draft appears in both URLs!

    "resources": [
        {
            "profile": "data-resource",
            "name": "eia860m-2015-07.xlsx",
            "path": "https://zenodo.org/api/records/10086240/draft/files/eia860m-2015-07.xlsx/content",
            "remote_url": "https://zenodo.org/api/records/10086240/draft/files/eia860m-2015-07.xlsx/content",
            "title": "eia860m-2015-07.xlsx",
            "parts": {
                "year_month": "2015-07"
            },
            "encoding": "utf-8",
            "mediatype": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
            "format": ".xlsx",
            "bytes": 2730620,
            "hash": "f37c121bd05397e41f4f19c0149a7cd4"
        },

@zaneselvans
Copy link
Member

The web interface also seems to be struggling with deletion of draft records like this one

@jdangerx
Copy link
Member

jdangerx commented Nov 20, 2023

I just pushed up a change that:

  • gets tests to pass! some of the weird behavior we were working around has now gone away!
  • uses the canonical URL, albeit in a sort of brittle way - going to try to refactor that a little bit more.

The dry-run stuff that's going in is similar to some stuff I was seeing when running tests last week / 2 weeks ago:

  1. first dry run makes a draft, populates it
  2. second dry run deletes the existing draft, but for some reason Zenodo times out instead of responding with a 200. Then, on retry, we get 404s since the draft is now gone. When we finish retrying, second dry run errors out
  3. third dry run sees no draft, doesn't have to delete it

This "delete succeeds in reality, but times out" thing was happening on sandbox and eventually stopped happening. But until then, I suppose we could include something in the delete logic of "If we get a 404 on deletion, in particular, we treat that as "deletion was successful!"

@jdangerx jdangerx dismissed their stale review November 20, 2023 21:56

changes have been made!

Unfortunately we can't just create this URL with f'{base_url}/records/{record_id}/files/{filename}' because that would require merging Deposition scope and FileLinks scopes.
@jdangerx jdangerx marked this pull request as ready for review November 20, 2023 23:12
Copy link
Member

@zschira zschira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me!

@jdangerx jdangerx merged commit 4a2b307 into main Nov 21, 2023
2 checks passed
@jdangerx jdangerx deleted the version-fix branch November 21, 2023 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants