Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script to sync a local directory up to a Zenodo DOI #3127

Merged
merged 8 commits into from
Dec 14, 2023

Conversation

zaneselvans
Copy link
Member

@zaneselvans zaneselvans commented Dec 6, 2023

PR Overview

Just struggling with the Zenodo API for the moment. Can easily do a manual upload of the files as well.

PR Checklist

  • Merge the most recent version of the branch you are merging into (probably dev).
  • All CI checks are passing. Run tests locally to debug failures
  • Make sure you've included good docstrings.
  • For major data coverage & analysis changes, run data validation tests
  • Include unit tests for new functions and classes.
  • Defensive data quality/sanity checks in analyses & data processing functions.
  • Update the release notes and reference reference the PR and related issues.
  • Do your own explanatory review of the PR to help the reviewer understand what's going on and identify issues preemptively.

@zaneselvans zaneselvans added zenodo Issues having to do with Zenodo data archiving and retrieval. release Tasks directly related to data and software releases. labels Dec 6, 2023
@zaneselvans zaneselvans self-assigned this Dec 6, 2023
@jdangerx jdangerx force-pushed the zenodo-data-autorelease branch from 89a1fbc to 8eda506 Compare December 12, 2023 19:09
@jdangerx jdangerx changed the title WIP: very drafty Zenodo data release script. Script to sync a local directory up to a Zenodo DOI Dec 12, 2023
@jdangerx jdangerx force-pushed the zenodo-data-autorelease branch from 2aa371c to e6348b7 Compare December 12, 2023 19:51
@jdangerx jdangerx marked this pull request as ready for review December 12, 2023 19:51
@jdangerx
Copy link
Member

So you should be able to use this script to sync a local directory up to a Zenodo DOI - and it will leave a draft for you to manually approve, or just YOLO it and publish (with the --publish flag).

I think this functionality is useful enough to merge into dev, and the next PR will have to handle:

  • getting Zenodo creds into nightly build runner - either through GH secrets or Google Secret Manager
  • only publishing when v20xx tag is pushed

I wonder if that is enough funkiness to warrant moving gcp_pudl_etl.sh into a Python script...
I also wonder how much effort it would be to use Google Batch for this instead of a VM, since we already have this all Dockerized.

Copy link
Member Author

@zaneselvans zaneselvans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly questions.

devtools/zenodo/zenodo_data_release.py Outdated Show resolved Hide resolved
devtools/zenodo/zenodo_data_release.py Outdated Show resolved Hide resolved
devtools/zenodo/zenodo_data_release.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented Dec 12, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (6293199) 92.7% compared to head (8e88c26) 92.7%.
Report is 6 commits behind head on dev.

Additional details and impacted files
@@          Coverage Diff          @@
##             dev   #3127   +/-   ##
=====================================
  Coverage   92.7%   92.7%           
=====================================
  Files        134     134           
  Lines      12597   12597           
=====================================
+ Hits       11672   11673    +1     
+ Misses       925     924    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jdangerx jdangerx force-pushed the zenodo-data-autorelease branch from 2bc73e4 to b8b4fc6 Compare December 12, 2023 21:43
@zaneselvans
Copy link
Member Author

zaneselvans commented Dec 13, 2023

Notes on attempted use:

  • We should display the default values for all the command line args.
  • It's a little unusual to have required arguments also require a --flag to identify them. The --source-dir feels like an argument not an option to me. Not sure about the --record-id. It kinda feels like the source directory is the "from" and the record ID is the "to", but "from" a directory "to" an integer is kinda weird.
  • We should log each filename as it's being uploaded.
  • I tried making an update to this sandbox archive using the nightly build outputs from last night with the following command:
./zenodo_data_release.py --env sandbox --concept-rec-id 5563 --no-publish --source-dir ../../../no-backups/nightly

and got the following error:

INFO:root:Getting new version for 5563
INFO:root:Draft 6640 has 22 existing files, deleting...
INFO:root:Syncing files from LocalSource(path=PosixPath('../../../no-backups/nightly')) to draft 6640...
Traceback (most recent call last):
  File "/Users/zane/code/catalyst/pudl/devtools/zenodo/./zenodo_data_release.py", line 343, in <module>
    pudl_zenodo_data_release()
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/code/catalyst/pudl/devtools/zenodo/./zenodo_data_release.py", line 331, in pudl_zenodo_data_release
    .sync_directory(source_data)
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/code/catalyst/pudl/devtools/zenodo/./zenodo_data_release.py", line 236, in sync_directory
    self.zenodo_client.create_deposition_file(self.record_id, name, blob)
  File "/Users/zane/code/catalyst/pudl/devtools/zenodo/./zenodo_data_release.py", line 150, in create_deposition_file
    response = requests.post(
               ^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/requests/api.py", line 115, in post
    return request("post", url, data=data, json=json, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/requests/sessions.py", line 575, in request
    prep = self.prepare_request(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/requests/sessions.py", line 486, in prepare_request
    p.prepare(
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/requests/models.py", line 371, in prepare
    self.prepare_body(data, files, json)
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/requests/models.py", line 556, in prepare_body
    (body, content_type) = self._encode_files(files, data)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/requests/models.py", line 191, in _encode_files
    fdata = fp.read()
            ^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa2 in position 82: invalid start byte
  • Logging in, I looked at the drafts and found 6640 still there, unpublished. Two small JSON files had successfully uploaded. No metadata was filled in. So maybe Requests doesn't like getting binary data here? Maybe we need to do f.open("rb")? Will that work for both the UTF-8 and binary files?
  • Attempting to run the update again without altering the draft at all, it successfully finds the draft with 2 files and removes them.
  • Logging the filenames, I can see that the next file it's attempting to upload is the big EPA CEMS Parquet file, and it fails with the same error as above again.
  • I tried a 3rd time, but now using the actual record ID, rather than the ID associated with the concept DOI, and it all worked the same, so it seems like we don't need to restrict the ID being used to be the concept record ID and could use --record-id instead, which feels more legible to me, and it's easy to pull the ID out of the URL by hand, whereas finding the concept record ID in the sidebar of the page can be a little obscure. For the automated uploads we'll want to actually use the concept record ID so we never have to update it. But actually do we even want users to be able to specify any random record ID? There's only one lineage of data releases on Zenodo and we probably want to keep it that way, and making it easy to accidentally create a new version of any of our archives and put the wrong data there seems dangerous. Maybe we should just store the sandbox/production data release record IDs in the script itself.
  • When we call Path.iterdir() to get the list of files to upload, we should check that they're actually files, and not subdirectories. Or skip subdirectories. Right now it crashes if it finds a directory.
  • I switched to using f.open("rb") and tried uploading again and got a different error:
./zenodo_data_release.py --env sandbox --no-publish ../../../no-backups/nightly
INFO:root:Getting new version for 5563
INFO:root:Draft 6640 has 2 existing files, deleting...
INFO:root:Syncing files from LocalSource(path=PosixPath('../../../no-backups/nightly')) to draft 6640...
INFO:root:Uploading ferc60_xbrl_taxonomy_metadata.json to new deposition...
INFO:root:Uploading ferc6_xbrl_taxonomy_metadata.json to new deposition...
INFO:root:Uploading hourly_emissions_epacems.parquet to new deposition...
Traceback (most recent call last):
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/urllib3/connectionpool.py", line 715, in urlopen
    httplib_response = self._make_request(
                       ^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/urllib3/connectionpool.py", line 416, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/urllib3/connection.py", line 244, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/http/client.py", line 1286, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/http/client.py", line 1332, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/http/client.py", line 1281, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/http/client.py", line 1080, in _send_output
    self.send(chunk)
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/http/client.py", line 1002, in send
    self.sock.sendall(data)
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/ssl.py", line 1274, in sendall
    v = self.send(byte_view[count:])
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/ssl.py", line 1243, in send
    return self._sslobj.write(data)
           ^^^^^^^^^^^^^^^^^^^^^^^^
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2427)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/urllib3/connectionpool.py", line 799, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='sandbox.zenodo.org', port=443): Max retries exceeded with url: /api/deposit/depositions/6640/files (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/zane/code/catalyst/pudl/devtools/zenodo/./zenodo_data_release.py", line 349, in <module>
    pudl_zenodo_data_release()
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/code/catalyst/pudl/devtools/zenodo/./zenodo_data_release.py", line 337, in pudl_zenodo_data_release
    .sync_directory(source_data)
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/code/catalyst/pudl/devtools/zenodo/./zenodo_data_release.py", line 237, in sync_directory
    self.zenodo_client.create_deposition_file(self.record_id, name, blob)
  File "/Users/zane/code/catalyst/pudl/devtools/zenodo/./zenodo_data_release.py", line 151, in create_deposition_file
    response = requests.post(
               ^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/requests/api.py", line 115, in post
    return request("post", url, data=data, json=json, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/requests/adapters.py", line 517, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='sandbox.zenodo.org', port=443): Max retries exceeded with url: /api/deposit/depositions/6640/files (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')))
  • I temporarily removed the big Parquet file and larger SQLite DBs from the source directory and tried again. This time it successfully uploaded one of the smaller SQLite DBs (MD5 sum matches local file) before crashing when encountering a subdirectory. So it wasn't the binary Parquet data that was causing an issue.
  • Inspecting the uploaded JSON files, they look none the worse for being treated as binary data.
  • Removing all the larger files and the subdirectory, the smaller files (up to 14MB) upload successfully and it gives me a link to the draft to review. Publication date has been updated, Title, Resource Type, and Creators seem to have come across from the previous version.
  • Licensing information failed to come across from the previous version which is CC-BY-4.0. The new archive is listed as public domain CC-0 which is incorrect.
  • Trying another upload after adding back in all of the files <100MB in size. Pleasantly, it seems unphased by my reloading the draft upload on the web while it's uploading the files, so I can see them as they come in. It took 6m41s but all of the medium-sized files uploaded just fine.
  • Adding back in all but the Parquet file, it chokes on pudl.sqlite.gz which is ~2.8GB with an error like with the Parquet. Is there a filesize limit with these SSL objects?
INFO:root:Uploading pudl.sqlite.gz to new deposition...
Traceback (most recent call last):
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/urllib3/connectionpool.py", line 715, in urlopen
    httplib_response = self._make_request(
                       ^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/urllib3/connectionpool.py", line 416, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/urllib3/connection.py", line 244, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/http/client.py", line 1286, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/http/client.py", line 1332, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/http/client.py", line 1281, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/http/client.py", line 1080, in _send_output
    self.send(chunk)
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/http/client.py", line 1002, in send
    self.sock.sendall(data)
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/ssl.py", line 1274, in sendall
    v = self.send(byte_view[count:])
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/ssl.py", line 1243, in send
    return self._sslobj.write(data)
           ^^^^^^^^^^^^^^^^^^^^^^^^
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2427)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/urllib3/connectionpool.py", line 799, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='sandbox.zenodo.org', port=443): Max retries exceeded with url: /api/deposit/depositions/6640/files (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/zane/code/catalyst/pudl/devtools/zenodo/./zenodo_data_release.py", line 349, in <module>
    pudl_zenodo_data_release()
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/code/catalyst/pudl/devtools/zenodo/./zenodo_data_release.py", line 337, in pudl_zenodo_data_release
    .sync_directory(source_data)
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/code/catalyst/pudl/devtools/zenodo/./zenodo_data_release.py", line 237, in sync_directory
    self.zenodo_client.create_deposition_file(self.record_id, name, blob)
  File "/Users/zane/code/catalyst/pudl/devtools/zenodo/./zenodo_data_release.py", line 151, in create_deposition_file
    response = requests.post(
               ^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/requests/api.py", line 115, in post
    return request("post", url, data=data, json=json, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.11/site-packages/requests/adapters.py", line 517, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='sandbox.zenodo.org', port=443): Max retries exceeded with url: /api/deposit/depositions/6640/files (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')))
  • Oh, wait. The old-old Zenodo API has a 100MB filesize limit and to upload larger files you have to use the bucket_url. Testing on one of the smaller files that's still bigger than 100MB and yes, it does fail.

@zaneselvans zaneselvans marked this pull request as draft December 13, 2023 06:22
@jdangerx
Copy link
Member

jdangerx commented Dec 13, 2023

Thanks for the digging, I would have definitely gotten confused by the bucket API thing! Here's my TODO list for fixes:

TODO

  • use f.open("rb")
  • log out what license we saw from the most recent record. or just force cc-by-4.0 if I can't figure this out.
  • Use bucket API! Ugh.
  • Clean up command line args
  • add logs per filename
  • make better log format :))

@jdangerx jdangerx force-pushed the zenodo-data-autorelease branch from d394f80 to 7c4f107 Compare December 13, 2023 17:14
@jdangerx
Copy link
Member

I do generally prefer "keyword" vs. "positional" arguments because they are much harder to misuse - so keeping that in for now.

The Whole Dang Nightly Builds Output also made it to Sandbox! With the following command:

./devtools/zenodo/zenodo_data_release.py --rec-id 5563 --source-dir ~/scratch/pudl_2023.12.01/

And took about 2 hours, 15 minutes.

@jdangerx jdangerx marked this pull request as ready for review December 13, 2023 19:38
Copy link
Member Author

@zaneselvans zaneselvans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested some clarifications on the help messages, and would like to know what you think about embedding the target record IDs in the script, at least as defaults.

devtools/zenodo/zenodo_data_release.py Outdated Show resolved Hide resolved
devtools/zenodo/zenodo_data_release.py Outdated Show resolved Hide resolved
devtools/zenodo/zenodo_data_release.py Outdated Show resolved Hide resolved
devtools/zenodo/zenodo_data_release.py Outdated Show resolved Hide resolved
devtools/zenodo/zenodo_data_release.py Outdated Show resolved Hide resolved
devtools/zenodo/zenodo_data_release.py Outdated Show resolved Hide resolved
devtools/zenodo/zenodo_data_release.py Outdated Show resolved Hide resolved
@zaneselvans
Copy link
Member Author

@jdangerx jdangerx enabled auto-merge (squash) December 13, 2023 21:23
Copy link
Member

@jdangerx jdangerx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weirdly, this became a PR where I wrote the code and Zane was reviewing. But he made a ✅ so I'm making the official GH PR-approval.

@zaneselvans
Copy link
Member Author

@jdangerx Will the auto-merge go through if the branch is out of date with the base branch? I feel like several PRs of mine have gotten stuck lately because of this.

@jdangerx
Copy link
Member

jdangerx commented Dec 13, 2023

I think it will work if and only if the PR is still "auto-mergeable" / there are no conflicts. We'll see!

@jdangerx
Copy link
Member

Looks like the answer, from another PR, is "no." So I'm gonna click that 'update branch' button.

@jdangerx jdangerx force-pushed the zenodo-data-autorelease branch from 87e1a8d to 8e88c26 Compare December 13, 2023 22:05
@zaneselvans zaneselvans disabled auto-merge December 14, 2023 01:07
@zaneselvans zaneselvans merged commit d96694d into dev Dec 14, 2023
16 checks passed
@zaneselvans zaneselvans deleted the zenodo-data-autorelease branch December 14, 2023 01:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release Tasks directly related to data and software releases. zenodo Issues having to do with Zenodo data archiving and retrieval.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants