Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add duplicate identifier check of harvest source #5039

Open
1 task
rshewitt opened this issue Jan 14, 2025 · 2 comments
Open
1 task

Add duplicate identifier check of harvest source #5039

rshewitt opened this issue Jan 14, 2025 · 2 comments
Labels
H2.0/Harvest-Runner Harvest Source Processing for Harvesting 2.0

Comments

@rshewitt
Copy link
Contributor

User Story

In order to detect erroneous records in the harvest runner process, datagov wants to add a check in external_records_to_id_hash which throws an error when duplicate identifiers are found

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

  • GIVEN a harvest source with duplicate record identifiers
    WHEN an identifier is found to already exist in HarvestSource.external_records
    THEN an error is thrown
    [AND optionally another verifiable outcome]

Background

[Any helpful contextual notes or links to artifacts/evidence, if needed]

Security Considerations (required)

[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]

Sketch

[Notes or a checklist reflecting our understanding of the selected approach]

@rshewitt rshewitt added the H2.0/Harvest-Runner Harvest Source Processing for Harvesting 2.0 label Jan 14, 2025
@FuhuXia
Copy link
Member

FuhuXia commented Jan 14, 2025

The error should be on the record level. The first record with the identifier should be processed, any other records with the same identifier should have a record level error raised.

@rshewitt
Copy link
Contributor Author

this occurs before we write the compare to the database so there's no record we can associate the error to

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
H2.0/Harvest-Runner Harvest Source Processing for Harvesting 2.0
Projects
Status: No status
Development

No branches or pull requests

2 participants