Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export additional columns from requests collection #52

Open
wants to merge 11 commits into
base: develop
Choose a base branch
from

Conversation

DJensen94
Copy link

🗣 Description

In order to keep cyhy data in sync with the mini data lake we need to pull additional fields. This PR adds additional fields to the projection from the requests table.

💭 Motivation and context

This will allow the mini data lake to keep in sync with the cyhy database, and it will also allow the ASM Visibility Dashboard to use these additional fields when generating dashboards and creating reports

🧪 Testing

Verified all added projections are fields in the requests collection

✅ Pre-approval checklist

  • This PR has an informative and human-readable title.
  • Changes are limited to a single goal - eschew scope creep!
  • All future TODOs are captured in issues, which are referenced
    in code comments.
  • All relevant type-of-change labels have been added.
  • I have read the CONTRIBUTING document.
  • These code changes follow cisagov code standards.
  • All relevant repo and/or project documentation has been updated
    to reflect the changes in this PR.
  • Tests have been added and/or modified to cover the changes in this PR.
  • All new and existing tests pass.

✅ Pre-merge checklist

  • Revert dependencies to default branches.
  • Finalize version.

✅ Post-merge checklist

  • Create a release.

DJensen94 added 2 commits May 6, 2024 13:08
Update projection for the requests query to pull additional fields
remove blank spaces
run linter to format the changes
@DJensen94
Copy link
Author

@dav3r Here is the PR with the updated projection

Copy link
Member

@mcdonnnj mcdonnnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe with this pull request the projection becomes everything per the schema. If that is the case we should consider using an empty projection.

Since we are pulling all columns from the collection we can just pass an empty projection
@DJensen94
Copy link
Author

I believe with this pull request the projection becomes everything per the schema. If that is the case we should consider using an empty projection.

I agree that makes more sense, just wasn't sure if we wanted to be explicit in stating all fields, but this commit removes the projections: e1f73a9

aws_jobs/cyhy-data-extract.py Outdated Show resolved Hide resolved
aws_jobs/cyhy-data-extract.py Outdated Show resolved Hide resolved
@@ -390,6 +390,8 @@ def main():
"agency.location": True,
"agency.name": True,
"agency.type": True,
"agency.contacts": True,
"key": True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key is sensitive data, which is why it was never exported in the past. Please provide confirmation via email that the CyHy data owner and the owner of the receiving system (the Analytics Environment) understand and accept the risks of sharing this data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With e1f73a9, the key field is no longer explicitly mentioned, but the concern above still stands.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reached out to get that approval

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DJensen94 Do you have any updates on this?

@jsf9k jsf9k added the improvement This issue or pull request will add new or improve existing functionality label May 6, 2024
Copy link
Member

@jsf9k jsf9k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm requesting that a comment be added.

aws_jobs/cyhy-data-extract.py Outdated Show resolved Hide resolved
DJensen94 and others added 4 commits May 6, 2024 15:32
Add comment to note that the empty projection exports all fields, which can contain sensitive data

Co-authored-by: Shane Frasier <[email protected]>
update projection to exclude data that wasn't necessary in the mini data lake
Alphabetize projection fields
reformat spacing via the linter
@jsf9k
Copy link
Member

jsf9k commented May 7, 2024

@DJensen94 - You needn't run the linter manually. See the instructions for setting up pre-commit locally.

DJensen94 added 2 commits May 13, 2024 12:06
extract the snapshots to give historical counts to the data lake
fix spacing in dictionary
@dav3r
Copy link
Member

dav3r commented May 14, 2024

@DJensen94 Please update the PR description to clearly specify which new fields and collections are being added in this PR, as well as the reason(s) why they are now required. Thanks!

we are removing snapshots pull and putting it into its own PR
@dav3r
Copy link
Member

dav3r commented May 14, 2024

Regarding 9b4879e, why are you putting the snapshots change in a separate PR?

@DJensen94
Copy link
Author

Regarding 9b4879e, why are you putting the snapshots change in a separate PR?

I was asked to move this to a separate pull request since we are still awaiting approval for the key and contacts and the snapshots pull is more time sensitive.

@dav3r
Copy link
Member

dav3r commented Oct 7, 2024

@DJensen94 @climber-girl @jessiebeals - Is this PR still needed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement This issue or pull request will add new or improve existing functionality
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

4 participants