Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficient Repository Updates instead of Refetching #103

Merged
merged 9 commits into from
Sep 28, 2024

Conversation

GODrums
Copy link
Collaborator

@GODrums GODrums commented Sep 24, 2024

Motivation

With our CRON-job, we want to make updating the repository and all its elements like PullRequests, Reviews and Comments as efficient as possible. This entails only fetching updated elements and updating entities instead of having to recreate them.

Description

This PR modifies the Github sync algorithm to allow for efficient refetching and updates:

  • The sync algorithm now uses a cutOffDate to determine how many PRs have to be refetched
  • The sync algorithm now checks for existing entities before creating new ones
  • Fields in entities can now be updated with the update-function of their respective converters
  • Relationships in entities now propagate changes via REFRESH instead of MERGE

Remark
The GitHubDataSyncService may get overly complex at some point. I would appreciate any suggestions on how to improve its structure or separate the logic into multiple classes,

Checklist

General

  • PR title is clear and descriptive
  • PR description explains the purpose and changes
  • Code follows project coding standards
  • Self-review of the code has been done
  • Changes have been tested locally
  • Screenshots have been attached (if applicable)
  • Documentation has been updated (if applicable)

Server

  • Code is performant and follows best practices
  • No security vulnerabilities introduced
  • Proper error handling has been implemented
  • Added tests for new functionality
  • Changes have been tested in different environments (if applicable)

@GODrums GODrums added enhancement New feature or request application-server labels Sep 24, 2024
@GODrums GODrums self-assigned this Sep 24, 2024
@github-actions github-actions bot added bug Something isn't working size:L This PR changes 100-499 lines, ignoring generated files. labels Sep 24, 2024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can simplify the sync service a bit. coplit suggested the following:
To simplify the GitHubDataSyncService class, we can break down its responsibilities into smaller, more manageable methods and classes. This will make the code more modular and easier to maintain. Here are some steps to achieve this:
Extract GitHub Initialization: Move the GitHub client initialization to a separate method.
Extract Repository Synchronization: Move the repository synchronization logic to a separate method.
Extract Pull Request Handling: Move the pull request handling logic to a separate method.
Extract Review Handling: Move the review handling logic to a separate method.
Extract Comment Handling: Move the comment handling logic to a separate method.

Copy link
Collaborator Author

@GODrums GODrums Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really liked the suggestion for the Github client init (done in ce3b04a). The other ones are already kinda separated in the suggested manner.
My main point of improvement would be finding a cleaner way of implementing the "get entity from Cache/DB/Create New"-methods. Ideally, this should be externalized using some abstract generalization (for example in the entities) or directly moved to a dedicated service. I wasn't able to find an appropriate solution yet though.

@FelixTJDietrich
Copy link
Collaborator

Code LGTM, thank you!

@FelixTJDietrich FelixTJDietrich merged commit bff82a1 into develop Sep 28, 2024
3 checks passed
@FelixTJDietrich FelixTJDietrich deleted the fix/cron-data-refreshes branch September 28, 2024 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
application-server bug Something isn't working enhancement New feature or request size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants