Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: new ta tasks #976

Merged
merged 3 commits into from
Jan 15, 2025
Merged

feat: new ta tasks #976

merged 3 commits into from
Jan 15, 2025

Conversation

joseph-sentry
Copy link
Contributor

this PR creates new ta processor and finisher tasks and uses them behind a feature flag in the upload task for a smooth rollout

@joseph-sentry joseph-sentry requested a review from a team December 19, 2024 21:39
Copy link

sentry-io bot commented Dec 19, 2024

🔍 Existing Issues For Review

Your pull request is modifying functions with the following pre-existing issues:

📄 File: tasks/upload.py

Function Unhandled Issue
_schedule_test_results_processing_task [**TypeError: unsupported operand type(s) for

Did you find this useful? React with a 👍 or 👎

Copy link

This PR includes changes to shared. Please review them here: https://github.com/codecov/shared/compare/2674ae99811767e63151590906691aed4c5ce1f9...

@codecov-staging
Copy link

codecov-staging bot commented Dec 19, 2024

Codecov Report

Attention: Patch coverage is 94.89292% with 31 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
tasks/ta_finisher.py 83.13% 28 Missing ⚠️
tasks/test_results_processor.py 92.85% 2 Missing ⚠️
tasks/ta_processor.py 98.64% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link

codecov bot commented Dec 19, 2024

Codecov Report

Attention: Patch coverage is 94.89292% with 31 lines in your changes missing coverage. Please review.

Project coverage is 97.74%. Comparing base (ac302e7) to head (86c8b7d).
Report is 2 commits behind head on main.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
tasks/ta_finisher.py 83.13% 28 Missing ⚠️
tasks/test_results_processor.py 92.85% 2 Missing ⚠️
tasks/ta_processor.py 98.64% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #976      +/-   ##
==========================================
- Coverage   97.79%   97.74%   -0.05%     
==========================================
  Files         447      451       +4     
  Lines       36175    36653     +478     
==========================================
+ Hits        35376    35828     +452     
- Misses        799      825      +26     
Flag Coverage Δ
integration 42.57% <67.05%> (+0.42%) ⬆️
unit 90.19% <57.82%> (-0.28%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

⚠️ Impact Analysis from Codecov is deprecated and will be sunset on Jan 31 2025. See more

@codecov-qa
Copy link

codecov-qa bot commented Dec 19, 2024

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
1778 1 1777 4
View the top 1 failed tests by shortest run time
tasks/tests/unit/test_upload_task.py::TestUploadTaskIntegration::test_upload_task_call_new_ta_tasks
Stack Traces | 0.156s run time
self = <worker.tasks.tests.unit.test_upload_task.TestUploadTaskIntegration object at 0x7f41e4a81fd0>
mocker = <pytest_mock.plugin.MockFixture object at 0x7f41ddb33290>
mock_configuration = <shared.config.ConfigHelper object at 0x7f41dd5b9d30>
dbsession = <sqlalchemy.orm.session.Session object at 0x7f41dda43110>
codecov_vcr = <vcr.cassette.Cassette object at 0x7f41ddb30e90>
mock_storage = <shared.storage.memory.MemoryStorageService object at 0x7f41dda2ef30>
mock_redis = <worker.tasks.tests.unit.test_upload_task.FakeRedis object at 0x7f41dd2306b0>
celery_app = <Celery celery.tests at 0x7f41dd258cd0>

    def test_upload_task_call_new_ta_tasks(
        self,
        mocker,
        mock_configuration,
        dbsession,
        codecov_vcr,
        mock_storage,
        mock_redis,
        celery_app,
    ):
        chord = mocker.patch("tasks.upload.chord")
        _ = mocker.patch("tasks.upload.NEW_TA_TASKS.check_value", return_value=True)
        storage_path = ".../C3C4715CA57C910D11D5EB899FC86A7E/4c4e4654ac25037ae869caeb3619d485970b6304/a84d445c-9c1e-434f-8275-f18f1f320f81.txt"
        redis_queue = [{"url": storage_path, "build_code": "some_random_build"}]
        jsonified_redis_queue = [json.dumps(x) for x in redis_queue]
        mocker.patch.object(UploadTask, "app", celery_app)
    
        mock_repo_provider_service = AsyncMock()
        mock_repo_provider_service.get_commit.return_value = {
            "author": {
                "id": "123",
                "username": "456",
                "email": "789",
                "name": "101",
            },
            "message": "hello world",
            "parents": [],
            "timestamp": str(datetime.now()),
        }
        mock_repo_provider_service.get_ancestors_tree.return_value = {"parents": []}
        mock_repo_provider_service.get_pull_request.return_value = {
            "head": {"branch": "main"},
            "base": {},
        }
        mock_repo_provider_service.list_top_level_files.return_value = [
            {"name": "codecov.yml", "path": "codecov.yml"}
        ]
        mock_repo_provider_service.get_source.return_value = {
            "content": """
            codecov:
                max_report_age: 1y ago
            """
        }
    
        mocker.patch(
            "tasks.upload.get_repo_provider_service",
            return_value=mock_repo_provider_service,
        )
        mocker.patch("tasks.upload.hasattr", return_value=False)
        commit = CommitFactory.create(
            message="",
            commitid="abf6d4df662c47e32460020ab14abf9303581429",
            repository__owner__oauth_token="GHTZB+Mi+.../ubudnSKTJYb/fgN4hRJVJYSIErtidEsCLDJBb8DZzkbXqLujHAnv28aKShXddE/OffwRuwKug==",
            repository__owner__username="ThiagoCodecov",
            repository__owner__service="github",
            repository__yaml={"codecov": {"max_report_age": "1y ago"}},
            repository__name="example-python",
            pullid=1,
            # Setting the time to _before_ patch centric default YAMLs start date of 2024-04-30
            repository__owner__createstamp=datetime(2023, 1, 1, tzinfo=timezone.utc),
            branch="main",
        )
        dbsession.add(commit)
        dbsession.flush()
        dbsession.refresh(commit)
    
        mock_redis.lists[f"uploads/{commit.repoid}/{commit.commitid}/test_results"] = (
            jsonified_redis_queue
        )
    
>       UploadTask().run_impl(
            dbsession,
            commit.repoid,
            commit.commitid,
            report_type="test_results",
        )

.../tests/unit/test_upload_task.py:512: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tasks/upload.py:353: in run_impl
    return self.run_impl_within_lock(
.../local/lib/python3.13.../site-packages/sentry_sdk/tracing_utils.py:673: in func_with_tracing
    return func(*args, **kwargs)
tasks/upload.py:535: in run_impl_within_lock
    self._bulk_insert_coverage_measurements(measurements=measurements)
tasks/upload.py:570: in _bulk_insert_coverage_measurements
    bulk_insert_coverage_measurements(measurements=measurements)
.../local/lib/python3.13.../shared/upload/utils.py:47: in bulk_insert_coverage_measurements
    with transaction.atomic():
.../local/lib/python3.13.../django/db/transaction.py:198: in __enter__
    if not connection.get_autocommit():
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <DatabaseWrapper vendor='postgresql' alias='default'>

    def get_autocommit(self):
        """Get the autocommit state."""
>       self.ensure_connection()
E       RuntimeError: Database access not allowed, use the "django_db" mark, or the "db" or "transactional_db" fixtures to enable it.

.../local/lib/python3.13.../backends/base/base.py:464: RuntimeError

To view more test analytics, go to the Test Analytics Dashboard
📢 Thoughts on this report? Let us know!

Copy link

codecov-public-qa bot commented Dec 19, 2024

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
1778 1 1777 4
View the top 1 failed tests by shortest run time
tasks/tests/unit/test_upload_task.py::TestUploadTaskIntegration::test_upload_task_call_new_ta_tasks
Stack Traces | 0.156s run time
self = <worker.tasks.tests.unit.test_upload_task.TestUploadTaskIntegration object at 0x7f41e4a81fd0>
mocker = <pytest_mock.plugin.MockFixture object at 0x7f41ddb33290>
mock_configuration = <shared.config.ConfigHelper object at 0x7f41dd5b9d30>
dbsession = <sqlalchemy.orm.session.Session object at 0x7f41dda43110>
codecov_vcr = <vcr.cassette.Cassette object at 0x7f41ddb30e90>
mock_storage = <shared.storage.memory.MemoryStorageService object at 0x7f41dda2ef30>
mock_redis = <worker.tasks.tests.unit.test_upload_task.FakeRedis object at 0x7f41dd2306b0>
celery_app = <Celery celery.tests at 0x7f41dd258cd0>

    def test_upload_task_call_new_ta_tasks(
        self,
        mocker,
        mock_configuration,
        dbsession,
        codecov_vcr,
        mock_storage,
        mock_redis,
        celery_app,
    ):
        chord = mocker.patch("tasks.upload.chord")
        _ = mocker.patch("tasks.upload.NEW_TA_TASKS.check_value", return_value=True)
        storage_path = ".../C3C4715CA57C910D11D5EB899FC86A7E/4c4e4654ac25037ae869caeb3619d485970b6304/a84d445c-9c1e-434f-8275-f18f1f320f81.txt"
        redis_queue = [{"url": storage_path, "build_code": "some_random_build"}]
        jsonified_redis_queue = [json.dumps(x) for x in redis_queue]
        mocker.patch.object(UploadTask, "app", celery_app)
    
        mock_repo_provider_service = AsyncMock()
        mock_repo_provider_service.get_commit.return_value = {
            "author": {
                "id": "123",
                "username": "456",
                "email": "789",
                "name": "101",
            },
            "message": "hello world",
            "parents": [],
            "timestamp": str(datetime.now()),
        }
        mock_repo_provider_service.get_ancestors_tree.return_value = {"parents": []}
        mock_repo_provider_service.get_pull_request.return_value = {
            "head": {"branch": "main"},
            "base": {},
        }
        mock_repo_provider_service.list_top_level_files.return_value = [
            {"name": "codecov.yml", "path": "codecov.yml"}
        ]
        mock_repo_provider_service.get_source.return_value = {
            "content": """
            codecov:
                max_report_age: 1y ago
            """
        }
    
        mocker.patch(
            "tasks.upload.get_repo_provider_service",
            return_value=mock_repo_provider_service,
        )
        mocker.patch("tasks.upload.hasattr", return_value=False)
        commit = CommitFactory.create(
            message="",
            commitid="abf6d4df662c47e32460020ab14abf9303581429",
            repository__owner__oauth_token="GHTZB+Mi+.../ubudnSKTJYb/fgN4hRJVJYSIErtidEsCLDJBb8DZzkbXqLujHAnv28aKShXddE/OffwRuwKug==",
            repository__owner__username="ThiagoCodecov",
            repository__owner__service="github",
            repository__yaml={"codecov": {"max_report_age": "1y ago"}},
            repository__name="example-python",
            pullid=1,
            # Setting the time to _before_ patch centric default YAMLs start date of 2024-04-30
            repository__owner__createstamp=datetime(2023, 1, 1, tzinfo=timezone.utc),
            branch="main",
        )
        dbsession.add(commit)
        dbsession.flush()
        dbsession.refresh(commit)
    
        mock_redis.lists[f"uploads/{commit.repoid}/{commit.commitid}/test_results"] = (
            jsonified_redis_queue
        )
    
>       UploadTask().run_impl(
            dbsession,
            commit.repoid,
            commit.commitid,
            report_type="test_results",
        )

.../tests/unit/test_upload_task.py:512: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tasks/upload.py:353: in run_impl
    return self.run_impl_within_lock(
.../local/lib/python3.13.../site-packages/sentry_sdk/tracing_utils.py:673: in func_with_tracing
    return func(*args, **kwargs)
tasks/upload.py:535: in run_impl_within_lock
    self._bulk_insert_coverage_measurements(measurements=measurements)
tasks/upload.py:570: in _bulk_insert_coverage_measurements
    bulk_insert_coverage_measurements(measurements=measurements)
.../local/lib/python3.13.../shared/upload/utils.py:47: in bulk_insert_coverage_measurements
    with transaction.atomic():
.../local/lib/python3.13.../django/db/transaction.py:198: in __enter__
    if not connection.get_autocommit():
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <DatabaseWrapper vendor='postgresql' alias='default'>

    def get_autocommit(self):
        """Get the autocommit state."""
>       self.ensure_connection()
E       RuntimeError: Database access not allowed, use the "django_db" mark, or the "db" or "transactional_db" fixtures to enable it.

.../local/lib/python3.13.../backends/base/base.py:464: RuntimeError

To view more test analytics, go to the Test Analytics Dashboard
📢 Thoughts on this report? Let us know!

Copy link

github-actions bot commented Dec 19, 2024

✅ All tests successful. No failed tests were found.

📣 Thoughts on this report? Let Codecov know! | Powered by Codecov

tasks/upload.py Outdated
Comment on lines 660 to 662
arguments_list=list(chunk),
)
for chunk in itertools.batched(argument_list, CHUNK_SIZE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we still want to run these in batches, or rather one upload per task?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one upload per task seems reasonable now that we aren't writing to the db in the processor


def test_test_analytics(dbsession, mocker, celery_app):
url = "literally/whatever"
storage_service = get_appropriate_storage_service(None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you want to use the mock storage provider for this?

Comment on lines 110 to 116
mocker.patch.object(TAProcessorTask, "app", celery_app)
mocker.patch.object(TAFinisherTask, "app", celery_app)

hello = celery_app.register_task(ProcessFlakesTask())
_ = celery_app.tasks[hello.name]
goodbye = celery_app.register_task(CacheTestRollupsTask())
_ = celery_app.tasks[goodbye.name]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have never seen this pattern, what does it do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without this, when the finisher would try to call those tasks, they weren't in the mocked celery app, so what i was trying to do here is add them to the mocked celery app

i replaced this with some code that is hopefully more clear

user-agent:
- Default
method: GET
uri: https://api.github.com/repos/ThiagoCodecov/example-python/commits/abf6d4df662c47e32460020ab14abf9303581429
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rather mock away whatever call does this request, instead of relying on vcr?

Comment on lines 254 to 256
for upload in uploads:
repo_flag_ids = get_repo_flag_ids(db_session, repoid, upload.flag_names)
if upload.state == "processed":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this loop has a couple of problems:

  • you are querying all uploads from the DB, but only ever run the code on processed ones
  • you only append to tests_to_write and friends, but never clear those across uploads
  • save_tests and friends runs for all the uploads, together with never clearing tests_to_write above means that you insert the same tests over and over again depending on how many total uploads you have
  • you unconditionally set state = "finished" for all the downloads, also ones that already have that state
  • the intermediate msgpack file is never cleared.

@joseph-sentry
Copy link
Contributor Author

@Swatinem sorry i got confused and rebased and force pushed but i really just added 5 new commits on top of the existing ones and didn't modify any of the existing ones

Copy link

This PR includes changes to shared. Please review them here: https://github.com/codecov/shared/compare/2674ae99811767e63151590906691aed4c5ce1f9...

Comment on lines +25 to 26
report__commit__repository__repoid=repo_id,
report__commit__commitid=commit_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe commit_id already uniquely identifies the report, so no need for an additional repository join.

tasks/tests/unit/test_upload_task.py Show resolved Hide resolved
tasks/tests/unit/test_ta_processor_task.py Show resolved Hide resolved
services/ta_finishing.py Outdated Show resolved Hide resolved
services/ta_finishing.py Outdated Show resolved Hide resolved
Copy link

github-actions bot commented Jan 9, 2025

This PR includes changes to shared. Please review them here: https://github.com/codecov/shared/compare/609e56d2aa30b26d44cddaba0e1ebd79ba954ac9...

@@ -1,6 +0,0 @@
import os
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m 👍🏻 on removing these if they won’t ever be used again, but its probably best to do that in a separate PR.

Copy link

This PR includes changes to shared. Please review them here: https://github.com/codecov/shared/compare/609e56d2aa30b26d44cddaba0e1ebd79ba954ac9...

this commit essentially does 3 things:
- creates the new ta_processor and ta_finisher tasks
  - the difference between these tasks and the old ones is that these
    ones use upload states differently
  - these ones also use the TA storage module to persist data to BQ
- updates the version of the test results parser being used
  - we've gone from parsing individual JUnit XML files to parsing the
    entire raw upload at once
- creates the ta_storage module
  - the ta_storage module serves as an abstraction for persisting data
    to both PG and BQ
Copy link

This PR includes changes to shared. Please review them here: https://github.com/codecov/shared/compare/de4b37bc5a736317c6e7c93f9c58e9ae07f8c96b...

@joseph-sentry joseph-sentry added this pull request to the merge queue Jan 15, 2025
Merged via the queue into main with commit 2c7bd18 Jan 15, 2025
18 of 27 checks passed
@joseph-sentry joseph-sentry deleted the joseph/new-ta-tasks branch January 15, 2025 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants