async release channel #1512

technillogue · 2024-02-02T23:03:07Z

this PR tracks the async release channel for the 0.10.0 release

Signed-off-by: technillogue <[email protected]>

* have runner return asyncio.Task instead of AsyncFuture * make tests async and fix them * delete remaining runner thread code :) * review changes to tests and server (reverts commit 828eee9) Signed-off-by: technillogue <[email protected]>

this is the first step towards supporting continuous batching and concurrent predictions. eventually, you will be configure it so your predict function will be called concurrently * bare minimum to support async predict * add async tests Signed-off-by: technillogue <[email protected]>

* conditionally create the event loop if predictor is async, and add a path for hypothetical async setup * don't use async for predict loop if predict is not async * add test cases for shared loop across setup and predict + asyncio.run in setup (reverts commit b533c6b) Signed-off-by: technillogue <[email protected]>

Signed-off-by: technillogue <[email protected]>

* async Worker._wait and its consequences * AsyncPipe so that we can process idempotent endpoint and cancellation rather than _wait blocking the event loop * test_prediction_cancel can be flaky on some machines * separate _process_list to be less surprising than isasyncgen * sleep wasn't needed * suggestions from review * suggestions from review * even more suggestions from review --------- Signed-off-by: technillogue <[email protected]> Co-authored-by: Nick Stenning <[email protected]> Signed-off-by: technillogue <[email protected]>

Signed-off-by: technillogue <[email protected]>

* race utility for racing awaitables * start mux, tag events with id, read pipe in a task, get events from mux * use async pipe for async child loop * _shutting_down vs _terminating * race with shutdown event * keep reading events during shutdown, but call terminate after the last Done * emit heartbeats from mux.read * don't use _wait. instead, setup reads event from the mux too * worker semaphore and prediction ctx * where _wait used to raise a fatal error, have _read_events set an error on Mux, and then Mux.read can raise the error in the right context. otherwise, the exception is stuck in a task and doesn't propagate correctly * fix event loop errors for <3.9 * keep track of predictions in flight explicitly and use that to route logs * don't wait for executor shutdown * progress: check for cancelation in task done_handler * let mux check if child is alive and set mux shutdown after leaving read event loop * close pipe when exiting * predict requires IDLE or PROCESSING * try adding a BUSY state distinct from PROCESSING when we no longer have capacity * move resetting events to setup() instead of _read_events() previously this was in _read_events because it's a coroutine that will have the correct event loop. however, _read_events actually gets created in a task, which can run *after* the first mux.read call by setup. since setup is now the first async entrypoint in worker and in tests, we can safely move it there * state_from_predictions_in_flight instead of checking the value of semaphore * make prediction_ctx "private" Signed-off-by: technillogue <[email protected]>

* add concurrency to config * this basically works! * more descriptive names for predict functions * maybe pass through prediction id and try to make cancelation do both? * don't cancel from signal handler if a loop is running. expose worker busy state to runner * move handle_event_stream to PredictionEventHandler * make setup and canceling work * drop some checks around cancelation * try out eager_predict_state_change * keep track of multiple runner prediction tasks to make idempotent endpoint return the same result and fix tests somewhat * fix idempotent tests * fix remaining errors? * worker predict_generator shouldn't be eager * wip: make the stuff that handles events and sends webhooks etc async * drop Runner._result * drop comments * inline client code * get started * inline webhooks * move clients into runner, switch to httpx, move create_event_handler into runner * add some comments * more notes * rip out webhooks and most of files and put them in a new ClientManager that handles most of everything. inline upload_files for that * move create_event_handler into PredictionEventHandler.__init__ * fix one test * break out Path.validate into value_to_path and inline get_filename and File.validate * split out URLPath into BackwardsCompatibleDataURLTempFilePath and URLThatCanBeConvertedToPath with the download part of URLFile inlined * let's make DataURLTempFilePath also use convert and move value_to_path back to Path.validate * use httpx for downloading input urls and follow redirects * take get_filename back out for tests * don't upload in http and delete cog/files.py * drop should_cancel * prediction->request * split up predict/inner/prediction_ctx into enter_predict/exit_predict/prediction_ctx/inner_async_predict/predict/good_predict as one way to do it. however, exposing all of those for runner predict enter/coro exit still sucks, but this is still an improvement * bigish change: inline predict_and_handle_errors * inline make_error_handler into setup * move runner.setup into runner.Runner.setup * add concurrency to config in go * try explicitly using prediction_ctx __enter__ and __exit__ * make runner setup more correct and marginally better * fix a few tests * notes * wip ClientManager.convert * relax setup argument requirement to str * glom worker into runner * add logging message * fix prediction retry and improve logging * split out handle_event * use CURL_CA_BUNDLE for file upload * clean up comments * dubious upload fix * small fixes * attempt to add context logging? * tweak names * fix error for predictionOutputType(multi=False) * improve comments * fix lints * skip worker and webhook tests since those were erroring on removed imports. fix or xfail runner tests * upload in http instead of PredictionEventHandler. this makes tests pass and fixes some problems with validation, but also prevents streaming files and causes new problems. also xfail all the responses tests that need to be replaced with respx * format * fix some new-style type signatures and drop 3.8 support * drop 3.7 in code Signed-off-by: technillogue <[email protected]>

This reverts commit 335f67b. Signed-off-by: technillogue <[email protected]>

* input downloads, output uploads, and webhooks are now handled by ClientManager, which persists for the lifetime of runner, allowing us to reuse connections, which may significantly help with large uploads. * although I was originally going to drop output_file_prefix, it's not actually hard to maintain. the behavior is changed now and objects are uploaded as soon as they're outputted rather than after the prediction is completed. * there's an ugly hack with uploading an empty body to get the redirect instead of making api time out from trying to upload an 140GB file. that can be fixed by implemented an MPU endpoint and/or a "fetch upload url" endpoint. * the behavior of the non-indempotent endpoint is changed; the id is now randomly generated if it's not provided in the body. this isn't strictly required for this change alone, but is hard to carve out. * the behavior of Path is changed significantly. see https://www.notion.so/replicate/Cog-Setup-Path-Problem-2fc41d40bcaf47579ccd8b2f4c71ee24 Signed-off-by: technillogue <[email protected]> Co-authored-by: Mattt <[email protected]>

Signed-off-by: technillogue <[email protected]>

technillogue · 2024-03-30T22:15:45Z

steps to carve up:

drop race and add connection (easy?)
runner refactor. I still want to reconsider how radical of a change I need to do to here, I don't know how to proceed
allow concurrency

* wip * some tweaks * ignore some type errors * test connection roundtrip * add notes from python source code Signed-off-by: technillogue <[email protected]>

Signed-off-by: technillogue <[email protected]>

* optimize webhook serialization and logging * optimize logging by binding structlog proxies * fix tests --------- Signed-off-by: technillogue <[email protected]>

Signed-off-by: technillogue <[email protected]>

* log traceback correctly * use repr(exception) instead of str(exception) if str(exception) is blank --------- Signed-off-by: technillogue <[email protected]>

Signed-off-by: technillogue <[email protected]>

…by uvicorn and I hadn't noticed This reverts commit 3ca8aec.

We have a problem in production where a broken model is not correctly shutting down when requested, which means that director comes back up, sees a healthy model (status READY/BUSY) and starts sending it new predictions, even though it's supposed to be shutting down. For now, try and improve the situation by poisoning the model healthcheck on shutdown. This doesn't solve the underlying problem but it should stop us losing more predictions to a known-broken pod.

This was due to conflicts in the dependencies === Errors Error: ../../../go/pkg/mod/github.com/anaskhan96/[email protected]/soup.go:20:2: missing go.sum entry for module providing package golang.org/x/net/html (imported by github.com/anaskhan96/soup); to add: go get github.com/anaskhan96/[email protected] Error: ../../../go/pkg/mod/github.com/anaskhan96/[email protected]/soup.go:21:2: missing go.sum entry for module providing package golang.org/x/net/html/charset (imported by github.com/anaskhan96/soup); to add: go get github.com/anaskhan96/[email protected] Running `go mod tidy` fixes the issue and this commit contains the updated go.mod and go.sum files.

Based on the implementation in #1698 for sync cog. If the request to /predict contains headers `traceparent` and `tracestate` defined by w3c Trace Context[^1] then these headers are forwarded on to the webhook and upload calls. This allows observability systems to link requests passing through cog. [^1]: https://www.w3.org/TR/trace-context/ Signed-off-by: technillogue <[email protected]>

* Cast TraceContext into Mapping[str, str] to fix linter * Include prediction id upload request Based on #1667 This PR introduces two small changes to the file upload interface. 1. We now allow downstream services to include the destination of the asset in a `Location` header, rather than assuming that it's the same as the final upload url (either the one passed via `--upload-url` or the result of a 307 redirect response. 2. We now include the `X-Prediction-Id` header in upload request, this allows the downstream client to potentially do configuration/routing based on the prediction ID. This ID should be considered unsafe and needs to be validated by the downstream service. * Extract ChunkFileReader into top-level class --------- Co-authored-by: Mattt Zmuda <[email protected]>

* unconditionally mark predictions as cancelled without waiting for cancellation to succeed while I struggle with #1786, we see queue spikes because not responding to cancellation promptly causes the pod to get restarted. this is a dirty hack to pretend like cancellation works immediately. as soon as we fix the race condition (and possibly any issues with task.cancel() behaving differently from signal handlers), we can drop this. Signed-off-by: technillogue <[email protected]> * make Mux.write a sync method this makes the cancel patch cleaner, but might be a small speedup for high-throughput outputs Signed-off-by: technillogue <[email protected]> * gate this behind a feature flag --------- Signed-off-by: technillogue <[email protected]>

* start with just changing Exception to BaseException to catch cancellation * add much more shutdown logging

…ons to complete (#1843) * move runner.terminate into shutdown, make it async, and document the behavior of Server.stop, should_exit, force_exit, and app shutdown handler, * remove BaseException handlers or re-raise * fix tests --------- Signed-off-by: technillogue <[email protected]>

We would like `predict` functions to be able to return a remote URL rather than a local file on disk and have it behave like a file object. And when it is passed to the file uploader it will stream the file from the remote to the destination provided. ```py class Predictor(BasePredictor): def predict(self, **kwargs) -> File: return URLFile("https://replicate.delivery/czjl/9MBNrffKcxoqY0iprW66NF8MZaNeH322a27yE0sjFGtKMXLnA/hello.webp") ``` This PR modifies the URLFile class to use `urllib.request.openurl` instead of `requests` as the HTTP client for reading the file data. The `urllib.response.addinfourl` interface conforms to the one used by `urllib3.response.HTTPResponse` so I don't think we have any gaps here, longer term we probably want to figure out a more reliable shim. But as this branch is mostly used for language models we're probably okay.

This works around an issue where the basename of the URL many not actually contain a file extension and the uploader logic cannot infer the mime type for the file.

This has only recently been introduced in Python 3.13.0 and is currently inconsistently implemented across different platforms. Confusingly webp is supported in local development on macOS but not when building the docker image of a cog model. This is either because it's not defined in the system mime.types file of the Linux image or because a dev dependency is manually adding it. I've not done the work to fully understand which. This commit introduces a function called in the init script for the cog package that patches the global mimetypes registry to understand the .webp extension and image/webp mime type. This will be a no-op on systems that already understand the type. This fixes a bug whereby files with the .webp extension are uploaded to the --upload-url with the incorrect application/octet-stream header.

This matches many of our other repositories and will ensure people who've bought into that environment will be using the right golang.

This allows developers to set local environment variables in a .env file if they wish, without needing to check that in or gitignore it.

The commit 3805e2e introduced the `filename` keyword argument to the `URLFile` constructor. However we do not correctly propagate that value when the instance is pickled resulting in the `URLFile` that is passed to the upload handler missing that attribute. This PR updates the code to stash the `name` when pickling and extract it again when unpickling. The `__getattr__` function then supports returning the underlying `name` value rather than proxying to the underlying request object. I also ran into a small bug whereby the `__del__` method was triggering a network request because of some private attributes being accessed during teardown would trigger the `__wrapper__` code. I've overridden the super class to disable this. Though I'm unclear if this is just the test suite doing this cleanup.

It turns out some providers require these headers to be present otherwise they'll give an error response. This issue is not present in the requests version of the codebase because the requests library provides default headers on our behalf.

httpx 0.28.0 has changed a few things that break things for us. Pin to <0.28.0 so things keep working.

run CI for this branch the same way as for main

0df9b82

Signed-off-by: technillogue <[email protected]>

yorickvP added the async label Feb 8, 2024

technillogue force-pushed the async branch 7 times, most recently from 03659b2 to f57474d Compare February 19, 2024 23:53

technillogue force-pushed the async branch 2 times, most recently from d6cc65d to 1e8c300 Compare February 21, 2024 21:16

technillogue and others added 8 commits February 21, 2024 16:16

async runner (#1352)

2eb3b48

* have runner return asyncio.Task instead of AsyncFuture * make tests async and fix them * delete remaining runner thread code :) * review changes to tests and server (reverts commit 828eee9) Signed-off-by: technillogue <[email protected]>

lints

ff754f3

Signed-off-by: technillogue <[email protected]>

format

3792202

Signed-off-by: technillogue <[email protected]>

technillogue force-pushed the async branch from 1e8c300 to 335f67b Compare February 21, 2024 21:16

technillogue and others added 3 commits February 21, 2024 18:51

Revert "Scary temporary commit for a hemorrhaging-edge release"

1af5eda

This reverts commit 335f67b. Signed-off-by: technillogue <[email protected]>

format

9538673

Signed-off-by: technillogue <[email protected]>

technillogue mentioned this pull request May 2, 2024

implement mp.Connection with async streams #1640

Merged

technillogue added 3 commits May 7, 2024 12:40

implement mp.Connection with async streams (#1640)

bb01c85

* wip * some tweaks * ignore some type errors * test connection roundtrip * add notes from python source code Signed-off-by: technillogue <[email protected]>

AsyncConcatenateIterator

12b0abe

Signed-off-by: technillogue <[email protected]>

optimize webhook serialization and logging (#1651)

08f3780

* optimize webhook serialization and logging * optimize logging by binding structlog proxies * fix tests --------- Signed-off-by: technillogue <[email protected]>

technillogue force-pushed the async branch from 6105c15 to 08f3780 Compare May 8, 2024 18:02

tweak names and style

c62cf67

Signed-off-by: technillogue <[email protected]>

technillogue added 2 commits June 12, 2024 12:47

log traceback properly (#1734)

42c6617

* log traceback correctly * use repr(exception) instead of str(exception) if str(exception) is blank --------- Signed-off-by: technillogue <[email protected]>

add batch size metric (#1750)

adf8894

Signed-off-by: technillogue <[email protected]>

technillogue mentioned this pull request Jun 19, 2024

async but refactored #1752

Closed

Revert "should_exit is not actually used by http". It was being used …

b93f233

…by uvicorn and I hadn't noticed This reverts commit 3ca8aec.

technillogue force-pushed the async branch from b93f233 to adf8894 Compare July 2, 2024 21:52

fix ruff check

81187f4

technillogue force-pushed the async branch from 9a86681 to 81187f4 Compare July 3, 2024 00:59

nickstenning and others added 5 commits July 3, 2024 10:47

Add fixes for ruff issues (#1799)

8d834f0

mattt mentioned this pull request Jul 18, 2024

Add support for async predictors #1813

Closed

technillogue added 3 commits July 22, 2024 15:32

syl/fix setup shutdown bug (#1819)

9f49b29

* start with just changing Exception to BaseException to catch cancellation * add much more shutdown logging

fix incorrect strutlog usage throwing an error when setup fails

acbb283

technillogue force-pushed the async branch from bb6f901 to acbb283 Compare July 26, 2024 03:50

technillogue and others added 13 commits August 2, 2024 15:09

Fix type annotations in Input

3edefe4

Validate url passed to URLFile conforms to HTTP protocol

05a13bf

Support custom filename to be provided to URLFile (#1997)

3805e2e

This works around an issue where the basename of the URL many not actually contain a file extension and the uploader logic cannot infer the mime type for the file.

Add .envrc for asdf

a60d151

This matches many of our other repositories and will ensure people who've bought into that environment will be using the right golang.

direnv: use layout python to automatically use venv (#1970)

b710b33

Add dotenv_if_exists to .envrc

2dcdf64

This allows developers to set local environment variables in a .env file if they wish, without needing to check that in or gitignore it.

pin httpx to <0.28.0 (#2075)

9b75c82

httpx 0.28.0 has changed a few things that break things for us. Pin to <0.28.0 so things keep working.

fix GHA

32c7408

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

async release channel #1512

async release channel #1512

technillogue commented Feb 2, 2024

technillogue commented Mar 30, 2024

async release channel #1512

Are you sure you want to change the base?

async release channel #1512

Conversation

technillogue commented Feb 2, 2024

technillogue commented Mar 30, 2024