-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow stale metadata requests to be repeated #127
Conversation
This bug fix is needed to test the executor handover feature.
The reference count determines how long we wait before allowing a new request.
Before, if the last request time was set but no reference count was recorded, `fetch_metadata` would've crashed when it tried to compute the wait time. Now, we use a short constant wait time in this situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you see this, let me know roughly your expected time frame for responding. Given the modest number of hours before the kickoff, I may need to address the comments as best I can myself depending on your availability. Of course, all else equal it's preferable for you to respond as you see appropriate.
In the meantime, I will run and test the new code.
Just came back to Numberscope for the evening! I'll remove the debug log messages and bump up the timeout; that should take well under an hour. I haven't looked at the "Bad Gateway" issue yet, so I don't know how long that might take to address. |
Note I pushed a small doc change for something that just bit me, so please pull before addressing the review comments (presuming that you'll want to add any further commit(s)). |
This comment was marked as resolved.
This comment was marked as resolved.
On the flip side, this is the only bad thing I have so far gotten the dev server to do (e.g., getting factors worked while it was still downloading backrefs, presuming that downloading wasn't disrupted by the server error), so I am hopeful that if we can get this resolved and the above comments, then this will be good to merge. :-) |
Metadata requests for sequences with tens or low hundreds of backrefs should take around half the wait time given by this curve.
…kscope into executor_handover
I got this too—from a crawling too fast response! The crash happens because the "crawling too fast" page is plain text rather than JSON. But that gives us an easy way to detect this kind of response! I'll write a test for it. If you check your logs, you should see a log entry for a "request issue" event; you can get its contents as a Python dictionary by entering |
Sounds like really good progress. Make sure it's clear when you think this is ready for further review, thanks. |
Update: I'm still working on the test simulating the non-JSON "crawling too fast" response that's currently crashing |
I've drafted a test ( @gwhitney, @katestange: I'd appreciate suggestions for how these endpoints should respond when they get a "crawling too fast" response from the OEIS. |
I would generally follow the guidelines in https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/503 both for the response to the requestor to backscope, and to the operation of backscope with respect to the oeis. In other words, if the database doesn't have the info being requested, and the oeis 503s when backscope tries to get it, backscope should 503 with a clear explanation and the same retry-after it got from the oeis. And whenever it gets a 503, backscope should put a moratorium on new requests to the oeis for the number of seconds indicated. That moratorium should ideally hold across all threads, including any backref downloads in progress. I suspect you need an entry in some admin table like |
Validate with mock OEIS tests simulating situations where B-files or search results are unavailable.
This code that handles "crawling too fast" responses without crashing is ready for review (@gwhitney, @katestange)! Right now, the mock OEIS tests that validate this feature are pretty repetitive. They're mostly copied-and-pasted boilerplate, with small adjustments to call different endpoints with different simulated OEIS behavior. I'd like to eventually consolidate these tests into descendants of an abstract mock OEIS test, which would handle the boilerplate. I could do that now if it would make this PR easier to review.
Returning a 503 error when we get a "crawling too fast" sounds like a good idea in the long run. In 1a5cc55, I've done something simpler and more consistent with the way other exceptions are handled: just return the exception with a 200 status code. Should we improve on that behavior now, or in a later pull request? Respecting the cool-down period we get from the OEIS might be more involved. I think that's better for a later PR—especially because, to stop back-reference downloading, we need to split the metadata download into smaller chunks. |
I can verify that the PR installs and runs on my machine. I tried various endpoints. The metadata worked nicely. A factoring endpoint |
Right now tests pass but
|
Thanks for looking, Kate! @Vectornaut, remove the draft status when you've addressed the latest observed crash and it's ready for further review. Much appreciated. |
That's odd! In your log file, you should find an entry logging this exception. Could you check whether there's a corresponding "request issue" entry just before it, and post that entry if so? (If you don't want to hunt for it, you could also just post the whole log file.) One thing that could cause this crash is the OEIS returning the "crawling too fast" page with a 200 (success) status code, which wouldn't be caught in the log. |
Just pulled executor_handover, activated my venv, and ran
|
This test is failing because a temporary change to the OEIS site—the black banner that reads "The OEIS mourns the passing of Jim Simons and is grateful to the Simons Foundation for its support..."—is interpreted as an unexpected response. You can see the test pass by changing
I left the test as-is because I'd guessed the banner would be removed by the time we were ready to merge. |
Can we change the test in some way? We don't really want our tests susceptible to such vagaries in the site... |
Instead of checking most of the logged response, just check the first few headers, up to the 'Date:' key.
The nonexistent sequence test now checks only the very beginning of the logged response: the first few headers, up to the |
Tests now pass here. Will finish review asap. |
Sorry that ended up being piecemeal as opposed to a single review; I went in presuming the ultimate comment count would be smaller. Anyhow, thanks for your patience, I think I have been through everything now. The performance seems quite solid, just some code formatting/commenting/factoring points. I've posted all the comments I have and as soon as we get them resolved, this should be good to merge. |
This should be ready for review again! I've added a comment to each open conversation that I've addressed. |
OK this is looking pretty good. All of the previously open points appear to have been addressed. I haven't been able to get it to crash or anything by repeatedly requesting the Virahanka-Fibonacci (A000045) metadata -- although it also hasn't yet run long enough to have grabbed all that data. Sadly, there is still a bug, although I am not sure whether or not it's a new bug; it's a bit of a pain to go back and forth and try the following in main to see if it works there, but if you need me to do that just let me know. In any case, it would be great to fix this before merging: If you start the server with the code in this PR (using |
On the plus side, after a couple hours, the code in this PR did manage to collect all 5701 backreferences to A000045, and in the meantime I never caught it having any trouble with other requests, including other requests concerning A000045. So things appear to be working well, and the only obstacle I know of to merging this is the name_and_values error described in the previous comment. |
OK, I am going to go ahead and merge this, because I think there is a good chance that the next PR will fix the one remaining issue in this PR, and we need a working backend to make progress with the frontend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been extensively reviewed over a long period, and there is a known remaining bug, but I think that bug is addressed by the next PR and even if not, we can fix it before merging that PR. So approving.
This is a single-commit version of #135. It addresses #133 by changing the search query in `get_oeis_name_and_values`. The Requests library automatically URL-encodes the query, so the `:` in the query gets sent as `%3A`. This pull request is based on #127, which should be merged first. The history associated with this pull request starts at commit 1eec7cd.
This addresses issue numberscope#154, using the same technique as PR numberscope#127.
This addresses issue #69 by replacing the flag that says metadata has been requested with a timestamp that says when the metadata was last requested. If the last request was long enough ago that it seems unlikely to succeed, we allow a new request to be started.
Mechanism
When a metadata request starts, it records the time and the reference count, so later threads can judge how likely it is to ever succeed.
When a metadata request finishes, it writes what it found to the database in the following situations:
To do
This partially addresses issue #33. To completely resolve that issue, we need to set up a similar system for repeating factorization requests.
Similarly, to address #104, we need to set up a similar system for repeating value requests.