Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inputs/Resources being updated too often (I think) #34

Open
hellais opened this issue Sep 17, 2018 · 0 comments
Open

Inputs/Resources being updated too often (I think) #34

hellais opened this issue Sep 17, 2018 · 0 comments
Labels
bug Something isn't working

Comments

@hellais
Copy link
Member

hellais commented Sep 17, 2018

@joelanders commented on Fri Nov 11 2016

Fell into a rabbit hole trying to understand some of the logic around resources and inputs.
I want to have a fresh look at this, but I think we can delete some code that does nothing / does things redundantly.

  1. The UpdateInputsAndResources(ScheduledTask) has under first_run, resources.check_for_update("ZZ"), which updates the things in (for example) {install_prefix}/var/lib/ooni/resources/citizenlab-test-lists/.
  2. UpdateInputsAndResources(ScheduledTask) under task calls both resources.check_for_update() and input_store.update()

I think things are being updated too often because:

  • input_store.update() calls resources.check_for_update() in its definition (so you shouldn't have to call one right after the other as in (2) above).
  • if input_store.update() is being called in task, there's no reason to have the first_run entry`
    • there's this create_input_store flag being plumbed around (defaulting to True) that would keep .update() from being called, but it doesn't actually seem to be set from anywhere.

But I've got my brain in a knot, so I'll need to check this.


@hellais commented on Thu Nov 24 2016

Thanks for looking into this.

Answering inline:

input_store.update() calls resources.check_for_update() in its definition (so you shouldn't have to call one right after the other as in (2) above).

The update call in input_store.update() (I assume you are referring to: https://github.com/TheTorProject/ooni-probe/blob/4ee1a90275bdd9fd120823beb5eaa44fe6aa2563/ooni/deck/store.py#L41) will only run when we notice that the input for our country is unavailable (this is done because we only download inputs for the country we detected the user is in and if they change country we will have to download fresh inputs).

That is to say that if the update check in https://github.com/TheTorProject/ooni-probe/blob/4ee1a90275bdd9fd120823beb5eaa44fe6aa2563/ooni/agent/scheduler.py#L165 downloads all the inputs it needs, then in the above line it will never run.

if input_store.update() is being called in task, there's no reason to have the first_run entry`

The first_run entry is needed, because on first boot we don't know the country of the user (because we haven't downloaded the geoip database files yet), so we need to trigger a first update of inputs with uknown country (ZZ) so that we are initialised with the required files.

I agree that maybe it can be cleaned up a bit, but just re-reading the code I don't believe we are updating too often.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant