Skip to content
This repository has been archived by the owner on Jun 1, 2019. It is now read-only.

update-commit-db _might_ miss commits #15

Open
steveklabnik opened this issue Jan 12, 2017 · 15 comments
Open

update-commit-db _might_ miss commits #15

steveklabnik opened this issue Jan 12, 2017 · 15 comments

Comments

@steveklabnik
Copy link
Member

It only looks at the first page of commits. If we get very busy, it might not catch all of them.

This should be alleviated once a release happens, but master will be missing in the meantime. So ideally it should go back enough to not miss anything.

@boxtown
Copy link
Contributor

boxtown commented Apr 2, 2017

I'll pick this up. Idea is to loop through pages grabbing commits until you either run out of pages or hit the SHA of the latest commit in the latest release for a project currently in the database. Quick question though, what is the purpose of the visible field in the Releases table?

@steveklabnik
Copy link
Member Author

Idea is to loop through pages grabbing commits until you either run out of pages or hit the SHA of the latest commit in the latest release for a project currently in the database

Exactly.

Quick question though, what is the purpose of the visible field in the Releases table?

In order to write the release blog post, I need to know how many people have contributed. In order to do that, I have to create the release. But then it's up before the actual announcement. So, basically, a new release will sit as false for a few days, and then set to true on release day.

Given that this process is only done with the master release, which will always be true, you shouldn't have to worry about it :)

@boxtown
Copy link
Contributor

boxtown commented Apr 3, 2017 via email

@steveklabnik
Copy link
Member Author

Thoughts?

Can you elaborate a bit more at the high level of how this works? That is, I'm not sure what adding the dates would get you, exactly.

@boxtown
Copy link
Contributor

boxtown commented Apr 3, 2017 via email

@steveklabnik
Copy link
Member Author

We need a way to say "what was the last commit saved for the latest non master release"

Ah, I see. Right. Hm.

Honestly, with stuff like #97, I'm starting to wonder if the right approach here is to just always dynamically calculate things. Originally I chose not to because I wasn't sure about heroku; however, crates.io is hosted there, and it seems to work? I'm not sure how that works though; maybe @carols10cents could help enlighten us here.

@carols10cents
Copy link
Member

Sorry, I don't have much context here about what you'd like to know... what about heroku are you not sure? The performance of database queries? The performance of web requests? Are you on free or paid dynos, free or paid database? What are you currently caching and considering dynamically calculating? How often do you need to update the data and query the data?

Crates.io does have some values we've tried to cache, and have had problems invalidating the cache at the right time in the right ways. We're now dynamically calculating the max_version of a crate and that seems to be performing okay.

Version downloads, however, are still being aggregated by a background worker script-- version downloads get updated in one table within a download request, then the background worker rolls up those counts to a downloads field on the versions table, on the crates table, and finally on a total_downloads field in the metadata table. I'm a little nervous about getting rid of this script because of postgres locking, since download requests are what happen most often.

Is any of that helpful...?

@boxtown
Copy link
Contributor

boxtown commented Apr 4, 2017

@steveklabnik Do you mean calculating commits, contributions, etc. in memory from the Github API responses? I think it could be pretty feasible, especially if you just cache it all in Redis. Would probably make setting up the dev environment simpler as well

@steveklabnik
Copy link
Member Author

@carols10cents sorry, I should have written more, sigh.

The way that thanks currently works is basically by mirroring a git db as a postgres db. i was thinking maybe instead, keeping a local git repo on the dyno would work instead.

Given crates.io has to deal with the index, I thought maybe it had it locally, but that could be wrong...

Do you mean calculating commits, contributions, etc. in memory from the Github API responses?

or from a local git repo.

@carols10cents
Copy link
Member

Aha, i see. Yes, crates.io does have a local checkout of the git repo on heroku -- this code sets it up. GIT_REPO_CHECKOUT is set to /app/tmp/index-checkout on crates.io and staging, and I have it set to ./tmp/index-co on my mirror that also runs on heroku, and both seem to be working fine.

As long as you don't expect that directory to be the same across dyno instances or through restarts (that code checks out the git repo on server start if it's not there), then it should be fine.

@carols10cents
Copy link
Member

@steveklabnik
Copy link
Member Author

steveklabnik commented Apr 5, 2017

As long as you don't expect that directory to be the same across dyno instances or through restarts (that code checks out the git repo on server start if it's not there), then it should be fine.

We will never have more than one instance, and even then, can make it work regardless 👍

Thank you! ❤️

So, @boxtown , I think we can do it this way. Do you have any interest in helping out? This is a much larger PR than this ticket originally imagined...

@steveklabnik
Copy link
Member Author

Actually, @boxtown let me give some stuff a try first, and then we can decide what to do here 😄

@boxtown
Copy link
Contributor

boxtown commented Apr 5, 2017 via email

@steveklabnik steveklabnik mentioned this issue Apr 6, 2017
6 tasks
@steveklabnik
Copy link
Member Author

@boxtown so yeah #99

basically, after this merges, this specific issue can be closed; #13 would always get all the commits, since we're doing it from git natively.

i'm not closing this yet because i haven't actually landed that PR.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants