update-commit-db _might_ miss commits #15

steveklabnik · 2017-01-12T22:04:33Z

It only looks at the first page of commits. If we get very busy, it might not catch all of them.

This should be alleviated once a release happens, but master will be missing in the meantime. So ideally it should go back enough to not miss anything.

boxtown · 2017-04-02T17:44:31Z

I'll pick this up. Idea is to loop through pages grabbing commits until you either run out of pages or hit the SHA of the latest commit in the latest release for a project currently in the database. Quick question though, what is the purpose of the visible field in the Releases table?

steveklabnik · 2017-04-03T14:21:26Z

Idea is to loop through pages grabbing commits until you either run out of pages or hit the SHA of the latest commit in the latest release for a project currently in the database

Exactly.

Quick question though, what is the purpose of the visible field in the Releases table?

In order to write the release blog post, I need to know how many people have contributed. In order to do that, I have to create the release. But then it's up before the actual announcement. So, basically, a new release will sit as false for a few days, and then set to true on release day.

Given that this process is only done with the master release, which will always be true, you shouldn't have to worry about it :)

boxtown · 2017-04-03T14:59:20Z

Having hacked on it a little, there's two ways I can see implementing this. The first assumes that commits are inserted in order which is a hacky assumption BUT requires very little change to the current code. The second would add a date column to the commits table in order to set a sort order for commits but is a much more involved change. Thoughts? Sorry for the terrible formatting

…

On Apr 3, 2017 10:21 AM, "Steve Klabnik" ***@***.***> wrote: Idea is to loop through pages grabbing commits until you either run out of pages or hit the SHA of the latest commit in the latest release for a project currently in the database Exactly. Quick question though, what is the purpose of the visible field in the Releases table? In order to write the release blog post, I need to know how many people have contributed. In order to do that, I have to create the release. But then it's up before the actual announcement. So, basically, a new release will sit as false for a few days, and then set to true on release day. Given that this process is only done with the master release, which will always be true, you shouldn't have to worry about it :) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#15 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACwkvoSsPdyahjv6R1HqHODi8R7SKIASks5rsQBngaJpZM4LiSko> .

steveklabnik · 2017-04-03T15:31:02Z

Thoughts?

Can you elaborate a bit more at the high level of how this works? That is, I'm not sure what adding the dates would get you, exactly.

boxtown · 2017-04-03T16:23:55Z

We need a way to say "what was the last commit saved for the latest non master release". We could assume that commits are inserted in order which means we wouldn't need dates, or we could use the date_approved field. Unless there is some other mechanism for ordering commits that I'm not aware of

…

On Apr 3, 2017 11:31 AM, "Steve Klabnik" ***@***.***> wrote: Thoughts? Can you elaborate a bit more at the high level of how this works? That is, I'm not sure what adding the dates would get you, exactly. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#15 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACwkvqckUSg-aZrJ42L6pk-EfUnjSBlPks5rsRC3gaJpZM4LiSko> .

steveklabnik · 2017-04-03T16:36:49Z

We need a way to say "what was the last commit saved for the latest non master release"

Ah, I see. Right. Hm.

Honestly, with stuff like #97, I'm starting to wonder if the right approach here is to just always dynamically calculate things. Originally I chose not to because I wasn't sure about heroku; however, crates.io is hosted there, and it seems to work? I'm not sure how that works though; maybe @carols10cents could help enlighten us here.

carols10cents · 2017-04-03T23:34:21Z

Sorry, I don't have much context here about what you'd like to know... what about heroku are you not sure? The performance of database queries? The performance of web requests? Are you on free or paid dynos, free or paid database? What are you currently caching and considering dynamically calculating? How often do you need to update the data and query the data?

Crates.io does have some values we've tried to cache, and have had problems invalidating the cache at the right time in the right ways. We're now dynamically calculating the max_version of a crate and that seems to be performing okay.

Version downloads, however, are still being aggregated by a background worker script-- version downloads get updated in one table within a download request, then the background worker rolls up those counts to a downloads field on the versions table, on the crates table, and finally on a total_downloads field in the metadata table. I'm a little nervous about getting rid of this script because of postgres locking, since download requests are what happen most often.

Is any of that helpful...?

boxtown · 2017-04-04T20:08:52Z

@steveklabnik Do you mean calculating commits, contributions, etc. in memory from the Github API responses? I think it could be pretty feasible, especially if you just cache it all in Redis. Would probably make setting up the dev environment simpler as well

steveklabnik · 2017-04-04T21:31:42Z

@carols10cents sorry, I should have written more, sigh.

The way that thanks currently works is basically by mirroring a git db as a postgres db. i was thinking maybe instead, keeping a local git repo on the dyno would work instead.

Given crates.io has to deal with the index, I thought maybe it had it locally, but that could be wrong...

Do you mean calculating commits, contributions, etc. in memory from the Github API responses?

or from a local git repo.

carols10cents · 2017-04-05T01:40:18Z

Aha, i see. Yes, crates.io does have a local checkout of the git repo on heroku -- this code sets it up. GIT_REPO_CHECKOUT is set to /app/tmp/index-checkout on crates.io and staging, and I have it set to ./tmp/index-co on my mirror that also runs on heroku, and both seem to be working fine.

As long as you don't expect that directory to be the same across dyno instances or through restarts (that code checks out the git repo on server start if it's not there), then it should be fine.

carols10cents · 2017-04-05T01:43:00Z

Here's the official heroku docs

steveklabnik · 2017-04-05T14:05:58Z

As long as you don't expect that directory to be the same across dyno instances or through restarts (that code checks out the git repo on server start if it's not there), then it should be fine.

We will never have more than one instance, and even then, can make it work regardless 👍

Thank you! ❤️

So, @boxtown , I think we can do it this way. Do you have any interest in helping out? This is a much larger PR than this ticket originally imagined...

steveklabnik · 2017-04-05T19:18:51Z

Actually, @boxtown let me give some stuff a try first, and then we can decide what to do here 😄

boxtown · 2017-04-05T19:22:25Z

👍

…

On Apr 5, 2017 3:18 PM, "Steve Klabnik" ***@***.***> wrote: Actually, @boxtown <https://github.com/boxtown> let me give some stuff a try first, and then we can decide what to do here 😄 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#15 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACwkvmQZ_xo9a8OTr8GRznmXjnd67Lctks5rs-kbgaJpZM4LiSko> .

steveklabnik · 2017-04-06T17:53:58Z

@boxtown so yeah #99

basically, after this merges, this specific issue can be closed; #13 would always get all the commits, since we're doing it from git natively.

i'm not closing this yet because i haven't actually landed that PR.

steveklabnik added bug help wanted labels Jan 12, 2017

steveklabnik mentioned this issue Apr 6, 2017

Rework #99

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update-commit-db _might_ miss commits #15

update-commit-db _might_ miss commits #15

steveklabnik commented Jan 12, 2017

boxtown commented Apr 2, 2017 •

edited

Loading

steveklabnik commented Apr 3, 2017

boxtown commented Apr 3, 2017 via email

steveklabnik commented Apr 3, 2017

boxtown commented Apr 3, 2017 via email

steveklabnik commented Apr 3, 2017

carols10cents commented Apr 3, 2017

boxtown commented Apr 4, 2017

steveklabnik commented Apr 4, 2017

carols10cents commented Apr 5, 2017

carols10cents commented Apr 5, 2017

steveklabnik commented Apr 5, 2017 •

edited

Loading

steveklabnik commented Apr 5, 2017

boxtown commented Apr 5, 2017 via email

steveklabnik commented Apr 6, 2017

update-commit-db _might_ miss commits #15

update-commit-db _might_ miss commits #15

Comments

steveklabnik commented Jan 12, 2017

boxtown commented Apr 2, 2017 • edited Loading

steveklabnik commented Apr 3, 2017

boxtown commented Apr 3, 2017 via email

steveklabnik commented Apr 3, 2017

boxtown commented Apr 3, 2017 via email

steveklabnik commented Apr 3, 2017

carols10cents commented Apr 3, 2017

boxtown commented Apr 4, 2017

steveklabnik commented Apr 4, 2017

carols10cents commented Apr 5, 2017

carols10cents commented Apr 5, 2017

steveklabnik commented Apr 5, 2017 • edited Loading

steveklabnik commented Apr 5, 2017

boxtown commented Apr 5, 2017 via email

steveklabnik commented Apr 6, 2017

boxtown commented Apr 2, 2017 •

edited

Loading

steveklabnik commented Apr 5, 2017 •

edited

Loading