-
Notifications
You must be signed in to change notification settings - Fork 33
update-commit-db _might_ miss commits #15
Comments
I'll pick this up. Idea is to loop through pages grabbing commits until you either run out of pages or hit the SHA of the latest commit in the latest release for a project currently in the database. Quick question though, what is the purpose of the |
Exactly.
In order to write the release blog post, I need to know how many people have contributed. In order to do that, I have to create the release. But then it's up before the actual announcement. So, basically, a new release will sit as Given that this process is only done with the |
Having hacked on it a little, there's two ways I can see implementing this.
The first assumes that commits are inserted in order which is a hacky
assumption BUT requires very little change to the current code. The second
would add a date column to the commits table in order to set a sort order
for commits but is a much more involved change. Thoughts? Sorry for the
terrible formatting
…On Apr 3, 2017 10:21 AM, "Steve Klabnik" ***@***.***> wrote:
Idea is to loop through pages grabbing commits until you either run out of
pages or hit the SHA of the latest commit in the latest release for a
project currently in the database
Exactly.
Quick question though, what is the purpose of the visible field in the
Releases table?
In order to write the release blog post, I need to know how many people
have contributed. In order to do that, I have to create the release. But
then it's up before the actual announcement. So, basically, a new release
will sit as false for a few days, and then set to true on release day.
Given that this process is only done with the master release, which will
always be true, you shouldn't have to worry about it :)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#15 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACwkvoSsPdyahjv6R1HqHODi8R7SKIASks5rsQBngaJpZM4LiSko>
.
|
Can you elaborate a bit more at the high level of how this works? That is, I'm not sure what adding the dates would get you, exactly. |
We need a way to say "what was the last commit saved for the latest non
master release". We could assume that commits are inserted in order which
means we wouldn't need dates, or we could use the date_approved field.
Unless there is some other mechanism for ordering commits that I'm not
aware of
…On Apr 3, 2017 11:31 AM, "Steve Klabnik" ***@***.***> wrote:
Thoughts?
Can you elaborate a bit more at the high level of how this works? That is,
I'm not sure what adding the dates would get you, exactly.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#15 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACwkvqckUSg-aZrJ42L6pk-EfUnjSBlPks5rsRC3gaJpZM4LiSko>
.
|
Ah, I see. Right. Hm. Honestly, with stuff like #97, I'm starting to wonder if the right approach here is to just always dynamically calculate things. Originally I chose not to because I wasn't sure about heroku; however, crates.io is hosted there, and it seems to work? I'm not sure how that works though; maybe @carols10cents could help enlighten us here. |
Sorry, I don't have much context here about what you'd like to know... what about heroku are you not sure? The performance of database queries? The performance of web requests? Are you on free or paid dynos, free or paid database? What are you currently caching and considering dynamically calculating? How often do you need to update the data and query the data? Crates.io does have some values we've tried to cache, and have had problems invalidating the cache at the right time in the right ways. We're now dynamically calculating the Version downloads, however, are still being aggregated by a background worker script-- version downloads get updated in one table within a download request, then the background worker rolls up those counts to a Is any of that helpful...? |
@steveklabnik Do you mean calculating commits, contributions, etc. in memory from the Github API responses? I think it could be pretty feasible, especially if you just cache it all in Redis. Would probably make setting up the dev environment simpler as well |
@carols10cents sorry, I should have written more, sigh. The way that thanks currently works is basically by mirroring a git db as a postgres db. i was thinking maybe instead, keeping a local git repo on the dyno would work instead. Given crates.io has to deal with the index, I thought maybe it had it locally, but that could be wrong...
or from a local git repo. |
Aha, i see. Yes, crates.io does have a local checkout of the git repo on heroku -- this code sets it up. GIT_REPO_CHECKOUT is set to As long as you don't expect that directory to be the same across dyno instances or through restarts (that code checks out the git repo on server start if it's not there), then it should be fine. |
We will never have more than one instance, and even then, can make it work regardless 👍 Thank you! ❤️ So, @boxtown , I think we can do it this way. Do you have any interest in helping out? This is a much larger PR than this ticket originally imagined... |
Actually, @boxtown let me give some stuff a try first, and then we can decide what to do here 😄 |
👍
…On Apr 5, 2017 3:18 PM, "Steve Klabnik" ***@***.***> wrote:
Actually, @boxtown <https://github.com/boxtown> let me give some stuff a
try first, and then we can decide what to do here 😄
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#15 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACwkvmQZ_xo9a8OTr8GRznmXjnd67Lctks5rs-kbgaJpZM4LiSko>
.
|
It only looks at the first page of commits. If we get very busy, it might not catch all of them.
This should be alleviated once a release happens, but master will be missing in the meantime. So ideally it should go back enough to not miss anything.
The text was updated successfully, but these errors were encountered: