Skip to content
This repository has been archived by the owner on Jun 1, 2019. It is now read-only.

Greatly improve performance of 'populate' #97

Merged

Conversation

Aaron1011
Copy link
Member

@Aaron1011 Aaron1011 commented Mar 25, 2017

This PR makes several changes to speed up parsing and storing commits and
authors:

  • 'libgit2' is now used to read commits and authors from disk.
    Previously, the (formatted) output of 'git log' was parsed, incurring
    significant overhead in string parsing
  • Authors are now cached in memory, instead of being complete re-fetched
    for every release. This allows database queries for finding/creating
    authors to be avoided entirely for most releases - some corner cases
    such as backported commits prevent all commits from being discovered at
    the start

Since libgit2 does not support the 'mailmap' feature of git, this is
implemented within the 'mailmap' module.

On my machine, this commit consistently brings the runtime of populate from more than 5 minutes to 13-16 seconds.

I've written up a short Python script (requires requests, runs on Python 2 and 3) to compare the output of each rendered release page. This script expects the webserver from the current master to be running at localhost:2000, while the webserver from this PR should be running at localhost:1334. Except for the following differences, the rendered pages should be identical character-for-character:

  • Dan W. is now displayed as Dan W. This is a consequence of libgit2 implementing git's removal of 'crud' when it reads commits from disk. This is slightly different from git itself, which only removes 'crud' when the commit is first made - the stripping of trailing characters is not performed by git log. This is a fairly minor issue, and could be rectified if necessary by a mailmap entry in rust removing the 'invalid' period.
  • Several authors appear to be removed (e.g. bluss). This is a consequence of the current version of thanks using different format specifiers in populate than in releases. Specifically, the former properly used the mailmap'd name, while the latter does not. My PR ensure that all author names and emails are properly mailmap'd, removing the incorrect authors from the contributors list.
  • The contributer Mario (as seen in this commit: rust-lang/rust@ec4a3cc3710eba9b31e00) has their name incorrectly parsed as 'Sopena Mario'. This is due to the fact that the email address for the commit was set to 'Mario Sopena', causing this code which expected emails to lack whitespace to fail. Since my PR uses libgit2, this is no longer an issue - Mario's name is now displayed correctly in the contributor list.
  • The 'All time' page has many commits moved around within their rank. This is because the ordering within a contribution rank is determined by whatever order the database returns the records it, and therefore isn't guaranteed to be consistent. However, all of the committers are still assigned their proper rank.

Make several changes to speed up parsing and storing commits and
authors:

* 'libgit2' is now used to read commits and authors from disk.
Previously, the (formatted) output of 'git log' was parsed, incurring
significant overhead in string parsing
* Authors are now cached in memory, instead of being complete re-fetched
for every release. This allows database queries for finding/creating
authors to be avoided entirely for most releases - some corner cases
such as backported commits prevent all commits from being discovered at
the start

Since libgit2 does not support the 'mailmap' feature of git, this is
implemented within the 'mailmap' module.
@steveklabnik
Copy link
Member

Thank you so much! This is exactly what I wanted to do, but didn't have the time, for all of the reasons you've cited. Amazing.

Looks like Travis is a bit mad: https://travis-ci.org/rust-lang-nursery/thanks/builds/214854741#L359

I've written up a short Python script (requires requests, runs on Python 2 and 3) to compare the output of each rendered release page.

I wonder if this could help with #11; any thoughts there? Wouldn't have to be in this PR.

Let's see if we can't get Travis fixed up, and then I'll review this more closely, but I'm very excited 🎊

@steveklabnik
Copy link
Member

It looks like http://stackoverflow.com/a/33203355/24817 might be the answer?

@Aaron1011
Copy link
Member Author

@steveklabnik: Thanks, that fixed it!

@steveklabnik steveklabnik merged commit 4c3aaa0 into rust-lang-nursery:master Mar 27, 2017
@steveklabnik
Copy link
Member

Awesome, this is wonderful. Thank you so so much!

steveklabnik added a commit that referenced this pull request Mar 27, 2017
steveklabnik added a commit that referenced this pull request Mar 27, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants