Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STYLE: Add historical bulk changes to git blame ignore #5162

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ebrahimebrahim
Copy link
Contributor

@ebrahimebrahim ebrahimebrahim commented Jan 23, 2025

Close #2623

In case this needs to be re-done in the future, I used the following script in a python environment with GitPython installed:

import git
from pathlib import Path

def find_style_commits_with_large_changes(repo : git.Repo, prefix="STYLE", file_change_threshold=100) -> list[str]:
    matching_commits = []
    for commit in repo.iter_commits():
        if commit.message.strip().startswith(prefix):
            num_files_changed = len(commit.stats.files)
            if num_files_changed >= file_change_threshold:
                matching_commits.append(commit.hexsha)

    return matching_commits


def get_commits_already_ignored(blame_ignore_revs_path) -> list[str]:
    return [
        line.strip()
        for line in blame_ignore_revs_path.read_text().split('\n')
        if line and not line.startswith('#')
    ]

if __name__ == "__main__":
    repo = git.Repo('.')
    style_commits = find_style_commits_with_large_changes(repo)
    blame_ignore_revs_path = Path("./.git-blame-ignore-revs")
    commits_already_ignored = get_commits_already_ignored(blame_ignore_revs_path)
    commits_to_add = [commit_hash for commit_hash in style_commits if commit_hash not in commits_already_ignored]
    entries_to_add : list[str] = []
    for commit_hash in commits_to_add:
        commit_message_title = repo.commit(commit_hash).message.split('\n')[0]
        entries_to_add.append(f"# {commit_message_title}\n{commit_hash}")
    text_to_add = '\n'.join(entries_to_add)
    with blame_ignore_revs_path.open('a') as file:
        file.write(text_to_add)

The file_change_threshold was 100, so the commits being added to the git-blame-ignore are those commits that both start with "STYLE" and that affect at least 100 files.


PR Checklist

  • No API changes were made (or the changes have been approved)
  • No major design changes were made (or the changes have been approved)
  • Added test (or behavior not changed)
  • Updated API documentation (or API not changed)
  • Added license to new files (if any)
  • Added Python wrapping to new files (if any) as described in ITK Software Guide Section 9.5
  • Added ITK examples for all new major features (if any)

Refer to the ITK Software Guide for
further development details if necessary.

@github-actions github-actions bot added the type:Style Style changes: no logic impact (indentation, comments, naming) label Jan 23, 2025
@ebrahimebrahim ebrahimebrahim force-pushed the add_historical_bulk_changes_to_git_blame_ignore branch from 5750500 to e264440 Compare January 23, 2025 21:46
@ebrahimebrahim ebrahimebrahim force-pushed the add_historical_bulk_changes_to_git_blame_ignore branch from e264440 to cd92cf1 Compare January 23, 2025 21:56
Copy link
Contributor

@Leengit Leengit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@N-Dekker
Copy link
Contributor

N-Dekker commented Jan 24, 2025

Thanks, but I'm not sure about the whole .git-blame-ignore-revs feature. When you want to find out what particular commit changed the code, why would you want to always ignore commits that affect more than 100 files?

Would it not be possible for such a commit to introduce a significant change? I have seen large commits that include some manual changes, possibly introducing unexpected changes.

@ebrahimebrahim
Copy link
Contributor Author

@N-Dekker It's not about the number of files really, it's more about the fact that they are STYLE changes

Would it not be possible for such a commit to introduce a significant change? I have seen large commits that include some manual changes, possibly introducing unexpected changes.

By definition if a change is marked as STYLE then the author is expressing that the change is not significant. These changes are largely not relevant to what one is looking for during a blame.

Also, this is all optional because the user can make git blame behave as they wish (--ignore-revs-file versus --no-ignore-revs), disabling in case one is actually interested in blaming through style changes

@N-Dekker
Copy link
Contributor

It's not about the number of files really, it's more about the fact that they are STYLE changes

@ebrahimebrahim Suppose you want to find out when a particular bug is introduced. In general it might also be introduced by a STYLE commit, right? So you still want to see STYLE commits in a blame.

this is all optional because the user can make git blame behave as they wish (--ignore-revs-file versus --no-ignore-revs)

Very interesting, thanks. I think I want to use that option: --no-ignore-revs! Do you happen to know if it can be enabled for TortoiseGit and the GitHub website?

For example, when I look at https://github.com/InsightSoftwareConsortium/ITK/blame/master/Modules/Core/Common/include/itkObject.h and press [Blame], it says:

(i) Ignoring revisions in .git-blame-ignore-revs.

In the TortoiseGitBlame settings, I don't see how to switch off ignoring revisions either 🤷

@ebrahimebrahim
Copy link
Contributor Author

Do you happen to know if it can be enabled for TortoiseGit and the GitHub website?

I'm not familiar with tortoisegit but based on this issue it looks like the --no-ignore-revs is the default behavior as they haven't added the "feature" yet -- idk if that info is up to date

As for github I don't see any way to turn it off :/

In general it might also be introduced by a STYLE commit, right? So you still want to see STYLE commits in a blame.

I guess it comes down to what we feel ought to be the default behavior. If we want the default to be to not ignore style commits then one approach may be to rename the .git-blame-ignore-revs file so that github cannot find it -- then users who want to ignore style commits can explicitly use --ignore-revs-file to point to the renamed file. My sense is that the default behavior ought to be to ignore style commits, since they are largely red herrings when seeking out what caused a certain change or broke something, but it's up for debate

Comment on lines +99 to +102
# STYLE: Prefer local initialization
e23a4dbd1d19c3fecd221a25e6f4c5825dc21e00
# STYLE: Variable can be made constexpr
e1acaf5aa8d23c82594fd1a7cc9282b3c9786b84
Copy link
Contributor

@N-Dekker N-Dekker Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for repeating myself, but I believe that a commit that makes the initialization of variables more local (e23a4db) or adds more constexpr (e1acaf5) might unexpectedly (probably unintendedly) cause some behavior change, and might be relevant when doing a git blame.

I would rather not ignore anything other than formatting changes (by Python black, clang-format, etc).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is certainly possible as Ebrahim pointed out, but is usually undesirable. We want to make the default more convenient for most people.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm maybe the issue is a liberal interpretation of what really counts as a style commit. I agree that those two commits one may well want to include in a blame. I wouldn't have called those "STYLE".

I'll wait to see how if this discussion continues but I'm open to the option renaming .git-blame-ignore-revs to something that wouldn't be used by default by github's blame.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps remove those two commits from the list, under the theory that they shouldn't have been labeled as STYLE?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to make the default more convenient for most people.

Why do people use git blame in the first place? I typically use it to find exactly when a specific feature or bug was introduced. A bug might also be introduced, accidentally, by a STYLE commit. My use case typically involves multiple "blame prior to change XXX/blame previous version" steps, iteratively, to find the relevant code change.

Of course, I may not be like "most people", I don't know. 🤷

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is, with a mature codebase like ITK, a lot of the steps in git blame tracing can be style commits. And when something is hard, no one likes doing it. Style commits are, by definition, not likely to cause problems, but modify a lot of files, and therefore feature in blame tracing prominently. Unless they can be automatically ignored.

@thewtex
Copy link
Member

thewtex commented Jan 24, 2025

I think 100 is too small here -- 1000 files is closer to a "bulk change" in a project as large as ITK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:Style Style changes: no logic impact (indentation, comments, naming)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add historical bulk changes to .git-blame-ignore-revs
5 participants