Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Check commit message compliance #5248

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

norihiro
Copy link
Contributor

@norihiro norihiro commented Sep 6, 2021

Description

This PR adds these checks for commit messages in each pull request.

  • Title consists of module name(s) and subject or just subject.
  • If there are two or more module names, the module names have to be separated by a comma followed by an optional space.
    • If the module name does not contain a slash /, each module name is searched into at most 3-level depths directories, submodules, and top-level file names.
    • If the module name contains one or more slash /, that hierarchy should exist at the top level. For example, data/locale will be rejected.
    • Note: Extension of the top-level file name is removed before comparing. For example, CONTRIBUTING is correct but CONTRIBUTING.rst is not correct.
    • Note: Should we allow UI/updater such as 47e441b?
  • First word of the subject is starting with an upper case letter and the remaining characters in the first word are lower case letters.
    "Don't" and a word with a hyphen in the middle are also acceptable.
  • The title does not end with a dot ..
  • The title is 72 characters at max including the module prefix.
  • If the subject exceeds 50 characters excluding the module prefix, display a warning message.
  • Titles for commits of revert and merge are ignored.
  • Full description lines are 72 columns max.
    • Lines with a single word (not having any space) will be ignored, such as URL-only lines.
    • Lines with Co-Authored-By: will be ignored.
    • Note: Sometimes this might be ignored such as the URL reference in fcb6df1. Do we also want to ignore lines containing https?:// for example?

The empty checkboxes above might need further discussion.

I'd like to highlight there are some limitations:

  • The script cannot detect a title not starting from a present-tense verb.
  • The script cannot detect full description lines folded at less than 72 columns.
  • Module prefixes vs. modified files are not checked.
  • Removed submodules won't be checked properly for the module name prefix. This is because the gitpython module does not provide an interface to the submodules on each commit tree. To fix this, need to implement a parser for the .gitmodules file.

In the CI, the commit message is taken by using GitHub REST API. In local, you may run a command like this:

gh api repos/obsproject/obs-studio/pulls/5248/commits | ./CI/check-log-msg.py -v -j -

To check commit messages without using GitHub REST API, a revision range can be specified as git log can accept.
For example, a command below will check the HEAD newer than origin/master. This will be useful to check the message before making a PR.

./CI/check-log-msg.py -v -c origin/master..

Motivation and Context

There are a lot of violations of the commit guidelines.
This PR will enable to check the commit messages before a PR is merged.

How Has This Been Tested?

The script is tested with old commit messages and confirmed no false errors except listed in the above limitations.

Errors later than 28.0.0 are as below.

  • Error: commit 03691eb: Too long title, 91 characters, limit 72:
    • win-capture: Remove the redundant "-" in the CSGO launch option and Steam url language code
  • Error: commit 29db52a: Invalid title text:
    • Save/Load source UUID in scene item data
  • Error: commit ecaa546: Too long description in a line, 74 characters, limit 72:
    • In this PR clock_gettime_nsec_np is used to convert from mach tick units
  • Error: commit bc9eee9: Too long description in a line, 97 characters, limit 72:
  • Error: commit 8abc352: Too long description in a line, 74 characters, limit 72:
  • Error: commit 634fd32: unknown module name 'CI/cmake'
  • Error: commit e593335: Invalid title text:
    • NVENC error logging improvements
  • Error: commit 559925e: 2nd line is not empty.
  • Error: commit 26725fa: Too long description in a line, 75 characters, limit 72:
    • source toolbar. This also does a refactor of the function to enable/disable
  • Error: commit dee7ef8: Too long description in a line, 85 characters, limit 72:
    • Fixes: 137966e ("libobs-opengl: Try to use the platform display if available")
  • Error: commit 5810571: Too long description in a line, 76 characters, limit 72:
    • Fixes: bcb04bb ("libobs: Open a separate X11 connection for hotkeys")
  • Error: commit 189c693: Too long description in a line, 74 characters, limit 72:
    • Also ensure that null conversion (C++ only for GCC) is enabled because its
  • Error: commit bf00ef1: Too long description in a line, 73 characters, limit 72:
    • Now uses GetIfEntry2 which supports 64-bit values for reporting speed, so
  • Error: commit bf00ef1: Too long description in a line, 76 characters, limit 72:
    • additional log line if the interface error counters are non-zero to possibly
  • Error: commit bf00ef1: Too long description in a line, 74 characters, limit 72:
    • help identify physical faults. Finally the transmit and receive speeds are
  • Error: commit bf00ef1: Too long description in a line, 74 characters, limit 72:
    • logged independently so that asynchronous mediums such as Wi-Fi that might
  • Error: commit 321776e: Invalid title text:
    • update Mildom servers
  • Error: commit ace5188: Too long title, 76 characters, limit 72:
    • obs-filters: disable NVIDIA FX audio model loading when SDK is not installed
  • Error: commit ace5188: Invalid title text:
    • disable NVIDIA FX audio model loading when SDK is not installed
  • Error: commit 930c65e: Too long description in a line, 73 characters, limit 72:
    • infinite time until disk is full and int overflows. Similarly, if no data
  • Error: commit 852d537: Too long description in a line, 73 characters, limit 72:
    • It actually use the recording encoder while restoring the stream encoder.
  • Error: commit c15cd23: unknown module name 'obs-filter'
  • Error: commit cdc9313: unknown module name 'cmake/libobs'
  • Error: commit a463326: unknown module name 'ffmpeg'
  • Error: commit a463326: Invalid title text:
    • fix cqp rate control on svtav1
  • Error: commit 777a8f8: unknown module name 'ffmpeg'
  • Error: commit 777a8f8: Invalid title text:
    • fix "cqp" mode for libaom
  • Error: commit fcb6df1: Too long description in a line, 80 characters, limit 72:
  • Error: commit fcb6df1: Too long description in a line, 95 characters, limit 72:
  • Error: commit f7086f2: Invalid title text:
    • NVIDIA Background Removal variable mask refresh
  • Error: commit d851f1d: Invalid title text:
    • Fix capturing UHD/4K YUV on Kona HDMI.
  • Error: commit 097e9cc: unknown module name 'mac-videtoolbox'
  • Error: commit 44c8249: Too long description in a line, 78 characters, limit 72:
    • The hw/sw encoder selection is enforced by the encoder IDs, so these flags are
  • Error: commit 761530d: Too long description in a line, 85 characters, limit 72:
    • Utilize the systems ProRes software and hardware encoders on supported configurations
  • Error: commit 641ec29: Too long description in a line, 73 characters, limit 72:
    • By default, ffmpeg-mux is guessing at the codec format of submitted data.
  • Error: commit 8779f05: Invalid title text:
    • fix build on non-x86 Linux platforms
  • Error: commit dba3401: unknown module name 'obs-filter'
  • Error: commit 6c474fe: Invalid title text:
    • use NvEncGetSequenceParams for NVENC header
  • Error: commit ffbcbaece: Too long description in a line, 76 characters, limit 72:
      • Make the settings a scroll area. This makes it work on smaller screens and
  • Error: commit ffbcbaece: Too long description in a line, 73 characters, limit 72:
      • Moved the remember settings checkbox to the top of the settings because
  • Error: commit 88a51db: Too long description in a line, 74 characters, limit 72:
    • May also improve performance of the main window, regardless of dock state.
  • Error: commit 88a51db: Too long description in a line, 80 characters, limit 72:
  • Error: commit 47e441b: unknown module name 'UI/updater'
  • Error: commit 47e441b: Invalid title text:
    • CMake: Add /utf-8 to MSVC command line
  • Error: commit 1cae3d4: unknown module name 'UI/updater'
  • Error: commit 7d853fb: unknown module name 'UI/updater'
  • Error: commit 7396c21: unknown module name 'UI/updater'
  • Error: commit e87a97e: unknown module name 'UI/updater'
  • Error: commit 31414d2: unknown module name 'UI/updater'
  • Error: commit 854d759: unknown module name 'UI/updater'

Types of changes

  • New feature (non-breaking change which adds functionality)

Checklist:

  • My code has been run through clang-format.
    • Instead of clang-format, the code is checked with loganalyzer's pycodestyle setup.
  • I have read the contributing document.
  • My code is not on the master branch.
  • The code has been tested.
  • All commit messages are properly formatted and commits squashed where appropriate.
  • I have included updates to all appropriate documentation.

@norihiro norihiro force-pushed the check-commit-message branch from ad65ea7 to 000db7e Compare September 6, 2021 15:28
@norihiro norihiro marked this pull request as draft September 6, 2021 15:29
@gxalpha gxalpha mentioned this pull request Sep 6, 2021
6 tasks
@norihiro norihiro force-pushed the check-commit-message branch from 000db7e to 0e4e42c Compare September 6, 2021 15:52
@norihiro norihiro marked this pull request as ready for review September 6, 2021 16:11
@RytoEX
Copy link
Member

RytoEX commented Sep 6, 2021

This is essentially a different version of #5000. Please see the concerns listed there.

I don't see these concerns covered in the description:

  • There are occasions where the commit prefix contains a slash
  • the commit message subject contains the word "Revert" before the rest of the message follows in quotes
  • the commit message has no prefix
  • the commit message subject has more than 50 characters after the prefix because making it shorter makes it non-descriptive and our hard limit is 72 characters total

The title is 50 characters at max excluding module prefix.

Again, to me, the 50 character limit is not hard limit. There are some cases where making a commit subject fit within 50 character (excluding the prefix) makes it less comprehensible than if you had used 53 characters and still came in under 72 characters total. The 50 character limit is a suggestion or guideline that should be taken seriously, but the absolute hard limit is 72 characters total, as that is when GitHub will wrap the text.

@notr1ch
Copy link
Member

notr1ch commented Sep 6, 2021

Agreed on the 50 characters being more of a suggestion. I frequently go over that trying to make a concise title, but definitely the 72 hard limit should apply.

@norihiro norihiro force-pushed the check-commit-message branch 2 times, most recently from 0bb53c2 to e781c43 Compare September 9, 2021 23:55
@norihiro norihiro marked this pull request as draft September 10, 2021 00:24
@norihiro norihiro marked this pull request as ready for review September 10, 2021 01:42
@norihiro
Copy link
Contributor Author

The concerns mentioned by @RytoEX are addressed in the commit message and the description of this PR.
The subject is checked not to exceed 72 characters. If it exceeds 50 characters, just a warning message will be displayed but won't fail.
I wrote the script in awk. If another language is preferrable, please let me know.

@RytoEX
Copy link
Member

RytoEX commented Sep 10, 2021

The concerns mentioned by @RytoEX are addressed in the commit message and the description of this PR.
The subject is checked not to exceed 72 characters. If it exceeds 50 characters, just a warning message will be displayed but won't fail.
I wrote the script in awk. If another language is preferrable, please let me know.

The subject is 72 characters at max excluding module prefix.

This is incorrect. The hard limit on maximum length of the subject, including the prefix, is 72 characters. GitHub's UI will wrap the subject if it goes beyond 72 characters.

@RytoEX
Copy link
Member

RytoEX commented Sep 10, 2021

Error: commit 84b4257: first word: De-escalate

"De-escalate" is technically a correct spelling of this verb. "Deescalate" is also acceptable. I don't know that it's easily detectable if a hyphen used in the first word is correct or not.

@norihiro norihiro force-pushed the check-commit-message branch 2 times, most recently from 88ec435 to b419832 Compare September 10, 2021 02:48
@norihiro
Copy link
Contributor Author

norihiro commented Sep 10, 2021

The title line criteria is revised.

  • Hard limit to 72 characters including module prefix
  • Warning at 50 characters excluding module prefix

The revised script will accept words with a single hyphen at the middle of the word such as "De-escalate". Also accept single hyphens appear multiple times such as "Click-and-drag". If two or more hyphens appear at the same time, such as "De--escalate", it will be rejected.

Description is also updated.

@WizardCM WizardCM added CI Enhancement Improvement to existing functionality labels Sep 11, 2021
@WizardCM WizardCM self-requested a review September 12, 2021 00:44
Copy link
Member

@WizardCM WizardCM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, my method for testing this PR was to grab every commit since 27.0.0 (35c07bb) and run it through the script.

Of 289 recent commits

  • Errors: 39
  • Warnings: 26
  • Info: 26

More specifically:

  • 34 description lines were too long (~76/72 characters)
  • 26 had titles too long (~52/50 characters)
  • 26 had prefixes that were considered "not frequent"
  • 5 had an "invalid" first word

Overall, I'm happy with the results and can see a benefit to having this workflow. Thank you so much for putting in the work! :)

My primary concern looking through the prefix results & the code that triggered them is that check_module_name() is hardcoded and a subset of the actual prefixes that may appear.

Here are the ones that it complained about:

UI,obs-transitions:
decklink-output-ui:
flatpak:
graphics-hook:
libobs,
libobs,deps/media-playback:
libobs-d3d11:
libobs-winrt,
libobs-winrt:
mac-capture:
mac-syphon:
mac-virtualcam:
obs-ffmpeg,
obs-frontend-api:
obs-qsv11:
text-freetype2:

I personally think that this check could be as simple as checking whether items in the commit prefix (comma or slash separated) match any one of the following:

  • the modified file's current directory
  • the modified file's root-level directory
  • the modified file's name without extension
  • if within a plugin subdirectory, the plugin's directory name

While I realise this is an 'Info' level warning, it can also be noisy.

Finally, for the "first word" check, I think it might be a little too aggressive:

first word: FIx
first word: pthread_mutex_init_recursive
first word: DrawSrgbDecompressPremultiplied
first word: add
first word: NVENC

Catching "FIx" and "add" is good, but function names and component names are valid first words in a commit title (in my opinion).


I did have issues running this script outside of the GitHub Actions workflow, and was wondering if there's something we could do about that - specifically, my awk is in /usr/bin/awk and not /bin/awk.

And I'd also like to see some comments in the awk script, specifically around the purposes/goals of some of the regular expressions.

@norihiro norihiro force-pushed the check-commit-message branch from b419832 to 166157b Compare September 13, 2021 14:28
@RytoEX
Copy link
Member

RytoEX commented Sep 13, 2021

I personally think that this check could be as simple as checking whether items in the commit prefix (comma or slash separated) match any one of the following:

* the modified file's current directory

* the modified file's root-level directory

* the modified file's name without extension

* if within a plugin subdirectory, the plugin's directory name

As far as I understand it, this script isn't checking modified files at all. It's just checking the output of git log for commit messages to check the commit message prefix against a predetermined list of options. Parsing the git log contents for modified is way more work.

In any case, what about commits that modify files in multiple directories?

Finally, for the "first word" check, I think it might be a little too aggressive:

Catching "FIx" and "add" is good, but function names and component names are valid first words in a commit title (in my opinion).

Technically, a function name is probably against the commit guidelines unless it's treated as a verb, but there's no easy way to detect "is this being used as a verb or not".

All of these issues though continue to make me think that a lot of these either need to be non-blocking warnings, or question if the signal-to-noise ratio will be high enough to justify implementing this, per the previous PR thread.

I did have issues running this script outside of the GitHub Actions workflow, and was wondering if there's something we could do about that - specifically, my awk is in /usr/bin/awk and not /bin/awk.

And I'd also like to see some comments in the awk script, specifically around the purposes/goals of some of the regular expressions.

I'll be honest, I kind of dislike that it's written in awk for a couple of reasons. Git for Windows doesn't come with awk installed, and while it can be installed on Windows, I think it's unlikely that most Windows-based devs would have it installed. I just don't know if there's an overwhelming reason to try to use a different tool than awk, because most other tools will have similar problems, aside from my opinion that awk is perhaps a bit esoteric and complicated compared to other possible tools.

I agree that at minimum, this requires additional comments/documentation, because I don't expect every dev to understand how awk, or regex, works.

@norihiro
Copy link
Contributor Author

I personally think that this check could be as simple as checking whether items in the commit prefix (comma or slash separated) match any one of the following:

Let me consider to retrieve to check the prefix from the changes of files.

Technically, a function name is probably against the commit guidelines unless it's treated as a verb, but there's no easy way to detect "is this being used as a verb or not".

I thought the function names are always noun.
Maybe, detecting function names will be something like below (and possibly with numbers) but I'm afraid not all functions are covered.

  • lower_case_letters_with_underscore
  • CapitalizedWord

I'll be honest, I kind of dislike that it's written in awk for a couple of reasons.

Then, how about python3? I guess more developers are familiar with python. Visual Studio includes python. Regex module (re) is included in it's library. There is a 3rd party's git-python library though I have never tried it yet. We may use it or just retrieving by calling git executable. One implementation concern is non-ascii characters, which is usually causes an error unless the environment is properly configured.

@WizardCM WizardCM requested a review from PatTheMav December 18, 2021 23:32
@PatTheMav
Copy link
Member

PatTheMav commented Dec 19, 2021

While I appreciate the idea in general and also the work put into this, I am not sure about added value:

Workflows do not run on all PRs automatically - this has to be triggered manually after review by a maintainer, which means that a human review has taken place yielding the same information as this check. But it is specifically PRs by non-regulars which might run afoul of this.

I share the reservations with regards to using awk and wonder if the same could be achieved by querying the Github API via curl and parsing the result via jq (to avoid a checkout just to check the git log), though that depends on whether we can query information for all commits associated to a PR that way.

Last, I maintain my reservation about introducing too many single-job workflows in parallel, we have too many of those running already, leading to an overcrowded "checks" area where one cannot see the forest for the trees.

EDIT: I also see the unfortunate side-effect of hardcoding "good" commit prefixes, which would require adjusting this PR whenever we add a new plugin or component, which will happen once or twice, but will fall by the wayside every time after that.

@norihiro norihiro marked this pull request as draft December 20, 2021 03:13
@norihiro
Copy link
Contributor Author

I will consider your valuable comments.

@norihiro norihiro force-pushed the check-commit-message branch 2 times, most recently from a6a5cfc to 40cd840 Compare March 21, 2023 04:44
@norihiro norihiro force-pushed the check-commit-message branch 2 times, most recently from e6f2439 to abd134e Compare March 21, 2023 08:21
@norihiro norihiro marked this pull request as ready for review March 21, 2023 08:26
@norihiro
Copy link
Contributor Author

norihiro commented Mar 21, 2023

I'm sorry for my late response. I've revised the script.

  • The check script is rewritten in python.
  • The commit history is taken using GitHub REST API.
  • The module prefix is checked with the tree (not checking the modified files by the commit but just the tree at the commit).

@norihiro norihiro requested review from WizardCM and removed request for PatTheMav March 21, 2023 08:40
@norihiro
Copy link
Contributor Author

I'm afraid I accidentally clicked something and got removed request for PatTheMav.

@norihiro norihiro force-pushed the check-commit-message branch from abd134e to fba9389 Compare April 3, 2023 12:09
This commit adds these checks.
- Title consists of module name(s) and subject or just subject.
- If there are two or more module names, the module names has to be
  separated by a comma followed by an optional space.
- First word of the subject is starting from an upper case letter and
  remaining characters in the first word are lower case letters.
  "Don't" and a word with a hyphen at the middle are also acceptable.
- The title is 72 characters at max including module prefix.
- If the subject exceed 50 characters excluding the module prefix,
  display a warning message.
- Titles for commits of revert and merge are ignored.
- Full description lines are 72 columns max.
  Links and co-author names are excluded from this check.
@norihiro norihiro force-pushed the check-commit-message branch from fba9389 to 58f0f2d Compare November 22, 2023 12:23
@norihiro
Copy link
Contributor Author

Added a small modification to the script to exclude links in the commit message like below.

  [1] https://www.example.com/long/long/long/long/reference-link/that-exceeds-the-72-column-rule

@PatTheMav
Copy link
Member

It would probably make more sense to convert this into a repository action and run it as part of the check-format workflow and (if necessary) guard it behind a condition to only run on PR pull events.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Enhancement Improvement to existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants