This tool computes the complexity of a specified contribution to a git repository. A contribution is one or more commits specified by their commit hashes. Alternatively, if commit messages contain references to issue numbers, a contribution can be specified by a regular expression matching a certain set of commits.
The tool reports a contribution complexity on the scale low
, moderate
, medium
, elevated
, high
.
That value identifies weather a contribution was simple to make (value low
) or if it consists of multiple intricate changes (value high
) that were difficult to integrate into the system.
For example, the storage engine of Apache Cassandra (DBMS) was refactored for version 3 to better support certain concepts of the query language and to allow for future performance optimizations, see ticket CASSANDRA-8099
The corresponding commit modifies almost 50k lines in 645 files and contains many non-trivial changes.
On the other hand a bug that prevented under certain circumstances streaming between cluster nodes was fixed with a quite tiny patch modifying 15 lines in two files.
For humans inspecting the two contributions it is quickly clear that the former contribution is way more complex to implement than the latter.
This tool is meant to automate the process of identification of contributions of various complexities either for inclusion in a CI setup or for research.
$ pip install contribution-complexity
You can run the tool either by specifying a list of commits or by providing a regular expression that matches commit messages containing
$ contribcompl commits <path_to_repo> <commit_shas>...
$ contribcompl issue <path_to_repo> <issue_regex>...
For example,
$ git clone [email protected]:apache/Cassandra.git /tmp/cassandra
$ contribcompl commits /tmp/cassandra 021df085074b761f2b3539355ecfc4c237a54a76 2f1d6c7254342af98c2919bd74d37b9944c41a6b
ContributionComplexity.LOW
$ contribcompl issue /tmp/cassandra 'CASSANDRA-8099( |$)'
ContributionComplexity.HIGH
from contribution_complexity.compute import find_commits_for_issue
from contribution_complexity.metrics import compute_contrib_compl
issue_re = "CASSANDRA-8099( |$)"
path_to_repo = "/tmp/cassandra"
commit_shas = find_commits_for_issue(path_to_repo, issue_re)
contribcompl = compute_contrib_compl(path_to_repo, commit_shas)
print(contribcompl)
See CITATION.bib.
- Vagrant with DigitalOcean plugin
- A DigitalOcean account
- SSH keys registered with DigitalOcean
- The SSH key name on an environment variable
SSH_KEY_NAME
- A DigitalOcean API token on an environment variable
DIGITAL_OCEAN_TOKEN
- Set your Github API key in the
Vagrantfile
, i.e., replace<PUT_YOUR_KEY_HERE>
on line 33 with your key. - Run
vagrant up
in this directory, which will bring up and configure a VM accordingly. It will automatically start the experiment recreation, which will take some hours to run. - Once done you have all results on the VM (log onto the machine with
vagrant ssh
) in the directory/vagrant/data/
The experiment is described in experiment/run_experiment.sh
.
The logo is adapted from a [flaticon icon](on https://www.flaticon.com/free-icon/puzzle_808497?term=contribution&page=1&position=16&page=1&position=16&related_id=808497&origin=search). Proper attribution to the original: