Improving Parallelism in git-grep

Matheus Tavares Bernardino
Supervisor: Prof. Dr. Alfredo Goldman
Capstone Project, Bachelor of Computer Science, IME-USP, 2019

Abstract Initial Proposal Code Poster Final Essay

Abstract

Version control systems have become standard use in medium to large software development. And, among them, Git has become the most popular1 . Being used to manage a large variety of projects, with different magnitudes (in both content and history sizes), it must be build to scale. With this in mind, Git's grep command was made parallel using a producer-consumer mechanism. However, when operating in Git's internal object store (e.g. for a search in older revisions), the multithreaded version became slower than the sequential code. For this reason, threads were disabled in the object store case.

The present work aims to contribute to the Git project improving the parallelization of the grep command and re-enabling threads for all cases. Analyzes were made on git-grep to locate its hotspots, i.e. where it was consumming the most time, and investigate how that could be mitigated. Between other foundings, these studies showed that the object decompression routines accounted for up to one third of git-grep's total execution time. These routines, despite being thread-safe, had to be serialized in the first threaded implementation, because of the surrouding thread-unsafe object reading machinery.

Through git-grep and object reading refactoring parallel access to the decompression code was allowed, and an speedup of more than 3x was achieved (running git-grep with 8 threads in 4 cores with hyper-threading). This successfully allowed the threads reactivation with good performance. Additionally, some previously possible race conditions in git-grep's code and a bug with submodules were fixed.

Additional Resources

Part of this project was developed during Google Summer of Code 2019. To allow my mentors and the Git community to keep track of my progress, I kept a weekly-updated blog at https://matheustavares.gitlab.io/gsoc.

Footnotes:

Contact info at: https://matheustavares.gitlab.io/

Page build with the jekyll template good-clean-read (MIT License)