git » git-arr » commit 535aa33

Speed up finding the git directory for a repository

author Alberto Bertogli
2025-05-18 22:32:28 UTC
committer Alberto Bertogli
2025-05-18 22:36:26 UTC
parent 5048b18f3690e6da95a6f325cf659b84397153bd

Speed up finding the git directory for a repository

When loading the config, we need to find the location of the git
directory for all repositories.

We do this even if we are only operating on a single repo (because we
need to as part of the config, and also to generate the top level index).

Today, this is implemented by calling `git rev-parse --git-dir`;
however, that ends up taking a relatively significant amount of time
when regenerating a single repo.

To speed things up, this patch introduces a simpler heuristic that
should be good enough for most purposes. We expect the paths to be
repositories already.

On a config with 34 repos, a no-op regeneration of a repo with 4 branches and
600 commits, this results in ~15% speedup.

git-arr +10 -6

diff --git a/git-arr b/git-arr
index 5fbf1d0..70bffd4 100755
--- a/git-arr
+++ b/git-arr
@@ -149,12 +149,16 @@ def find_git_dir(path):
     """
 
     def check(p):
-        """A dirty check for whether this is a git dir or not."""
-        # Note silent stderr because we expect this to fail and don't want the
-        # noise; and also we strip the final \n from the output.
-        return git.run_git(
-            p, ["rev-parse", "--git-dir"], silent_stderr=True
-        ).read()[:-1]
+        "True if p is a git directory, False otherwise."
+        # This is a very crude heuristic, but works well enough for our needs,
+        # since we expect the directories to be given to us to be git repos.
+        # We used to do this by calling `git rev-parse --git-dir`, but it ends
+        # up taking a (relatively) significant amount of time, as we have to
+        # do it for all repos even if we just want to (re-)generate a single
+        # one.
+        if os.path.isdir(p + "/objects") and os.path.isdir(p + "/refs"):
+            return True
+        return False
 
     for p in [path, path + "/.git"]:
         if check(p):