• Ryan Groch's avatar
    perf: reduce number of native git calls when extracting a codebase (#3105) · 0beeb501
    Ryan Groch 提交于
    Currently, `collectFiles` is calling `isGitIgnored` on each
    (non-excluded) recursion. Although there is caching, we're frequently
    executing Git just to check whether an individual file or directory is
    gitignored, meaning that the number of Git invocations scales with the
    number of files in the user's app.
    
    This amounts to a substantial number of Git invocations. For smaller
    projects it could be dozens; for larger projects it could be thousands.
    It's particularly a problem for native Git, because each `exec` call
    comes with a lot of overhead even though Git itself is quite fast.
    
    Although I'm not 100% sure, I suspect that this was the underlying cause
    of both #2795 and #1642, because:
    1. Both mention Dyad freezing when dealing with larger projects, and
    this issue is far more noticeable for large projects.
    2. Both specifically mention that the freeze happens upon opening their
    project, which is when `collectFiles` runs.
    3. I was able to replicate the crash consistently on Windows 10 and
    inconsistently on Linux Mint by importing a large project into Dyad. I
    don't yet have a good automated test for this, though.
    
    The solution that I wrote for this PR puts the responsibility of
    traversing the app's files onto native Git instead of doing it manually.
    This means that we'll only have one Git invocation per call to the
    function (formerly named) `collectFiles`.
    
    I've also done my best to keep the output of `collectFilesNativeGit` as
    close as possible to the original `collectFiles`. The ordering of the
    files will be different, but I don't think that should make a difference
    given that we later sort them anyway.
    
    Some alternatives I've thought of if we decide we want to keep the
    current traversal logic:
    - Run `git check-ignore` on batches of files (e.g. each result of
    `fsAsync.readdir`) rather than one at a time. This would still result in
    multiple Git calls, though.
    - Run `git check-ignore` on all of the files at once at the end of
    `collectFiles`. We wouldn't be able to prune gitignored directories in
    our traversal, but at least we'd still avoid the directories in
    `EXCLUDED_DIRS`, such as `node_modules` and `.next`.
    
    I've left the logic of `collectFiles` untouched for isomorphic-git for
    now. There might be a good way to optimize that as well, but it will
    likely be a bit different because isomorphic-git has different
    capabilities than native Git.
    <!-- devin-review-badge-begin -->
    
    ---
    
    <a href="https://app.devin.ai/review/dyad-sh/dyad/pull/3105"
    target="_blank">
      <picture>
    <source media="(prefers-color-scheme: dark)"
    srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
    <img
    src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
    alt="Open with Devin">
      </picture>
    </a>
    <!-- devin-review-badge-end -->
    
    ---------
    Co-authored-by: 's avatarClaude <noreply@anthropic.com>
    0beeb501
git_types.ts 2.1 KB