perf: reduce number of native git calls when extracting a codebase (#3105)
Currently, `collectFiles` is calling `isGitIgnored` on each
(non-excluded) recursion. Although there is caching, we're frequently
executing Git just to check whether an individual file or directory is
gitignored, meaning that the number of Git invocations scales with the
number of files in the user's app.
This amounts to a substantial number of Git invocations. For smaller
projects it could be dozens; for larger projects it could be thousands.
It's particularly a problem for native Git, because each `exec` call
comes with a lot of overhead even though Git itself is quite fast.
Although I'm not 100% sure, I suspect that this was the underlying cause
of both #2795 and #1642, because:
1. Both mention Dyad freezing when dealing with larger projects, and
this issue is far more noticeable for large projects.
2. Both specifically mention that the freeze happens upon opening their
project, which is when `collectFiles` runs.
3. I was able to replicate the crash consistently on Windows 10 and
inconsistently on Linux Mint by importing a large project into Dyad. I
don't yet have a good automated test for this, though.
The solution that I wrote for this PR puts the responsibility of
traversing the app's files onto native Git instead of doing it manually.
This means that we'll only have one Git invocation per call to the
function (formerly named) `collectFiles`.
I've also done my best to keep the output of `collectFilesNativeGit` as
close as possible to the original `collectFiles`. The ordering of the
files will be different, but I don't think that should make a difference
given that we later sort them anyway.
Some alternatives I've thought of if we decide we want to keep the
current traversal logic:
- Run `git check-ignore` on batches of files (e.g. each result of
`fsAsync.readdir`) rather than one at a time. This would still result in
multiple Git calls, though.
- Run `git check-ignore` on all of the files at once at the end of
`collectFiles`. We wouldn't be able to prune gitignored directories in
our traversal, but at least we'd still avoid the directories in
`EXCLUDED_DIRS`, such as `node_modules` and `.next`.
I've left the logic of `collectFiles` untouched for isomorphic-git for
now. There might be a good way to optimize that as well, but it will
likely be a bit different because isomorphic-git has different
capabilities than native Git.
<!-- devin-review-badge-begin -->
---
<a href="https://app.devin.ai/review/dyad-sh/dyad/pull/3105"
target="_blank">
<picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
</picture>
</a>
<!-- devin-review-badge-end -->
---------
Co-authored-by:
Claude <noreply@anthropic.com>
正在显示
src/__tests__/git_utils.test.ts
0 → 100644
请
注册
或者
登录
后发表评论