Unverified 提交 eb1ebdb2 authored 作者: Ryan Groch's avatar Ryan Groch 提交者: GitHub

remove edit_file tool from pro agent (#3268)

A few notes: - The fallback rule (i.e. if `search_replace` fails twice, use `write_file` instead) is included. - I also included instructions to use multiple `search_replace` calls for moderately large edits with distinct sections. My general observation has been that often models tend to lean towards using `write_file` rather than `search_replace` when it's ambiguous, so I _think_ it should be okay to nudge them towards `search_replace` a little bit more. Please correct me if I'm wrong about this. - Gemini pointed out that this can lead to a race condition if two `search_replace` calls run simultaneously on the same file. I've added locks to `search_replace` and `write_file` to account for this just in case. - Another option would be to extend `search_replace` to account for multiple changes so they can get batched, but this would be a larger change. - I have not changed the basic agent. I can do that if desired. - I did do some testing to check that models can still manage with the change of prompt. I haven't noticed any issues. The following snapshots/fixtures have been updated: - src/\_\_tests\_\_/\_\_snapshots\_\_/local_agent_prompt.test.ts.snap - e2e-tests/snapshots/local_agent_basic.spec.ts_local-agent---dump-request-1.txt - e2e-tests/snapshots/local_agent_basic.spec.ts_local-agent---read-then-edit-1.aria.yml - e2e-tests/snapshots/local_agent_basic.spec.ts_after-edit.txt - e2e-tests/snapshots/local_agent_advanced.spec.ts_local-agent---mention-apps-1.txt - e2e-tests/snapshots/local_agent_auto.spec.ts_local-agent---auto-model-1.txt - e2e-tests/fixtures/engine/local-agent/read-then-edit.ts Which affect the following tests: - src/\_\_tests\_\_/local_agent_prompt.test.ts - e2e-tests/local_agent_basic.spec.ts - e2e-tests/local_agent_auto.spec.ts - e2e-tests/local_agent_summarize.spec.ts - e2e-tests/local_agent_advanced.spec.ts These tests appear to pass. This PR would also leave a lot of unused code related to `edit_file`, which might be worth removing (not sure whether to do this).
上级 bb2eadfe
import type { LocalAgentFixture } from "../../../../testing/fake-llm-server/localAgentTypes"; import type { LocalAgentFixture } from "../../../../testing/fake-llm-server/localAgentTypes";
export const fixture: LocalAgentFixture = { export const fixture: LocalAgentFixture = {
description: "Read a file, then edit it with edit_file", description: "Read a file, then edit it with search_replace",
turns: [ turns: [
{ {
text: "Let me first read the current file contents to understand what we're working with.", text: "Let me first read the current file contents to understand what we're working with.",
...@@ -15,16 +15,14 @@ export const fixture: LocalAgentFixture = { ...@@ -15,16 +15,14 @@ export const fixture: LocalAgentFixture = {
], ],
}, },
{ {
text: "Now I'll update the welcome message to say Hello World instead.", text: "Now I'll update the welcome message to say UPDATED imported app instead.",
toolCalls: [ toolCalls: [
{ {
name: "edit_file", name: "search_replace",
args: { args: {
path: "src/App.tsx", file_path: "src/App.tsx",
content: `// ... existing code ... old_string: "const App = () => <div>Minimal imported app</div>;",
const App = () => <div>UPDATED imported app</div>; new_string: "const App = () => <div>UPDATED imported app</div>;",
// ... existing code ...`,
description: "Update welcome message",
}, },
}, },
], ],
...@@ -34,4 +32,3 @@ const App = () => <div>UPDATED imported app</div>; ...@@ -34,4 +32,3 @@ const App = () => <div>UPDATED imported app</div>;
}, },
], ],
}; };
...@@ -44,36 +44,6 @@ ...@@ -44,36 +44,6 @@
} }
} }
}, },
{
"type": "function",
"function": {
"name": "edit_file",
"description": "\n## When to Use edit_file\n\nUse the `edit_file` tool when you need to modify **a section or function** within an existing file. The edit output will be read by a less intelligent model, which will quickly apply the edit. You should make it clear what the edit is, while also minimizing the unchanged code you write.\n\n**Use only ONE edit_file call per file.** If you need to make multiple changes to the same file, include all edits in sequence within a single call using `// ... existing code ...` comments between them.\n\n## When NOT to Use edit_file\n\nDo NOT use this tool when:\n- You are making a **small, surgical edit** (1-3 lines) like fixing a typo, renaming a variable, updating a single value, or changing an import. Use `search_replace` instead for these precise changes.\n- You are creating a brand-new file (use `write_file` instead).\n- You are rewriting most of an existing file (in those cases, use `write_file` to output the complete file instead).\n\n## Basic Format\n\nWhen writing the edit, you should specify each edit in sequence, with the special comment // ... existing code ... to represent unchanged code in between edited lines.\n\nBasic example:\n```\nedit_file(path=\"file.js\", instructions=\"I am adding error handling to the fetchData function and updating the return type.\", content=\"\"\"\n// ... existing code ...\nFIRST_EDIT\n// ... existing code ...\nSECOND_EDIT\n// ... existing code ...\nTHIRD_EDIT\n// ... existing code ...\n\"\"\")\n```\n\n## General Principles\n\nYou should bias towards repeating as few lines of the original file as possible to convey the change.\n\nNEVER show unmodified code in the edit, unless sufficient context of unchanged lines around the code you're editing is needed to resolve ambiguity.\n\nDO NOT omit spans of pre-existing code without using the // ... existing code ... comment to indicate its absence.\n\n## Example: Basic Edit\n```\nedit_file(path=\"LandingPage.tsx\", instructions=\"I am changing the return statement in LandingPage to render a div with 'hello' instead of the previous content.\", content=\"\"\"\n// ... existing code ...\n\nconst LandingPage = () => {\n // ... existing code ...\n return (\n <div>hello</div>\n );\n};\n\n// ... existing code ...\n\"\"\")\n```\n\n## Example: Deleting Code\n\n**When deleting code, you must provide surrounding context and leave an explicit comment indicating what was removed.**\n```\nedit_file(path=\"utils.ts\", instructions=\"I am removing the deprecatedHelper function located between currentHelper and anotherHelper.\", content=\"\"\"\n// ... existing code ...\n\nexport function currentHelper() {\n return \"active\";\n}\n\n// REMOVED: deprecatedHelper() function\n\nexport function anotherHelper() {\n return \"working\";\n}\n\n// ... existing code ...\n\"\"\")\n```\n",
"parameters": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "The file path relative to the app root"
},
"content": {
"type": "string",
"description": "The updated code snippet to apply"
},
"instructions": {
"description": "Instructions for the edit. A single sentence describing what you are going to do for the sketched edit. This helps the less intelligent model apply the edit correctly. Use first person to describe what you are doing. Don't repeat what you've said in previous messages. Use it to disambiguate any uncertainty in the edit.",
"type": "string"
}
},
"required": [
"path",
"content"
],
"additionalProperties": false
}
}
},
{ {
"type": "function", "type": "function",
"function": { "function": {
......
=== src/App.tsx === === src/App.tsx ===
TURBO EDITED filePath const App = () => <div>UPDATED imported app</div>;
\ No newline at end of file
export default App;
...@@ -52,36 +52,6 @@ ...@@ -52,36 +52,6 @@
} }
} }
}, },
{
"type": "function",
"function": {
"name": "edit_file",
"description": "\n## When to Use edit_file\n\nUse the `edit_file` tool when you need to modify **a section or function** within an existing file. The edit output will be read by a less intelligent model, which will quickly apply the edit. You should make it clear what the edit is, while also minimizing the unchanged code you write.\n\n**Use only ONE edit_file call per file.** If you need to make multiple changes to the same file, include all edits in sequence within a single call using `// ... existing code ...` comments between them.\n\n## When NOT to Use edit_file\n\nDo NOT use this tool when:\n- You are making a **small, surgical edit** (1-3 lines) like fixing a typo, renaming a variable, updating a single value, or changing an import. Use `search_replace` instead for these precise changes.\n- You are creating a brand-new file (use `write_file` instead).\n- You are rewriting most of an existing file (in those cases, use `write_file` to output the complete file instead).\n\n## Basic Format\n\nWhen writing the edit, you should specify each edit in sequence, with the special comment // ... existing code ... to represent unchanged code in between edited lines.\n\nBasic example:\n```\nedit_file(path=\"file.js\", instructions=\"I am adding error handling to the fetchData function and updating the return type.\", content=\"\"\"\n// ... existing code ...\nFIRST_EDIT\n// ... existing code ...\nSECOND_EDIT\n// ... existing code ...\nTHIRD_EDIT\n// ... existing code ...\n\"\"\")\n```\n\n## General Principles\n\nYou should bias towards repeating as few lines of the original file as possible to convey the change.\n\nNEVER show unmodified code in the edit, unless sufficient context of unchanged lines around the code you're editing is needed to resolve ambiguity.\n\nDO NOT omit spans of pre-existing code without using the // ... existing code ... comment to indicate its absence.\n\n## Example: Basic Edit\n```\nedit_file(path=\"LandingPage.tsx\", instructions=\"I am changing the return statement in LandingPage to render a div with 'hello' instead of the previous content.\", content=\"\"\"\n// ... existing code ...\n\nconst LandingPage = () => {\n // ... existing code ...\n return (\n <div>hello</div>\n );\n};\n\n// ... existing code ...\n\"\"\")\n```\n\n## Example: Deleting Code\n\n**When deleting code, you must provide surrounding context and leave an explicit comment indicating what was removed.**\n```\nedit_file(path=\"utils.ts\", instructions=\"I am removing the deprecatedHelper function located between currentHelper and anotherHelper.\", content=\"\"\"\n// ... existing code ...\n\nexport function currentHelper() {\n return \"active\";\n}\n\n// REMOVED: deprecatedHelper() function\n\nexport function anotherHelper() {\n return \"working\";\n}\n\n// ... existing code ...\n\"\"\")\n```\n",
"parameters": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "The file path relative to the app root"
},
"content": {
"type": "string",
"description": "The updated code snippet to apply"
},
"instructions": {
"description": "Instructions for the edit. A single sentence describing what you are going to do for the sketched edit. This helps the less intelligent model apply the edit correctly. Use first person to describe what you are doing. Don't repeat what you've said in previous messages. Use it to disambiguate any uncertainty in the edit.",
"type": "string"
}
},
"required": [
"path",
"content"
],
"additionalProperties": false
}
}
},
{ {
"type": "function", "type": "function",
"function": { "function": {
......
...@@ -15,6 +15,8 @@ ...@@ -15,6 +15,8 @@
- text: claude-opus-4-5 - text: claude-opus-4-5
- img - img
- text: less than a minute ago - text: less than a minute ago
- img
- text: "Version 2: (1 files changed)"
- button "Copy Request ID": - button "Copy Request ID":
- img - img
- text: "" - text: ""
...@@ -22,18 +24,23 @@ ...@@ -22,18 +24,23 @@
- paragraph: Let me first read the current file contents to understand what we're working with. - paragraph: Let me first read the current file contents to understand what we're working with.
- img - img
- text: Read src/App.tsx - text: Read src/App.tsx
- paragraph: Now I'll update the welcome message to say Hello World instead. - paragraph: Now I'll update the welcome message to say UPDATED imported app instead.
- button "App.tsx src/App.tsx Turbo Edit": - button "Search & Replace App.tsx src/App.tsx":
- img - img
- text: "" - text: ""
- img - img
- text: ""
- paragraph: Done! I've updated the title from 'Minimal imported app' to 'UPDATED imported app'. The change has been applied successfully. - paragraph: Done! I've updated the title from 'Minimal imported app' to 'UPDATED imported app'. The change has been applied successfully.
- button "Copy": - button "Copy":
- img - img
- img - img
- text: Approved
- img
- text: claude-opus-4-5 - text: claude-opus-4-5
- img - img
- text: less than a minute ago - text: less than a minute ago
- img
- text: "Version 3: (1 files changed)"
- button "Copy Request ID": - button "Copy Request ID":
- img - img
- text: "" - text: ""
......
...@@ -56,23 +56,24 @@ You have tools at your disposal to solve the coding task. Follow these rules reg ...@@ -56,23 +56,24 @@ You have tools at your disposal to solve the coding task. Follow these rules reg
<tool_calling_best_practices> <tool_calling_best_practices>
- **Read before writing**: Use \`read_file\` and \`list_files\` to understand the codebase before making changes - **Read before writing**: Use \`read_file\` and \`list_files\` to understand the codebase before making changes
- **Use \`edit_file\` for edits**: For modifying existing files, prefer \`edit_file\` over \`write_file\` - **Prefer \`search_replace\` for edits**: For small to medium edits on existing files, use \`search_replace\` rather than rewriting the whole file
- **Be surgical**: Only change what's necessary to accomplish the task - **Be surgical**: Only change what's necessary to accomplish the task
- **Handle errors gracefully**: If a tool fails, explain the issue and suggest alternatives - **Handle errors gracefully**: If a tool fails, explain the issue and suggest alternatives
</tool_calling_best_practices> </tool_calling_best_practices>
<file_editing_tool_selection> <file_editing_tool_selection>
You have three tools for editing files. Choose based on the scope of your change: You have two tools for editing files. Choose based on the scope of your change:
| Scope | Tool | Examples | | Scope | Tool | Examples |
|-------|------|----------| |-------|------|----------|
| **Small** (a few lines) | \`search_replace\` or \`edit_file\` | Fix a typo, rename a variable, update a value, change an import | | **Small to medium** (a few lines up to one function or contiguous section) | Single \`search_replace\` | Fix a typo, rename a variable, update a value, change an import, rewrite a function, modify multiple related lines |
| **Medium** (one function or section) | \`edit_file\` | Rewrite a function, add a new component, modify multiple related lines | | **Moderately large** (changes spread across multiple parts of the file, up to about half of it) | Multiple \`search_replace\` calls, one per distinct region | Update several functions, change an import plus update its call sites, refactor a few related sections |
| **Large** (most of the file) | \`write_file\` | Major refactor, rewrite a module, create a new file | | **Large** (rewriting the majority of the file, or creating a new file) | \`write_file\` | Major refactor that touches most of the file, rewrite a module end-to-end, create a new file |
**Tips:** Lean toward \`search_replace\` when in doubt — for moderately large edits, prefer several targeted \`search_replace\` calls over one \`write_file\`. Use \`write_file\` when less than half of the original file will remain.
- \`edit_file\` supports \`// ... existing code ...\` markers to skip unchanged sections
- When in doubt, prefer \`search_replace\` for precision or \`write_file\` for simplicity **Fallback rule:**
If \`search_replace\` fails twice in a row on the same edit (e.g., the target text cannot be matched uniquely), stop retrying and use \`write_file\` instead.
**Post-edit verification (REQUIRED):** **Post-edit verification (REQUIRED):**
After every edit, read the file to verify changes applied correctly. If something went wrong, try a different tool and verify again. After every edit, read the file to verify changes applied correctly. If something went wrong, try a different tool and verify again.
...@@ -85,7 +86,7 @@ After every edit, read the file to verify changes applied correctly. If somethin ...@@ -85,7 +86,7 @@ After every edit, read the file to verify changes applied correctly. If somethin
**Skip when:** the request is specific and concrete (e.g. "Fix the login button", "Change color from blue to green"). **Skip when:** the request is specific and concrete (e.g. "Fix the login button", "Change color from blue to green").
The tool accepts ONLY a \`questions\` array (no empty objects). It returns the user's answers as the tool result. The tool accepts ONLY a \`questions\` array (no empty objects). It returns the user's answers as the tool result.
3. **Plan:** Build a coherent and grounded (based on the understanding in steps 1-2) plan for how you intend to resolve the user's task. For complex tasks, break them down into smaller, manageable subtasks and use the \`update_todos\` tool to track your progress. Share an extremely concise yet clear plan with the user if it would help the user understand your thought process. 3. **Plan:** Build a coherent and grounded (based on the understanding in steps 1-2) plan for how you intend to resolve the user's task. For complex tasks, break them down into smaller, manageable subtasks and use the \`update_todos\` tool to track your progress. Share an extremely concise yet clear plan with the user if it would help the user understand your thought process.
4. **Implement:** Use the available tools (e.g., \`edit_file\`, \`write_file\`, ...) to act on the plan, strictly adhering to the project's established conventions. When debugging, add targeted console.log statements to trace data flow and identify root causes. **Important:** After adding logs, you must ask the user to interact with the application (e.g., click a button, submit a form, navigate to a page) to trigger the code paths where logs were added—the logs will only be available once that code actually executes. 4. **Implement:** Use the available tools (e.g., \`search_replace\`, \`write_file\`, ...) to act on the plan, strictly adhering to the project's established conventions. When debugging, add targeted console.log statements to trace data flow and identify root causes. **Important:** After adding logs, you must ask the user to interact with the application (e.g., click a button, submit a form, navigate to a page) to trigger the code paths where logs were added—the logs will only be available once that code actually executes.
5. **Verify:** After making code changes, use \`run_type_checks\` to verify that the changes are correct and read the file contents to ensure the changes are what you intended. 5. **Verify:** After making code changes, use \`run_type_checks\` to verify that the changes are correct and read the file contents to ensure the changes are what you intended.
6. **Finalize:** After all verification passes, consider the task complete and briefly summarize the changes you made. 6. **Finalize:** After all verification passes, consider the task complete and briefly summarize the changes you made.
</development_workflow> </development_workflow>
......
# Evals # Evals
LLM eval suite for tool-use quality. Six suites run the same 16 cases and LLM eval suite for tool-use quality. Five suites run the same 16 cases and
the same three models (Claude Sonnet 4.6, GPT 5.4, Gemini 3 Flash) but with the same three models (Claude Sonnet 4.6, GPT 5.4, Gemini 3 Flash) but with
different tool sets and system prompts: different tool sets and system prompts:
| Suite name | Tools available | System prompt | | Suite name | Tools available | System prompt |
| ------------------------ | ------------------------------------------- | -------------------------------------------- | | ------------------------ | ------------------------------ | -------------------------------------------- |
| `search_replace` | `search_replace` only | Minimal custom "precise code editor" prompt | | `search_replace` | `search_replace` only | Minimal custom "precise code editor" prompt |
| `search_replace_few` | `search_replace` only | Variant prompt encouraging fewer tool calls | | `search_replace_few` | `search_replace` only | Variant prompt encouraging fewer tool calls |
| `edit_file` | `edit_file` only | Minimal custom `edit_file` prompt | | `basic_agent` | `search_replace`, `write_file` | Production `LOCAL_AGENT_BASIC_SYSTEM_PROMPT` |
| `basic_agent` | `search_replace`, `write_file` | Production `LOCAL_AGENT_BASIC_SYSTEM_PROMPT` | | `pro_agent` | `search_replace`, `write_file` | Production `LOCAL_AGENT_SYSTEM_PROMPT` (Pro) |
| `pro_agent` | `search_replace`, `edit_file`, `write_file` | Production `LOCAL_AGENT_SYSTEM_PROMPT` (Pro) | | `pro_agent_experimental` | `search_replace`, `write_file` | Editable copy of the Pro prompt for tweaking |
| `pro_agent_experimental` | `search_replace`, `edit_file`, `write_file` | Editable copy of the Pro prompt for tweaking |
Each case gives the model a real source file plus an editing instruction, Each case gives the model a real source file plus an editing instruction,
runs the model with the suite's tools wired up, applies the produced edits, runs the model with the suite's tools wired up, applies the produced edits,
...@@ -21,9 +20,7 @@ instruction. ...@@ -21,9 +20,7 @@ instruction.
## Prerequisites ## Prerequisites
All models are routed through the Dyad Engine gateway, so you only need one All models are routed through the Dyad Engine gateway, so you only need one
credential: a Dyad Pro API key, exposed as `DYAD_PRO_API_KEY`. The credential: a Dyad Pro API key, exposed as `DYAD_PRO_API_KEY`.
`edit_file` tool additionally calls the engine's `/tools/turbo-file-edit`
endpoint to apply sketched edits — that uses the same key.
The suite is skipped entirely when `DYAD_PRO_API_KEY` is unset — no tests will The suite is skipped entirely when `DYAD_PRO_API_KEY` is unset — no tests will
fail, they just won't run. This keeps regular `vitest run` safe for contributors fail, they just won't run. This keeps regular `vitest run` safe for contributors
...@@ -63,11 +60,9 @@ EVAL_SUITE=all EVAL_MODEL=all DYAD_PRO_API_KEY="..." npm run eval ...@@ -63,11 +60,9 @@ EVAL_SUITE=all EVAL_MODEL=all DYAD_PRO_API_KEY="..." npm run eval
**Heads up — this is expensive.** A full `all`/`all` run issues one **Heads up — this is expensive.** A full `all`/`all` run issues one
generation per (suite × model × case) triple plus one judge call per case, generation per (suite × model × case) triple plus one judge call per case,
across 6 suites, 3 models, and 16 cases. The `edit_file`, `pro_agent`, and across 5 suites, 3 models, and 16 cases. Expect dozens of LLM requests,
`pro_agent_experimental` suites also make additional engine calls for each some of which run reasoning models on 300+ line fixtures. Use sparingly;
sketched edit the model produces through `edit_file`. Expect dozens of LLM requests, some of which run reasoning prefer narrow filters during development.
models on 300+ line fixtures. Use sparingly; prefer narrow filters during
development.
### Running a single suite ### Running a single suite
...@@ -82,13 +77,13 @@ EVAL_SUITE=search_replace EVAL_MODEL=all DYAD_PRO_API_KEY="..." npm run eval ...@@ -82,13 +77,13 @@ EVAL_SUITE=search_replace EVAL_MODEL=all DYAD_PRO_API_KEY="..." npm run eval
# The basic_agent suite (Basic agent prompt, search_replace + write_file) # The basic_agent suite (Basic agent prompt, search_replace + write_file)
EVAL_SUITE=basic_agent EVAL_MODEL=all DYAD_PRO_API_KEY="..." npm run eval EVAL_SUITE=basic_agent EVAL_MODEL=all DYAD_PRO_API_KEY="..." npm run eval
# The pro_agent suite (Pro agent prompt, search_replace + edit_file + write_file) # The pro_agent suite (Pro agent prompt, search_replace + write_file)
EVAL_SUITE=pro_agent EVAL_MODEL=all DYAD_PRO_API_KEY="..." npm run eval EVAL_SUITE=pro_agent EVAL_MODEL=all DYAD_PRO_API_KEY="..." npm run eval
``` ```
Note: `EVAL_SUITE` matches suite `name`s exactly (case-insensitive), and Note: `EVAL_SUITE` matches suite `name`s exactly (case-insensitive), and
accepts a comma-separated list for multiple suites (e.g. accepts a comma-separated list for multiple suites (e.g.
`EVAL_SUITE=search_replace,edit_file`). Unknown names error out with the `EVAL_SUITE=search_replace,basic_agent`). Unknown names error out with the
available list. available list.
### Running a single case ### Running a single case
...@@ -173,7 +168,6 @@ directory: ...@@ -173,7 +168,6 @@ directory:
- `eval-results/search_replace/` - `eval-results/search_replace/`
- `eval-results/search_replace_few/` - `eval-results/search_replace_few/`
- `eval-results/edit_file/`
- `eval-results/basic_agent/` - `eval-results/basic_agent/`
- `eval-results/pro_agent/` - `eval-results/pro_agent/`
- `eval-results/pro_agent_experimental/` - `eval-results/pro_agent_experimental/`
...@@ -206,8 +200,8 @@ one folder. Folder names sort chronologically under `ls`. ...@@ -206,8 +200,8 @@ one folder. Folder names sort chronologically under `ls`.
- `toolCalls` — every tool call the model made. Each entry records - `toolCalls` — every tool call the model made. Each entry records
`toolName`, `filePath`, an `args` map (keyed by the tool's parameter names, `toolName`, `filePath`, an `args` map (keyed by the tool's parameter names,
so `old_string`/`new_string` for `search_replace`, `content` for so `old_string`/`new_string` for `search_replace`, `content` for
`write_file`, `content`/`instructions` for `edit_file`), the file before `write_file`), the file before and after the call, and a unified diff of
and after the call, and a unified diff of just that call. just that call.
- `diff` — unified diff from the original fixture to the final file - `diff` — unified diff from the original fixture to the final file
(i.e. the cumulative effect of all tool calls). (i.e. the cumulative effect of all tool calls).
- `judge` — the judge's verdict: `label`, `modelName`, `durationMs`, - `judge` — the judge's verdict: `label`, `modelName`, `durationMs`,
...@@ -251,5 +245,4 @@ call. The split view contains the raw pieces as standalone files: ...@@ -251,5 +245,4 @@ call. The split view contains the raw pieces as standalone files:
target file's extension (for syntax highlighting); non-string args become target file's extension (for syntax highlighting); non-string args become
JSON blobs. So a `search_replace` call produces `old_string.ts` and JSON blobs. So a `search_replace` call produces `old_string.ts` and
`new_string.ts`; a `write_file` call produces `content.ts` and `new_string.ts`; a `write_file` call produces `content.ts` and
`description.ts`; an `edit_file` call produces `content.ts` and `description.ts`.
`instructions.ts`.
...@@ -48,7 +48,7 @@ export interface ToolCallRecord { ...@@ -48,7 +48,7 @@ export interface ToolCallRecord {
filePath: string; filePath: string;
// Raw tool input arguments, keyed by the tool's parameter names // Raw tool input arguments, keyed by the tool's parameter names
// (e.g. `old_string`/`new_string` for search_replace, `content` for // (e.g. `old_string`/`new_string` for search_replace, `content` for
// write_file, `content`/`instructions` for edit_file). // write_file).
args: Record<string, unknown>; args: Record<string, unknown>;
fileBefore: string; fileBefore: string;
fileAfter: string; fileAfter: string;
......
...@@ -13,8 +13,7 @@ export type EvalProvider = "anthropic" | "openai" | "google"; ...@@ -13,8 +13,7 @@ export type EvalProvider = "anthropic" | "openai" | "google";
// the judge model. // the judge model.
export const GPT_5_4 = "gpt-5.4"; export const GPT_5_4 = "gpt-5.4";
// Single source of truth for the Dyad Engine URL across the eval helpers // Single source of truth for the Dyad Engine URL across the eval helpers.
// and any out-of-band fetches the harness makes (e.g. turbo-file-edit).
export const DYAD_ENGINE_URL = export const DYAD_ENGINE_URL =
process.env.DYAD_ENGINE_URL ?? "https://engine.dyad.sh/v1"; process.env.DYAD_ENGINE_URL ?? "https://engine.dyad.sh/v1";
......
...@@ -16,11 +16,6 @@ export const SIMPLE_SEARCH_REPLACE_SYSTEM_PROMPT = ...@@ -16,11 +16,6 @@ export const SIMPLE_SEARCH_REPLACE_SYSTEM_PROMPT =
"use the search_replace tool. You may call it multiple times " + "use the search_replace tool. You may call it multiple times " +
"to make sequential edits. Do not explain."; "to make sequential edits. Do not explain.";
export const SIMPLE_EDIT_FILE_SYSTEM_PROMPT =
"You are a precise code editor. When asked to change a file, " +
"use the edit_file tool. You may call it multiple times " +
"to make sequential edits. Do not explain.";
export const SEARCH_REPLACE_FEW_SYSTEM_PROMPT = export const SEARCH_REPLACE_FEW_SYSTEM_PROMPT =
SIMPLE_SEARCH_REPLACE_SYSTEM_PROMPT + SIMPLE_SEARCH_REPLACE_SYSTEM_PROMPT +
" Aim to use as few tool calls as possible — ideally a single call " + " Aim to use as few tool calls as possible — ideally a single call " +
...@@ -88,23 +83,24 @@ You have tools at your disposal to solve the coding task. Follow these rules reg ...@@ -88,23 +83,24 @@ You have tools at your disposal to solve the coding task. Follow these rules reg
const PRO_TOOL_CALLING_BEST_PRACTICES_BLOCK = `<tool_calling_best_practices> const PRO_TOOL_CALLING_BEST_PRACTICES_BLOCK = `<tool_calling_best_practices>
- **Read before writing**: Use \`read_file\` and \`list_files\` to understand the codebase before making changes - **Read before writing**: Use \`read_file\` and \`list_files\` to understand the codebase before making changes
- **Use \`edit_file\` for edits**: For modifying existing files, prefer \`edit_file\` over \`write_file\` - **Prefer \`search_replace\` for edits**: For small to medium edits on existing files, use \`search_replace\` rather than rewriting the whole file
- **Be surgical**: Only change what's necessary to accomplish the task - **Be surgical**: Only change what's necessary to accomplish the task
- **Handle errors gracefully**: If a tool fails, explain the issue and suggest alternatives - **Handle errors gracefully**: If a tool fails, explain the issue and suggest alternatives
</tool_calling_best_practices>`; </tool_calling_best_practices>`;
const PRO_FILE_EDITING_TOOL_SELECTION_BLOCK = `<file_editing_tool_selection> const PRO_FILE_EDITING_TOOL_SELECTION_BLOCK = `<file_editing_tool_selection>
You have three tools for editing files. Choose based on the scope of your change: You have two tools for editing files. Choose based on the scope of your change:
| Scope | Tool | Examples | | Scope | Tool | Examples |
|-------|------|----------| |-------|------|----------|
| **Small** (a few lines) | \`search_replace\` or \`edit_file\` | Fix a typo, rename a variable, update a value, change an import | | **Small to medium** (a few lines up to one function or contiguous section) | Single \`search_replace\` | Fix a typo, rename a variable, update a value, change an import, rewrite a function, modify multiple related lines |
| **Medium** (one function or section) | \`edit_file\` | Rewrite a function, add a new component, modify multiple related lines | | **Moderately large** (changes spread across multiple parts of the file, up to about half of it) | Multiple \`search_replace\` calls, one per distinct region | Update several functions, change an import plus update its call sites, refactor a few related sections |
| **Large** (most of the file) | \`write_file\` | Major refactor, rewrite a module, create a new file | | **Large** (rewriting the majority of the file, or creating a new file) | \`write_file\` | Major refactor that touches most of the file, rewrite a module end-to-end, create a new file |
Lean toward \`search_replace\` when in doubt — for moderately large edits, prefer several targeted \`search_replace\` calls over one \`write_file\`. Use \`write_file\` when less than half of the original file will remain.
**Tips:** **Fallback rule:**
- \`edit_file\` supports \`// ... existing code ...\` markers to skip unchanged sections If \`search_replace\` fails twice in a row on the same edit (e.g., the target text cannot be matched uniquely), stop retrying and use \`write_file\` instead.
- When in doubt, prefer \`search_replace\` for precision or \`write_file\` for simplicity
**Post-edit verification (REQUIRED):** **Post-edit verification (REQUIRED):**
After every edit, read the file to verify changes applied correctly. If something went wrong, try a different tool and verify again. After every edit, read the file to verify changes applied correctly. If something went wrong, try a different tool and verify again.
...@@ -117,7 +113,7 @@ const PRO_DEVELOPMENT_WORKFLOW_BLOCK = `<development_workflow> ...@@ -117,7 +113,7 @@ const PRO_DEVELOPMENT_WORKFLOW_BLOCK = `<development_workflow>
**Skip when:** the request is specific and concrete (e.g. "Fix the login button", "Change color from blue to green"). **Skip when:** the request is specific and concrete (e.g. "Fix the login button", "Change color from blue to green").
The tool accepts ONLY a \`questions\` array (no empty objects). It returns the user's answers as the tool result. The tool accepts ONLY a \`questions\` array (no empty objects). It returns the user's answers as the tool result.
3. **Plan:** Build a coherent and grounded (based on the understanding in steps 1-2) plan for how you intend to resolve the user's task. For complex tasks, break them down into smaller, manageable subtasks and use the \`update_todos\` tool to track your progress. Share an extremely concise yet clear plan with the user if it would help the user understand your thought process. 3. **Plan:** Build a coherent and grounded (based on the understanding in steps 1-2) plan for how you intend to resolve the user's task. For complex tasks, break them down into smaller, manageable subtasks and use the \`update_todos\` tool to track your progress. Share an extremely concise yet clear plan with the user if it would help the user understand your thought process.
4. **Implement:** Use the available tools (e.g., \`edit_file\`, \`write_file\`, ...) to act on the plan, strictly adhering to the project's established conventions. When debugging, add targeted console.log statements to trace data flow and identify root causes. **Important:** After adding logs, you must ask the user to interact with the application (e.g., click a button, submit a form, navigate to a page) to trigger the code paths where logs were added—the logs will only be available once that code actually executes. 4. **Implement:** Use the available tools (e.g., \`search_replace\`, \`write_file\`, ...) to act on the plan, strictly adhering to the project's established conventions. When debugging, add targeted console.log statements to trace data flow and identify root causes. **Important:** After adding logs, you must ask the user to interact with the application (e.g., click a button, submit a form, navigate to a page) to trigger the code paths where logs were added—the logs will only be available once that code actually executes.
5. **Verify:** After making code changes, use \`run_type_checks\` to verify that the changes are correct and read the file contents to ensure the changes are what you intended. 5. **Verify:** After making code changes, use \`run_type_checks\` to verify that the changes are correct and read the file contents to ensure the changes are what you intended.
6. **Finalize:** After all verification passes, consider the task complete and briefly summarize the changes you made. 6. **Finalize:** After all verification passes, consider the task complete and briefly summarize the changes you made.
</development_workflow>`; </development_workflow>`;
......
...@@ -2,10 +2,8 @@ import { describe, it } from "vitest"; ...@@ -2,10 +2,8 @@ import { describe, it } from "vitest";
import { generateText, stepCountIs, type Tool } from "ai"; import { generateText, stepCountIs, type Tool } from "ai";
import { readFileSync } from "node:fs"; import { readFileSync } from "node:fs";
import { basename, resolve } from "node:path"; import { basename, resolve } from "node:path";
import { randomUUID } from "node:crypto";
import { searchReplaceTool } from "@/pro/main/ipc/handlers/local_agent/tools/search_replace"; import { searchReplaceTool } from "@/pro/main/ipc/handlers/local_agent/tools/search_replace";
import { writeFileTool } from "@/pro/main/ipc/handlers/local_agent/tools/write_file"; import { writeFileTool } from "@/pro/main/ipc/handlers/local_agent/tools/write_file";
import { editFileTool } from "@/pro/main/ipc/handlers/local_agent/tools/edit_file";
import { applySearchReplace } from "@/pro/main/ipc/processors/search_replace_processor"; import { applySearchReplace } from "@/pro/main/ipc/processors/search_replace_processor";
import { escapeSearchReplaceMarkers } from "@/pro/shared/search_replace_markers"; import { escapeSearchReplaceMarkers } from "@/pro/shared/search_replace_markers";
import { constructLocalAgentPrompt } from "@/prompts/local_agent_prompt"; import { constructLocalAgentPrompt } from "@/prompts/local_agent_prompt";
...@@ -14,7 +12,6 @@ import { ...@@ -14,7 +12,6 @@ import {
GEMINI_3_FLASH, GEMINI_3_FLASH,
} from "@/ipc/shared/language_model_constants"; } from "@/ipc/shared/language_model_constants";
import { import {
DYAD_ENGINE_URL,
GPT_5_4, GPT_5_4,
getEvalModel, getEvalModel,
hasDyadProKey, hasDyadProKey,
...@@ -31,7 +28,6 @@ import { ...@@ -31,7 +28,6 @@ import {
import { createUnifiedDiff } from "./helpers/unified_diff"; import { createUnifiedDiff } from "./helpers/unified_diff";
import { import {
SIMPLE_SEARCH_REPLACE_SYSTEM_PROMPT, SIMPLE_SEARCH_REPLACE_SYSTEM_PROMPT,
SIMPLE_EDIT_FILE_SYSTEM_PROMPT,
SEARCH_REPLACE_FEW_SYSTEM_PROMPT, SEARCH_REPLACE_FEW_SYSTEM_PROMPT,
PRO_AGENT_EXPERIMENTAL_SYSTEM_PROMPT, PRO_AGENT_EXPERIMENTAL_SYSTEM_PROMPT,
} from "./helpers/prompts"; } from "./helpers/prompts";
...@@ -435,53 +431,6 @@ function applySearchReplaceEdit( ...@@ -435,53 +431,6 @@ function applySearchReplaceEdit(
return applied.content!; return applied.content!;
} }
// Stand-in for the production `edit_file` tool's engine call. Mirrors
// `callTurboFileEdit` in src/pro/main/ipc/handlers/local_agent/tools/edit_file.ts
// but reaches the engine directly (no AgentContext required). The base URL is
// imported from `helpers/get_eval_model` so this and the SDK provider can't
// drift apart.
async function turboFileEdit(params: {
path: string;
content: string;
originalContent: string;
instructions?: string;
signal?: AbortSignal;
}): Promise<string> {
const apiKey = process.env.DYAD_PRO_API_KEY;
if (!apiKey) {
throw new Error(
"DYAD_PRO_API_KEY is required to run eval suites that use edit_file",
);
}
const response = await fetch(`${DYAD_ENGINE_URL}/tools/turbo-file-edit`, {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${apiKey}`,
"X-Dyad-Request-Id": randomUUID(),
},
body: JSON.stringify({
path: params.path,
content: params.content,
originalContent: params.originalContent,
instructions: params.instructions ?? "",
}),
signal: params.signal,
});
if (!response.ok) {
const errorText = await response.text();
throw new Error(
`turbo-file-edit failed: ${response.status} ${response.statusText} - ${errorText}`,
);
}
const data = (await response.json()) as { result?: unknown };
if (typeof data.result !== "string") {
throw new Error("turbo-file-edit returned unexpected payload (no result)");
}
return data.result;
}
// ── Tool factories ───────────────────────────────────────────── // ── Tool factories ─────────────────────────────────────────────
// //
// Each factory returns an AI-SDK tool whose `execute` mutates the // Each factory returns an AI-SDK tool whose `execute` mutates the
...@@ -628,66 +577,6 @@ function writeFileHarnessTool( ...@@ -628,66 +577,6 @@ function writeFileHarnessTool(
}; };
} }
function editFileHarnessTool(
state: ToolRunState,
c: EvalCase,
label: string,
): Tool {
return {
description: editFileTool.description,
inputSchema: editFileTool.inputSchema,
execute: async (args) => {
const fileBefore = state.content;
const recordArgs = {
path: args.path,
content: args.content,
instructions: args.instructions ?? "",
};
try {
if (!pathMatchesCase(args.path, c.fileName)) {
throw new Error(
`${label} / ${c.name} edit_file targeted wrong file: ` +
`got "${args.path}", expected "${c.fileName}"`,
);
}
const newContent = await turboFileEdit({
path: args.path,
content: args.content,
originalContent: state.content,
instructions: args.instructions,
signal: state.abortSignal,
});
state.content = newContent;
state.toolCalls.push(
makeRecord(
"edit_file",
args.path,
recordArgs,
fileBefore,
state.content,
state.toolCalls.length,
),
);
return `Successfully edited ${args.path}`;
} catch (err) {
const message = err instanceof Error ? err.message : String(err);
state.toolCalls.push(
makeRecord(
"edit_file",
args.path ?? c.fileName,
recordArgs,
fileBefore,
fileBefore,
state.toolCalls.length,
{ succeeded: false, error: message },
),
);
throw err;
}
},
};
}
// ── Suite configs ────────────────────────────────────────────── // ── Suite configs ──────────────────────────────────────────────
interface SuiteConfig { interface SuiteConfig {
...@@ -718,14 +607,6 @@ const SUITES: SuiteConfig[] = [ ...@@ -718,14 +607,6 @@ const SUITES: SuiteConfig[] = [
search_replace: searchReplaceHarnessTool(state, c, label), search_replace: searchReplaceHarnessTool(state, c, label),
}), }),
}, },
{
name: "edit_file",
displayName: "edit_file",
systemPrompt: SIMPLE_EDIT_FILE_SYSTEM_PROMPT,
buildTools: (state, c, label) => ({
edit_file: editFileHarnessTool(state, c, label),
}),
},
{ {
name: "basic_agent", name: "basic_agent",
displayName: "basic_agent (search_replace + write_file)", displayName: "basic_agent (search_replace + write_file)",
...@@ -739,11 +620,10 @@ const SUITES: SuiteConfig[] = [ ...@@ -739,11 +620,10 @@ const SUITES: SuiteConfig[] = [
}, },
{ {
name: "pro_agent", name: "pro_agent",
displayName: "pro_agent (search_replace + edit_file + write_file)", displayName: "pro_agent (search_replace + write_file)",
systemPrompt: constructLocalAgentPrompt(undefined), systemPrompt: constructLocalAgentPrompt(undefined),
buildTools: (state, c, label) => ({ buildTools: (state, c, label) => ({
search_replace: searchReplaceHarnessTool(state, c, label), search_replace: searchReplaceHarnessTool(state, c, label),
edit_file: editFileHarnessTool(state, c, label),
write_file: writeFileHarnessTool(state, c, label), write_file: writeFileHarnessTool(state, c, label),
}), }),
}, },
...@@ -756,7 +636,6 @@ const SUITES: SuiteConfig[] = [ ...@@ -756,7 +636,6 @@ const SUITES: SuiteConfig[] = [
systemPrompt: PRO_AGENT_EXPERIMENTAL_SYSTEM_PROMPT, systemPrompt: PRO_AGENT_EXPERIMENTAL_SYSTEM_PROMPT,
buildTools: (state, c, label) => ({ buildTools: (state, c, label) => ({
search_replace: searchReplaceHarnessTool(state, c, label), search_replace: searchReplaceHarnessTool(state, c, label),
edit_file: editFileHarnessTool(state, c, label),
write_file: writeFileHarnessTool(state, c, label), write_file: writeFileHarnessTool(state, c, label),
}), }),
}, },
...@@ -981,7 +860,7 @@ async function runCase( ...@@ -981,7 +860,7 @@ async function runCase(
// against every model by accident is expensive, so the caller must opt // against every model by accident is expensive, so the caller must opt
// in explicitly. Use `all` to mean "run everything". `EVAL_SUITE` matches // in explicitly. Use `all` to mean "run everything". `EVAL_SUITE` matches
// suite names exactly (comma-separated for multiple, e.g. // suite names exactly (comma-separated for multiple, e.g.
// `EVAL_SUITE=search_replace,edit_file`) so that `search_replace` does // `EVAL_SUITE=search_replace,basic_agent`) so that `search_replace` does
// not also pick up `search_replace_few`. `EVAL_MODEL` is a // not also pick up `search_replace_few`. `EVAL_MODEL` is a
// case-insensitive substring match against model label or id. // case-insensitive substring match against model label or id.
......
const locks = new Map<number | string, Promise<void>>(); const locks = new Map<number | string, Promise<void>>();
/**
* Build the lock ID used to serialize writes to a single file path.
* Some tool calls (e.g. `write_file` and `search_replace`) must use
* this so they don't race against each other on the same file.
*/
export function getFileWriteKey(filePath: string): string {
return `filewrite:${filePath}`;
}
/** /**
* Executes a function with a lock on the lock ID. * Executes a function with a lock on the lock ID.
* Uses promise-chaining so that queued operations execute serially, * Uses promise-chaining so that queued operations execute serially,
......
...@@ -21,7 +21,6 @@ import { getSupabaseProjectInfoTool } from "./tools/get_supabase_project_info"; ...@@ -21,7 +21,6 @@ import { getSupabaseProjectInfoTool } from "./tools/get_supabase_project_info";
import { setChatSummaryTool } from "./tools/set_chat_summary"; import { setChatSummaryTool } from "./tools/set_chat_summary";
import { addIntegrationTool } from "./tools/add_integration"; import { addIntegrationTool } from "./tools/add_integration";
import { readLogsTool } from "./tools/read_logs"; import { readLogsTool } from "./tools/read_logs";
import { editFileTool } from "./tools/edit_file";
import { searchReplaceTool } from "./tools/search_replace"; import { searchReplaceTool } from "./tools/search_replace";
import { webSearchTool } from "./tools/web_search"; import { webSearchTool } from "./tools/web_search";
import { webCrawlTool } from "./tools/web_crawl"; import { webCrawlTool } from "./tools/web_crawl";
...@@ -70,7 +69,6 @@ function getToolErrorSummary(error: unknown): string { ...@@ -70,7 +69,6 @@ function getToolErrorSummary(error: unknown): string {
// Combined tool definitions array // Combined tool definitions array
export const TOOL_DEFINITIONS: readonly ToolDefinition[] = [ export const TOOL_DEFINITIONS: readonly ToolDefinition[] = [
writeFileTool, writeFileTool,
editFileTool,
searchReplaceTool, searchReplaceTool,
copyFileTool, copyFileTool,
deleteFileTool, deleteFileTool,
...@@ -420,7 +418,6 @@ function trackFileEditTool( ...@@ -420,7 +418,6 @@ function trackFileEditTool(
if (!ctx.fileEditTracker[filePath]) { if (!ctx.fileEditTracker[filePath]) {
ctx.fileEditTracker[filePath] = { ctx.fileEditTracker[filePath] = {
write_file: 0, write_file: 0,
edit_file: 0,
search_replace: 0, search_replace: 0,
}; };
} }
......
import fs from "node:fs";
import path from "node:path";
import { z } from "zod";
import log from "electron-log";
import { ToolDefinition, AgentContext, escapeXmlAttr } from "./types";
import { safeJoin } from "@/ipc/utils/path_utils";
import { deploySupabaseFunction } from "../../../../../../supabase_admin/supabase_management_client";
import {
isServerFunction,
isSharedServerModule,
} from "../../../../../../supabase_admin/supabase_utils";
import { engineFetch } from "./engine_fetch";
import { DyadError, DyadErrorKind } from "@/errors/dyad_error";
import { queueCloudSandboxSnapshotSync } from "@/ipc/utils/cloud_sandbox_provider";
const readFile = fs.promises.readFile;
const logger = log.scope("edit_file");
const editFileSchema = z.object({
path: z.string().describe("The file path relative to the app root"),
content: z.string().describe("The updated code snippet to apply"),
instructions: z
.string()
.optional()
.describe(
"Instructions for the edit. A single sentence describing what you are going to do for the sketched edit. This helps the less intelligent model apply the edit correctly. Use first person to describe what you are doing. Don't repeat what you've said in previous messages. Use it to disambiguate any uncertainty in the edit.",
),
});
const turboFileEditResponseSchema = z.object({
result: z.string(),
});
async function callTurboFileEdit(
params: {
path: string;
content: string;
originalContent: string;
instructions?: string;
},
ctx: AgentContext,
): Promise<string> {
const response = await engineFetch(ctx, "/tools/turbo-file-edit", {
method: "POST",
body: JSON.stringify({
path: params.path,
content: params.content,
originalContent: params.originalContent,
instructions: params.instructions ?? "",
}),
});
if (!response.ok) {
const errorText = await response.text();
throw new Error(
`File edit failed: ${response.status} ${response.statusText} - ${errorText}`,
);
}
const data = turboFileEditResponseSchema.parse(await response.json());
return data.result;
}
const DESCRIPTION = `
## When to Use edit_file
Use the \`edit_file\` tool when you need to modify **a section or function** within an existing file. The edit output will be read by a less intelligent model, which will quickly apply the edit. You should make it clear what the edit is, while also minimizing the unchanged code you write.
**Use only ONE edit_file call per file.** If you need to make multiple changes to the same file, include all edits in sequence within a single call using \`// ... existing code ...\` comments between them.
## When NOT to Use edit_file
Do NOT use this tool when:
- You are making a **small, surgical edit** (1-3 lines) like fixing a typo, renaming a variable, updating a single value, or changing an import. Use \`search_replace\` instead for these precise changes.
- You are creating a brand-new file (use \`write_file\` instead).
- You are rewriting most of an existing file (in those cases, use \`write_file\` to output the complete file instead).
## Basic Format
When writing the edit, you should specify each edit in sequence, with the special comment // ... existing code ... to represent unchanged code in between edited lines.
Basic example:
\`\`\`
edit_file(path="file.js", instructions="I am adding error handling to the fetchData function and updating the return type.", content="""
// ... existing code ...
FIRST_EDIT
// ... existing code ...
SECOND_EDIT
// ... existing code ...
THIRD_EDIT
// ... existing code ...
""")
\`\`\`
## General Principles
You should bias towards repeating as few lines of the original file as possible to convey the change.
NEVER show unmodified code in the edit, unless sufficient context of unchanged lines around the code you're editing is needed to resolve ambiguity.
DO NOT omit spans of pre-existing code without using the // ... existing code ... comment to indicate its absence.
## Example: Basic Edit
\`\`\`
edit_file(path="LandingPage.tsx", instructions="I am changing the return statement in LandingPage to render a div with 'hello' instead of the previous content.", content="""
// ... existing code ...
const LandingPage = () => {
// ... existing code ...
return (
<div>hello</div>
);
};
// ... existing code ...
""")
\`\`\`
## Example: Deleting Code
**When deleting code, you must provide surrounding context and leave an explicit comment indicating what was removed.**
\`\`\`
edit_file(path="utils.ts", instructions="I am removing the deprecatedHelper function located between currentHelper and anotherHelper.", content="""
// ... existing code ...
export function currentHelper() {
return "active";
}
// REMOVED: deprecatedHelper() function
export function anotherHelper() {
return "working";
}
// ... existing code ...
""")
\`\`\`
`;
export const editFileTool: ToolDefinition<z.infer<typeof editFileSchema>> = {
name: "edit_file",
description: DESCRIPTION,
inputSchema: editFileSchema,
defaultConsent: "always",
modifiesState: true,
// Requires Dyad Pro engine API
isEnabled: (ctx) => ctx.isDyadPro,
getConsentPreview: (args) => `Edit ${args.path}`,
buildXml: (args, isComplete) => {
if (!args.path) return undefined;
let xml = `<dyad-edit path="${escapeXmlAttr(args.path)}" description="${escapeXmlAttr(args.instructions ?? "")}">\n${args.content ?? ""}`;
if (isComplete) {
xml += "\n</dyad-edit>";
}
return xml;
},
execute: async (args, ctx: AgentContext) => {
const fullFilePath = safeJoin(ctx.appPath, args.path);
// Track if this is a shared module
if (isSharedServerModule(args.path)) {
ctx.isSharedModulesChanged = true;
}
// Read original file content
if (!fs.existsSync(fullFilePath)) {
throw new DyadError(
`File does not exist: ${args.path}`,
DyadErrorKind.NotFound,
);
}
const originalContent = await readFile(fullFilePath, "utf8");
// Call the turbo-file-edit endpoint
const newContent = await callTurboFileEdit(
{
path: args.path,
content: args.content,
originalContent,
instructions: args.instructions,
},
ctx,
);
if (!newContent) {
throw new Error(
"Failed to extract content from turbo-file-edit response",
);
}
// Ensure directory exists
const dirPath = path.dirname(fullFilePath);
fs.mkdirSync(dirPath, { recursive: true });
// Write file content
fs.writeFileSync(fullFilePath, newContent);
logger.log(`Successfully edited file: ${fullFilePath}`);
queueCloudSandboxSnapshotSync({
appId: ctx.appId,
changedPaths: [args.path],
});
// Deploy Supabase function if applicable
if (
ctx.supabaseProjectId &&
isServerFunction(args.path) &&
!ctx.isSharedModulesChanged
) {
try {
await deploySupabaseFunction({
supabaseProjectId: ctx.supabaseProjectId,
functionName: path.basename(path.dirname(args.path)),
appPath: ctx.appPath,
organizationSlug: ctx.supabaseOrganizationSlug ?? null,
});
} catch (error) {
return `File edited, but failed to deploy Supabase function: ${error}`;
}
}
return `Successfully edited ${args.path}`;
},
};
...@@ -19,6 +19,7 @@ import { ...@@ -19,6 +19,7 @@ import {
import { sendTelemetryEvent } from "@/ipc/utils/telemetry"; import { sendTelemetryEvent } from "@/ipc/utils/telemetry";
import { DyadError, DyadErrorKind } from "@/errors/dyad_error"; import { DyadError, DyadErrorKind } from "@/errors/dyad_error";
import { queueCloudSandboxSnapshotSync } from "@/ipc/utils/cloud_sandbox_provider"; import { queueCloudSandboxSnapshotSync } from "@/ipc/utils/cloud_sandbox_provider";
import { withLock, getFileWriteKey } from "@/ipc/utils/lock_utils";
const logger = log.scope("search_replace"); const logger = log.scope("search_replace");
...@@ -106,40 +107,42 @@ CRITICAL REQUIREMENTS FOR USING THIS TOOL: ...@@ -106,40 +107,42 @@ CRITICAL REQUIREMENTS FOR USING THIS TOOL:
ctx.isSharedModulesChanged = true; ctx.isSharedModulesChanged = true;
} }
if (!fs.existsSync(fullFilePath)) { await withLock(getFileWriteKey(fullFilePath), async () => {
throw new DyadError( if (!fs.existsSync(fullFilePath)) {
`File does not exist: ${args.file_path}`, throw new DyadError(
DyadErrorKind.NotFound, `File does not exist: ${args.file_path}`,
); DyadErrorKind.NotFound,
} );
}
const original = await fs.promises.readFile(fullFilePath, "utf8"); const original = await fs.promises.readFile(fullFilePath, "utf8");
// Construct the operations string in the expected format // Construct the operations string in the expected format
const escapedOld = escapeSearchReplaceMarkers(args.old_string); const escapedOld = escapeSearchReplaceMarkers(args.old_string);
const escapedNew = escapeSearchReplaceMarkers(args.new_string); const escapedNew = escapeSearchReplaceMarkers(args.new_string);
const operations = `<<<<<<< SEARCH\n${escapedOld}\n=======\n${escapedNew}\n>>>>>>> REPLACE`; const operations = `<<<<<<< SEARCH\n${escapedOld}\n=======\n${escapedNew}\n>>>>>>> REPLACE`;
const result = applySearchReplace(original, operations); const result = applySearchReplace(original, operations);
if (!result.success || typeof result.content !== "string") { if (!result.success || typeof result.content !== "string") {
sendTelemetryEvent("local_agent:search_replace:failure", { sendTelemetryEvent("local_agent:search_replace:failure", {
filePath: args.file_path,
error: result.error ?? "unknown",
});
throw new Error(
`Failed to apply search-replace: ${result.error ?? "unknown"}`,
);
}
await fs.promises.writeFile(fullFilePath, result.content);
logger.log(`Successfully applied search-replace to: ${fullFilePath}`);
queueCloudSandboxSnapshotSync({
appId: ctx.appId,
changedPaths: [args.file_path],
});
sendTelemetryEvent("local_agent:search_replace:success", {
filePath: args.file_path, filePath: args.file_path,
error: result.error ?? "unknown",
}); });
throw new Error(
`Failed to apply search-replace: ${result.error ?? "unknown"}`,
);
}
await fs.promises.writeFile(fullFilePath, result.content);
logger.log(`Successfully applied search-replace to: ${fullFilePath}`);
queueCloudSandboxSnapshotSync({
appId: ctx.appId,
changedPaths: [args.file_path],
});
sendTelemetryEvent("local_agent:search_replace:success", {
filePath: args.file_path,
}); });
// Deploy Supabase function if applicable // Deploy Supabase function if applicable
......
...@@ -27,16 +27,11 @@ export { ...@@ -27,16 +27,11 @@ export {
export type Todo = AgentTodo; export type Todo = AgentTodo;
/** Tracks which file-editing tools were used on each file path */ /** Tracks which file-editing tools were used on each file path */
export const FILE_EDIT_TOOL_NAMES = [ export const FILE_EDIT_TOOL_NAMES = ["write_file", "search_replace"] as const;
"write_file",
"edit_file",
"search_replace",
] as const;
export type FileEditToolName = (typeof FILE_EDIT_TOOL_NAMES)[number]; export type FileEditToolName = (typeof FILE_EDIT_TOOL_NAMES)[number];
export interface FileEditTracker { export interface FileEditTracker {
[filePath: string]: { [filePath: string]: {
write_file: number; write_file: number;
edit_file: number;
search_replace: number; search_replace: number;
}; };
} }
......
...@@ -10,6 +10,7 @@ import { ...@@ -10,6 +10,7 @@ import {
isSharedServerModule, isSharedServerModule,
} from "../../../../../../supabase_admin/supabase_utils"; } from "../../../../../../supabase_admin/supabase_utils";
import { queueCloudSandboxSnapshotSync } from "@/ipc/utils/cloud_sandbox_provider"; import { queueCloudSandboxSnapshotSync } from "@/ipc/utils/cloud_sandbox_provider";
import { withLock, getFileWriteKey } from "@/ipc/utils/lock_utils";
const logger = log.scope("write_file"); const logger = log.scope("write_file");
const writeFileSchema = z.object({ const writeFileSchema = z.object({
...@@ -48,16 +49,18 @@ export const writeFileTool: ToolDefinition<z.infer<typeof writeFileSchema>> = { ...@@ -48,16 +49,18 @@ export const writeFileTool: ToolDefinition<z.infer<typeof writeFileSchema>> = {
ctx.isSharedModulesChanged = true; ctx.isSharedModulesChanged = true;
} }
// Ensure directory exists await withLock(getFileWriteKey(fullFilePath), async () => {
const dirPath = path.dirname(fullFilePath); // Ensure directory exists
fs.mkdirSync(dirPath, { recursive: true }); const dirPath = path.dirname(fullFilePath);
fs.mkdirSync(dirPath, { recursive: true });
// Write file content // Write file content
fs.writeFileSync(fullFilePath, args.content); fs.writeFileSync(fullFilePath, args.content);
logger.log(`Successfully wrote file: ${fullFilePath}`); logger.log(`Successfully wrote file: ${fullFilePath}`);
queueCloudSandboxSnapshotSync({ queueCloudSandboxSnapshotSync({
appId: ctx.appId, appId: ctx.appId,
changedPaths: [args.path], changedPaths: [args.path],
});
}); });
// Deploy Supabase function if applicable // Deploy Supabase function if applicable
......
...@@ -68,23 +68,24 @@ You have tools at your disposal to solve the coding task. Follow these rules reg ...@@ -68,23 +68,24 @@ You have tools at your disposal to solve the coding task. Follow these rules reg
const PRO_TOOL_CALLING_BEST_PRACTICES_BLOCK = `<tool_calling_best_practices> const PRO_TOOL_CALLING_BEST_PRACTICES_BLOCK = `<tool_calling_best_practices>
- **Read before writing**: Use \`read_file\` and \`list_files\` to understand the codebase before making changes - **Read before writing**: Use \`read_file\` and \`list_files\` to understand the codebase before making changes
- **Use \`edit_file\` for edits**: For modifying existing files, prefer \`edit_file\` over \`write_file\` - **Prefer \`search_replace\` for edits**: For small to medium edits on existing files, use \`search_replace\` rather than rewriting the whole file
- **Be surgical**: Only change what's necessary to accomplish the task - **Be surgical**: Only change what's necessary to accomplish the task
- **Handle errors gracefully**: If a tool fails, explain the issue and suggest alternatives - **Handle errors gracefully**: If a tool fails, explain the issue and suggest alternatives
</tool_calling_best_practices>`; </tool_calling_best_practices>`;
const PRO_FILE_EDITING_TOOL_SELECTION_BLOCK = `<file_editing_tool_selection> const PRO_FILE_EDITING_TOOL_SELECTION_BLOCK = `<file_editing_tool_selection>
You have three tools for editing files. Choose based on the scope of your change: You have two tools for editing files. Choose based on the scope of your change:
| Scope | Tool | Examples | | Scope | Tool | Examples |
|-------|------|----------| |-------|------|----------|
| **Small** (a few lines) | \`search_replace\` or \`edit_file\` | Fix a typo, rename a variable, update a value, change an import | | **Small to medium** (a few lines up to one function or contiguous section) | Single \`search_replace\` | Fix a typo, rename a variable, update a value, change an import, rewrite a function, modify multiple related lines |
| **Medium** (one function or section) | \`edit_file\` | Rewrite a function, add a new component, modify multiple related lines | | **Moderately large** (changes spread across multiple parts of the file, up to about half of it) | Multiple \`search_replace\` calls, one per distinct region | Update several functions, change an import plus update its call sites, refactor a few related sections |
| **Large** (most of the file) | \`write_file\` | Major refactor, rewrite a module, create a new file | | **Large** (rewriting the majority of the file, or creating a new file) | \`write_file\` | Major refactor that touches most of the file, rewrite a module end-to-end, create a new file |
**Tips:** Lean toward \`search_replace\` when in doubt — for moderately large edits, prefer several targeted \`search_replace\` calls over one \`write_file\`. Use \`write_file\` when less than half of the original file will remain.
- \`edit_file\` supports \`// ... existing code ...\` markers to skip unchanged sections
- When in doubt, prefer \`search_replace\` for precision or \`write_file\` for simplicity **Fallback rule:**
If \`search_replace\` fails twice in a row on the same edit (e.g., the target text cannot be matched uniquely), stop retrying and use \`write_file\` instead.
**Post-edit verification (REQUIRED):** **Post-edit verification (REQUIRED):**
After every edit, read the file to verify changes applied correctly. If something went wrong, try a different tool and verify again. After every edit, read the file to verify changes applied correctly. If something went wrong, try a different tool and verify again.
...@@ -97,7 +98,7 @@ const PRO_DEVELOPMENT_WORKFLOW_BLOCK = `<development_workflow> ...@@ -97,7 +98,7 @@ const PRO_DEVELOPMENT_WORKFLOW_BLOCK = `<development_workflow>
**Skip when:** the request is specific and concrete (e.g. "Fix the login button", "Change color from blue to green"). **Skip when:** the request is specific and concrete (e.g. "Fix the login button", "Change color from blue to green").
The tool accepts ONLY a \`questions\` array (no empty objects). It returns the user's answers as the tool result. The tool accepts ONLY a \`questions\` array (no empty objects). It returns the user's answers as the tool result.
3. **Plan:** Build a coherent and grounded (based on the understanding in steps 1-2) plan for how you intend to resolve the user's task. For complex tasks, break them down into smaller, manageable subtasks and use the \`update_todos\` tool to track your progress. Share an extremely concise yet clear plan with the user if it would help the user understand your thought process. 3. **Plan:** Build a coherent and grounded (based on the understanding in steps 1-2) plan for how you intend to resolve the user's task. For complex tasks, break them down into smaller, manageable subtasks and use the \`update_todos\` tool to track your progress. Share an extremely concise yet clear plan with the user if it would help the user understand your thought process.
4. **Implement:** Use the available tools (e.g., \`edit_file\`, \`write_file\`, ...) to act on the plan, strictly adhering to the project's established conventions. When debugging, add targeted console.log statements to trace data flow and identify root causes. **Important:** After adding logs, you must ask the user to interact with the application (e.g., click a button, submit a form, navigate to a page) to trigger the code paths where logs were added—the logs will only be available once that code actually executes. 4. **Implement:** Use the available tools (e.g., \`search_replace\`, \`write_file\`, ...) to act on the plan, strictly adhering to the project's established conventions. When debugging, add targeted console.log statements to trace data flow and identify root causes. **Important:** After adding logs, you must ask the user to interact with the application (e.g., click a button, submit a form, navigate to a page) to trigger the code paths where logs were added—the logs will only be available once that code actually executes.
5. **Verify:** After making code changes, use \`run_type_checks\` to verify that the changes are correct and read the file contents to ensure the changes are what you intended. 5. **Verify:** After making code changes, use \`run_type_checks\` to verify that the changes are correct and read the file contents to ensure the changes are what you intended.
6. **Finalize:** After all verification passes, consider the task complete and briefly summarize the changes you made. 6. **Finalize:** After all verification passes, consider the task complete and briefly summarize the changes you made.
</development_workflow>`; </development_workflow>`;
...@@ -209,7 +210,7 @@ When a user explicitly requests custom images, illustrations, or visual media fo ...@@ -209,7 +210,7 @@ When a user explicitly requests custom images, illustrations, or visual media fo
/** /**
* System prompt for Local Agent v2 in Pro mode * System prompt for Local Agent v2 in Pro mode
* Full access to all tools including edit_file, code_search, web_search, web_crawl * Full access to all tools including code_search, web_search, web_crawl
*/ */
export const LOCAL_AGENT_SYSTEM_PROMPT = ` export const LOCAL_AGENT_SYSTEM_PROMPT = `
${ROLE_BLOCK} ${ROLE_BLOCK}
...@@ -233,7 +234,7 @@ ${IMAGE_GENERATION_BLOCK} ...@@ -233,7 +234,7 @@ ${IMAGE_GENERATION_BLOCK}
/** /**
* System prompt for Local Agent v2 in Basic Agent mode (free tier) * System prompt for Local Agent v2 in Basic Agent mode (free tier)
* Limited tools - no edit_file, code_search, web_search, web_crawl * Limited tools - no code_search, web_search, web_crawl
*/ */
export const LOCAL_AGENT_BASIC_SYSTEM_PROMPT = ` export const LOCAL_AGENT_BASIC_SYSTEM_PROMPT = `
${ROLE_BLOCK} ${ROLE_BLOCK}
......
...@@ -524,21 +524,6 @@ app.post("/github/api/test/clear-push-events", handleClearPushEvents); ...@@ -524,21 +524,6 @@ app.post("/github/api/test/clear-push-events", handleClearPushEvents);
// GitHub Git endpoints - intercept all paths with /github/git prefix // GitHub Git endpoints - intercept all paths with /github/git prefix
app.all("/github/git/*", handleGitPush); app.all("/github/git/*", handleGitPush);
// Dyad Engine turbo-file-edit endpoint for edit_file tool
app.post("/engine/v1/tools/turbo-file-edit", (req, res) => {
const { path: filePath, description } = req.body;
console.log(
`* turbo-file-edit: ${filePath} - ${description || "no description"}`,
);
try {
res.json({ result: "TURBO EDITED filePath" });
} catch (error) {
console.error(`* turbo-file-edit error:`, error);
res.status(400).json({ error: String(error) });
}
});
// Dyad Engine code-search endpoint for code_search tool // Dyad Engine code-search endpoint for code_search tool
app.post("/engine/v1/tools/code-search", (req, res) => { app.post("/engine/v1/tools/code-search", (req, res) => {
const { query, filesContext } = req.body; const { query, filesContext } = req.body;
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论