Unverified 提交 eb1ebdb2 authored 作者: Ryan Groch's avatar Ryan Groch 提交者: GitHub

remove edit_file tool from pro agent (#3268)

A few notes: - The fallback rule (i.e. if `search_replace` fails twice, use `write_file` instead) is included. - I also included instructions to use multiple `search_replace` calls for moderately large edits with distinct sections. My general observation has been that often models tend to lean towards using `write_file` rather than `search_replace` when it's ambiguous, so I _think_ it should be okay to nudge them towards `search_replace` a little bit more. Please correct me if I'm wrong about this. - Gemini pointed out that this can lead to a race condition if two `search_replace` calls run simultaneously on the same file. I've added locks to `search_replace` and `write_file` to account for this just in case. - Another option would be to extend `search_replace` to account for multiple changes so they can get batched, but this would be a larger change. - I have not changed the basic agent. I can do that if desired. - I did do some testing to check that models can still manage with the change of prompt. I haven't noticed any issues. The following snapshots/fixtures have been updated: - src/\_\_tests\_\_/\_\_snapshots\_\_/local_agent_prompt.test.ts.snap - e2e-tests/snapshots/local_agent_basic.spec.ts_local-agent---dump-request-1.txt - e2e-tests/snapshots/local_agent_basic.spec.ts_local-agent---read-then-edit-1.aria.yml - e2e-tests/snapshots/local_agent_basic.spec.ts_after-edit.txt - e2e-tests/snapshots/local_agent_advanced.spec.ts_local-agent---mention-apps-1.txt - e2e-tests/snapshots/local_agent_auto.spec.ts_local-agent---auto-model-1.txt - e2e-tests/fixtures/engine/local-agent/read-then-edit.ts Which affect the following tests: - src/\_\_tests\_\_/local_agent_prompt.test.ts - e2e-tests/local_agent_basic.spec.ts - e2e-tests/local_agent_auto.spec.ts - e2e-tests/local_agent_summarize.spec.ts - e2e-tests/local_agent_advanced.spec.ts These tests appear to pass. This PR would also leave a lot of unused code related to `edit_file`, which might be worth removing (not sure whether to do this).
上级 bb2eadfe
import type { LocalAgentFixture } from "../../../../testing/fake-llm-server/localAgentTypes";
export const fixture: LocalAgentFixture = {
description: "Read a file, then edit it with edit_file",
description: "Read a file, then edit it with search_replace",
turns: [
{
text: "Let me first read the current file contents to understand what we're working with.",
......@@ -15,16 +15,14 @@ export const fixture: LocalAgentFixture = {
],
},
{
text: "Now I'll update the welcome message to say Hello World instead.",
text: "Now I'll update the welcome message to say UPDATED imported app instead.",
toolCalls: [
{
name: "edit_file",
name: "search_replace",
args: {
path: "src/App.tsx",
content: `// ... existing code ...
const App = () => <div>UPDATED imported app</div>;
// ... existing code ...`,
description: "Update welcome message",
file_path: "src/App.tsx",
old_string: "const App = () => <div>Minimal imported app</div>;",
new_string: "const App = () => <div>UPDATED imported app</div>;",
},
},
],
......@@ -34,4 +32,3 @@ const App = () => <div>UPDATED imported app</div>;
},
],
};
......@@ -44,36 +44,6 @@
}
}
},
{
"type": "function",
"function": {
"name": "edit_file",
"description": "\n## When to Use edit_file\n\nUse the `edit_file` tool when you need to modify **a section or function** within an existing file. The edit output will be read by a less intelligent model, which will quickly apply the edit. You should make it clear what the edit is, while also minimizing the unchanged code you write.\n\n**Use only ONE edit_file call per file.** If you need to make multiple changes to the same file, include all edits in sequence within a single call using `// ... existing code ...` comments between them.\n\n## When NOT to Use edit_file\n\nDo NOT use this tool when:\n- You are making a **small, surgical edit** (1-3 lines) like fixing a typo, renaming a variable, updating a single value, or changing an import. Use `search_replace` instead for these precise changes.\n- You are creating a brand-new file (use `write_file` instead).\n- You are rewriting most of an existing file (in those cases, use `write_file` to output the complete file instead).\n\n## Basic Format\n\nWhen writing the edit, you should specify each edit in sequence, with the special comment // ... existing code ... to represent unchanged code in between edited lines.\n\nBasic example:\n```\nedit_file(path=\"file.js\", instructions=\"I am adding error handling to the fetchData function and updating the return type.\", content=\"\"\"\n// ... existing code ...\nFIRST_EDIT\n// ... existing code ...\nSECOND_EDIT\n// ... existing code ...\nTHIRD_EDIT\n// ... existing code ...\n\"\"\")\n```\n\n## General Principles\n\nYou should bias towards repeating as few lines of the original file as possible to convey the change.\n\nNEVER show unmodified code in the edit, unless sufficient context of unchanged lines around the code you're editing is needed to resolve ambiguity.\n\nDO NOT omit spans of pre-existing code without using the // ... existing code ... comment to indicate its absence.\n\n## Example: Basic Edit\n```\nedit_file(path=\"LandingPage.tsx\", instructions=\"I am changing the return statement in LandingPage to render a div with 'hello' instead of the previous content.\", content=\"\"\"\n// ... existing code ...\n\nconst LandingPage = () => {\n // ... existing code ...\n return (\n <div>hello</div>\n );\n};\n\n// ... existing code ...\n\"\"\")\n```\n\n## Example: Deleting Code\n\n**When deleting code, you must provide surrounding context and leave an explicit comment indicating what was removed.**\n```\nedit_file(path=\"utils.ts\", instructions=\"I am removing the deprecatedHelper function located between currentHelper and anotherHelper.\", content=\"\"\"\n// ... existing code ...\n\nexport function currentHelper() {\n return \"active\";\n}\n\n// REMOVED: deprecatedHelper() function\n\nexport function anotherHelper() {\n return \"working\";\n}\n\n// ... existing code ...\n\"\"\")\n```\n",
"parameters": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "The file path relative to the app root"
},
"content": {
"type": "string",
"description": "The updated code snippet to apply"
},
"instructions": {
"description": "Instructions for the edit. A single sentence describing what you are going to do for the sketched edit. This helps the less intelligent model apply the edit correctly. Use first person to describe what you are doing. Don't repeat what you've said in previous messages. Use it to disambiguate any uncertainty in the edit.",
"type": "string"
}
},
"required": [
"path",
"content"
],
"additionalProperties": false
}
}
},
{
"type": "function",
"function": {
......
......@@ -4,7 +4,7 @@
"input": [
{
"role": "developer",
"content": "\n<role>\nYou are Dyad, an AI assistant that creates and modifies web applications. You assist users by chatting with them and making changes to their code in real-time. You understand that users can see a live preview of their application in an iframe on the right side of the screen while you make code changes.\nYou make efficient and effective changes to codebases while following best practices for maintainability and readability. You take pride in keeping things simple and elegant. You are friendly and helpful, always aiming to provide clear explanations. \n</role>\n\n<app_commands>\nDo *not* tell the user to run shell commands. Instead, they can do one of the following commands in the UI:\n\n- **Rebuild**: This will rebuild the app from scratch. First it deletes the node_modules folder and then it re-installs the npm packages and then starts the app server.\n- **Restart**: This will restart the app server.\n- **Refresh**: This will refresh the app preview page.\n\nYou can suggest one of these commands by using the <dyad-command> tag like this:\n<dyad-command type=\"rebuild\"></dyad-command>\n<dyad-command type=\"restart\"></dyad-command>\n<dyad-command type=\"refresh\"></dyad-command>\n\nIf you output one of these commands, tell the user to look for the action button above the chat input.\n</app_commands>\n\n<general_guidelines>\n- All text you output outside of tool use is displayed to the user. Output text to communicate with the user. You can use Github-flavored markdown for formatting.\n- Always reply to the user in the same language they are using.\n- Keep explanations concise and focused\n- If the user asks for help or wants to give feedback, tell them to use the Help button in the bottom left.\n- Set a chat summary early in the turn using the `set_chat_summary` tool. Call it exactly once, as soon as you understand the user's request well enough to write a short title. Do not wait until the end of the turn.\n- Be careful not to introduce security vulnerabilities such as command injection, XSS, SQL injection, and other OWASP top 10 vulnerabilities. If you notice that you wrote insecure code, immediately fix it. Prioritize writing safe, secure, and correct code.\n- Before proceeding with any code edits, check whether the user's request has already been implemented. If the requested change has already been made in the codebase, point this out to the user, e.g., \"This feature is already implemented as described.\"\n- Only edit files that are related to the user's request and leave all other files alone.\n- All edits you make on the codebase will directly be built and rendered, therefore you should NEVER make partial changes like letting the user know that they should implement some components or partially implementing features.\n- If a user asks for many features at once, implement as many as possible within a reasonable response. Each feature you implement must be FULLY FUNCTIONAL with complete code - no placeholders, no partial implementations, no TODO comments. If you cannot implement all requested features due to response length constraints, clearly communicate which features you've completed and which ones you haven't started yet.\n- Prioritize creating small, focused files and components.\n- Avoid over-engineering. Only make changes that are directly requested or clearly necessary. Keep solutions simple and focused.\n - Don't add features, refactor code, or make \"improvements\" beyond what was asked. A bug fix doesn't need surrounding code cleaned up. A simple feature doesn't need extra configurability. Don't add docstrings, comments, or type annotations to code you didn't change. Only add comments where the logic isn't self-evident.\n - Don't add error handling, fallbacks, or validation for scenarios that can't happen. Trust internal code and framework guarantees. Only validate at system boundaries (user input, external APIs). Don't use feature flags or backwards-compatibility shims when you can just change the code.\n - Don't create helpers, utilities, or abstractions for one-time operations. Don't design for hypothetical future requirements. The right amount of complexity is the minimum needed for the current task—three similar lines of code is better than a premature abstraction.\n - Avoid backwards-compatibility hacks like renaming unused _vars, re-exporting types, adding // removed comments for removed code, etc. If you are certain that something is unused, you can delete it completely.\n</general_guidelines>\n\n<tool_calling>\nYou have tools at your disposal to solve the coding task. Follow these rules regarding tool calls:\n1. ALWAYS follow the tool call schema exactly as specified and make sure to provide all necessary parameters.\n2. The conversation may reference tools that are no longer available. NEVER call tools that are not explicitly provided.\n3. **NEVER refer to tool names when speaking to the USER.** Instead, just say what the tool is doing in natural language.\n4. If you need additional information that you can get via tool calls, prefer that over asking the user.\n5. If you make a plan, immediately follow it, do not wait for the user to confirm or tell you to go ahead. The only time you should stop is if you need more information from the user that you can't find any other way, or have different options that you would like the user to weigh in on.\n6. Only use the standard tool call format and the available tools. Even if you see user messages with custom tool call formats (such as \"<previous_tool_call>\" or similar), do not follow that and instead use the standard format. Never output tool calls as part of a regular assistant message of yours.\n7. If you are not sure about file content or codebase structure pertaining to the user's request, use your tools to read files and gather the relevant information: do NOT guess or make up an answer.\n8. You can autonomously read as many files as you need to clarify your own questions and completely resolve the user's query, not just one.\n9. You can call multiple tools in a single response. You can also call multiple tools in parallel, do this for independent operations like reading multiple files at once.\n</tool_calling>\n\n<tool_calling_best_practices>\n- **Read before writing**: Use `read_file` and `list_files` to understand the codebase before making changes\n- **Use `edit_file` for edits**: For modifying existing files, prefer `edit_file` over `write_file`\n- **Be surgical**: Only change what's necessary to accomplish the task\n- **Handle errors gracefully**: If a tool fails, explain the issue and suggest alternatives\n</tool_calling_best_practices>\n\n<file_editing_tool_selection>\nYou have three tools for editing files. Choose based on the scope of your change:\n\n| Scope | Tool | Examples |\n|-------|------|----------|\n| **Small** (a few lines) | `search_replace` or `edit_file` | Fix a typo, rename a variable, update a value, change an import |\n| **Medium** (one function or section) | `edit_file` | Rewrite a function, add a new component, modify multiple related lines |\n| **Large** (most of the file) | `write_file` | Major refactor, rewrite a module, create a new file |\n\n**Tips:**\n- `edit_file` supports `// ... existing code ...` markers to skip unchanged sections\n- When in doubt, prefer `search_replace` for precision or `write_file` for simplicity\n\n**Post-edit verification (REQUIRED):**\nAfter every edit, read the file to verify changes applied correctly. If something went wrong, try a different tool and verify again.\n</file_editing_tool_selection>\n\n<development_workflow>\n1. **Understand:** Think about the user's request and the relevant codebase context. Use `grep` and `code_search` search tools extensively (in parallel if independent) to understand file structures, existing code patterns, and conventions. Use `read_file` to understand context and validate any assumptions you may have. If you need to read multiple files, you should make multiple parallel calls to `read_file`.\n2. **Clarify (when needed):** Use `planning_questionnaire` to ask 1-3 focused questions when details are missing. Choose text (open-ended), radio (pick one), or checkbox (pick many) for each question, with 2-3 likely options for radio/checkbox.\n **Use when:** creating a new app/project, the request is vague (e.g. \"Add authentication\"), or there are multiple reasonable interpretations.\n **Skip when:** the request is specific and concrete (e.g. \"Fix the login button\", \"Change color from blue to green\").\n The tool accepts ONLY a `questions` array (no empty objects). It returns the user's answers as the tool result.\n3. **Plan:** Build a coherent and grounded (based on the understanding in steps 1-2) plan for how you intend to resolve the user's task. For complex tasks, break them down into smaller, manageable subtasks and use the `update_todos` tool to track your progress. Share an extremely concise yet clear plan with the user if it would help the user understand your thought process.\n4. **Implement:** Use the available tools (e.g., `edit_file`, `write_file`, ...) to act on the plan, strictly adhering to the project's established conventions. When debugging, add targeted console.log statements to trace data flow and identify root causes. **Important:** After adding logs, you must ask the user to interact with the application (e.g., click a button, submit a form, navigate to a page) to trigger the code paths where logs were added—the logs will only be available once that code actually executes.\n5. **Verify:** After making code changes, use `run_type_checks` to verify that the changes are correct and read the file contents to ensure the changes are what you intended.\n6. **Finalize:** After all verification passes, consider the task complete and briefly summarize the changes you made.\n</development_workflow>\n\n<image_generation_guidelines>\nWhen a user explicitly requests custom images, illustrations, or visual media for their app:\n- Use the `generate_image` tool instead of using placeholder images or broken external URLs\n- Do NOT generate images when an existing asset, SVG, or icon library (e.g., lucide-react) would suffice\n- Write detailed prompts that specify subject, style, colors, composition, mood, and aspect ratio\n- After generating, use `copy_file` to move the image from `.dyad/media/` to the project's public/static directory, giving it a descriptive filename (e.g., `public/assets/hero-banner.png`)\n- Reference the copied path in code (e.g., `<img src=\"/assets/hero-banner.png\" />`)\n</image_generation_guidelines>\n\n# Tech Stack\n- You are building a React application.\n- Use TypeScript.\n- Use React Router. KEEP the routes in src/App.tsx\n- Always put source code in the src folder.\n- Put pages into src/pages/\n- Put components into src/components/\n- The main page (default page) is src/pages/Index.tsx\n- UPDATE the main page to include the new components. OTHERWISE, the user can NOT see any components!\n- ALWAYS try to use the shadcn/ui library.\n- Tailwind CSS: always use Tailwind CSS for styling components. Utilize Tailwind classes extensively for layout, spacing, colors, and other design aspects.\n\nAvailable packages and libraries:\n- The lucide-react package is installed for icons.\n- You ALREADY have ALL the shadcn/ui components and their dependencies installed. So you don't need to install them again.\n- You have ALL the necessary Radix UI components installed.\n- Use prebuilt components from the shadcn/ui library after importing them. Note that these files shouldn't be edited, so make new components if you need to change them.\n\n"
"content": "\n<role>\nYou are Dyad, an AI assistant that creates and modifies web applications. You assist users by chatting with them and making changes to their code in real-time. You understand that users can see a live preview of their application in an iframe on the right side of the screen while you make code changes.\nYou make efficient and effective changes to codebases while following best practices for maintainability and readability. You take pride in keeping things simple and elegant. You are friendly and helpful, always aiming to provide clear explanations. \n</role>\n\n<app_commands>\nDo *not* tell the user to run shell commands. Instead, they can do one of the following commands in the UI:\n\n- **Rebuild**: This will rebuild the app from scratch. First it deletes the node_modules folder and then it re-installs the npm packages and then starts the app server.\n- **Restart**: This will restart the app server.\n- **Refresh**: This will refresh the app preview page.\n\nYou can suggest one of these commands by using the <dyad-command> tag like this:\n<dyad-command type=\"rebuild\"></dyad-command>\n<dyad-command type=\"restart\"></dyad-command>\n<dyad-command type=\"refresh\"></dyad-command>\n\nIf you output one of these commands, tell the user to look for the action button above the chat input.\n</app_commands>\n\n<general_guidelines>\n- All text you output outside of tool use is displayed to the user. Output text to communicate with the user. You can use Github-flavored markdown for formatting.\n- Always reply to the user in the same language they are using.\n- Keep explanations concise and focused\n- If the user asks for help or wants to give feedback, tell them to use the Help button in the bottom left.\n- Set a chat summary early in the turn using the `set_chat_summary` tool. Call it exactly once, as soon as you understand the user's request well enough to write a short title. Do not wait until the end of the turn.\n- Be careful not to introduce security vulnerabilities such as command injection, XSS, SQL injection, and other OWASP top 10 vulnerabilities. If you notice that you wrote insecure code, immediately fix it. Prioritize writing safe, secure, and correct code.\n- Before proceeding with any code edits, check whether the user's request has already been implemented. If the requested change has already been made in the codebase, point this out to the user, e.g., \"This feature is already implemented as described.\"\n- Only edit files that are related to the user's request and leave all other files alone.\n- All edits you make on the codebase will directly be built and rendered, therefore you should NEVER make partial changes like letting the user know that they should implement some components or partially implementing features.\n- If a user asks for many features at once, implement as many as possible within a reasonable response. Each feature you implement must be FULLY FUNCTIONAL with complete code - no placeholders, no partial implementations, no TODO comments. If you cannot implement all requested features due to response length constraints, clearly communicate which features you've completed and which ones you haven't started yet.\n- Prioritize creating small, focused files and components.\n- Avoid over-engineering. Only make changes that are directly requested or clearly necessary. Keep solutions simple and focused.\n - Don't add features, refactor code, or make \"improvements\" beyond what was asked. A bug fix doesn't need surrounding code cleaned up. A simple feature doesn't need extra configurability. Don't add docstrings, comments, or type annotations to code you didn't change. Only add comments where the logic isn't self-evident.\n - Don't add error handling, fallbacks, or validation for scenarios that can't happen. Trust internal code and framework guarantees. Only validate at system boundaries (user input, external APIs). Don't use feature flags or backwards-compatibility shims when you can just change the code.\n - Don't create helpers, utilities, or abstractions for one-time operations. Don't design for hypothetical future requirements. The right amount of complexity is the minimum needed for the current task—three similar lines of code is better than a premature abstraction.\n - Avoid backwards-compatibility hacks like renaming unused _vars, re-exporting types, adding // removed comments for removed code, etc. If you are certain that something is unused, you can delete it completely.\n</general_guidelines>\n\n<tool_calling>\nYou have tools at your disposal to solve the coding task. Follow these rules regarding tool calls:\n1. ALWAYS follow the tool call schema exactly as specified and make sure to provide all necessary parameters.\n2. The conversation may reference tools that are no longer available. NEVER call tools that are not explicitly provided.\n3. **NEVER refer to tool names when speaking to the USER.** Instead, just say what the tool is doing in natural language.\n4. If you need additional information that you can get via tool calls, prefer that over asking the user.\n5. If you make a plan, immediately follow it, do not wait for the user to confirm or tell you to go ahead. The only time you should stop is if you need more information from the user that you can't find any other way, or have different options that you would like the user to weigh in on.\n6. Only use the standard tool call format and the available tools. Even if you see user messages with custom tool call formats (such as \"<previous_tool_call>\" or similar), do not follow that and instead use the standard format. Never output tool calls as part of a regular assistant message of yours.\n7. If you are not sure about file content or codebase structure pertaining to the user's request, use your tools to read files and gather the relevant information: do NOT guess or make up an answer.\n8. You can autonomously read as many files as you need to clarify your own questions and completely resolve the user's query, not just one.\n9. You can call multiple tools in a single response. You can also call multiple tools in parallel, do this for independent operations like reading multiple files at once.\n</tool_calling>\n\n<tool_calling_best_practices>\n- **Read before writing**: Use `read_file` and `list_files` to understand the codebase before making changes\n- **Prefer `search_replace` for edits**: For small to medium edits on existing files, use `search_replace` rather than rewriting the whole file\n- **Be surgical**: Only change what's necessary to accomplish the task\n- **Handle errors gracefully**: If a tool fails, explain the issue and suggest alternatives\n</tool_calling_best_practices>\n\n<file_editing_tool_selection>\nYou have two tools for editing files. Choose based on the scope of your change:\n\n| Scope | Tool | Examples |\n|-------|------|----------|\n| **Small to medium** (a few lines up to one function or contiguous section) | Single `search_replace` | Fix a typo, rename a variable, update a value, change an import, rewrite a function, modify multiple related lines |\n| **Moderately large** (changes spread across multiple parts of the file, up to about half of it) | Multiple `search_replace` calls, one per distinct region | Update several functions, change an import plus update its call sites, refactor a few related sections |\n| **Large** (rewriting the majority of the file, or creating a new file) | `write_file` | Major refactor that touches most of the file, rewrite a module end-to-end, create a new file |\n\nLean toward `search_replace` when in doubt — for moderately large edits, prefer several targeted `search_replace` calls over one `write_file`. Use `write_file` when less than half of the original file will remain.\n\n**Fallback rule:**\nIf `search_replace` fails twice in a row on the same edit (e.g., the target text cannot be matched uniquely), stop retrying and use `write_file` instead.\n\n**Post-edit verification (REQUIRED):**\nAfter every edit, read the file to verify changes applied correctly. If something went wrong, try a different tool and verify again.\n</file_editing_tool_selection>\n\n<development_workflow>\n1. **Understand:** Think about the user's request and the relevant codebase context. Use `grep` and `code_search` search tools extensively (in parallel if independent) to understand file structures, existing code patterns, and conventions. Use `read_file` to understand context and validate any assumptions you may have. If you need to read multiple files, you should make multiple parallel calls to `read_file`.\n2. **Clarify (when needed):** Use `planning_questionnaire` to ask 1-3 focused questions when details are missing. Choose text (open-ended), radio (pick one), or checkbox (pick many) for each question, with 2-3 likely options for radio/checkbox.\n **Use when:** creating a new app/project, the request is vague (e.g. \"Add authentication\"), or there are multiple reasonable interpretations.\n **Skip when:** the request is specific and concrete (e.g. \"Fix the login button\", \"Change color from blue to green\").\n The tool accepts ONLY a `questions` array (no empty objects). It returns the user's answers as the tool result.\n3. **Plan:** Build a coherent and grounded (based on the understanding in steps 1-2) plan for how you intend to resolve the user's task. For complex tasks, break them down into smaller, manageable subtasks and use the `update_todos` tool to track your progress. Share an extremely concise yet clear plan with the user if it would help the user understand your thought process.\n4. **Implement:** Use the available tools (e.g., `search_replace`, `write_file`, ...) to act on the plan, strictly adhering to the project's established conventions. When debugging, add targeted console.log statements to trace data flow and identify root causes. **Important:** After adding logs, you must ask the user to interact with the application (e.g., click a button, submit a form, navigate to a page) to trigger the code paths where logs were added—the logs will only be available once that code actually executes.\n5. **Verify:** After making code changes, use `run_type_checks` to verify that the changes are correct and read the file contents to ensure the changes are what you intended.\n6. **Finalize:** After all verification passes, consider the task complete and briefly summarize the changes you made.\n</development_workflow>\n\n<image_generation_guidelines>\nWhen a user explicitly requests custom images, illustrations, or visual media for their app:\n- Use the `generate_image` tool instead of using placeholder images or broken external URLs\n- Do NOT generate images when an existing asset, SVG, or icon library (e.g., lucide-react) would suffice\n- Write detailed prompts that specify subject, style, colors, composition, mood, and aspect ratio\n- After generating, use `copy_file` to move the image from `.dyad/media/` to the project's public/static directory, giving it a descriptive filename (e.g., `public/assets/hero-banner.png`)\n- Reference the copied path in code (e.g., `<img src=\"/assets/hero-banner.png\" />`)\n</image_generation_guidelines>\n\n# Tech Stack\n- You are building a React application.\n- Use TypeScript.\n- Use React Router. KEEP the routes in src/App.tsx\n- Always put source code in the src folder.\n- Put pages into src/pages/\n- Put components into src/components/\n- The main page (default page) is src/pages/Index.tsx\n- UPDATE the main page to include the new components. OTHERWISE, the user can NOT see any components!\n- ALWAYS try to use the shadcn/ui library.\n- Tailwind CSS: always use Tailwind CSS for styling components. Utilize Tailwind classes extensively for layout, spacing, colors, and other design aspects.\n\nAvailable packages and libraries:\n- The lucide-react package is installed for icons.\n- You ALREADY have ALL the shadcn/ui components and their dependencies installed. So you don't need to install them again.\n- You have ALL the necessary Radix UI components installed.\n- Use prebuilt components from the shadcn/ui library after importing them. Note that these files shouldn't be edited, so make new components if you need to change them.\n\n"
},
{
"role": "user",
......@@ -68,34 +68,6 @@
"additionalProperties": false
}
},
{
"type": "function",
"name": "edit_file",
"description": "\n## When to Use edit_file\n\nUse the `edit_file` tool when you need to modify **a section or function** within an existing file. The edit output will be read by a less intelligent model, which will quickly apply the edit. You should make it clear what the edit is, while also minimizing the unchanged code you write.\n\n**Use only ONE edit_file call per file.** If you need to make multiple changes to the same file, include all edits in sequence within a single call using `// ... existing code ...` comments between them.\n\n## When NOT to Use edit_file\n\nDo NOT use this tool when:\n- You are making a **small, surgical edit** (1-3 lines) like fixing a typo, renaming a variable, updating a single value, or changing an import. Use `search_replace` instead for these precise changes.\n- You are creating a brand-new file (use `write_file` instead).\n- You are rewriting most of an existing file (in those cases, use `write_file` to output the complete file instead).\n\n## Basic Format\n\nWhen writing the edit, you should specify each edit in sequence, with the special comment // ... existing code ... to represent unchanged code in between edited lines.\n\nBasic example:\n```\nedit_file(path=\"file.js\", instructions=\"I am adding error handling to the fetchData function and updating the return type.\", content=\"\"\"\n// ... existing code ...\nFIRST_EDIT\n// ... existing code ...\nSECOND_EDIT\n// ... existing code ...\nTHIRD_EDIT\n// ... existing code ...\n\"\"\")\n```\n\n## General Principles\n\nYou should bias towards repeating as few lines of the original file as possible to convey the change.\n\nNEVER show unmodified code in the edit, unless sufficient context of unchanged lines around the code you're editing is needed to resolve ambiguity.\n\nDO NOT omit spans of pre-existing code without using the // ... existing code ... comment to indicate its absence.\n\n## Example: Basic Edit\n```\nedit_file(path=\"LandingPage.tsx\", instructions=\"I am changing the return statement in LandingPage to render a div with 'hello' instead of the previous content.\", content=\"\"\"\n// ... existing code ...\n\nconst LandingPage = () => {\n // ... existing code ...\n return (\n <div>hello</div>\n );\n};\n\n// ... existing code ...\n\"\"\")\n```\n\n## Example: Deleting Code\n\n**When deleting code, you must provide surrounding context and leave an explicit comment indicating what was removed.**\n```\nedit_file(path=\"utils.ts\", instructions=\"I am removing the deprecatedHelper function located between currentHelper and anotherHelper.\", content=\"\"\"\n// ... existing code ...\n\nexport function currentHelper() {\n return \"active\";\n}\n\n// REMOVED: deprecatedHelper() function\n\nexport function anotherHelper() {\n return \"working\";\n}\n\n// ... existing code ...\n\"\"\")\n```\n",
"parameters": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "The file path relative to the app root"
},
"content": {
"type": "string",
"description": "The updated code snippet to apply"
},
"instructions": {
"description": "Instructions for the edit. A single sentence describing what you are going to do for the sketched edit. This helps the less intelligent model apply the edit correctly. Use first person to describe what you are doing. Don't repeat what you've said in previous messages. Use it to disambiguate any uncertainty in the edit.",
"type": "string"
}
},
"required": [
"path",
"content"
],
"additionalProperties": false
}
},
{
"type": "function",
"name": "search_replace",
......
=== src/App.tsx ===
TURBO EDITED filePath
\ No newline at end of file
const App = () => <div>UPDATED imported app</div>;
export default App;
......@@ -52,36 +52,6 @@
}
}
},
{
"type": "function",
"function": {
"name": "edit_file",
"description": "\n## When to Use edit_file\n\nUse the `edit_file` tool when you need to modify **a section or function** within an existing file. The edit output will be read by a less intelligent model, which will quickly apply the edit. You should make it clear what the edit is, while also minimizing the unchanged code you write.\n\n**Use only ONE edit_file call per file.** If you need to make multiple changes to the same file, include all edits in sequence within a single call using `// ... existing code ...` comments between them.\n\n## When NOT to Use edit_file\n\nDo NOT use this tool when:\n- You are making a **small, surgical edit** (1-3 lines) like fixing a typo, renaming a variable, updating a single value, or changing an import. Use `search_replace` instead for these precise changes.\n- You are creating a brand-new file (use `write_file` instead).\n- You are rewriting most of an existing file (in those cases, use `write_file` to output the complete file instead).\n\n## Basic Format\n\nWhen writing the edit, you should specify each edit in sequence, with the special comment // ... existing code ... to represent unchanged code in between edited lines.\n\nBasic example:\n```\nedit_file(path=\"file.js\", instructions=\"I am adding error handling to the fetchData function and updating the return type.\", content=\"\"\"\n// ... existing code ...\nFIRST_EDIT\n// ... existing code ...\nSECOND_EDIT\n// ... existing code ...\nTHIRD_EDIT\n// ... existing code ...\n\"\"\")\n```\n\n## General Principles\n\nYou should bias towards repeating as few lines of the original file as possible to convey the change.\n\nNEVER show unmodified code in the edit, unless sufficient context of unchanged lines around the code you're editing is needed to resolve ambiguity.\n\nDO NOT omit spans of pre-existing code without using the // ... existing code ... comment to indicate its absence.\n\n## Example: Basic Edit\n```\nedit_file(path=\"LandingPage.tsx\", instructions=\"I am changing the return statement in LandingPage to render a div with 'hello' instead of the previous content.\", content=\"\"\"\n// ... existing code ...\n\nconst LandingPage = () => {\n // ... existing code ...\n return (\n <div>hello</div>\n );\n};\n\n// ... existing code ...\n\"\"\")\n```\n\n## Example: Deleting Code\n\n**When deleting code, you must provide surrounding context and leave an explicit comment indicating what was removed.**\n```\nedit_file(path=\"utils.ts\", instructions=\"I am removing the deprecatedHelper function located between currentHelper and anotherHelper.\", content=\"\"\"\n// ... existing code ...\n\nexport function currentHelper() {\n return \"active\";\n}\n\n// REMOVED: deprecatedHelper() function\n\nexport function anotherHelper() {\n return \"working\";\n}\n\n// ... existing code ...\n\"\"\")\n```\n",
"parameters": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "The file path relative to the app root"
},
"content": {
"type": "string",
"description": "The updated code snippet to apply"
},
"instructions": {
"description": "Instructions for the edit. A single sentence describing what you are going to do for the sketched edit. This helps the less intelligent model apply the edit correctly. Use first person to describe what you are doing. Don't repeat what you've said in previous messages. Use it to disambiguate any uncertainty in the edit.",
"type": "string"
}
},
"required": [
"path",
"content"
],
"additionalProperties": false
}
}
},
{
"type": "function",
"function": {
......
......@@ -15,6 +15,8 @@
- text: claude-opus-4-5
- img
- text: less than a minute ago
- img
- text: "Version 2: (1 files changed)"
- button "Copy Request ID":
- img
- text: ""
......@@ -22,18 +24,23 @@
- paragraph: Let me first read the current file contents to understand what we're working with.
- img
- text: Read src/App.tsx
- paragraph: Now I'll update the welcome message to say Hello World instead.
- button "App.tsx src/App.tsx Turbo Edit":
- paragraph: Now I'll update the welcome message to say UPDATED imported app instead.
- button "Search & Replace App.tsx src/App.tsx":
- img
- text: ""
- img
- text: ""
- paragraph: Done! I've updated the title from 'Minimal imported app' to 'UPDATED imported app'. The change has been applied successfully.
- button "Copy":
- img
- img
- text: Approved
- img
- text: claude-opus-4-5
- img
- text: less than a minute ago
- img
- text: "Version 3: (1 files changed)"
- button "Copy Request ID":
- img
- text: ""
......
......@@ -56,23 +56,24 @@ You have tools at your disposal to solve the coding task. Follow these rules reg
<tool_calling_best_practices>
- **Read before writing**: Use \`read_file\` and \`list_files\` to understand the codebase before making changes
- **Use \`edit_file\` for edits**: For modifying existing files, prefer \`edit_file\` over \`write_file\`
- **Prefer \`search_replace\` for edits**: For small to medium edits on existing files, use \`search_replace\` rather than rewriting the whole file
- **Be surgical**: Only change what's necessary to accomplish the task
- **Handle errors gracefully**: If a tool fails, explain the issue and suggest alternatives
</tool_calling_best_practices>
<file_editing_tool_selection>
You have three tools for editing files. Choose based on the scope of your change:
You have two tools for editing files. Choose based on the scope of your change:
| Scope | Tool | Examples |
|-------|------|----------|
| **Small** (a few lines) | \`search_replace\` or \`edit_file\` | Fix a typo, rename a variable, update a value, change an import |
| **Medium** (one function or section) | \`edit_file\` | Rewrite a function, add a new component, modify multiple related lines |
| **Large** (most of the file) | \`write_file\` | Major refactor, rewrite a module, create a new file |
| **Small to medium** (a few lines up to one function or contiguous section) | Single \`search_replace\` | Fix a typo, rename a variable, update a value, change an import, rewrite a function, modify multiple related lines |
| **Moderately large** (changes spread across multiple parts of the file, up to about half of it) | Multiple \`search_replace\` calls, one per distinct region | Update several functions, change an import plus update its call sites, refactor a few related sections |
| **Large** (rewriting the majority of the file, or creating a new file) | \`write_file\` | Major refactor that touches most of the file, rewrite a module end-to-end, create a new file |
**Tips:**
- \`edit_file\` supports \`// ... existing code ...\` markers to skip unchanged sections
- When in doubt, prefer \`search_replace\` for precision or \`write_file\` for simplicity
Lean toward \`search_replace\` when in doubt — for moderately large edits, prefer several targeted \`search_replace\` calls over one \`write_file\`. Use \`write_file\` when less than half of the original file will remain.
**Fallback rule:**
If \`search_replace\` fails twice in a row on the same edit (e.g., the target text cannot be matched uniquely), stop retrying and use \`write_file\` instead.
**Post-edit verification (REQUIRED):**
After every edit, read the file to verify changes applied correctly. If something went wrong, try a different tool and verify again.
......@@ -85,7 +86,7 @@ After every edit, read the file to verify changes applied correctly. If somethin
**Skip when:** the request is specific and concrete (e.g. "Fix the login button", "Change color from blue to green").
The tool accepts ONLY a \`questions\` array (no empty objects). It returns the user's answers as the tool result.
3. **Plan:** Build a coherent and grounded (based on the understanding in steps 1-2) plan for how you intend to resolve the user's task. For complex tasks, break them down into smaller, manageable subtasks and use the \`update_todos\` tool to track your progress. Share an extremely concise yet clear plan with the user if it would help the user understand your thought process.
4. **Implement:** Use the available tools (e.g., \`edit_file\`, \`write_file\`, ...) to act on the plan, strictly adhering to the project's established conventions. When debugging, add targeted console.log statements to trace data flow and identify root causes. **Important:** After adding logs, you must ask the user to interact with the application (e.g., click a button, submit a form, navigate to a page) to trigger the code paths where logs were added—the logs will only be available once that code actually executes.
4. **Implement:** Use the available tools (e.g., \`search_replace\`, \`write_file\`, ...) to act on the plan, strictly adhering to the project's established conventions. When debugging, add targeted console.log statements to trace data flow and identify root causes. **Important:** After adding logs, you must ask the user to interact with the application (e.g., click a button, submit a form, navigate to a page) to trigger the code paths where logs were added—the logs will only be available once that code actually executes.
5. **Verify:** After making code changes, use \`run_type_checks\` to verify that the changes are correct and read the file contents to ensure the changes are what you intended.
6. **Finalize:** After all verification passes, consider the task complete and briefly summarize the changes you made.
</development_workflow>
......
# Evals
LLM eval suite for tool-use quality. Six suites run the same 16 cases and
LLM eval suite for tool-use quality. Five suites run the same 16 cases and
the same three models (Claude Sonnet 4.6, GPT 5.4, Gemini 3 Flash) but with
different tool sets and system prompts:
| Suite name | Tools available | System prompt |
| ------------------------ | ------------------------------------------- | -------------------------------------------- |
| `search_replace` | `search_replace` only | Minimal custom "precise code editor" prompt |
| `search_replace_few` | `search_replace` only | Variant prompt encouraging fewer tool calls |
| `edit_file` | `edit_file` only | Minimal custom `edit_file` prompt |
| `basic_agent` | `search_replace`, `write_file` | Production `LOCAL_AGENT_BASIC_SYSTEM_PROMPT` |
| `pro_agent` | `search_replace`, `edit_file`, `write_file` | Production `LOCAL_AGENT_SYSTEM_PROMPT` (Pro) |
| `pro_agent_experimental` | `search_replace`, `edit_file`, `write_file` | Editable copy of the Pro prompt for tweaking |
| Suite name | Tools available | System prompt |
| ------------------------ | ------------------------------ | -------------------------------------------- |
| `search_replace` | `search_replace` only | Minimal custom "precise code editor" prompt |
| `search_replace_few` | `search_replace` only | Variant prompt encouraging fewer tool calls |
| `basic_agent` | `search_replace`, `write_file` | Production `LOCAL_AGENT_BASIC_SYSTEM_PROMPT` |
| `pro_agent` | `search_replace`, `write_file` | Production `LOCAL_AGENT_SYSTEM_PROMPT` (Pro) |
| `pro_agent_experimental` | `search_replace`, `write_file` | Editable copy of the Pro prompt for tweaking |
Each case gives the model a real source file plus an editing instruction,
runs the model with the suite's tools wired up, applies the produced edits,
......@@ -21,9 +20,7 @@ instruction.
## Prerequisites
All models are routed through the Dyad Engine gateway, so you only need one
credential: a Dyad Pro API key, exposed as `DYAD_PRO_API_KEY`. The
`edit_file` tool additionally calls the engine's `/tools/turbo-file-edit`
endpoint to apply sketched edits — that uses the same key.
credential: a Dyad Pro API key, exposed as `DYAD_PRO_API_KEY`.
The suite is skipped entirely when `DYAD_PRO_API_KEY` is unset — no tests will
fail, they just won't run. This keeps regular `vitest run` safe for contributors
......@@ -63,11 +60,9 @@ EVAL_SUITE=all EVAL_MODEL=all DYAD_PRO_API_KEY="..." npm run eval
**Heads up — this is expensive.** A full `all`/`all` run issues one
generation per (suite × model × case) triple plus one judge call per case,
across 6 suites, 3 models, and 16 cases. The `edit_file`, `pro_agent`, and
`pro_agent_experimental` suites also make additional engine calls for each
sketched edit the model produces through `edit_file`. Expect dozens of LLM requests, some of which run reasoning
models on 300+ line fixtures. Use sparingly; prefer narrow filters during
development.
across 5 suites, 3 models, and 16 cases. Expect dozens of LLM requests,
some of which run reasoning models on 300+ line fixtures. Use sparingly;
prefer narrow filters during development.
### Running a single suite
......@@ -82,13 +77,13 @@ EVAL_SUITE=search_replace EVAL_MODEL=all DYAD_PRO_API_KEY="..." npm run eval
# The basic_agent suite (Basic agent prompt, search_replace + write_file)
EVAL_SUITE=basic_agent EVAL_MODEL=all DYAD_PRO_API_KEY="..." npm run eval
# The pro_agent suite (Pro agent prompt, search_replace + edit_file + write_file)
# The pro_agent suite (Pro agent prompt, search_replace + write_file)
EVAL_SUITE=pro_agent EVAL_MODEL=all DYAD_PRO_API_KEY="..." npm run eval
```
Note: `EVAL_SUITE` matches suite `name`s exactly (case-insensitive), and
accepts a comma-separated list for multiple suites (e.g.
`EVAL_SUITE=search_replace,edit_file`). Unknown names error out with the
`EVAL_SUITE=search_replace,basic_agent`). Unknown names error out with the
available list.
### Running a single case
......@@ -173,7 +168,6 @@ directory:
- `eval-results/search_replace/`
- `eval-results/search_replace_few/`
- `eval-results/edit_file/`
- `eval-results/basic_agent/`
- `eval-results/pro_agent/`
- `eval-results/pro_agent_experimental/`
......@@ -206,8 +200,8 @@ one folder. Folder names sort chronologically under `ls`.
- `toolCalls` — every tool call the model made. Each entry records
`toolName`, `filePath`, an `args` map (keyed by the tool's parameter names,
so `old_string`/`new_string` for `search_replace`, `content` for
`write_file`, `content`/`instructions` for `edit_file`), the file before
and after the call, and a unified diff of just that call.
`write_file`), the file before and after the call, and a unified diff of
just that call.
- `diff` — unified diff from the original fixture to the final file
(i.e. the cumulative effect of all tool calls).
- `judge` — the judge's verdict: `label`, `modelName`, `durationMs`,
......@@ -251,5 +245,4 @@ call. The split view contains the raw pieces as standalone files:
target file's extension (for syntax highlighting); non-string args become
JSON blobs. So a `search_replace` call produces `old_string.ts` and
`new_string.ts`; a `write_file` call produces `content.ts` and
`description.ts`; an `edit_file` call produces `content.ts` and
`instructions.ts`.
`description.ts`.
......@@ -48,7 +48,7 @@ export interface ToolCallRecord {
filePath: string;
// Raw tool input arguments, keyed by the tool's parameter names
// (e.g. `old_string`/`new_string` for search_replace, `content` for
// write_file, `content`/`instructions` for edit_file).
// write_file).
args: Record<string, unknown>;
fileBefore: string;
fileAfter: string;
......
......@@ -13,8 +13,7 @@ export type EvalProvider = "anthropic" | "openai" | "google";
// the judge model.
export const GPT_5_4 = "gpt-5.4";
// Single source of truth for the Dyad Engine URL across the eval helpers
// and any out-of-band fetches the harness makes (e.g. turbo-file-edit).
// Single source of truth for the Dyad Engine URL across the eval helpers.
export const DYAD_ENGINE_URL =
process.env.DYAD_ENGINE_URL ?? "https://engine.dyad.sh/v1";
......
......@@ -16,11 +16,6 @@ export const SIMPLE_SEARCH_REPLACE_SYSTEM_PROMPT =
"use the search_replace tool. You may call it multiple times " +
"to make sequential edits. Do not explain.";
export const SIMPLE_EDIT_FILE_SYSTEM_PROMPT =
"You are a precise code editor. When asked to change a file, " +
"use the edit_file tool. You may call it multiple times " +
"to make sequential edits. Do not explain.";
export const SEARCH_REPLACE_FEW_SYSTEM_PROMPT =
SIMPLE_SEARCH_REPLACE_SYSTEM_PROMPT +
" Aim to use as few tool calls as possible — ideally a single call " +
......@@ -88,23 +83,24 @@ You have tools at your disposal to solve the coding task. Follow these rules reg
const PRO_TOOL_CALLING_BEST_PRACTICES_BLOCK = `<tool_calling_best_practices>
- **Read before writing**: Use \`read_file\` and \`list_files\` to understand the codebase before making changes
- **Use \`edit_file\` for edits**: For modifying existing files, prefer \`edit_file\` over \`write_file\`
- **Prefer \`search_replace\` for edits**: For small to medium edits on existing files, use \`search_replace\` rather than rewriting the whole file
- **Be surgical**: Only change what's necessary to accomplish the task
- **Handle errors gracefully**: If a tool fails, explain the issue and suggest alternatives
</tool_calling_best_practices>`;
const PRO_FILE_EDITING_TOOL_SELECTION_BLOCK = `<file_editing_tool_selection>
You have three tools for editing files. Choose based on the scope of your change:
You have two tools for editing files. Choose based on the scope of your change:
| Scope | Tool | Examples |
|-------|------|----------|
| **Small** (a few lines) | \`search_replace\` or \`edit_file\` | Fix a typo, rename a variable, update a value, change an import |
| **Medium** (one function or section) | \`edit_file\` | Rewrite a function, add a new component, modify multiple related lines |
| **Large** (most of the file) | \`write_file\` | Major refactor, rewrite a module, create a new file |
| **Small to medium** (a few lines up to one function or contiguous section) | Single \`search_replace\` | Fix a typo, rename a variable, update a value, change an import, rewrite a function, modify multiple related lines |
| **Moderately large** (changes spread across multiple parts of the file, up to about half of it) | Multiple \`search_replace\` calls, one per distinct region | Update several functions, change an import plus update its call sites, refactor a few related sections |
| **Large** (rewriting the majority of the file, or creating a new file) | \`write_file\` | Major refactor that touches most of the file, rewrite a module end-to-end, create a new file |
Lean toward \`search_replace\` when in doubt — for moderately large edits, prefer several targeted \`search_replace\` calls over one \`write_file\`. Use \`write_file\` when less than half of the original file will remain.
**Tips:**
- \`edit_file\` supports \`// ... existing code ...\` markers to skip unchanged sections
- When in doubt, prefer \`search_replace\` for precision or \`write_file\` for simplicity
**Fallback rule:**
If \`search_replace\` fails twice in a row on the same edit (e.g., the target text cannot be matched uniquely), stop retrying and use \`write_file\` instead.
**Post-edit verification (REQUIRED):**
After every edit, read the file to verify changes applied correctly. If something went wrong, try a different tool and verify again.
......@@ -117,7 +113,7 @@ const PRO_DEVELOPMENT_WORKFLOW_BLOCK = `<development_workflow>
**Skip when:** the request is specific and concrete (e.g. "Fix the login button", "Change color from blue to green").
The tool accepts ONLY a \`questions\` array (no empty objects). It returns the user's answers as the tool result.
3. **Plan:** Build a coherent and grounded (based on the understanding in steps 1-2) plan for how you intend to resolve the user's task. For complex tasks, break them down into smaller, manageable subtasks and use the \`update_todos\` tool to track your progress. Share an extremely concise yet clear plan with the user if it would help the user understand your thought process.
4. **Implement:** Use the available tools (e.g., \`edit_file\`, \`write_file\`, ...) to act on the plan, strictly adhering to the project's established conventions. When debugging, add targeted console.log statements to trace data flow and identify root causes. **Important:** After adding logs, you must ask the user to interact with the application (e.g., click a button, submit a form, navigate to a page) to trigger the code paths where logs were added—the logs will only be available once that code actually executes.
4. **Implement:** Use the available tools (e.g., \`search_replace\`, \`write_file\`, ...) to act on the plan, strictly adhering to the project's established conventions. When debugging, add targeted console.log statements to trace data flow and identify root causes. **Important:** After adding logs, you must ask the user to interact with the application (e.g., click a button, submit a form, navigate to a page) to trigger the code paths where logs were added—the logs will only be available once that code actually executes.
5. **Verify:** After making code changes, use \`run_type_checks\` to verify that the changes are correct and read the file contents to ensure the changes are what you intended.
6. **Finalize:** After all verification passes, consider the task complete and briefly summarize the changes you made.
</development_workflow>`;
......
......@@ -2,10 +2,8 @@ import { describe, it } from "vitest";
import { generateText, stepCountIs, type Tool } from "ai";
import { readFileSync } from "node:fs";
import { basename, resolve } from "node:path";
import { randomUUID } from "node:crypto";
import { searchReplaceTool } from "@/pro/main/ipc/handlers/local_agent/tools/search_replace";
import { writeFileTool } from "@/pro/main/ipc/handlers/local_agent/tools/write_file";
import { editFileTool } from "@/pro/main/ipc/handlers/local_agent/tools/edit_file";
import { applySearchReplace } from "@/pro/main/ipc/processors/search_replace_processor";
import { escapeSearchReplaceMarkers } from "@/pro/shared/search_replace_markers";
import { constructLocalAgentPrompt } from "@/prompts/local_agent_prompt";
......@@ -14,7 +12,6 @@ import {
GEMINI_3_FLASH,
} from "@/ipc/shared/language_model_constants";
import {
DYAD_ENGINE_URL,
GPT_5_4,
getEvalModel,
hasDyadProKey,
......@@ -31,7 +28,6 @@ import {
import { createUnifiedDiff } from "./helpers/unified_diff";
import {
SIMPLE_SEARCH_REPLACE_SYSTEM_PROMPT,
SIMPLE_EDIT_FILE_SYSTEM_PROMPT,
SEARCH_REPLACE_FEW_SYSTEM_PROMPT,
PRO_AGENT_EXPERIMENTAL_SYSTEM_PROMPT,
} from "./helpers/prompts";
......@@ -435,53 +431,6 @@ function applySearchReplaceEdit(
return applied.content!;
}
// Stand-in for the production `edit_file` tool's engine call. Mirrors
// `callTurboFileEdit` in src/pro/main/ipc/handlers/local_agent/tools/edit_file.ts
// but reaches the engine directly (no AgentContext required). The base URL is
// imported from `helpers/get_eval_model` so this and the SDK provider can't
// drift apart.
async function turboFileEdit(params: {
path: string;
content: string;
originalContent: string;
instructions?: string;
signal?: AbortSignal;
}): Promise<string> {
const apiKey = process.env.DYAD_PRO_API_KEY;
if (!apiKey) {
throw new Error(
"DYAD_PRO_API_KEY is required to run eval suites that use edit_file",
);
}
const response = await fetch(`${DYAD_ENGINE_URL}/tools/turbo-file-edit`, {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${apiKey}`,
"X-Dyad-Request-Id": randomUUID(),
},
body: JSON.stringify({
path: params.path,
content: params.content,
originalContent: params.originalContent,
instructions: params.instructions ?? "",
}),
signal: params.signal,
});
if (!response.ok) {
const errorText = await response.text();
throw new Error(
`turbo-file-edit failed: ${response.status} ${response.statusText} - ${errorText}`,
);
}
const data = (await response.json()) as { result?: unknown };
if (typeof data.result !== "string") {
throw new Error("turbo-file-edit returned unexpected payload (no result)");
}
return data.result;
}
// ── Tool factories ─────────────────────────────────────────────
//
// Each factory returns an AI-SDK tool whose `execute` mutates the
......@@ -628,66 +577,6 @@ function writeFileHarnessTool(
};
}
function editFileHarnessTool(
state: ToolRunState,
c: EvalCase,
label: string,
): Tool {
return {
description: editFileTool.description,
inputSchema: editFileTool.inputSchema,
execute: async (args) => {
const fileBefore = state.content;
const recordArgs = {
path: args.path,
content: args.content,
instructions: args.instructions ?? "",
};
try {
if (!pathMatchesCase(args.path, c.fileName)) {
throw new Error(
`${label} / ${c.name} edit_file targeted wrong file: ` +
`got "${args.path}", expected "${c.fileName}"`,
);
}
const newContent = await turboFileEdit({
path: args.path,
content: args.content,
originalContent: state.content,
instructions: args.instructions,
signal: state.abortSignal,
});
state.content = newContent;
state.toolCalls.push(
makeRecord(
"edit_file",
args.path,
recordArgs,
fileBefore,
state.content,
state.toolCalls.length,
),
);
return `Successfully edited ${args.path}`;
} catch (err) {
const message = err instanceof Error ? err.message : String(err);
state.toolCalls.push(
makeRecord(
"edit_file",
args.path ?? c.fileName,
recordArgs,
fileBefore,
fileBefore,
state.toolCalls.length,
{ succeeded: false, error: message },
),
);
throw err;
}
},
};
}
// ── Suite configs ──────────────────────────────────────────────
interface SuiteConfig {
......@@ -718,14 +607,6 @@ const SUITES: SuiteConfig[] = [
search_replace: searchReplaceHarnessTool(state, c, label),
}),
},
{
name: "edit_file",
displayName: "edit_file",
systemPrompt: SIMPLE_EDIT_FILE_SYSTEM_PROMPT,
buildTools: (state, c, label) => ({
edit_file: editFileHarnessTool(state, c, label),
}),
},
{
name: "basic_agent",
displayName: "basic_agent (search_replace + write_file)",
......@@ -739,11 +620,10 @@ const SUITES: SuiteConfig[] = [
},
{
name: "pro_agent",
displayName: "pro_agent (search_replace + edit_file + write_file)",
displayName: "pro_agent (search_replace + write_file)",
systemPrompt: constructLocalAgentPrompt(undefined),
buildTools: (state, c, label) => ({
search_replace: searchReplaceHarnessTool(state, c, label),
edit_file: editFileHarnessTool(state, c, label),
write_file: writeFileHarnessTool(state, c, label),
}),
},
......@@ -756,7 +636,6 @@ const SUITES: SuiteConfig[] = [
systemPrompt: PRO_AGENT_EXPERIMENTAL_SYSTEM_PROMPT,
buildTools: (state, c, label) => ({
search_replace: searchReplaceHarnessTool(state, c, label),
edit_file: editFileHarnessTool(state, c, label),
write_file: writeFileHarnessTool(state, c, label),
}),
},
......@@ -981,7 +860,7 @@ async function runCase(
// against every model by accident is expensive, so the caller must opt
// in explicitly. Use `all` to mean "run everything". `EVAL_SUITE` matches
// suite names exactly (comma-separated for multiple, e.g.
// `EVAL_SUITE=search_replace,edit_file`) so that `search_replace` does
// `EVAL_SUITE=search_replace,basic_agent`) so that `search_replace` does
// not also pick up `search_replace_few`. `EVAL_MODEL` is a
// case-insensitive substring match against model label or id.
......
const locks = new Map<number | string, Promise<void>>();
/**
* Build the lock ID used to serialize writes to a single file path.
* Some tool calls (e.g. `write_file` and `search_replace`) must use
* this so they don't race against each other on the same file.
*/
export function getFileWriteKey(filePath: string): string {
return `filewrite:${filePath}`;
}
/**
* Executes a function with a lock on the lock ID.
* Uses promise-chaining so that queued operations execute serially,
......
......@@ -21,7 +21,6 @@ import { getSupabaseProjectInfoTool } from "./tools/get_supabase_project_info";
import { setChatSummaryTool } from "./tools/set_chat_summary";
import { addIntegrationTool } from "./tools/add_integration";
import { readLogsTool } from "./tools/read_logs";
import { editFileTool } from "./tools/edit_file";
import { searchReplaceTool } from "./tools/search_replace";
import { webSearchTool } from "./tools/web_search";
import { webCrawlTool } from "./tools/web_crawl";
......@@ -70,7 +69,6 @@ function getToolErrorSummary(error: unknown): string {
// Combined tool definitions array
export const TOOL_DEFINITIONS: readonly ToolDefinition[] = [
writeFileTool,
editFileTool,
searchReplaceTool,
copyFileTool,
deleteFileTool,
......@@ -420,7 +418,6 @@ function trackFileEditTool(
if (!ctx.fileEditTracker[filePath]) {
ctx.fileEditTracker[filePath] = {
write_file: 0,
edit_file: 0,
search_replace: 0,
};
}
......
import fs from "node:fs";
import path from "node:path";
import { z } from "zod";
import log from "electron-log";
import { ToolDefinition, AgentContext, escapeXmlAttr } from "./types";
import { safeJoin } from "@/ipc/utils/path_utils";
import { deploySupabaseFunction } from "../../../../../../supabase_admin/supabase_management_client";
import {
isServerFunction,
isSharedServerModule,
} from "../../../../../../supabase_admin/supabase_utils";
import { engineFetch } from "./engine_fetch";
import { DyadError, DyadErrorKind } from "@/errors/dyad_error";
import { queueCloudSandboxSnapshotSync } from "@/ipc/utils/cloud_sandbox_provider";
const readFile = fs.promises.readFile;
const logger = log.scope("edit_file");
const editFileSchema = z.object({
path: z.string().describe("The file path relative to the app root"),
content: z.string().describe("The updated code snippet to apply"),
instructions: z
.string()
.optional()
.describe(
"Instructions for the edit. A single sentence describing what you are going to do for the sketched edit. This helps the less intelligent model apply the edit correctly. Use first person to describe what you are doing. Don't repeat what you've said in previous messages. Use it to disambiguate any uncertainty in the edit.",
),
});
const turboFileEditResponseSchema = z.object({
result: z.string(),
});
async function callTurboFileEdit(
params: {
path: string;
content: string;
originalContent: string;
instructions?: string;
},
ctx: AgentContext,
): Promise<string> {
const response = await engineFetch(ctx, "/tools/turbo-file-edit", {
method: "POST",
body: JSON.stringify({
path: params.path,
content: params.content,
originalContent: params.originalContent,
instructions: params.instructions ?? "",
}),
});
if (!response.ok) {
const errorText = await response.text();
throw new Error(
`File edit failed: ${response.status} ${response.statusText} - ${errorText}`,
);
}
const data = turboFileEditResponseSchema.parse(await response.json());
return data.result;
}
const DESCRIPTION = `
## When to Use edit_file
Use the \`edit_file\` tool when you need to modify **a section or function** within an existing file. The edit output will be read by a less intelligent model, which will quickly apply the edit. You should make it clear what the edit is, while also minimizing the unchanged code you write.
**Use only ONE edit_file call per file.** If you need to make multiple changes to the same file, include all edits in sequence within a single call using \`// ... existing code ...\` comments between them.
## When NOT to Use edit_file
Do NOT use this tool when:
- You are making a **small, surgical edit** (1-3 lines) like fixing a typo, renaming a variable, updating a single value, or changing an import. Use \`search_replace\` instead for these precise changes.
- You are creating a brand-new file (use \`write_file\` instead).
- You are rewriting most of an existing file (in those cases, use \`write_file\` to output the complete file instead).
## Basic Format
When writing the edit, you should specify each edit in sequence, with the special comment // ... existing code ... to represent unchanged code in between edited lines.
Basic example:
\`\`\`
edit_file(path="file.js", instructions="I am adding error handling to the fetchData function and updating the return type.", content="""
// ... existing code ...
FIRST_EDIT
// ... existing code ...
SECOND_EDIT
// ... existing code ...
THIRD_EDIT
// ... existing code ...
""")
\`\`\`
## General Principles
You should bias towards repeating as few lines of the original file as possible to convey the change.
NEVER show unmodified code in the edit, unless sufficient context of unchanged lines around the code you're editing is needed to resolve ambiguity.
DO NOT omit spans of pre-existing code without using the // ... existing code ... comment to indicate its absence.
## Example: Basic Edit
\`\`\`
edit_file(path="LandingPage.tsx", instructions="I am changing the return statement in LandingPage to render a div with 'hello' instead of the previous content.", content="""
// ... existing code ...
const LandingPage = () => {
// ... existing code ...
return (
<div>hello</div>
);
};
// ... existing code ...
""")
\`\`\`
## Example: Deleting Code
**When deleting code, you must provide surrounding context and leave an explicit comment indicating what was removed.**
\`\`\`
edit_file(path="utils.ts", instructions="I am removing the deprecatedHelper function located between currentHelper and anotherHelper.", content="""
// ... existing code ...
export function currentHelper() {
return "active";
}
// REMOVED: deprecatedHelper() function
export function anotherHelper() {
return "working";
}
// ... existing code ...
""")
\`\`\`
`;
export const editFileTool: ToolDefinition<z.infer<typeof editFileSchema>> = {
name: "edit_file",
description: DESCRIPTION,
inputSchema: editFileSchema,
defaultConsent: "always",
modifiesState: true,
// Requires Dyad Pro engine API
isEnabled: (ctx) => ctx.isDyadPro,
getConsentPreview: (args) => `Edit ${args.path}`,
buildXml: (args, isComplete) => {
if (!args.path) return undefined;
let xml = `<dyad-edit path="${escapeXmlAttr(args.path)}" description="${escapeXmlAttr(args.instructions ?? "")}">\n${args.content ?? ""}`;
if (isComplete) {
xml += "\n</dyad-edit>";
}
return xml;
},
execute: async (args, ctx: AgentContext) => {
const fullFilePath = safeJoin(ctx.appPath, args.path);
// Track if this is a shared module
if (isSharedServerModule(args.path)) {
ctx.isSharedModulesChanged = true;
}
// Read original file content
if (!fs.existsSync(fullFilePath)) {
throw new DyadError(
`File does not exist: ${args.path}`,
DyadErrorKind.NotFound,
);
}
const originalContent = await readFile(fullFilePath, "utf8");
// Call the turbo-file-edit endpoint
const newContent = await callTurboFileEdit(
{
path: args.path,
content: args.content,
originalContent,
instructions: args.instructions,
},
ctx,
);
if (!newContent) {
throw new Error(
"Failed to extract content from turbo-file-edit response",
);
}
// Ensure directory exists
const dirPath = path.dirname(fullFilePath);
fs.mkdirSync(dirPath, { recursive: true });
// Write file content
fs.writeFileSync(fullFilePath, newContent);
logger.log(`Successfully edited file: ${fullFilePath}`);
queueCloudSandboxSnapshotSync({
appId: ctx.appId,
changedPaths: [args.path],
});
// Deploy Supabase function if applicable
if (
ctx.supabaseProjectId &&
isServerFunction(args.path) &&
!ctx.isSharedModulesChanged
) {
try {
await deploySupabaseFunction({
supabaseProjectId: ctx.supabaseProjectId,
functionName: path.basename(path.dirname(args.path)),
appPath: ctx.appPath,
organizationSlug: ctx.supabaseOrganizationSlug ?? null,
});
} catch (error) {
return `File edited, but failed to deploy Supabase function: ${error}`;
}
}
return `Successfully edited ${args.path}`;
},
};
......@@ -19,6 +19,7 @@ import {
import { sendTelemetryEvent } from "@/ipc/utils/telemetry";
import { DyadError, DyadErrorKind } from "@/errors/dyad_error";
import { queueCloudSandboxSnapshotSync } from "@/ipc/utils/cloud_sandbox_provider";
import { withLock, getFileWriteKey } from "@/ipc/utils/lock_utils";
const logger = log.scope("search_replace");
......@@ -106,40 +107,42 @@ CRITICAL REQUIREMENTS FOR USING THIS TOOL:
ctx.isSharedModulesChanged = true;
}
if (!fs.existsSync(fullFilePath)) {
throw new DyadError(
`File does not exist: ${args.file_path}`,
DyadErrorKind.NotFound,
);
}
await withLock(getFileWriteKey(fullFilePath), async () => {
if (!fs.existsSync(fullFilePath)) {
throw new DyadError(
`File does not exist: ${args.file_path}`,
DyadErrorKind.NotFound,
);
}
const original = await fs.promises.readFile(fullFilePath, "utf8");
const original = await fs.promises.readFile(fullFilePath, "utf8");
// Construct the operations string in the expected format
const escapedOld = escapeSearchReplaceMarkers(args.old_string);
const escapedNew = escapeSearchReplaceMarkers(args.new_string);
const operations = `<<<<<<< SEARCH\n${escapedOld}\n=======\n${escapedNew}\n>>>>>>> REPLACE`;
// Construct the operations string in the expected format
const escapedOld = escapeSearchReplaceMarkers(args.old_string);
const escapedNew = escapeSearchReplaceMarkers(args.new_string);
const operations = `<<<<<<< SEARCH\n${escapedOld}\n=======\n${escapedNew}\n>>>>>>> REPLACE`;
const result = applySearchReplace(original, operations);
const result = applySearchReplace(original, operations);
if (!result.success || typeof result.content !== "string") {
sendTelemetryEvent("local_agent:search_replace:failure", {
if (!result.success || typeof result.content !== "string") {
sendTelemetryEvent("local_agent:search_replace:failure", {
filePath: args.file_path,
error: result.error ?? "unknown",
});
throw new Error(
`Failed to apply search-replace: ${result.error ?? "unknown"}`,
);
}
await fs.promises.writeFile(fullFilePath, result.content);
logger.log(`Successfully applied search-replace to: ${fullFilePath}`);
queueCloudSandboxSnapshotSync({
appId: ctx.appId,
changedPaths: [args.file_path],
});
sendTelemetryEvent("local_agent:search_replace:success", {
filePath: args.file_path,
error: result.error ?? "unknown",
});
throw new Error(
`Failed to apply search-replace: ${result.error ?? "unknown"}`,
);
}
await fs.promises.writeFile(fullFilePath, result.content);
logger.log(`Successfully applied search-replace to: ${fullFilePath}`);
queueCloudSandboxSnapshotSync({
appId: ctx.appId,
changedPaths: [args.file_path],
});
sendTelemetryEvent("local_agent:search_replace:success", {
filePath: args.file_path,
});
// Deploy Supabase function if applicable
......
......@@ -27,16 +27,11 @@ export {
export type Todo = AgentTodo;
/** Tracks which file-editing tools were used on each file path */
export const FILE_EDIT_TOOL_NAMES = [
"write_file",
"edit_file",
"search_replace",
] as const;
export const FILE_EDIT_TOOL_NAMES = ["write_file", "search_replace"] as const;
export type FileEditToolName = (typeof FILE_EDIT_TOOL_NAMES)[number];
export interface FileEditTracker {
[filePath: string]: {
write_file: number;
edit_file: number;
search_replace: number;
};
}
......
......@@ -10,6 +10,7 @@ import {
isSharedServerModule,
} from "../../../../../../supabase_admin/supabase_utils";
import { queueCloudSandboxSnapshotSync } from "@/ipc/utils/cloud_sandbox_provider";
import { withLock, getFileWriteKey } from "@/ipc/utils/lock_utils";
const logger = log.scope("write_file");
const writeFileSchema = z.object({
......@@ -48,16 +49,18 @@ export const writeFileTool: ToolDefinition<z.infer<typeof writeFileSchema>> = {
ctx.isSharedModulesChanged = true;
}
// Ensure directory exists
const dirPath = path.dirname(fullFilePath);
fs.mkdirSync(dirPath, { recursive: true });
await withLock(getFileWriteKey(fullFilePath), async () => {
// Ensure directory exists
const dirPath = path.dirname(fullFilePath);
fs.mkdirSync(dirPath, { recursive: true });
// Write file content
fs.writeFileSync(fullFilePath, args.content);
logger.log(`Successfully wrote file: ${fullFilePath}`);
queueCloudSandboxSnapshotSync({
appId: ctx.appId,
changedPaths: [args.path],
// Write file content
fs.writeFileSync(fullFilePath, args.content);
logger.log(`Successfully wrote file: ${fullFilePath}`);
queueCloudSandboxSnapshotSync({
appId: ctx.appId,
changedPaths: [args.path],
});
});
// Deploy Supabase function if applicable
......
......@@ -68,23 +68,24 @@ You have tools at your disposal to solve the coding task. Follow these rules reg
const PRO_TOOL_CALLING_BEST_PRACTICES_BLOCK = `<tool_calling_best_practices>
- **Read before writing**: Use \`read_file\` and \`list_files\` to understand the codebase before making changes
- **Use \`edit_file\` for edits**: For modifying existing files, prefer \`edit_file\` over \`write_file\`
- **Prefer \`search_replace\` for edits**: For small to medium edits on existing files, use \`search_replace\` rather than rewriting the whole file
- **Be surgical**: Only change what's necessary to accomplish the task
- **Handle errors gracefully**: If a tool fails, explain the issue and suggest alternatives
</tool_calling_best_practices>`;
const PRO_FILE_EDITING_TOOL_SELECTION_BLOCK = `<file_editing_tool_selection>
You have three tools for editing files. Choose based on the scope of your change:
You have two tools for editing files. Choose based on the scope of your change:
| Scope | Tool | Examples |
|-------|------|----------|
| **Small** (a few lines) | \`search_replace\` or \`edit_file\` | Fix a typo, rename a variable, update a value, change an import |
| **Medium** (one function or section) | \`edit_file\` | Rewrite a function, add a new component, modify multiple related lines |
| **Large** (most of the file) | \`write_file\` | Major refactor, rewrite a module, create a new file |
| **Small to medium** (a few lines up to one function or contiguous section) | Single \`search_replace\` | Fix a typo, rename a variable, update a value, change an import, rewrite a function, modify multiple related lines |
| **Moderately large** (changes spread across multiple parts of the file, up to about half of it) | Multiple \`search_replace\` calls, one per distinct region | Update several functions, change an import plus update its call sites, refactor a few related sections |
| **Large** (rewriting the majority of the file, or creating a new file) | \`write_file\` | Major refactor that touches most of the file, rewrite a module end-to-end, create a new file |
**Tips:**
- \`edit_file\` supports \`// ... existing code ...\` markers to skip unchanged sections
- When in doubt, prefer \`search_replace\` for precision or \`write_file\` for simplicity
Lean toward \`search_replace\` when in doubt — for moderately large edits, prefer several targeted \`search_replace\` calls over one \`write_file\`. Use \`write_file\` when less than half of the original file will remain.
**Fallback rule:**
If \`search_replace\` fails twice in a row on the same edit (e.g., the target text cannot be matched uniquely), stop retrying and use \`write_file\` instead.
**Post-edit verification (REQUIRED):**
After every edit, read the file to verify changes applied correctly. If something went wrong, try a different tool and verify again.
......@@ -97,7 +98,7 @@ const PRO_DEVELOPMENT_WORKFLOW_BLOCK = `<development_workflow>
**Skip when:** the request is specific and concrete (e.g. "Fix the login button", "Change color from blue to green").
The tool accepts ONLY a \`questions\` array (no empty objects). It returns the user's answers as the tool result.
3. **Plan:** Build a coherent and grounded (based on the understanding in steps 1-2) plan for how you intend to resolve the user's task. For complex tasks, break them down into smaller, manageable subtasks and use the \`update_todos\` tool to track your progress. Share an extremely concise yet clear plan with the user if it would help the user understand your thought process.
4. **Implement:** Use the available tools (e.g., \`edit_file\`, \`write_file\`, ...) to act on the plan, strictly adhering to the project's established conventions. When debugging, add targeted console.log statements to trace data flow and identify root causes. **Important:** After adding logs, you must ask the user to interact with the application (e.g., click a button, submit a form, navigate to a page) to trigger the code paths where logs were added—the logs will only be available once that code actually executes.
4. **Implement:** Use the available tools (e.g., \`search_replace\`, \`write_file\`, ...) to act on the plan, strictly adhering to the project's established conventions. When debugging, add targeted console.log statements to trace data flow and identify root causes. **Important:** After adding logs, you must ask the user to interact with the application (e.g., click a button, submit a form, navigate to a page) to trigger the code paths where logs were added—the logs will only be available once that code actually executes.
5. **Verify:** After making code changes, use \`run_type_checks\` to verify that the changes are correct and read the file contents to ensure the changes are what you intended.
6. **Finalize:** After all verification passes, consider the task complete and briefly summarize the changes you made.
</development_workflow>`;
......@@ -209,7 +210,7 @@ When a user explicitly requests custom images, illustrations, or visual media fo
/**
* System prompt for Local Agent v2 in Pro mode
* Full access to all tools including edit_file, code_search, web_search, web_crawl
* Full access to all tools including code_search, web_search, web_crawl
*/
export const LOCAL_AGENT_SYSTEM_PROMPT = `
${ROLE_BLOCK}
......@@ -233,7 +234,7 @@ ${IMAGE_GENERATION_BLOCK}
/**
* System prompt for Local Agent v2 in Basic Agent mode (free tier)
* Limited tools - no edit_file, code_search, web_search, web_crawl
* Limited tools - no code_search, web_search, web_crawl
*/
export const LOCAL_AGENT_BASIC_SYSTEM_PROMPT = `
${ROLE_BLOCK}
......
......@@ -524,21 +524,6 @@ app.post("/github/api/test/clear-push-events", handleClearPushEvents);
// GitHub Git endpoints - intercept all paths with /github/git prefix
app.all("/github/git/*", handleGitPush);
// Dyad Engine turbo-file-edit endpoint for edit_file tool
app.post("/engine/v1/tools/turbo-file-edit", (req, res) => {
const { path: filePath, description } = req.body;
console.log(
`* turbo-file-edit: ${filePath} - ${description || "no description"}`,
);
try {
res.json({ result: "TURBO EDITED filePath" });
} catch (error) {
console.error(`* turbo-file-edit error:`, error);
res.status(400).json({ error: String(error) });
}
});
// Dyad Engine code-search endpoint for code_search tool
app.post("/engine/v1/tools/code-search", (req, res) => {
const { query, filesContext } = req.body;
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论