Unverified 提交 f6584527 authored 作者: Will Chen's avatar Will Chen 提交者: GitHub

plan: add web-fetch-local-agent implementation plan (#2801)

## Summary - Add planning document for implementing a new `web_fetch` tool for local agent mode - This tool will fetch and read website content when users share URLs, available to all users (free + Pro) - Unlike the existing Pro-only `web_crawl` tool, `web_fetch` performs direct local HTTP fetch at zero infrastructure cost ## Test plan - Manual review of the plan document to ensure implementation strategy is clear - Implementation will follow the detailed testing strategy outlined in the plan 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/dyad-sh/dyad/pull/2801" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end -->
上级 12c3456a
# Web Fetch Tool for Local Agent Mode
> Generated by swarm planning session on 2026-02-25
## Summary
Add a new `web_fetch` tool to the local agent that fetches and reads website content when users share URLs for reference. Unlike the existing Pro-only `web_crawl` tool (which uses Firecrawl for visual cloning with screenshots), `web_fetch` performs a direct local HTTP fetch from the user's machine, making it available to all users (free + Pro) at zero infrastructure cost.
## Problem Statement
When users paste a URL into the Dyad chat (e.g., "Help me integrate this API: https://docs.stripe.com/api"), the agent cannot access the content behind that URL. Users must manually copy-paste page content, breaking their flow. This is especially painful for developers building with APIs, following tutorials, or referencing documentation — the most common use cases for Dyad's target audience. The existing `web_crawl` tool only activates for "clone/copy/replicate" intent and requires Dyad Pro, leaving a gap for the broader "read this page for context" use case.
## Scope
### In Scope (MVP)
- New `web_fetch` tool that fetches a URL and returns content as markdown
- Available to **all users** (free + Pro) — no `isDyadPro` gate
- LLM-triggered via standard tool call mechanism (not auto-detected)
- HTML-to-markdown conversion using `turndown` + `@mozilla/readability` for content extraction
- Content-Type detection: HTML → markdown, JSON → code block, text → as-is, PDF/images → "not supported" message
- URL scheme validation (`http:` and `https:` only; block `file:`, `ftp:`, `data:`, `javascript:`, `blob:` schemes)
- Private/localhost IPs allowed (consent dialog is sufficient protection)
- Consent-gated with `"ask"` default
- Content truncation at 16,000 characters (matching existing `MAX_TEXT_SNIPPET_LENGTH`)
- Timeout at 10-15 seconds via `AbortController`
- XML streaming preview via `<dyad-web-fetch>` tag
- Clear error messages for timeout, 403/blocked, empty content, unsupported content types
### Out of Scope (Follow-up)
- Auto-detection of URLs in user input (pre-fetching before LLM runs)
- JavaScript rendering / headless browser for SPAs
- Screenshot capture
- PDF content extraction
- Caching of fetched pages within a session
- Batch consent UI for multiple URLs in one message
- Re-fetch / refresh button on completed cards
- Link preview in chat input area
## User Stories
- As a developer building an app, I want to paste an API documentation URL and have the agent understand its contents, so that I can say "integrate this API" without manually copying docs.
- As a user following a tutorial, I want to share a blog post or tutorial URL with the agent, so that it can follow the instructions and implement what the tutorial describes.
- As a user referencing a design, I want to share a website URL for style reference (without cloning), so that the agent understands the content and direction I'm going for.
- As a free-tier user, I want basic web fetching to work without a Pro subscription, so that I can reference external content in my workflow.
## UX Design
### User Flow
1. User types a message that includes a URL (e.g., "Use the Stripe API docs at https://docs.stripe.com/api to add payments")
2. The LLM recognizes the URL and determines it needs the page content to fulfill the request
3. A consent dialog appears: `Fetch page content: "https://docs.stripe.com/api"`
4. User approves (accept-once / accept-always / decline)
5. A `<dyad-web-fetch>` card appears in the chat showing the URL being fetched with a loading state
6. Content is fetched, processed through Readability + Turndown, truncated if needed, and returned as the tool result
7. The card transitions to a completed state showing the page title (extracted by Readability) and URL
8. The AI continues its response using the fetched content as context
### Key States
- **Loading**: Card with URL, spinner, "Fetching..." label (use existing `DyadStateIndicator` pattern)
- **Completed (HTML)**: Card with page title (extracted by Readability) + URL in muted text, expandable to show markdown preview
- **Completed (JSON)**: Card with `application/json` badge + URL, expandable content as code block
- **Completed (text)**: Card with `text/plain` badge + URL, content displayed as-is
- **Error — Timeout**: "This page couldn't be reached. Check the URL and try again."
- **Error — Blocked (403)**: "This page blocked the request. You may need to copy-paste its content manually."
- **Error — Empty/JS-only**: "This page returned no readable content. It may require JavaScript to render."
- **Warning — Unsupported type**: Amber/warning state (not red error): "PDF files cannot be fetched as text. Try copying the relevant content and pasting it into the chat." (Use `<dyad-output type="warning">`)
- **Truncated**: Show note on card: "Content truncated (showing first 16,000 characters)"
### Interaction Details
- Consent preview text: `Fetch page content: "https://..."` (action-focused, not implementation-detail-focused)
- Card icon: Use `Link` from lucide-react (differentiated from `Globe` for web_search and `ScanQrCode` for web_crawl)
- Badge color: Use `purple` to differentiate from the `blue` used by web_search and web_crawl
- Completed card is collapsed by default with page title visible; expandable to show markdown preview
- When truncation occurs, surface it in the card UI so users understand the AI only saw partial content
### Accessibility
- Consent dialog: keyboard-navigable via standard button focus (existing pattern)
- Expandable cards: Enter/Space to toggle (existing `DyadCard` pattern)
- Screen reader: announce "Web Fetch completed: [page title]" or "Web Fetch failed: [error]"
## Technical Design
### Architecture
New tool following the established `ToolDefinition<T>` pattern. Performs a direct HTTP fetch from the Electron main process using Node.js `fetch()`, processes HTML through `@mozilla/readability` for content extraction, then converts to markdown via `turndown`. Returns the markdown string as the tool result. No changes to existing tools or the agent handler.
**Dependency pipeline:** `fetch(url)``linkedom.parseHTML(html)``new Readability(doc).parse()``new TurndownService().turndown(article.content)``truncateText(markdown)`
`linkedom` is required because both `@mozilla/readability` and `turndown` need a DOM document, and Electron's main process doesn't have one. `linkedom` is lightweight (~50KB) and much faster than JSDOM.
### Components Affected
- **New file:** `src/pro/main/ipc/handlers/local_agent/tools/web_fetch.ts` — Tool implementation
- **Modified:** `src/pro/main/ipc/handlers/local_agent/tool_definitions.ts` — Import and register `webFetchTool` in `TOOL_DEFINITIONS` array
- **Modified:** `package.json` — Add `turndown`, `@types/turndown`, `linkedom`, `@mozilla/readability` (or `defuddle`)
- **New file (renderer):** `DyadWebFetch` component for rendering the `<dyad-web-fetch>` XML tag in chat
- **No changes to:** `web_crawl.ts`, `engine_fetch.ts`, `local_agent_handler.ts`, `types.ts`
### Data Model Changes
None. The tool returns a string result via the existing `ToolResult` type. No schema or storage changes.
### API Changes
No external API changes. Internally:
- New tool `web_fetch` added to `TOOL_DEFINITIONS` array
- New XML tag `<dyad-web-fetch>` for renderer
### Tool Description (Critical)
The tool description guides LLM behavior and is the single biggest factor in feature success:
```
Fetch and read content from a URL. Works with web pages (returns cleaned markdown) and API endpoints (returns JSON).
### When to Use
Use this tool when the user shares a URL and wants you to reference, understand, or use information from that page. Examples:
- User shares API documentation and asks you to integrate it
- User shares a tutorial or blog post and wants you to follow it
- User shares a web page and asks about its content
- User shares an API endpoint URL and wants you to understand the response
### When NOT to Use
- User wants to CLONE / COPY / REPLICATE / RECREATE a website's visual design — use web_crawl instead
- User mentions a URL in passing without wanting you to read it
- You need to search the web for information (no specific URL) — use web_search instead
### Limitations
- Cannot render JavaScript — some dynamic/SPA pages may return limited content
- Content is truncated to ~16,000 characters for very long pages
- PDF and image files are not supported
```
### Key Implementation Details
````typescript
// web_fetch.ts - Core structure
const webFetchSchema = z.object({
url: z.string().describe("URL to fetch"),
});
// URL validation: only http: and https: schemes
// No private IP blocking (user decision: allow with consent)
// Timeout: 10-15 seconds via AbortController
// User-Agent: set a reasonable browser-like string
// Content-Type handling:
// text/html → Readability extraction → Turndown markdown → truncate
// application/json → return as ```json code block → truncate
// text/plain, text/markdown → return as-is → truncate
// application/pdf, image/* → return "not supported" message
// other → attempt text extraction, fall back to "not supported"
// Truncation: reuse MAX_TEXT_SNIPPET_LENGTH (16,000 chars) pattern
export const webFetchTool: ToolDefinition<z.infer<typeof webFetchSchema>> = {
name: "web_fetch",
description: DESCRIPTION,
inputSchema: webFetchSchema,
defaultConsent: "ask",
// No isEnabled gate — available to all users
getConsentPreview: (args) => `Fetch page content: "${args.url}"`,
buildXml: (args, isComplete) => {
if (!args.url) return undefined;
let xml = `<dyad-web-fetch url="${escapeXmlContent(args.url)}">`;
if (isComplete) xml += "</dyad-web-fetch>";
return xml;
},
execute: async (args, ctx) => {
// 1. Validate URL scheme (http/https only)
// 2. Fetch with timeout (AbortController, 15s)
// 3. Check Content-Type header
// 4. For HTML: parse with Readability, convert with Turndown
// 5. For JSON: wrap in code block
// 6. For text: return as-is
// 7. For unsupported: return clear message
// 8. Truncate to MAX_TEXT_SNIPPET_LENGTH
// 9. Return markdown string as tool result
},
};
````
## Implementation Plan
### Phase 1: Core Tool
- [ ] Add dependencies: `turndown`, `@types/turndown`, `linkedom`, `@mozilla/readability` (evaluate `defuddle` as alternative)
- [ ] Create `src/pro/main/ipc/handlers/local_agent/tools/web_fetch.ts` with:
- URL scheme validation
- Fetch with AbortController timeout (15 seconds)
- Content-Type detection and routing
- Readability extraction for HTML
- Turndown markdown conversion
- JSON/text/unsupported content handling
- Truncation using existing pattern
- Proper error messages for common failure modes
- [ ] Register `webFetchTool` in `tool_definitions.ts` TOOL_DEFINITIONS array
- [ ] Write tool description with clear when-to-use / when-not-to-use guidance
### Phase 2: Renderer Component
- [ ] Create `DyadWebFetch` component to render `<dyad-web-fetch>` XML tags
- [ ] Implement loading state (URL + spinner)
- [ ] Implement completed state (page title + URL, expandable markdown preview)
- [ ] Implement error states
- [ ] Show truncation indicator when content was truncated
- [ ] Register in the markdown parser's XML tag handler
### Phase 3: Testing
- [ ] Unit tests for URL validation (scheme checking, malformed URLs)
- [ ] Unit tests for Content-Type handling (HTML, JSON, text, PDF, images)
- [ ] Unit tests for HTML-to-markdown conversion (simple pages, complex pages, empty bodies)
- [ ] Unit tests for truncation behavior
- [ ] Unit tests for timeout/error handling (mock fetch failures, non-200 responses)
- [ ] Integration test: verify tool appears in `buildAgentToolSet` output (no `isEnabled` gate)
- [ ] Manual E2E testing with real URLs in local agent chat
## Testing Strategy
- [ ] Unit test URL scheme validation: verify `file://`, `ftp://`, `data:` are rejected; `http://` and `https://` are accepted
- [ ] Unit test Content-Type routing: verify HTML → readability+turndown, JSON → code block, text → as-is, PDF → error message
- [ ] Unit test HTML conversion with various inputs: simple pages, pages with scripts/styles, empty bodies, non-UTF-8 encoding
- [ ] Unit test truncation: verify content over 16K chars is truncated with indicator
- [ ] Unit test error handling: mock network failures, timeouts, 403/404 responses, non-200 status codes
- [ ] Integration test: verify `webFetchTool` is included in tool set for both Pro and non-Pro contexts
- [ ] Manual test: verify consent dialog, loading card, completed card, error states in the actual UI
- [ ] Manual test: verify tool is NOT triggered for clone/replicate intent (web_crawl should be used instead)
## Risks & Mitigations
| Risk | Likelihood | Impact | Mitigation |
| --------------------------------------------------------------------- | ---------- | ------ | ---------------------------------------------------------------------------------------------- |
| JS-rendered SPAs return minimal content | Medium | Medium | Clear tool description noting limitation; LLM can explain to user; Pro users can use web_crawl |
| LLM confuses web_fetch with web_crawl or web_search | Low | Medium | Precise, mutually-exclusive tool descriptions with explicit when/when-not guidance |
| Large HTML pages block Electron main process during conversion | Low | Medium | Truncate raw HTML before processing; move to worker thread in follow-up if needed |
| Content quality varies across sites (paywalls, anti-bot) | Medium | Low | Return clear error messages; user can fall back to manual copy-paste |
| New dependencies (turndown, readability) introduce maintenance burden | Low | Low | Both are mature, stable libraries with large install bases |
| "Accept always" consent enables unbounded fetch loops | Low | Medium | Monitor; consider per-turn fetch limit in follow-up if abuse is observed |
## Open Questions
- **Readability vs. Defuddle:** Evaluate `defuddle` (by Jina AI) as a potential alternative to `@mozilla/readability`. Defuddle may offer better extraction for modern web pages. Decision can be made during implementation based on testing.
- **DOM library:** `linkedom` is included as the DOM implementation since both `@mozilla/readability` and `turndown` require a DOM document and Electron's main process doesn't provide one. `linkedom` is lightweight (~50KB) and much faster than JSDOM.
- **Multiple URLs per message:** When a user pastes 2-5 URLs, the LLM may call `web_fetch` multiple times. Each triggers a separate consent dialog. If this proves disruptive, consider batch consent UI in a follow-up.
- **Stale content:** Fetched content is point-in-time. For long conversations, consider adding timestamps to fetch cards and a re-fetch capability in a follow-up.
## Decision Log
| Decision | Reasoning |
| -------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| New tool (`web_fetch`) rather than extending `web_crawl` | Use cases are fundamentally different (read vs. clone). Separate tools = cleaner code, clearer LLM descriptions, independent consent settings. All 3 roles agreed independently. |
| Available to all users (free + Pro) | Local fetch has zero infrastructure cost. Differentiates free tier. Natural upsell to Pro for enhanced crawl+screenshot. |
| LLM-triggered, not auto-detected | Consistent with existing tool architecture. Auto-detection would require new handler-layer logic and might fetch URLs users didn't intend. |
| Allow private/localhost IPs | Dyad runs locally; SSRF is a server-side threat model. Fetching localhost:3000 or internal docs is a legitimate use case. Consent dialog provides sufficient protection. |
| Include @mozilla/readability in v1 | Dramatically better content extraction (strips nav, footer, ads). Small marginal cost (one extra dependency). All roles agreed. |
| Handle Content-Type gracefully | ~15 lines of code prevents confusing failures for JSON, text, PDF URLs. Better UX for minimal effort. |
| Consent default: "ask" | Consistent with web_crawl and web_search. Network requests to arbitrary external URLs warrant explicit approval. |
| Truncation at 16K characters | Matches existing `MAX_TEXT_SNIPPET_LENGTH`. Prevents context window overflow while providing substantial content. |
| Tool name: `web_fetch` | Consistent with `web_search`, `web_crawl` naming convention. Clear, concise, action-oriented. |
---
_Generated by dyad:swarm-to-plan_
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论