Add dangerous action safeguards plan (#2733)

## Summary - Add planning documentation for dangerous-action guardrails and implementation approach. - Document detection and mitigation strategies for potentially destructive operations. - Define acceptance criteria and rollout/testing recommendations. ## Test plan - Manual review of the plan document for completeness and consistency. - Validate markdown formatting and consistency with repository conventions. 🤖 Generated with [Claude Code](https://claude.com/claude-code)  --- <a href="https://app.devin.ai/review/dyad-sh/dyad/pull/2733" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a>

Add dangerous action safeguards plan (#2733)
a30c45d7 · Will Chen · GitHub · 5b7497fb · a30c45d7
--- a/plans/dangerous-action-guards.md
+++ b/plans/dangerous-action-guards.md
+# Dangerous Action Guards
+
+> Generated by swarm planning session on 2026-02-14
+
+## Summary
+
+Add automatic safety guards that detect and warn users before executing dangerous actions -- destructive SQL queries, malicious npm packages, and suspicious code patterns -- even when auto-approve is enabled. Includes a "dangerous approval override" toggle for power users who want to bypass all safety checks.
+
+## Problem Statement
+
+Users building apps with Dyad can inadvertently (or through prompt injection) execute destructive actions. Today, Dyad's only defense is the consent banner ("Allow once / Always allow / Decline"), which users frequently bypass with auto-approve or "Always allow" settings. Once bypassed, there is **zero validation**:
+
+- SQL queries run as-is -- a single `DROP TABLE` can destroy hours of work
+- Package names are passed directly to shell commands with no validation (and there is an **existing command injection vulnerability** in `executeAddDependency.ts`)
+- File writes from the LLM are completely unscanned
+
+The LLM is an untrusted actor. Prompt injection, hallucination, and model errors can generate destructive operations the user never intended. Auto-approve removes the last line of defense. Users trust Dyad to help them build safely.
+
+## Scope
+
+### In Scope (MVP)
+
+1. **Dangerous SQL detection** -- Heuristic pattern matching for destructive SQL operations (DROP, TRUNCATE, DELETE without WHERE, etc.). Force an enhanced consent prompt even if auto-approve is enabled.
+2. **Malicious npm package detection** -- Input sanitization (fix command injection vulnerability), registry existence check pre-install, `npm audit` post-install for known CVEs.
+3. **Narrow code injection scanning** -- High-confidence pattern detection for reverse shells, crypto miners, credential exfiltration, and obfuscated eval payloads. Near-zero false positive tolerance.
+4. **Enhanced consent banner** -- Danger variant with red/destructive styling, human-readable explanations, and two-button design (no "Always allow" for dangerous actions).
+5. **Dangerous approval override** -- Settings toggle to skip all danger checks, with confirmation dialog requiring typed acknowledgment and persistent UI indicator when active.
+6. **package.json write detection** -- When `write_file` or `search_replace` targets `package.json`, run the same package validation on newly-added dependencies.
+7. **Telemetry** -- Track danger detections, categories, and user decisions (allow/decline) to tune false positive rates.
+
+### Out of Scope (Follow-up)
+
+- LLM-based SQL semantic analysis (expensive, latency, provider dependency)
+- Comprehensive code security scanning beyond the narrow pattern set
+- MCP tool danger detection (MCP tools are opaque -- we don't control their behavior)
+- Typosquatting detection (requires maintaining/fetching popular package lists)
+- Sandboxed SQL execution / dry-run mode
+- Build-mode proposal security risk interception (separate code path from tool consent)
+- Per-category danger guard enable/disable in settings
+
+## User Stories
+
+- As a user with auto-approve enabled, I want Dyad to still warn me before executing destructive SQL so that I don't accidentally lose data.
+- As a user building with Supabase, I want to see exactly why a SQL query was flagged as dangerous so that I can make an informed decision to proceed or decline.
+- As a user adding dependencies, I want Dyad to warn me if a package is known-malicious or has known vulnerabilities so that I don't introduce security issues into my app.
+- As a user, I want to see a clear explanation of why an action was flagged so that I can dismiss false positives confidently.
+- As a power user, I want to disable danger checks entirely so that I can work without interruption when I know what I'm doing.
+- As a user reviewing agent actions (auto-approve OFF), I want danger context in the consent banner so that I can make better-informed decisions about which actions to allow.
+
+## UX Design
+
+### User Flow
+
+**Flow 1: Dangerous SQL detected (auto-approve ON)**
+
+1. User has auto-approve enabled and is iterating on their app
+2. Agent generates a SQL query (e.g., `DROP TABLE users`)
+3. `dangerCheck` on the SQL tool detects destructive pattern
+4. Instead of auto-executing, the system intercepts and shows a **danger consent banner**
+5. Banner shows: "Auto-approve paused: this query will permanently delete the `users` table and all its data"
+6. User clicks "Allow anyway" (destructive style) or "Decline" (default focus)
+7. If approved, execution continues; if declined, the agent gets feedback that the action was blocked
+
+**Flow 2: Dangerous SQL detected (auto-approve OFF)**
+
+1. Agent generates a destructive SQL query
+2. `dangerCheck` detects the pattern
+3. The normal consent banner is shown but with **enhanced danger styling** (red border, ShieldAlert icon, explanation text)
+4. User reviews and decides with better context than the standard consent banner provides
+
+**Flow 3: Malicious npm package detected**
+
+1. Agent attempts to install a package
+2. Package name is validated (sanitization regex) -- invalid names are rejected immediately
+3. Registry existence check confirms the package exists
+4. If the consent banner fires (ask mode or danger-escalated), it includes package metadata
+5. After installation, `npm audit --json` runs and parses results
+6. If vulnerabilities found: critical/high severity shows red `danger` banner; moderate/low shows amber `warning` banner
+7. User reviews advisory details and decides
+
+**Flow 4: Suspicious code detected**
+
+1. Agent writes code via `write_file`, `edit_file`, or `search_replace`
+2. Content is scanned against the high-confidence pattern set
+3. If a pattern matches, a danger banner appears showing the filename, flagged snippet, and a specific explanation (e.g., "This code appears to open a reverse shell connection to an external server")
+4. User reviews and decides
+
+**Flow 5: Enabling dangerous approval override**
+
+1. User navigates to Settings > Safety section
+2. Finds "Skip all danger checks" toggle (default: OFF)
+3. Toggling ON opens a confirmation dialog: "This will skip all safety warnings for dangerous SQL, suspicious packages, and potentially malicious code. Actions will proceed without review."
+4. Dialog requires typing "I understand" to confirm
+5. Once enabled, a persistent shield-off icon appears in the chat header/status bar
+6. Icon is clickable to jump back to the setting
+7. All danger checks are bypassed; normal consent flow still applies per tool settings
+
+### Key States
+
+- **Default (no danger)**: Invisible. Zero friction. Actions proceed normally per consent settings.
+- **Danger detected (auto-approve ON)**: Red/destructive banner with ShieldAlert icon, explanation, two buttons. Auto-approve paused.
+- **Danger detected (auto-approve OFF)**: Enhanced consent banner with red styling and danger explanation. Same two buttons.
+- **Warning detected (lower severity)**: Amber banner with AlertTriangle icon. Moderate/low npm advisories, DELETE with WHERE clause, etc.
+- **Checking safety (async)**: Brief inline indicator ("Checking packages...") only for async checks like npm registry lookup. Not shown for instant checks (SQL regex).
+- **Override active**: Persistent shield-off indicator in chat header. All danger checks bypassed.
+- **Check failed/unavailable**: Fail-open with subtle notification: "Safety check unavailable -- proceeding." User knows the guard wasn't active.
+
+### Interaction Details
+
+**Danger consent banner:**
+
+- Visually distinct from standard consent banner: red/destructive color scheme, ShieldAlert icon (not Bot icon)
+- Includes: category label ("Dangerous SQL" / "Vulnerable Package" / "Suspicious Code"), human-readable explanation, expandable content preview
+- Two buttons only: "Allow anyway" (destructive variant) and "Decline" (default style)
+- No "Always allow" option -- you cannot permanently approve dangerous actions by category
+- Not dismissible via X button -- only explicit button clicks
+- Takes priority in consent queue (dangerous items shown first)
+- When auto-approve is ON, banner copy reads "Auto-approve paused: [explanation]"
+
+**Keyboard navigation:**
+
+- "Decline" is default focused (Enter = safe action)
+- "Allow anyway" requires Tab + Enter (deliberate action)
+
+**Queue behavior:**
+
+- If agent fires 5 actions with auto-approve, 4 safe ones auto-execute, 1 dangerous one pauses
+- Multiple dangerous actions in parallel: show sequentially with queue count
+
+**Danger explanation quality (required templates):**
+
+| Pattern                       | Explanation Template                                                                  |
+| ----------------------------- | ------------------------------------------------------------------------------------- |
+| `DROP TABLE x`                | "This query will permanently delete the `{table}` table and all its data"             |
+| `DROP DATABASE x`             | "This query will permanently delete the entire `{database}` database"                 |
+| `TRUNCATE x`                  | "This query will delete all rows from the `{table}` table"                            |
+| `DELETE FROM x` (no WHERE)    | "This query will delete all rows from the `{table}` table"                            |
+| `ALTER TABLE x DROP COLUMN y` | "This query will permanently remove the `{column}` column from the `{table}` table"   |
+| `GRANT` / `REVOKE`            | "This query modifies database permissions"                                            |
+| npm critical/high advisory    | "Package `{name}` has a known vulnerability: {advisory_title} (severity: {severity})" |
+| npm moderate/low advisory     | "Package `{name}` has a known advisory: {advisory_title} (severity: {severity})"      |
+| Reverse shell pattern         | "This code appears to open a reverse shell connection to an external server"          |
+| Crypto miner pattern          | "This code contains patterns associated with cryptocurrency mining"                   |
+| Credential exfiltration       | "This code appears to send environment variables to an external URL"                  |
+| Obfuscated eval               | "This code contains an obfuscated execution pattern (base64-decoded eval)"            |
+
+### Accessibility
+
+- Not color-alone: danger banner differs via icon (ShieldAlert vs Bot), text label ("Potentially dangerous" vs standard), AND color
+- `aria-live="polite"` on danger banner (not "assertive" -- the agent is paused, no urgency to interrupt)
+- Focus moves to danger banner when it appears; returns to chat input on resolution
+- "Skip all danger checks" toggle associated with `aria-describedby` pointing to warning text
+- Confirmation dialog is keyboard-navigable and screen-reader announced
+
+## Technical Design
+
+### Architecture
+
+Add a `dangerCheck` method to the existing `ToolDefinition` interface. This runs before consent and can escalate the consent level from "always" to forced-ask with danger context. The detection logic is per-tool (each tool knows its domain), while the consent escalation is centralized in `buildAgentToolSet`.
+
+```
+Tool invocation → dangerCheck() → if dangerous, force consent with dangerInfo
+                                → if safe, proceed with normal consent flow
+```
+
+New module: `src/pro/main/ipc/handlers/local_agent/danger_detection/` containing:
+
+- `sql_heuristics.ts` -- SQL pattern matching
+- `npm_validation.ts` -- Package name sanitization + registry/audit checks
+- `code_scanning.ts` -- High-confidence malicious code patterns
+- `types.ts` -- Shared types (`DangerCheckResult`)
+
+### Components Affected
+
+| Component             | File(s)                                                                | Change Type                                                                             |
+| --------------------- | ---------------------------------------------------------------------- | --------------------------------------------------------------------------------------- |
+| Tool definition types | `tools/types.ts`                                                       | Add `dangerCheck` to `ToolDefinition` interface                                         |
+| Tool set builder      | `tool_definitions.ts`                                                  | Wire `dangerCheck` into execute wrapper, pass `dangerInfo` to consent request           |
+| SQL tool              | `tools/execute_sql.ts`                                                 | Add SQL danger heuristics via `dangerCheck`                                             |
+| Dependency tool       | `tools/add_dependency.ts`                                              | Add package validation via `dangerCheck`                                                |
+| Dependency processor  | `executeAddDependency.ts`                                              | **Fix command injection**: use `execFile` with array args; add post-install `npm audit` |
+| File write tools      | `tools/write_file.ts`, `tools/edit_file.ts`, `tools/search_replace.ts` | Add code scanning via `dangerCheck`; add package.json filename detection                |
+| Settings schema       | `src/lib/schemas.ts`                                                   | Add `dangerousApprovalOverride` field                                                   |
+| Settings UI           | New "Safety" section in settings                                       | Toggle with confirmation dialog                                                         |
+| Consent banner        | `AgentConsentBanner.tsx`                                               | Danger variant (red styling, two buttons, explanation, priority queue)                  |
+| Consent types         | IPC payload types                                                      | Add `dangerInfo` to consent request                                                     |
+| Chat UI               | Chat header/status area                                                | Persistent shield-off indicator when override is active                                 |
+| Telemetry             | Agent handler                                                          | Emit danger detection events                                                            |
+
+### Data Model Changes
+
+**UserSettings additions (in `schemas.ts`):**
+
+```typescript
+dangerousApprovalOverride: z.boolean().optional(), // default: false
+```
+
+**New types:**
+
+```typescript
+interface DangerCheckResult {
+  level: "warning" | "danger";
+  category: "destructive_sql" | "malicious_package" | "suspicious_code";
+  message: string; // Human-readable explanation (required, specific)
+  details?: string; // Extended details (full query, advisory URL, etc.)
+}
+```
+
+**Extended consent request payload:**
+
+```typescript
+// In agent-tool:consent-request IPC event
+{
+  requestId: string;
+  chatId: number;
+  toolName: string;
+  toolDescription: string;
+  inputPreview: string;
+  dangerInfo: DangerCheckResult | null; // NEW
+}
+```
+
+**Extended ToolDefinition interface:**
+
+```typescript
+interface ToolDefinition<T> {
+  // ... existing fields ...
+  dangerCheck?: (
+    args: T,
+    ctx: AgentContext,
+  ) => Promise<DangerCheckResult | null>;
+}
+```
+
+### API Changes
+
+- **Modified `buildAgentToolSet` execute wrapper**: Before calling `requireConsent`, run `dangerCheck`. If result is non-null and `dangerousApprovalOverride` is not enabled, force consent to "ask" and include `dangerInfo` in the consent request payload.
+- **Modified consent request IPC**: Add `dangerInfo` field to `agent-tool:consent-request` event.
+- **Modified consent response**: When `dangerInfo` is present, only accept `"accept-once"` or `"decline"` (no `"accept-always"`).
+- **New telemetry events**: `danger_check:detected` and `danger_check:override` with category, tool name, and user decision.
+
+### SQL Danger Heuristics
+
+Patterns to detect (case-insensitive, ignoring SQL comments):
+
+| Pattern                       | Level   | Template                                              |
+| ----------------------------- | ------- | ----------------------------------------------------- |
+| `DROP TABLE`                  | danger  | "permanently delete the `{table}` table"              |
+| `DROP DATABASE`               | danger  | "permanently delete the entire `{database}` database" |
+| `TRUNCATE TABLE`              | danger  | "delete all rows from the `{table}` table"            |
+| `DELETE FROM` without `WHERE` | danger  | "delete all rows from the `{table}` table"            |
+| `ALTER TABLE ... DROP COLUMN` | warning | "permanently remove the `{column}` column"            |
+| `GRANT` / `REVOKE`            | warning | "modifies database permissions"                       |
+| `DROP SCHEMA` / `DROP INDEX`  | warning | "permanently delete database object"                  |
+
+Implementation notes:
+
+- Strip SQL comments (`--`, `/* */`) before pattern matching to prevent bypass
+- Handle multi-statement queries (split on `;` and check each)
+- Sub-millisecond execution (regex only, no parsing)
+
+### npm Package Validation
+
+**Pre-install (in `dangerCheck`):**
+
+1. Validate package name against npm naming rules: `^(@[a-z0-9-~][a-z0-9-._~]*/)?[a-z0-9-~][a-z0-9-._~]*(@.*)?$`
+2. Reject any name that doesn't match (prevents command injection AND invalid packages)
+3. Fetch `https://registry.npmjs.org/{package}` to confirm existence and check `deprecated` flag
+
+**Post-install (in `executeAddDependency`):**
+
+1. Run `npm audit --json` or `pnpm audit --json` in the app directory
+2. Parse output for new vulnerabilities
+3. If critical/high: show `danger` banner with advisory details
+4. If moderate/low: show `warning` banner
+5. Cache advisory data locally with 24-hour TTL for repeated installs
+
+**Command injection fix (immediate, independent):**
+
+- Replace `exec(\`pnpm add ${packageStr}\`)`with`execFile("pnpm", ["add", ...packages])` or equivalent
+- Validate all package name strings before any shell interaction
+
+### Code Injection Patterns
+
+High-confidence, near-zero false positive patterns:
+
+```typescript
+const DANGER_PATTERNS = [
+  // Reverse shells
+  {
+    pattern: /\b(nc|ncat|netcat)\s+-[a-z]*e\s/i,
+    message: "reverse shell connection",
+  },
+  { pattern: /\/dev\/tcp\//, message: "reverse shell connection" },
+  {
+    pattern: /child_process.*?(exec|spawn).*?(bash|sh|cmd|powershell)/s,
+    message: "shell execution",
+  },
+
+  // Crypto miners
+  {
+    pattern: /\b(coinhive|cryptonight|stratum\+tcp|xmrig)\b/i,
+    message: "cryptocurrency mining",
+  },
+
+  // Credential exfiltration
+  {
+    pattern: /process\.env\b.*?\bfetch\s*\(/s,
+    message: "environment variable exfiltration",
+  },
+  {
+    pattern: /process\.env\b.*?\bhttp/s,
+    message: "environment variable exfiltration",
+  },
+
+  // Obfuscated payloads
+  { pattern: /\batob\s*\(.*?\beval\b/s, message: "obfuscated code execution" },
+  {
+    pattern: /Buffer\.from\s*\([^)]+,\s*['"]base64['"]\).*?\beval\b/s,
+    message: "obfuscated code execution",
+  },
+];
+```
+
+Applied to content in `write_file`, `edit_file` (edit sketch content), and `search_replace` (replacement content). Not applied to the full file to avoid false positives from existing code.
+
+## Implementation Plan
+
+### Phase 0: Security Fix (Independent, Ship Immediately)
+
+- [ ] Fix command injection in `executeAddDependency.ts` -- replace string interpolation with `execFile` array args or validate package names with regex before shell execution
+- [ ] Add unit tests for package name validation
+
+### Phase 1: Foundation
+
+- [ ] Add `dangerCheck` field to `ToolDefinition` interface in `tools/types.ts`
+- [ ] Add `DangerCheckResult` type to `danger_detection/types.ts`
+- [ ] Wire `dangerCheck` into `buildAgentToolSet` execute wrapper -- run before consent, force "ask" if dangerous
+- [ ] Extend consent request IPC payload with `dangerInfo: DangerCheckResult | null`
+- [ ] Update `AgentConsentBanner.tsx` with danger variant: red styling, ShieldAlert icon, explanation text, two-button layout (no "Always allow"), priority queue ordering, not X-dismissible
+- [ ] Add `aria-live="polite"`, focus management, keyboard defaults (Decline focused)
+- [ ] Add danger detection telemetry: `danger_check:detected`, `danger_check:user_decision`
+
+### Phase 2: SQL Danger Heuristics
+
+- [ ] Implement `sql_heuristics.ts` with pattern matching for destructive operations
+- [ ] Add `dangerCheck` to `executeSqlTool` that calls SQL heuristics
+- [ ] Handle SQL comment stripping, multi-statement queries
+- [ ] Add human-readable explanation templates with table/column name extraction
+- [ ] Unit tests: corpus of dangerous and safe SQL, edge cases (DROP in comments, DELETE with complex WHERE, multi-statement)
+
+### Phase 3: npm Package Validation
+
+- [ ] Implement `npm_validation.ts` with package name sanitization regex
+- [ ] Add pre-install registry existence check (`https://registry.npmjs.org/{package}`)
+- [ ] Add `dangerCheck` to `addDependencyTool` for pre-install validation
+- [ ] Add post-install `npm audit --json` / `pnpm audit --json` parsing in `executeAddDependency.ts`
+- [ ] Map npm advisory severity to danger levels (critical/high = danger, moderate/low = warning)
+- [ ] Add local caching for advisory data (24-hour TTL)
+- [ ] Handle `@version` suffix in package names
+- [ ] Unit tests: valid/invalid names, known vulnerable packages (mocked registry), severity mapping
+
+### Phase 4: Code Injection Scanning
+
+- [ ] Implement `code_scanning.ts` with high-confidence pattern set
+- [ ] Add shared `scanContentForDangers(content: string)` function
+- [ ] Add `dangerCheck` to `writeFileTool`, `editFileTool`, `searchReplaceTool`
+- [ ] For `edit_file`: scan the edit sketch content, not the final merged file
+- [ ] Add package.json detection: if target file is `package.json`, parse diff and run npm validation on new dependencies
+- [ ] Per-pattern explanation templates
+- [ ] Unit tests: known malicious patterns, legitimate code that looks suspicious (build tools, base64 in tests)
+- [ ] Performance benchmark: verify sub-millisecond execution for regex patterns
+
+### Phase 5: Dangerous Approval Override
+
+- [ ] Add `dangerousApprovalOverride: boolean` to `BaseUserSettingsFields` in `schemas.ts` (default: false)
+- [ ] Wire override check into `buildAgentToolSet` -- skip `dangerCheck` when enabled
+- [ ] Add "Safety" section in settings UI, visually separated from auto-approve
+- [ ] Implement confirmation dialog with "I understand" text input requirement
+- [ ] Add persistent shield-off indicator in chat header when override is active (clickable to jump to setting)
+- [ ] Add telemetry for override enable/disable events
+- [ ] Consider auto-expiry on app update (re-prompt user to re-enable)
+
+## Testing Strategy
+
+- [ ] **Unit tests for SQL heuristics**: Corpus of 50+ dangerous and safe SQL queries. Edge cases: DROP inside comments, DELETE with complex WHERE clauses, multi-statement queries, case variations, GRANT/REVOKE.
+- [ ] **Unit tests for npm validation**: Valid package names, invalid/malicious names, `@scope/package` format, `package@version` format, names with special characters (command injection attempts).
+- [ ] **Unit tests for code scanning**: Known malicious patterns, legitimate code that resembles patterns (build tools using eval, base64 in unit tests, process.env in config files).
+- [ ] **Integration tests**: Verify that `dangerCheck` results flow through the consent system correctly -- forced consent shows danger banner even when consent is "always", danger info appears in banner, "accept-always" is not an option.
+- [ ] **E2E tests**: Simulate agent attempting dangerous SQL with auto-approve ON; verify danger banner appears with correct explanation. Test override toggle flow.
+- [ ] **Regression tests**: Ensure existing auto-approve workflows are not broken for non-dangerous operations. Verify zero-friction happy path.
+- [ ] **Performance tests**: Benchmark SQL heuristics and code scanning to verify sub-millisecond execution on typical inputs.
+
+## Risks & Mitigations
+
+| Risk                                                                   | Likelihood | Impact | Mitigation                                                                                                                       |
+| ---------------------------------------------------------------------- | ---------- | ------ | -------------------------------------------------------------------------------------------------------------------------------- |
+| False positives erode user trust                                       | HIGH       | HIGH   | Start with very high-confidence patterns only. Track override rates via telemetry. Remove patterns that produce false positives. |
+| Command injection via package names (EXISTING)                         | HIGH       | HIGH   | Fix immediately in Phase 0, independent of feature work. Use `execFile` with array args.                                         |
+| Override + auto-approve = zero guardrails                              | MEDIUM     | HIGH   | Track this state in telemetry. Consider auto-expiry on app update. Persistent UI indicator.                                      |
+| Narrow code scanning creates false sense of security                   | MEDIUM     | MEDIUM | Honest messaging: "checks for common malicious patterns" not "security scanning." Document known limitations.                    |
+| npm audit coverage gaps (no typosquats, zero-days)                     | MEDIUM     | MEDIUM | Accept as known limitation. Document. Consider Socket.dev integration in v2.                                                     |
+| Performance impact on file writes from code scanning                   | LOW        | MEDIUM | Regex-only patterns (sub-millisecond). Benchmark before shipping.                                                                |
+| Bypass via indirect paths (write benign script that downloads malware) | MEDIUM     | LOW    | Fundamental limitation of static analysis. Accept and document.                                                                  |
+| npm registry/audit API unavailable (offline/outage)                    | LOW        | LOW    | Fail-open with notification: "Safety check unavailable -- proceeding."                                                           |
+| Pattern list goes stale as threats evolve                              | LOW        | MEDIUM | Keep pattern set small and high-signal. Easy to update (single file).                                                            |
+| MCP tools bypass all danger checks                                     | LOW        | LOW    | Document as known limitation. Out of scope for v1.                                                                               |
+
+## Open Questions
+
+- **Build mode coverage**: The `autoApproveChanges` setting in build mode bypasses the proposal flow, including existing `SecurityRisk` warnings. This feature only covers local-agent mode. Should build mode be covered in v2?
+- **`npm:` protocol aliases**: `package.json` edits could use `"my-pkg": "npm:malicious-pkg@1.0.0"` to bypass name validation. Should we parse these in the package.json detection?
+- **Per-category danger guard settings**: Should users be able to disable SQL checks but keep npm checks? The `category` field on `DangerCheckResult` enables this in the future, but it's not in the MVP.
+- **MCP tool danger detection**: MCP tools are opaque but could execute SQL or install packages. Future option: let MCP server authors declare danger levels in tool metadata.
+
+## Decision Log
+
+| Decision                                                | Reasoning                                                                                                                                                                            |
+| ------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| Heuristic SQL detection over LLM-based                  | LLM adds latency, cost, and provider dependency (violates Backend-Flexible principle). Heuristics catch 95%+ of destructive patterns with zero false positives on the obvious cases. |
+| npm audit advisories over Socket.dev                    | Free, official, no API key needed. Socket.dev is more comprehensive but adds external dependency. Can upgrade later.                                                                 |
+| Include narrow code injection scanning in MVP           | User decided. Scoped to near-zero false positive patterns (reverse shells, crypto miners, credential exfiltration). Performance impact is minimal (regex-only).                      |
+| Include dangerous approval override in MVP              | User decided. Mitigated with confirmation dialog (typed "I understand"), persistent UI indicator, and telemetry tracking.                                                            |
+| Always show danger context (even with auto-approve OFF) | Enhances decision quality for all users. Same consent banner component, just with upgraded styling when danger is detected.                                                          |
+| Advisory (forced consent) over blocking                 | Users can still proceed past warnings. This respects user autonomy while ensuring informed consent. The override toggle is the escape hatch from even this.                          |
+| Two buttons only on danger banner (no "Always allow")   | Permanently auto-approving dangerous actions defeats the purpose. Users approve per-instance or use the global override.                                                             |
+| `dangerCheck` per-tool over centralized detection       | Each tool knows its domain best. SQL heuristics are completely different from npm validation. Co-locating detection with the tool is cleaner and more extensible.                    |
+| Fix command injection independently                     | This is a security bug that exists today, not a feature. Ship the fix immediately without waiting for the full danger guards feature.                                                |
+| Fail-open when checks are unavailable                   | Fail-closed would mean a third-party API outage blocks the user's work. Fail-open with notification is the right balance for a local-first tool.                                     |
+| `aria-live="polite"` over "assertive"                   | The agent is paused waiting for consent -- there's no urgency. "Assertive" would disruptively interrupt screen reader users.                                                         |
+
+---
+
+_Generated by dyad:swarm-to-plan_