Add markdown sanitizer for fix-issue command (#2337)

## Summary - Add Python script to sanitize GitHub issue markdown (removes HTML comments, zero-width characters, excessive whitespace, details/summary tags) - Add unit tests with 5 golden input/output pairs plus additional inline tests - Update fix-issue.md to use sanitizer and proceed directly with implementation for straightforward plans (no remote session question) - Add goldens directory to format ignore to preserve test data ## Test plan - Run `python3 .claude/commands/dyad/scripts/test_sanitize_issue_markdown.py` to verify all 13 unit tests pass - Test the sanitizer directly: `echo "" | python3 .claude/commands/dyad/scripts/sanitize_issue_markdown.py` #skip-bugbot 🤖 Generated with [Claude Code](https://claude.com/claude-code)  --- <a href="https://app.devin.ai/review/dyad-sh/dyad/pull/2337"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a>   --- ## Summary by cubic Adds a Python markdown sanitizer and integrates it into the fix-issue flow to clean GitHub issue content and enable direct local implementation for straightforward plans. - **New Features** - Added sanitizer script that removes HTML comments, invisible characters, excessive blank lines, and strips details/summary tags while keeping content; normalizes line endings and whitespace. - Updated fix-issue.md to run the sanitizer and let simple plans proceed directly to local implementation without the remote session prompt. - Included golden files and unit tests (13) to validate sanitizer behavior. - Added the goldens directory to formatter and Prettier ignore lists to preserve test fixtures. - **Bug Fixes** - Fixed shell injection risk in fix-issue.md by using printf in the sanitizer step. <sup>Written for commit 226f9436ba6f338efd6fb798aa327334459647aa. Summary will update on new commits.</sup>  --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>

Add markdown sanitizer for fix-issue command (#2337)
dee3a20e · Will Chen · GitHub · f46cdf7a · dee3a20e · dee3a20e
--- a/.claude/commands/dyad/fix-issue.md
+++ b/.claude/commands/dyad/fix-issue.md
 # Fix Issue
-Create a plan to fix a GitHub issue, then send it to be worked on remotely after approval.
+Create a plan to fix a GitHub issue, then implement it locally.
 ## Arguments
@@ -20,18 +20,32 @@ Create a plan to fix a GitHub issue, then send it to be worked on remotely after
   gh issue view <issue-number> --json title,body,comments,labels,assignees
   ```
-2. **Analyze the issue:**
+2. **Sanitize the issue content:**
+   Run the issue body through the sanitization script to remove HTML comments, invisible characters, and other artifacts:
+   ```
+   printf '%s' "$ISSUE_BODY" | python3 .claude/commands/dyad/scripts/sanitize_issue_markdown.py
+   ```
+   This removes:
+   - HTML comments (`<!-- ... -->`)
+   - Zero-width and invisible Unicode characters
+   - Excessive blank lines
+   - HTML details/summary tags (keeping content)
+3. **Analyze the issue:**
   - Understand what the issue is asking for
   - Identify the type of work (bug fix, feature, refactor, etc.)
   - Note any specific requirements or constraints mentioned
-3. **Explore the codebase:**
+4. **Explore the codebase:**
   - Search for relevant files and code related to the issue
   - Understand the current implementation
   - Identify what needs to change
   - Look at existing tests to understand testing patterns used in the project
-4. **Determine testing approach:**
+5. **Determine testing approach:**
   Consider what kind of testing is appropriate for this change:
   - **E2E test**: For user-facing features or complete user flows. Prefer this when the change involves UI interactions or would require mocking many dependencies to unit test.
@@ -40,7 +54,7 @@ Create a plan to fix a GitHub issue, then send it to be worked on remotely after
   Note: Per project guidelines, avoid writing many E2E tests for one feature. Prefer one or two E2E tests with broad coverage. If unsure, ask the user for guidance on testing approach.
-5. **Create a detailed plan:**
+6. **Create a detailed plan:**
   Write a plan that includes:
   - **Summary**: Brief description of the issue and proposed solution
@@ -49,16 +63,14 @@ Create a plan to fix a GitHub issue, then send it to be worked on remotely after
   - **Testing approach**: What tests to add (E2E, unit, or none) and why
   - **Potential risks**: Any concerns or edge cases to consider
-6. **Request plan approval:**
+7. **Execute the plan:**
-   Present the plan to the user and use `ExitPlanMode` to request approval. The plan should be clear enough that it can be executed without further clarification.
-7. **Ask how to proceed:**
-   After the plan is approved, ask the user whether they want to:
+   If the plan is straightforward with no ambiguities or open questions:
-   - **Continue locally**: Implement the plan in the current session
+   - Proceed directly to implementation without asking for approval
-   - **Send to remote**: Push to a remote Claude session for implementation
+   - Implement the plan step by step
+   - Run `/dyad:pr-push` when complete
-8. **Execute based on user choice:**
+   If the plan has significant complexity, multiple valid approaches, or requires user input:
-   - If **local**: Proceed to implement the plan step by step, then run `/dyad:pr-push` when complete
+   - Present the plan to the user and use `ExitPlanMode` to request approval
-   - If **remote**: Use `ExitPlanMode` with `pushToRemote: true` and share the remote session URL with the user
+   - After approval, implement the plan step by step
+   - Run `/dyad:pr-push` when complete
--- a/.claude/commands/dyad/scripts/goldens/details_summary_input.md
+++ b/.claude/commands/dyad/scripts/goldens/details_summary_input.md
+# Bug Report
+<details>
+<summary>Click to expand logs</summary>
+Error log content here:
+```
+ERROR: Something went wrong
+Stack trace follows
+```
+</details>
+## More Info
+Additional context.
+<details open>
+<summary>Open by default</summary>
+This is expanded by default.
+</details>
--- a/.claude/commands/dyad/scripts/goldens/details_summary_output.md
+++ b/.claude/commands/dyad/scripts/goldens/details_summary_output.md
+# Bug Report
+Click to expand logs
+Error log content here:
+```
+ERROR: Something went wrong
+Stack trace follows
+```
+## More Info
+Additional context.
+Open by default
+This is expanded by default.
\ No newline at end of file
--- a/.claude/commands/dyad/scripts/goldens/excessive_whitespace_input.md
+++ b/.claude/commands/dyad/scripts/goldens/excessive_whitespace_input.md
+# Issue Title
+Too many blank lines above.
+And here too.
+## Section
+Content with trailing spaces
+More content.
--- a/.claude/commands/dyad/scripts/goldens/excessive_whitespace_output.md
+++ b/.claude/commands/dyad/scripts/goldens/excessive_whitespace_output.md
+# Issue Title
+Too many blank lines above.
+And here too.
+## Section
+Content with trailing spaces
+More content.
\ No newline at end of file
--- a/.claude/commands/dyad/scripts/goldens/html_comments_input.md
+++ b/.claude/commands/dyad/scripts/goldens/html_comments_input.md
+# Bug Report
+<!-- This is a hidden comment that should be removed -->
+There's a bug in the login flow.
+<!--
+Multi-line comment
+that spans several lines
+and should also be removed
+-->
+## Steps to Reproduce
+1. Go to login page
+2. Enter credentials
+3. Click submit
+<!-- TODO: Add more details -->
--- a/.claude/commands/dyad/scripts/goldens/html_comments_output.md
+++ b/.claude/commands/dyad/scripts/goldens/html_comments_output.md
+# Bug Report
+There's a bug in the login flow.
+## Steps to Reproduce
+1. Go to login page
+2. Enter credentials
+3. Click submit
\ No newline at end of file
--- a/.claude/commands/dyad/scripts/goldens/invisible_chars_input.md
+++ b/.claude/commands/dyad/scripts/goldens/invisible_chars_input.md
+# Feature Request
+This text has zero-width spaces hidden in it.
+And some other invisible characters like  and ‌ and ‍.
+## Description
+Normal text here.
--- a/.claude/commands/dyad/scripts/goldens/invisible_chars_output.md
+++ b/.claude/commands/dyad/scripts/goldens/invisible_chars_output.md
+# Feature Request
+This text has zero-width spaces hidden in it.
+And some other invisible characters like  and  and .
+## Description
+Normal text here.
\ No newline at end of file
--- a/.claude/commands/dyad/scripts/goldens/mixed_input.md
+++ b/.claude/commands/dyad/scripts/goldens/mixed_input.md
+# Complex Issue
+<!-- Hidden metadata: priority=high -->
+This issue has multiple problems.
+<details>
+<summary>Stack trace</summary>
+```
+Error at line 42
+  at foo()
+  at bar()
+```
+</details>
+## Steps to Reproduce
+<!-- TODO: verify these steps -->
+1. Do thing A
+2. Do thing B
+3. See error
+<!-- End of issue -->
--- a/.claude/commands/dyad/scripts/goldens/mixed_output.md
+++ b/.claude/commands/dyad/scripts/goldens/mixed_output.md
+# Complex Issue
+This issue has multiple problems.
+Stack trace
+```
+Error at line 42
+  at foo()
+  at bar()
+```
+## Steps to Reproduce
+1. Do thing A
+2. Do thing B
+3. See error
\ No newline at end of file
--- a/.claude/commands/dyad/scripts/sanitize_issue_markdown.py
+++ b/.claude/commands/dyad/scripts/sanitize_issue_markdown.py
+#!/usr/bin/env python3
+"""
+Sanitize GitHub issue markdown by removing comments, unusual formatting,
+and other artifacts that may confuse LLMs processing the issue.
+"""
+import re
+import sys
+def sanitize_issue_markdown(markdown: str) -> str:
+    """
+    Sanitize GitHub issue markdown content.
+    Removes:
+    - HTML comments (<!-- ... -->)
+    - Zero-width characters and other invisible Unicode
+    - Excessive blank lines (more than 2 consecutive)
+    - Leading/trailing whitespace on each line
+    - HTML tags that aren't useful for understanding content
+    - GitHub-specific directives that aren't content
+    Args:
+        markdown: Raw markdown string from GitHub issue
+    Returns:
+        Cleaned markdown string
+    """
+    result = markdown
+    # Remove HTML comments (including multi-line)
+    result = re.sub(r"<!--[\s\S]*?-->", "", result)
+    # Remove zero-width characters and other invisible Unicode
+    # (Zero-width space, non-joiner, joiner, word joiner, no-break space, etc.)
+    result = re.sub(
+        r"[\u200b\u200c\u200d\u2060\ufeff\u00ad\u034f\u061c\u180e]", "", result
+    )
+    # Remove other control characters (except newlines, tabs)
+    result = re.sub(r"[\x00-\x08\x0b\x0c\x0e-\x1f\x7f]", "", result)
+    # Remove HTML details/summary blocks but keep inner content
+    result = re.sub(r"</?(?:details|summary)[^>]*>", "", result, flags=re.IGNORECASE)
+    # Remove empty HTML tags
+    result = re.sub(r"<([a-z]+)[^>]*>\s*</\1>", "", result, flags=re.IGNORECASE)
+    # Remove GitHub task list markers that are just decoration
+    # But keep the actual checkbox content (supports both [x] and [X])
+    result = re.sub(r"^\s*-\s*\[[ xX]\]\s*$", "", result, flags=re.MULTILINE)
+    # Normalize line endings
+    result = result.replace("\r\n", "\n").replace("\r", "\n")
+    # Strip trailing whitespace from each line
+    result = "\n".join(line.rstrip() for line in result.split("\n"))
+    # Collapse more than 2 consecutive blank lines into 2
+    result = re.sub(r"\n{4,}", "\n\n\n", result)
+    # Strip leading/trailing whitespace from the whole document
+    result = result.strip()
+    return result
+def main():
+    """Read from stdin, sanitize, write to stdout."""
+    if len(sys.argv) > 1:
+        # Read from file
+        with open(sys.argv[1], "r", encoding="utf-8") as f:
+            content = f.read()
+    else:
+        # Read from stdin
+        content = sys.stdin.read()
+    sanitized = sanitize_issue_markdown(content)
+    print(sanitized)
+if __name__ == "__main__":
+    main()
--- a/.claude/commands/dyad/scripts/test_sanitize_issue_markdown.py
+++ b/.claude/commands/dyad/scripts/test_sanitize_issue_markdown.py
+#!/usr/bin/env python3
+"""
+Unit tests for sanitize_issue_markdown.py using golden input/output pairs.
+"""
+import unittest
+from pathlib import Path
+from sanitize_issue_markdown import sanitize_issue_markdown
+class TestSanitizeIssueMarkdown(unittest.TestCase):
+    """Test the sanitize_issue_markdown function using golden files."""
+    GOLDENS_DIR = Path(__file__).parent / "goldens"
+    def _load_golden_pair(self, name: str) -> tuple[str, str]:
+        """Load a golden input/output pair by name."""
+        input_file = self.GOLDENS_DIR / f"{name}_input.md"
+        output_file = self.GOLDENS_DIR / f"{name}_output.md"
+        with open(input_file, "r", encoding="utf-8") as f:
+            input_content = f.read()
+        with open(output_file, "r", encoding="utf-8") as f:
+            expected_output = f.read()
+        return input_content, expected_output
+    def _run_golden_test(self, name: str):
+        """Run a golden test by name."""
+        input_content, expected_output = self._load_golden_pair(name)
+        actual_output = sanitize_issue_markdown(input_content)
+        self.assertEqual(
+            actual_output,
+            expected_output,
+            f"Golden test '{name}' failed.\n"
+            f"Expected:\n{repr(expected_output)}\n\n"
+            f"Actual:\n{repr(actual_output)}",
+        )
+    def test_html_comments(self):
+        """Test that HTML comments are removed."""
+        self._run_golden_test("html_comments")
+    def test_invisible_chars(self):
+        """Test that invisible/zero-width characters are removed."""
+        self._run_golden_test("invisible_chars")
+    def test_excessive_whitespace(self):
+        """Test that excessive blank lines and trailing whitespace are normalized."""
+        self._run_golden_test("excessive_whitespace")
+    def test_details_summary(self):
+        """Test that details/summary HTML tags are removed but content is kept."""
+        self._run_golden_test("details_summary")
+    def test_mixed(self):
+        """Test a complex issue with multiple types of artifacts."""
+        self._run_golden_test("mixed")
+    def test_empty_input(self):
+        """Test that empty input returns empty output."""
+        self.assertEqual(sanitize_issue_markdown(""), "")
+    def test_plain_text(self):
+        """Test that plain text without artifacts is unchanged."""
+        plain = "# Simple Issue\n\nThis is plain text.\n\n## Section\n\nMore text."
+        self.assertEqual(sanitize_issue_markdown(plain), plain)
+    def test_preserves_code_blocks(self):
+        """Test that code blocks are preserved."""
+        content = """# Issue
+```python
+def foo():
+    # This is a comment in code, not HTML
+    return 42
+```
+More text."""
+        result = sanitize_issue_markdown(content)
+        self.assertIn("# This is a comment in code", result)
+        self.assertIn("def foo():", result)
+    def test_preserves_inline_code(self):
+        """Test that inline code is preserved."""
+        content = "Use `<!-- not a comment -->` for HTML comments."
+        # The sanitizer will still remove the HTML comment even in inline code
+        # because we're doing a simple regex replacement. This is acceptable.
+        result = sanitize_issue_markdown(content)
+        self.assertIn("Use `", result)
+    def test_preserves_links(self):
+        """Test that markdown links are preserved."""
+        content = "Check [this link](https://example.com) for more info."
+        result = sanitize_issue_markdown(content)
+        self.assertEqual(result, content)
+    def test_preserves_images(self):
+        """Test that image references are preserved."""
+        content = "![Screenshot](https://example.com/image.png)"
+        result = sanitize_issue_markdown(content)
+        self.assertEqual(result, content)
+    def test_crlf_normalization(self):
+        """Test that CRLF line endings are normalized to LF."""
+        content = "Line 1\r\nLine 2\r\nLine 3"
+        result = sanitize_issue_markdown(content)
+        self.assertEqual(result, "Line 1\nLine 2\nLine 3")
+    def test_removes_control_characters(self):
+        """Test that control characters are removed."""
+        content = "Hello\x00World\x1fTest"
+        result = sanitize_issue_markdown(content)
+        self.assertEqual(result, "HelloWorldTest")
+def discover_golden_tests():
+    """Discover all golden test pairs in the goldens directory."""
+    goldens_dir = Path(__file__).parent / "goldens"
+    if not goldens_dir.exists():
+        return []
+    input_files = goldens_dir.glob("*_input.md")
+    names = set()
+    for f in input_files:
+        name = f.stem.replace("_input", "")
+        output_file = goldens_dir / f"{name}_output.md"
+        if output_file.exists():
+            names.add(name)
+    return sorted(names)
+if __name__ == "__main__":
+    # Print discovered golden tests
+    golden_tests = discover_golden_tests()
+    print(f"Discovered {len(golden_tests)} golden test pairs: {golden_tests}")
+    print()
+    # Run tests
+    unittest.main(verbosity=2)
--- a/.oxfmtrc.json
+++ b/.oxfmtrc.json
@@ -8,6 +8,7 @@
    "drizzle/",
    "**/pnpm-lock.yaml",
    "**/snapshots/**",
-    "e2e-tests/fixtures/**"
+    "e2e-tests/fixtures/**",
+    ".claude/commands/dyad/scripts/goldens/**"
  ]
 }
--- a/.prettierignore
+++ b/.prettierignore
@@ -7,3 +7,5 @@ drizzle/
 **/snapshots/**
 # test fixtures
 e2e-tests/fixtures/**
+# sanitize_issue_markdown test goldens
+.claude/commands/dyad/scripts/goldens/**
\ No newline at end of file