Unverified 提交 dee3a20e authored 作者: Will Chen's avatar Will Chen 提交者: GitHub

Add markdown sanitizer for fix-issue command (#2337)

## Summary - Add Python script to sanitize GitHub issue markdown (removes HTML comments, zero-width characters, excessive whitespace, details/summary tags) - Add unit tests with 5 golden input/output pairs plus additional inline tests - Update fix-issue.md to use sanitizer and proceed directly with implementation for straightforward plans (no remote session question) - Add goldens directory to format ignore to preserve test data ## Test plan - Run `python3 .claude/commands/dyad/scripts/test_sanitize_issue_markdown.py` to verify all 13 unit tests pass - Test the sanitizer directly: `echo "<!-- comment -->" | python3 .claude/commands/dyad/scripts/sanitize_issue_markdown.py` #skip-bugbot 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/dyad-sh/dyad/pull/2337"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end --> <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Adds a Python markdown sanitizer and integrates it into the fix-issue flow to clean GitHub issue content and enable direct local implementation for straightforward plans. - **New Features** - Added sanitizer script that removes HTML comments, invisible characters, excessive blank lines, and strips details/summary tags while keeping content; normalizes line endings and whitespace. - Updated fix-issue.md to run the sanitizer and let simple plans proceed directly to local implementation without the remote session prompt. - Included golden files and unit tests (13) to validate sanitizer behavior. - Added the goldens directory to formatter and Prettier ignore lists to preserve test fixtures. - **Bug Fixes** - Fixed shell injection risk in fix-issue.md by using printf in the sanitizer step. <sup>Written for commit 226f9436ba6f338efd6fb798aa327334459647aa. Summary will update on new commits.</sup> <!-- End of auto-generated description by cubic. --> --------- Co-authored-by: 's avatarClaude Opus 4.5 <noreply@anthropic.com> Co-authored-by: 's avatarclaude[bot] <41898282+claude[bot]@users.noreply.github.com>
上级 f46cdf7a
# Fix Issue # Fix Issue
Create a plan to fix a GitHub issue, then send it to be worked on remotely after approval. Create a plan to fix a GitHub issue, then implement it locally.
## Arguments ## Arguments
...@@ -20,18 +20,32 @@ Create a plan to fix a GitHub issue, then send it to be worked on remotely after ...@@ -20,18 +20,32 @@ Create a plan to fix a GitHub issue, then send it to be worked on remotely after
gh issue view <issue-number> --json title,body,comments,labels,assignees gh issue view <issue-number> --json title,body,comments,labels,assignees
``` ```
2. **Analyze the issue:** 2. **Sanitize the issue content:**
Run the issue body through the sanitization script to remove HTML comments, invisible characters, and other artifacts:
```
printf '%s' "$ISSUE_BODY" | python3 .claude/commands/dyad/scripts/sanitize_issue_markdown.py
```
This removes:
- HTML comments (`<!-- ... -->`)
- Zero-width and invisible Unicode characters
- Excessive blank lines
- HTML details/summary tags (keeping content)
3. **Analyze the issue:**
- Understand what the issue is asking for - Understand what the issue is asking for
- Identify the type of work (bug fix, feature, refactor, etc.) - Identify the type of work (bug fix, feature, refactor, etc.)
- Note any specific requirements or constraints mentioned - Note any specific requirements or constraints mentioned
3. **Explore the codebase:** 4. **Explore the codebase:**
- Search for relevant files and code related to the issue - Search for relevant files and code related to the issue
- Understand the current implementation - Understand the current implementation
- Identify what needs to change - Identify what needs to change
- Look at existing tests to understand testing patterns used in the project - Look at existing tests to understand testing patterns used in the project
4. **Determine testing approach:** 5. **Determine testing approach:**
Consider what kind of testing is appropriate for this change: Consider what kind of testing is appropriate for this change:
- **E2E test**: For user-facing features or complete user flows. Prefer this when the change involves UI interactions or would require mocking many dependencies to unit test. - **E2E test**: For user-facing features or complete user flows. Prefer this when the change involves UI interactions or would require mocking many dependencies to unit test.
...@@ -40,7 +54,7 @@ Create a plan to fix a GitHub issue, then send it to be worked on remotely after ...@@ -40,7 +54,7 @@ Create a plan to fix a GitHub issue, then send it to be worked on remotely after
Note: Per project guidelines, avoid writing many E2E tests for one feature. Prefer one or two E2E tests with broad coverage. If unsure, ask the user for guidance on testing approach. Note: Per project guidelines, avoid writing many E2E tests for one feature. Prefer one or two E2E tests with broad coverage. If unsure, ask the user for guidance on testing approach.
5. **Create a detailed plan:** 6. **Create a detailed plan:**
Write a plan that includes: Write a plan that includes:
- **Summary**: Brief description of the issue and proposed solution - **Summary**: Brief description of the issue and proposed solution
...@@ -49,16 +63,14 @@ Create a plan to fix a GitHub issue, then send it to be worked on remotely after ...@@ -49,16 +63,14 @@ Create a plan to fix a GitHub issue, then send it to be worked on remotely after
- **Testing approach**: What tests to add (E2E, unit, or none) and why - **Testing approach**: What tests to add (E2E, unit, or none) and why
- **Potential risks**: Any concerns or edge cases to consider - **Potential risks**: Any concerns or edge cases to consider
6. **Request plan approval:** 7. **Execute the plan:**
Present the plan to the user and use `ExitPlanMode` to request approval. The plan should be clear enough that it can be executed without further clarification.
7. **Ask how to proceed:**
After the plan is approved, ask the user whether they want to: If the plan is straightforward with no ambiguities or open questions:
- **Continue locally**: Implement the plan in the current session - Proceed directly to implementation without asking for approval
- **Send to remote**: Push to a remote Claude session for implementation - Implement the plan step by step
- Run `/dyad:pr-push` when complete
8. **Execute based on user choice:** If the plan has significant complexity, multiple valid approaches, or requires user input:
- If **local**: Proceed to implement the plan step by step, then run `/dyad:pr-push` when complete - Present the plan to the user and use `ExitPlanMode` to request approval
- If **remote**: Use `ExitPlanMode` with `pushToRemote: true` and share the remote session URL with the user - After approval, implement the plan step by step
- Run `/dyad:pr-push` when complete
# Bug Report
<details>
<summary>Click to expand logs</summary>
Error log content here:
```
ERROR: Something went wrong
Stack trace follows
```
</details>
## More Info
Additional context.
<details open>
<summary>Open by default</summary>
This is expanded by default.
</details>
# Bug Report
Click to expand logs
Error log content here:
```
ERROR: Something went wrong
Stack trace follows
```
## More Info
Additional context.
Open by default
This is expanded by default.
\ No newline at end of file
# Issue Title
Too many blank lines above.
And here too.
## Section
Content with trailing spaces
More content.
# Issue Title
Too many blank lines above.
And here too.
## Section
Content with trailing spaces
More content.
\ No newline at end of file
# Bug Report
<!-- This is a hidden comment that should be removed -->
There's a bug in the login flow.
<!--
Multi-line comment
that spans several lines
and should also be removed
-->
## Steps to Reproduce
1. Go to login page
2. Enter credentials
3. Click submit
<!-- TODO: Add more details -->
# Bug Report
There's a bug in the login flow.
## Steps to Reproduce
1. Go to login page
2. Enter credentials
3. Click submit
\ No newline at end of file
# Feature Request
This​ text has zero​-width spaces hidden​ in it.
And some other invisible characters like ​ and ‌ and ‍.
## Description
Normal text here.
# Feature Request
This text has zero-width spaces hidden in it.
And some other invisible characters like and and .
## Description
Normal text here.
\ No newline at end of file
# Complex Issue
<!-- Hidden metadata: priority=high -->
This​ issue has multiple problems.
<details>
<summary>Stack trace</summary>
```
Error at line 42
at foo()
at bar()
```
</details>
## Steps to Reproduce
<!-- TODO: verify these steps -->
1. Do thing A
2. Do thing B
3. See error
<!-- End of issue -->
# Complex Issue
This issue has multiple problems.
Stack trace
```
Error at line 42
at foo()
at bar()
```
## Steps to Reproduce
1. Do thing A
2. Do thing B
3. See error
\ No newline at end of file
#!/usr/bin/env python3
"""
Sanitize GitHub issue markdown by removing comments, unusual formatting,
and other artifacts that may confuse LLMs processing the issue.
"""
import re
import sys
def sanitize_issue_markdown(markdown: str) -> str:
"""
Sanitize GitHub issue markdown content.
Removes:
- HTML comments (<!-- ... -->)
- Zero-width characters and other invisible Unicode
- Excessive blank lines (more than 2 consecutive)
- Leading/trailing whitespace on each line
- HTML tags that aren't useful for understanding content
- GitHub-specific directives that aren't content
Args:
markdown: Raw markdown string from GitHub issue
Returns:
Cleaned markdown string
"""
result = markdown
# Remove HTML comments (including multi-line)
result = re.sub(r"<!--[\s\S]*?-->", "", result)
# Remove zero-width characters and other invisible Unicode
# (Zero-width space, non-joiner, joiner, word joiner, no-break space, etc.)
result = re.sub(
r"[\u200b\u200c\u200d\u2060\ufeff\u00ad\u034f\u061c\u180e]", "", result
)
# Remove other control characters (except newlines, tabs)
result = re.sub(r"[\x00-\x08\x0b\x0c\x0e-\x1f\x7f]", "", result)
# Remove HTML details/summary blocks but keep inner content
result = re.sub(r"</?(?:details|summary)[^>]*>", "", result, flags=re.IGNORECASE)
# Remove empty HTML tags
result = re.sub(r"<([a-z]+)[^>]*>\s*</\1>", "", result, flags=re.IGNORECASE)
# Remove GitHub task list markers that are just decoration
# But keep the actual checkbox content (supports both [x] and [X])
result = re.sub(r"^\s*-\s*\[[ xX]\]\s*$", "", result, flags=re.MULTILINE)
# Normalize line endings
result = result.replace("\r\n", "\n").replace("\r", "\n")
# Strip trailing whitespace from each line
result = "\n".join(line.rstrip() for line in result.split("\n"))
# Collapse more than 2 consecutive blank lines into 2
result = re.sub(r"\n{4,}", "\n\n\n", result)
# Strip leading/trailing whitespace from the whole document
result = result.strip()
return result
def main():
"""Read from stdin, sanitize, write to stdout."""
if len(sys.argv) > 1:
# Read from file
with open(sys.argv[1], "r", encoding="utf-8") as f:
content = f.read()
else:
# Read from stdin
content = sys.stdin.read()
sanitized = sanitize_issue_markdown(content)
print(sanitized)
if __name__ == "__main__":
main()
#!/usr/bin/env python3
"""
Unit tests for sanitize_issue_markdown.py using golden input/output pairs.
"""
import unittest
from pathlib import Path
from sanitize_issue_markdown import sanitize_issue_markdown
class TestSanitizeIssueMarkdown(unittest.TestCase):
"""Test the sanitize_issue_markdown function using golden files."""
GOLDENS_DIR = Path(__file__).parent / "goldens"
def _load_golden_pair(self, name: str) -> tuple[str, str]:
"""Load a golden input/output pair by name."""
input_file = self.GOLDENS_DIR / f"{name}_input.md"
output_file = self.GOLDENS_DIR / f"{name}_output.md"
with open(input_file, "r", encoding="utf-8") as f:
input_content = f.read()
with open(output_file, "r", encoding="utf-8") as f:
expected_output = f.read()
return input_content, expected_output
def _run_golden_test(self, name: str):
"""Run a golden test by name."""
input_content, expected_output = self._load_golden_pair(name)
actual_output = sanitize_issue_markdown(input_content)
self.assertEqual(
actual_output,
expected_output,
f"Golden test '{name}' failed.\n"
f"Expected:\n{repr(expected_output)}\n\n"
f"Actual:\n{repr(actual_output)}",
)
def test_html_comments(self):
"""Test that HTML comments are removed."""
self._run_golden_test("html_comments")
def test_invisible_chars(self):
"""Test that invisible/zero-width characters are removed."""
self._run_golden_test("invisible_chars")
def test_excessive_whitespace(self):
"""Test that excessive blank lines and trailing whitespace are normalized."""
self._run_golden_test("excessive_whitespace")
def test_details_summary(self):
"""Test that details/summary HTML tags are removed but content is kept."""
self._run_golden_test("details_summary")
def test_mixed(self):
"""Test a complex issue with multiple types of artifacts."""
self._run_golden_test("mixed")
def test_empty_input(self):
"""Test that empty input returns empty output."""
self.assertEqual(sanitize_issue_markdown(""), "")
def test_plain_text(self):
"""Test that plain text without artifacts is unchanged."""
plain = "# Simple Issue\n\nThis is plain text.\n\n## Section\n\nMore text."
self.assertEqual(sanitize_issue_markdown(plain), plain)
def test_preserves_code_blocks(self):
"""Test that code blocks are preserved."""
content = """# Issue
```python
def foo():
# This is a comment in code, not HTML
return 42
```
More text."""
result = sanitize_issue_markdown(content)
self.assertIn("# This is a comment in code", result)
self.assertIn("def foo():", result)
def test_preserves_inline_code(self):
"""Test that inline code is preserved."""
content = "Use `<!-- not a comment -->` for HTML comments."
# The sanitizer will still remove the HTML comment even in inline code
# because we're doing a simple regex replacement. This is acceptable.
result = sanitize_issue_markdown(content)
self.assertIn("Use `", result)
def test_preserves_links(self):
"""Test that markdown links are preserved."""
content = "Check [this link](https://example.com) for more info."
result = sanitize_issue_markdown(content)
self.assertEqual(result, content)
def test_preserves_images(self):
"""Test that image references are preserved."""
content = "![Screenshot](https://example.com/image.png)"
result = sanitize_issue_markdown(content)
self.assertEqual(result, content)
def test_crlf_normalization(self):
"""Test that CRLF line endings are normalized to LF."""
content = "Line 1\r\nLine 2\r\nLine 3"
result = sanitize_issue_markdown(content)
self.assertEqual(result, "Line 1\nLine 2\nLine 3")
def test_removes_control_characters(self):
"""Test that control characters are removed."""
content = "Hello\x00World\x1fTest"
result = sanitize_issue_markdown(content)
self.assertEqual(result, "HelloWorldTest")
def discover_golden_tests():
"""Discover all golden test pairs in the goldens directory."""
goldens_dir = Path(__file__).parent / "goldens"
if not goldens_dir.exists():
return []
input_files = goldens_dir.glob("*_input.md")
names = set()
for f in input_files:
name = f.stem.replace("_input", "")
output_file = goldens_dir / f"{name}_output.md"
if output_file.exists():
names.add(name)
return sorted(names)
if __name__ == "__main__":
# Print discovered golden tests
golden_tests = discover_golden_tests()
print(f"Discovered {len(golden_tests)} golden test pairs: {golden_tests}")
print()
# Run tests
unittest.main(verbosity=2)
...@@ -8,6 +8,7 @@ ...@@ -8,6 +8,7 @@
"drizzle/", "drizzle/",
"**/pnpm-lock.yaml", "**/pnpm-lock.yaml",
"**/snapshots/**", "**/snapshots/**",
"e2e-tests/fixtures/**" "e2e-tests/fixtures/**",
".claude/commands/dyad/scripts/goldens/**"
] ]
} }
...@@ -7,3 +7,5 @@ drizzle/ ...@@ -7,3 +7,5 @@ drizzle/
**/snapshots/** **/snapshots/**
# test fixtures # test fixtures
e2e-tests/fixtures/** e2e-tests/fixtures/**
# sanitize_issue_markdown test goldens
.claude/commands/dyad/scripts/goldens/**
\ No newline at end of file
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论