Unverified 提交 6ba51165 authored 作者: Will Chen's avatar Will Chen 提交者: GitHub

Replace prompt-based stop hook with Sonnet-powered analysis (#2331)

## Summary - Replace broken prompt-based Stop hook with command-based hook using Claude Sonnet - Add .claude/hooks/stop-hook.py that reads conversation transcript and uses Sonnet to analyze task completion - Includes infinite loop prevention via stop_hook_active check - Add unit tests for the stop hook ## Test plan - [x] Run pytest .claude/hooks/tests/test_stop_hook.py -v - all 9 tests pass - [ ] Manual testing: verify stop hook fires and correctly analyzes task completion #skip-bugbot <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Replaced the broken prompt-based stop hook with a command-based hook that blocks when tasks remain and uses Sonnet analysis as a fallback. Adds loop protection and tests. - **New Features** - Added .claude/hooks/stop-hook.py that blocks when TaskCreate/TaskUpdate show remaining tasks, returning {"decision":"block","reason":...}. If none remain, it analyzes a 32k transcript (middle truncation) with Sonnet. - Added unit tests and a stop_hook_active guard to prevent infinite loops. - **Refactors** - Updated .claude/settings.json to use the command-based hook (30000 ms timeout) instead of the prompt hook. - Added --no-session-persistence to Claude CLI calls in stop-hook.py and permission-request-hook.py. <sup>Written for commit 575426cee9efb0fa7e1f4be64a8405ae2e717a3b. Summary will update on new commits.</sup> <!-- End of auto-generated description by cubic. --> <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/dyad-sh/dyad/pull/2331"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end --> --------- Co-authored-by: 's avatarClaude Opus 4.5 <noreply@anthropic.com>
上级 67fe5048
......@@ -84,6 +84,7 @@ Analyze this request and provide your safety assessment. Respond with ONLY a JSO
"--print",
"--output-format", "text",
"--model", "sonnet",
"--no-session-persistence",
prompt
],
capture_output=True,
......
#!/usr/bin/env python3
"""
AI-Powered Stop Hook
This is a Stop hook that runs when Claude is about to stop working.
It uses Claude Sonnet to analyze the transcript and determine whether
Claude should continue working or is allowed to stop.
The hook is designed to catch premature stopping when tasks are incomplete.
Usage:
Receives JSON on stdin with session info including transcript_path
Outputs JSON with decision="block" and reason to continue, or no output to allow stop
"""
import json
import os
import shutil
import subprocess
import sys
from pathlib import Path
def extract_task_state(transcript_path: str) -> dict:
"""Extract task state from TaskCreate and TaskUpdate tool calls.
Returns a dict with:
- tasks: dict mapping task_id to {subject, status}
- remaining: list of (task_id, subject, status) for incomplete tasks
- total: total number of tasks created
"""
tasks: dict[str, dict] = {}
try:
path = Path(transcript_path).expanduser()
if not path.exists():
return {"tasks": {}, "remaining": [], "total": 0}
lines = path.read_text().strip().split("\n")
for line in lines:
try:
entry = json.loads(line)
if entry.get("type") != "assistant":
continue
content = entry.get("message", {}).get("content", [])
for part in content:
if part.get("type") != "tool_use":
continue
tool_name = part.get("name", "")
tool_input = part.get("input", {})
if tool_name == "TaskCreate":
# TaskCreate assigns sequential IDs starting from 1
task_id = str(len(tasks) + 1)
subject = tool_input.get("subject", f"Task {task_id}")
tasks[task_id] = {
"subject": subject,
"status": "pending"
}
elif tool_name == "TaskUpdate":
task_id = tool_input.get("taskId", "")
if task_id and task_id in tasks:
if "status" in tool_input:
tasks[task_id]["status"] = tool_input["status"]
if "subject" in tool_input:
tasks[task_id]["subject"] = tool_input["subject"]
except json.JSONDecodeError:
continue
except Exception:
return {"tasks": {}, "remaining": [], "total": 0}
# Find remaining (non-completed) tasks
remaining = [
(task_id, info["subject"], info["status"])
for task_id, info in tasks.items()
if info["status"] != "completed"
]
return {
"tasks": tasks,
"remaining": remaining,
"total": len(tasks)
}
def get_claude_path() -> str | None:
"""Find the claude CLI path."""
claude_path = shutil.which("claude")
if claude_path:
return claude_path
home = Path.home()
default_path = home / ".claude" / "local" / "claude"
if default_path.exists() and os.access(default_path, os.X_OK):
return str(default_path)
return None
def read_transcript(transcript_path: str, max_chars: int = 32000) -> str:
"""Read and format the transcript for analysis.
Includes content from the beginning and end of the transcript,
truncating from the middle if needed to stay within limits.
Args:
transcript_path: Path to the JSONL transcript file
max_chars: Maximum characters for the output (default 32000, ~8000 tokens)
"""
try:
path = Path(transcript_path).expanduser()
if not path.exists():
return ""
lines = path.read_text().strip().split("\n")
formatted = []
for line in lines:
try:
entry = json.loads(line)
msg_type = entry.get("type", "unknown")
if msg_type == "user":
content = entry.get("message", {}).get("content", "")
if isinstance(content, list):
text_parts = [
p.get("text", "") for p in content if p.get("type") == "text"
]
content = " ".join(text_parts)
formatted.append(f"USER: {content[:500]}")
elif msg_type == "assistant":
content = entry.get("message", {}).get("content", [])
text_parts = []
tool_uses = []
for part in content:
if part.get("type") == "text":
text_parts.append(part.get("text", "")[:300])
elif part.get("type") == "tool_use":
tool_uses.append(part.get("name", "unknown"))
if text_parts:
formatted.append(f"ASSISTANT: {' '.join(text_parts)}")
if tool_uses:
formatted.append(f"ASSISTANT TOOLS: {', '.join(tool_uses)}")
elif msg_type == "tool_result":
# Just note that a tool result came back
formatted.append("TOOL_RESULT: (received)")
except json.JSONDecodeError:
continue
result = "\n".join(formatted)
if len(result) > max_chars:
# Keep beginning and end, truncate from middle
# Reserve ~40% for beginning, ~60% for end (end is more important)
begin_budget = int(max_chars * 0.4)
end_budget = max_chars - begin_budget - 50 # 50 chars for truncation marker
begin_part = result[:begin_budget]
end_part = result[-end_budget:]
result = begin_part + "\n\n...(middle truncated)...\n\n" + end_part
return result
except OSError:
return ""
def analyze_with_claude(transcript: str, cwd: str) -> dict | None:
"""
Use Claude CLI to analyze whether Claude should continue working.
Returns dict with 'continue' (bool) and 'reason' (str) or None on failure.
"""
claude_path = get_claude_path()
if not claude_path:
return None
if not transcript:
return None
project_dir = os.environ.get("CLAUDE_PROJECT_DIR", cwd)
prompt = f"""You are evaluating whether Claude Code should stop working or continue.
## Recent Conversation
{transcript}
## Analysis Instructions
Analyze the conversation above and determine if Claude should CONTINUE working or is allowed to STOP.
CONTINUE (block stopping) if ANY of these are true:
- Tasks the user requested are not fully completed
- Errors occurred that weren't resolved
- Claude said it would do something but didn't actually do it
- There are obvious next steps that should be done
- Work quality appears partial or incomplete
- There are failing tests or unresolved issues
ALLOW STOP if ALL of these are true:
- All requested tasks are genuinely, fully complete
- No unresolved errors exist
- No obvious follow-up work remains
- Claude has provided a clear summary or completion message
Respond with ONLY a JSON object:
{{"continue": true, "reason": "specific explanation of what still needs to be done"}}
OR
{{"continue": false}}
JSON response:"""
try:
# Use stdin for prompt to avoid command-line length limits with large transcripts
# Timeout is 25s (5s margin under the 30s hook timeout in settings.json)
result = subprocess.run(
[
claude_path,
"--print",
"--output-format", "text",
"--model", "sonnet",
"--no-session-persistence",
"-p", "-"
],
input=prompt,
capture_output=True,
text=True,
timeout=25,
cwd=project_dir,
)
if result.returncode != 0:
return None
response_text = result.stdout.strip()
# Try direct JSON parse
try:
parsed = json.loads(response_text)
if "continue" in parsed:
return parsed
except json.JSONDecodeError:
pass
# Extract JSON from response (handle markdown code fences, etc.)
start_indices = [i for i, c in enumerate(response_text) if c == '{']
for start in start_indices:
depth = 0
in_string = False
escape_next = False
for i, c in enumerate(response_text[start:], start):
if escape_next:
escape_next = False
continue
if c == '\\' and in_string:
escape_next = True
continue
if c == '"' and not escape_next:
in_string = not in_string
continue
if not in_string:
if c == '{':
depth += 1
elif c == '}':
depth -= 1
if depth == 0:
candidate = response_text[start:i + 1]
try:
parsed = json.loads(candidate)
if "continue" in parsed:
return parsed
except json.JSONDecodeError:
pass
break
return None
except (subprocess.SubprocessError, OSError):
return None
def make_block_decision(reason: str) -> dict:
"""Block Claude from stopping - force continuation."""
return {
"decision": "block",
"reason": reason
}
def main():
try:
input_data = json.load(sys.stdin)
except json.JSONDecodeError:
# Can't parse input, allow stop
sys.exit(0)
# Check for infinite loop prevention
if input_data.get("stop_hook_active", False):
# Already continuing due to stop hook, allow stop to prevent infinite loop
sys.exit(0)
transcript_path = input_data.get("transcript_path", "")
cwd = input_data.get("cwd", os.getcwd())
# First, check for remaining tasks (deterministic check)
if transcript_path:
task_state = extract_task_state(transcript_path)
remaining = task_state["remaining"]
total = task_state["total"]
if remaining:
# Build detailed reason with remaining tasks
task_list = "\n".join(
f" - Task {task_id}: \"{subject}\" (status: {status})"
for task_id, subject, status in remaining
)
reason = (
f"{len(remaining)} of {total} tasks remaining:\n{task_list}\n\n"
f"Please complete all tasks before stopping."
)
print(json.dumps(make_block_decision(reason)))
sys.exit(0)
# If no remaining tasks (or no tasks at all), fall back to AI analysis
transcript = read_transcript(transcript_path)
if not transcript:
# No transcript to analyze, allow stop
sys.exit(0)
result = analyze_with_claude(transcript, cwd)
if result is None:
# Analysis failed, allow stop
sys.exit(0)
should_continue = result.get("continue", False)
reason = result.get("reason", "Tasks may be incomplete")
if should_continue:
print(json.dumps(make_block_decision(reason)))
# If not continuing, no output = allow stop
sys.exit(0)
if __name__ == "__main__":
main()
{"type": "user", "message": {"content": "Fix all the PR review comments on this branch"}}
{"type": "assistant", "message": {"content": [{"type": "text", "text": "I'll help you fix the PR review comments. Let me create a task list."}, {"type": "tool_use", "name": "TaskCreate", "input": {"subject": "Determine the PR", "activeForm": "Determining PR"}}]}}
{"type": "tool_result", "content": "Task 1 created"}
{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskCreate", "input": {"subject": "Fetch review threads", "activeForm": "Fetching threads"}}]}}
{"type": "tool_result", "content": "Task 2 created"}
{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskCreate", "input": {"subject": "Address comments", "activeForm": "Addressing comments"}}]}}
{"type": "tool_result", "content": "Task 3 created"}
{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskUpdate", "input": {"taskId": "1", "status": "in_progress"}}]}}
{"type": "tool_result", "content": "Task 1 in_progress"}
{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskUpdate", "input": {"taskId": "1", "status": "completed"}}]}}
{"type": "tool_result", "content": "Task 1 completed"}
{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskUpdate", "input": {"taskId": "2", "status": "in_progress"}}]}}
{"type": "tool_result", "content": "Task 2 in_progress"}
{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskUpdate", "input": {"taskId": "2", "status": "completed"}}]}}
{"type": "tool_result", "content": "Task 2 completed"}
{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskUpdate", "input": {"taskId": "3", "status": "in_progress"}}]}}
{"type": "tool_result", "content": "Task 3 in_progress"}
{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "Edit", "input": {"file_path": "src/main.ts"}}]}}
{"type": "tool_result", "content": "File edited"}
{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskUpdate", "input": {"taskId": "3", "status": "completed"}}]}}
{"type": "tool_result", "content": "Task 3 completed"}
{"type": "assistant", "message": {"content": [{"type": "text", "text": "I've completed all the PR review comments. Here's a summary:\n\n1. Found PR #123\n2. Fetched 2 review threads\n3. Fixed both issues by updating src/main.ts\n\nAll tasks are complete. The changes have been committed and pushed."}]}}
{"type": "user", "message": {"content": "Fix all the PR review comments on this branch"}}
{"type": "assistant", "message": {"content": [{"type": "text", "text": "I'll help you fix the PR review comments. Let me create a task list to track progress."}, {"type": "tool_use", "name": "TaskCreate", "input": {"subject": "Determine the PR to work on", "description": "Get the current branch's PR", "activeForm": "Determining PR"}}]}}
{"type": "tool_result", "content": "Task 1 created"}
{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskCreate", "input": {"subject": "Fetch unresolved PR review threads", "description": "Get all review threads", "activeForm": "Fetching review threads"}}]}}
{"type": "tool_result", "content": "Task 2 created"}
{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskCreate", "input": {"subject": "Process and address review comments", "description": "Fix each comment", "activeForm": "Processing comments"}}]}}
{"type": "tool_result", "content": "Task 3 created"}
{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskCreate", "input": {"subject": "Verify, commit, and push changes", "description": "Run lint, commit, push", "activeForm": "Verifying and pushing"}}]}}
{"type": "tool_result", "content": "Task 4 created"}
{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskCreate", "input": {"subject": "Provide summary to user", "description": "Report what was done", "activeForm": "Providing summary"}}]}}
{"type": "tool_result", "content": "Task 5 created"}
{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskUpdate", "input": {"taskId": "1", "status": "in_progress"}}]}}
{"type": "tool_result", "content": "Task 1 updated to in_progress"}
{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "Bash", "input": {"command": "gh pr view --json number"}}]}}
{"type": "tool_result", "content": "PR #123"}
{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskUpdate", "input": {"taskId": "1", "status": "completed"}}]}}
{"type": "tool_result", "content": "Task 1 completed"}
{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskUpdate", "input": {"taskId": "2", "status": "in_progress"}}]}}
{"type": "tool_result", "content": "Task 2 updated to in_progress"}
{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "Bash", "input": {"command": "gh api graphql"}}]}}
{"type": "tool_result", "content": "Found 2 review threads"}
{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskUpdate", "input": {"taskId": "2", "status": "completed"}}]}}
{"type": "tool_result", "content": "Task 2 completed"}
{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskUpdate", "input": {"taskId": "3", "status": "in_progress"}}]}}
{"type": "tool_result", "content": "Task 3 updated to in_progress"}
{"type": "assistant", "message": {"content": [{"type": "text", "text": "I found 2 review comments to address."}, {"type": "tool_use", "name": "Edit", "input": {"file_path": "src/main.ts", "old_string": "foo", "new_string": "bar"}}]}}
{"type": "tool_result", "content": "File edited"}
{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskUpdate", "input": {"taskId": "3", "status": "completed"}}]}}
{"type": "tool_result", "content": "Task 3 completed"}
{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskUpdate", "input": {"taskId": "4", "status": "in_progress"}}]}}
{"type": "tool_result", "content": "Task 4 updated to in_progress"}
{"type": "assistant", "message": {"content": [{"type": "text", "text": "Now running lint checks..."}, {"type": "tool_use", "name": "Bash", "input": {"command": "npm run lint"}}]}}
{"type": "tool_result", "content": "Lint passed"}
#!/usr/bin/env python3
"""
Tests for the AI-powered Stop hook.
This is a Stop hook that runs when Claude is about to stop working.
It can block stopping to force continuation if tasks are incomplete.
Response format: { "decision": "block", "reason": "..." } or no output to allow stop
"""
import json
import subprocess
import sys
from pathlib import Path
import pytest
# Get the hook path
HOOK_PATH = Path(__file__).parent.parent / "stop-hook.py"
def run_hook(input_data: dict) -> tuple[int, str]:
"""Run the hook with the given input and return (returncode, stdout)."""
input_json = json.dumps(input_data)
result = subprocess.run(
[sys.executable, str(HOOK_PATH)],
input=input_json,
capture_output=True,
text=True,
)
return result.returncode, result.stdout
def parse_response(stdout: str) -> dict | None:
"""Parse the hook response, return None if empty/invalid."""
if not stdout.strip():
return None
try:
return json.loads(stdout)
except json.JSONDecodeError:
return None
class TestHookBasics:
"""Test basic hook behavior without AI."""
def test_invalid_json_allows_stop(self):
"""Invalid JSON should allow stop (exit 0, no output)."""
result = subprocess.run(
[sys.executable, str(HOOK_PATH)],
input="not valid json",
capture_output=True,
text=True,
)
assert result.returncode == 0
assert result.stdout == ""
def test_stop_hook_active_allows_stop(self):
"""When stop_hook_active is true, should allow stop to prevent infinite loop."""
returncode, stdout = run_hook({
"session_id": "test",
"transcript_path": "/nonexistent/path",
"cwd": "/tmp",
"stop_hook_active": True
})
assert returncode == 0
assert stdout == ""
def test_missing_transcript_allows_stop(self):
"""Missing transcript should allow stop."""
returncode, stdout = run_hook({
"session_id": "test",
"cwd": "/tmp",
"stop_hook_active": False
})
assert returncode == 0
assert stdout == ""
def test_nonexistent_transcript_allows_stop(self):
"""Nonexistent transcript path should allow stop."""
returncode, stdout = run_hook({
"session_id": "test",
"transcript_path": "/nonexistent/path/to/transcript.jsonl",
"cwd": "/tmp",
"stop_hook_active": False
})
assert returncode == 0
assert stdout == ""
class TestNoCLI:
"""Test behavior when claude CLI is not available."""
def test_analyze_returns_none_when_no_cli(self, tmp_path, monkeypatch):
"""analyze_with_claude should return None when claude CLI is not found."""
module = load_hook_module()
# Mock get_claude_path to return None (simulating no CLI available)
monkeypatch.setattr(module, "get_claude_path", lambda: None)
assert module.analyze_with_claude("test transcript", str(tmp_path)) is None
def test_no_claude_cli_allows_stop(self, tmp_path):
"""Without claude CLI, should allow stop (no AI analysis possible)."""
# Create a minimal transcript
transcript = tmp_path / "transcript.jsonl"
transcript.write_text('{"type": "user", "message": {"content": "hello"}}\n')
returncode, _ = run_hook({
"session_id": "test",
"transcript_path": str(transcript),
"cwd": str(tmp_path),
"stop_hook_active": False
})
assert returncode == 0
# The hook should complete without error (output depends on CLI availability)
class TestAnalyzeWithClaude:
"""Test analyze_with_claude function with mocked subprocess."""
def test_parses_valid_json_response(self, tmp_path, monkeypatch):
"""Should correctly parse a valid JSON response from CLI."""
module = load_hook_module()
monkeypatch.setattr(module, "get_claude_path", lambda: "/usr/bin/claude")
mock_result = subprocess.CompletedProcess(
args=[],
returncode=0,
stdout='{"continue": true, "reason": "Tasks incomplete"}',
stderr=""
)
monkeypatch.setattr(subprocess, "run", lambda *_args, **_kwargs: mock_result)
result = module.analyze_with_claude("test transcript", str(tmp_path))
assert result is not None
assert result["continue"] is True
assert result["reason"] == "Tasks incomplete"
def test_parses_json_in_markdown_code_fence(self, tmp_path, monkeypatch):
"""Should extract JSON wrapped in markdown code fences."""
module = load_hook_module()
monkeypatch.setattr(module, "get_claude_path", lambda: "/usr/bin/claude")
mock_result = subprocess.CompletedProcess(
args=[],
returncode=0,
stdout='```json\n{"continue": false}\n```',
stderr=""
)
monkeypatch.setattr(subprocess, "run", lambda *_args, **_kwargs: mock_result)
result = module.analyze_with_claude("test transcript", str(tmp_path))
assert result is not None
assert result["continue"] is False
def test_returns_none_for_no_json(self, tmp_path, monkeypatch):
"""Should return None when response contains no JSON."""
module = load_hook_module()
monkeypatch.setattr(module, "get_claude_path", lambda: "/usr/bin/claude")
mock_result = subprocess.CompletedProcess(
args=[],
returncode=0,
stdout="I think the tasks are complete.",
stderr=""
)
monkeypatch.setattr(subprocess, "run", lambda *_args, **_kwargs: mock_result)
result = module.analyze_with_claude("test transcript", str(tmp_path))
assert result is None
def test_returns_none_for_malformed_json(self, tmp_path, monkeypatch):
"""Should return None when JSON is malformed."""
module = load_hook_module()
monkeypatch.setattr(module, "get_claude_path", lambda: "/usr/bin/claude")
mock_result = subprocess.CompletedProcess(
args=[],
returncode=0,
stdout='{"continue": true, reason: incomplete}', # Missing quotes
stderr=""
)
monkeypatch.setattr(subprocess, "run", lambda *_args, **_kwargs: mock_result)
result = module.analyze_with_claude("test transcript", str(tmp_path))
assert result is None
def test_returns_none_on_nonzero_returncode(self, tmp_path, monkeypatch):
"""Should return None when subprocess returns non-zero exit code."""
module = load_hook_module()
monkeypatch.setattr(module, "get_claude_path", lambda: "/usr/bin/claude")
mock_result = subprocess.CompletedProcess(
args=[],
returncode=1,
stdout='{"continue": true}',
stderr="Error: API rate limit"
)
monkeypatch.setattr(subprocess, "run", lambda *_args, **_kwargs: mock_result)
result = module.analyze_with_claude("test transcript", str(tmp_path))
assert result is None
def test_returns_none_on_subprocess_error(self, tmp_path, monkeypatch):
"""Should return None when subprocess raises an error."""
module = load_hook_module()
monkeypatch.setattr(module, "get_claude_path", lambda: "/usr/bin/claude")
def raise_error(*_args, **_kwargs):
raise subprocess.TimeoutExpired(cmd="claude", timeout=25)
monkeypatch.setattr(subprocess, "run", raise_error)
result = module.analyze_with_claude("test transcript", str(tmp_path))
assert result is None
def test_returns_none_for_empty_transcript(self, tmp_path, monkeypatch):
"""Should return None when transcript is empty."""
module = load_hook_module()
monkeypatch.setattr(module, "get_claude_path", lambda: "/usr/bin/claude")
result = module.analyze_with_claude("", str(tmp_path))
assert result is None
class TestResponseFormat:
"""Test that responses follow Stop hook format."""
def test_response_format_documented(self):
"""Response format should be documented in the hook."""
hook_content = HOOK_PATH.read_text()
assert '"decision"' in hook_content
assert '"block"' in hook_content
assert '"reason"' in hook_content
def test_hook_checks_stop_hook_active(self):
"""Hook should check stop_hook_active to prevent infinite loops."""
hook_content = HOOK_PATH.read_text()
assert "stop_hook_active" in hook_content
def load_hook_module():
"""Load the stop hook module for testing."""
import importlib.util
spec = importlib.util.spec_from_file_location("stop_hook", HOOK_PATH)
assert spec is not None
module = importlib.util.module_from_spec(spec)
assert spec.loader is not None
spec.loader.exec_module(module)
return module
class TestTaskStateExtraction:
"""Test task state extraction from transcripts."""
def test_extracts_incomplete_tasks(self):
"""Should correctly identify remaining tasks from incomplete_tasks fixture.
The fixture has:
- 5 tasks created
- Tasks 1, 2, 3 completed
- Task 4 in_progress (not completed)
- Task 5 pending (never started)
=> 2 tasks remaining
"""
module = load_hook_module()
fixture_path = Path(__file__).parent / "fixtures" / "incomplete_tasks.jsonl"
result = module.extract_task_state(str(fixture_path))
assert result["total"] == 5
assert len(result["remaining"]) == 2
# Check the specific remaining tasks
remaining_ids = [task_id for task_id, _, _ in result["remaining"]]
assert "4" in remaining_ids
assert "5" in remaining_ids
# Task 4 should be in_progress
task4 = [(tid, subj, status) for tid, subj, status in result["remaining"] if tid == "4"][0]
assert task4[2] == "in_progress"
assert "Verify" in task4[1] or "commit" in task4[1] or "push" in task4[1]
# Task 5 should be pending
task5 = [(tid, subj, status) for tid, subj, status in result["remaining"] if tid == "5"][0]
assert task5[2] == "pending"
assert "summary" in task5[1].lower()
def test_extracts_completed_tasks(self):
"""Should correctly identify all tasks completed from completed_tasks fixture."""
module = load_hook_module()
fixture_path = Path(__file__).parent / "fixtures" / "completed_tasks.jsonl"
result = module.extract_task_state(str(fixture_path))
assert result["total"] == 3
assert len(result["remaining"]) == 0
def test_handles_nonexistent_file(self):
"""Should handle nonexistent transcript gracefully."""
module = load_hook_module()
result = module.extract_task_state("/nonexistent/path.jsonl")
assert result["total"] == 0
assert len(result["remaining"]) == 0
def test_handles_no_tasks(self, tmp_path):
"""Should handle transcript with no tasks."""
module = load_hook_module()
transcript = tmp_path / "no_tasks.jsonl"
transcript.write_text('{"type": "user", "message": {"content": "hello"}}\n')
result = module.extract_task_state(str(transcript))
assert result["total"] == 0
assert len(result["remaining"]) == 0
class TestRemainingTasksBlocking:
"""Test that hook blocks when tasks are remaining."""
def test_blocks_with_remaining_tasks(self):
"""Hook should block and list remaining tasks when incomplete."""
fixture_path = Path(__file__).parent / "fixtures" / "incomplete_tasks.jsonl"
returncode, stdout = run_hook({
"session_id": "test",
"transcript_path": str(fixture_path),
"cwd": "/tmp",
"stop_hook_active": False
})
assert returncode == 0
response = parse_response(stdout)
assert response is not None
assert response["decision"] == "block"
# Reason should mention the count
assert "2 of 5 tasks remaining" in response["reason"]
# Reason should list the specific tasks
assert "Task 4" in response["reason"]
assert "Task 5" in response["reason"]
assert "in_progress" in response["reason"]
assert "pending" in response["reason"]
def test_allows_stop_with_all_tasks_completed(self, monkeypatch):
"""Hook should allow stop (via AI check) when all tasks are completed."""
fixture_path = Path(__file__).parent / "fixtures" / "completed_tasks.jsonl"
# Remove claude from PATH so AI check is skipped
monkeypatch.setenv("PATH", "/nonexistent")
returncode, stdout = run_hook({
"session_id": "test",
"transcript_path": str(fixture_path),
"cwd": "/tmp",
"stop_hook_active": False
})
assert returncode == 0
# No remaining tasks, and no AI available, so should allow stop
assert parse_response(stdout) is None
class TestTranscriptReading:
"""Test transcript reading functionality."""
def test_reads_incomplete_tasks_fixture(self):
"""Should correctly parse the incomplete_tasks fixture.
This fixture represents a real scenario where:
- 5 tasks were created
- Tasks 1, 2, 3 were completed
- Task 4 is in_progress (never completed)
- Task 5 is pending (never started)
=> 2 tasks remaining
"""
module = load_hook_module()
fixture_path = Path(__file__).parent / "fixtures" / "incomplete_tasks.jsonl"
result = module.read_transcript(str(fixture_path))
# Should contain the user request
assert "USER:" in result
assert "Fix all the PR review comments" in result
# Should show task creation
assert "TaskCreate" in result
# Should show the incomplete state - last message is about lint checks
assert "lint checks" in result
# Should NOT have a completion summary
assert "All tasks are complete" not in result
assert "summary" not in result.lower() or "Providing summary" not in result
def test_reads_completed_tasks_fixture(self):
"""Should correctly parse the completed_tasks fixture.
This fixture represents a scenario where all tasks completed successfully.
"""
module = load_hook_module()
fixture_path = Path(__file__).parent / "fixtures" / "completed_tasks.jsonl"
result = module.read_transcript(str(fixture_path))
# Should contain the user request
assert "USER:" in result
assert "Fix all the PR review comments" in result
# Should have a completion summary at the end
assert "completed all the PR review comments" in result
assert "All tasks are complete" in result
def test_reads_user_messages(self, tmp_path):
"""Should be able to read user messages from transcript."""
module = load_hook_module()
transcript = tmp_path / "transcript.jsonl"
transcript.write_text(
'{"type": "user", "message": {"content": "test message"}}\n'
)
result = module.read_transcript(str(transcript))
assert "USER:" in result
assert "test message" in result
def test_reads_assistant_messages(self, tmp_path):
"""Should be able to read assistant messages from transcript."""
module = load_hook_module()
transcript = tmp_path / "transcript.jsonl"
transcript.write_text(
'{"type": "assistant", "message": {"content": [{"type": "text", "text": "response text"}]}}\n'
)
result = module.read_transcript(str(transcript))
assert "ASSISTANT:" in result
assert "response text" in result
def test_truncates_large_transcripts(self, tmp_path):
"""Should truncate large transcripts from the middle, keeping beginning and end."""
module = load_hook_module()
transcript = tmp_path / "transcript.jsonl"
# Create a large transcript with many messages
lines = []
for i in range(100):
lines.append(f'{{"type": "user", "message": {{"content": "message {i} with some extra content to make it longer"}}}}')
transcript.write_text("\n".join(lines))
# With a small max_chars, should truncate from the middle
result = module.read_transcript(str(transcript), max_chars=500)
assert len(result) <= 600 # Allow buffer for truncation marker
assert "...(middle truncated)..." in result
# Should keep beginning messages (lower numbers)
assert "message 0" in result or "message 1" in result
# Should keep end messages (higher numbers)
assert "message 99" in result or "message 98" in result
if __name__ == "__main__":
pytest.main([__file__, "-v"])
......@@ -154,9 +154,9 @@
{
"hooks": [
{
"type": "prompt",
"prompt": "You are evaluating whether Claude should stop working. Your default bias should be to CONTINUE WORKING (ok: false). Only allow stopping when absolutely certain all work is complete.\n\nContext: $ARGUMENTS\n\nAnalyze the conversation and determine:\n\n1. TASK COMPLETION CHECK\n - Has EVERY task the user requested been fully completed?\n - Not just acknowledged, but actually done with verifiable output?\n - Are there any implied subtasks that haven't been addressed?\n\n2. ERROR CHECK\n - Did any errors occur that weren't resolved?\n - Are there failed operations that need retry?\n - Did Claude say it would do something but didn't actually do it?\n\n3. FOLLOW-UP CHECK\n - Is there obvious next-step work that should be done?\n - Did the user's request imply multiple steps where only some were completed?\n - Would a thorough assistant do more here?\n\n4. QUALITY CHECK\n - Was the work done to completion, not just partially?\n - Are there placeholders, TODOs, or incomplete sections?\n - Did Claude stop prematurely mid-task?\n\nDECISION RULES:\n- If ANY task is incomplete → continue (ok: false)\n- If ANY error is unresolved → continue (ok: false)\n- If there's reasonable follow-up work → continue (ok: false)\n- If Claude said \"I'll do X\" but didn't → continue (ok: false)\n- If work quality is partial/rushed → continue (ok: false)\n- ONLY return ok: true if ALL tasks are genuinely, fully complete AND no errors exist AND no reasonable follow-up remains\n\nRespond with JSON:\n- {\"ok\": false, \"reason\": \"specific explanation of what still needs to be done\"} — DEFAULT, use this unless everything is truly complete\n- {\"ok\": true} — ONLY when absolutely certain all work is finished and nothing remains",
"timeout": 30
"type": "command",
"command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/stop-hook.py",
"timeout": 30000
}
]
}
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论