Replace prompt-based stop hook with Sonnet-powered analysis (#2331)

## Summary - Replace broken prompt-based Stop hook with command-based hook using Claude Sonnet - Add .claude/hooks/stop-hook.py that reads conversation transcript and uses Sonnet to analyze task completion - Includes infinite loop prevention via stop_hook_active check - Add unit tests for the stop hook ## Test plan - [x] Run pytest .claude/hooks/tests/test_stop_hook.py -v - all 9 tests pass - [ ] Manual testing: verify stop hook fires and correctly analyzes task completion #skip-bugbot  --- ## Summary by cubic Replaced the broken prompt-based stop hook with a command-based hook that blocks when tasks remain and uses Sonnet analysis as a fallback. Adds loop protection and tests. - **New Features** - Added .claude/hooks/stop-hook.py that blocks when TaskCreate/TaskUpdate show remaining tasks, returning {"decision":"block","reason":...}. If none remain, it analyzes a 32k transcript (middle truncation) with Sonnet. - Added unit tests and a stop_hook_active guard to prevent infinite loops. - **Refactors** - Updated .claude/settings.json to use the command-based hook (30000 ms timeout) instead of the prompt hook. - Added --no-session-persistence to Claude CLI calls in stop-hook.py and permission-request-hook.py. <sup>Written for commit 575426cee9efb0fa7e1f4be64a8405ae2e717a3b. Summary will update on new commits.</sup>   --- <a href="https://app.devin.ai/review/dyad-sh/dyad/pull/2331"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a>  --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

Replace prompt-based stop hook with Sonnet-powered analysis (#2331)
6ba51165 · Will Chen · GitHub · 67fe5048 · 6ba51165 · 6ba51165
--- a/.claude/hooks/permission-request-hook.py
+++ b/.claude/hooks/permission-request-hook.py
@@ -84,6 +84,7 @@ Analyze this request and provide your safety assessment. Respond with ONLY a JSO
                "--print",
                "--output-format", "text",
                "--model", "sonnet",
+                "--no-session-persistence",
                prompt
            ],
            capture_output=True,

--- a/.claude/hooks/stop-hook.py
+++ b/.claude/hooks/stop-hook.py
+#!/usr/bin/env python3
+"""
+AI-Powered Stop Hook
+
+This is a Stop hook that runs when Claude is about to stop working.
+It uses Claude Sonnet to analyze the transcript and determine whether
+Claude should continue working or is allowed to stop.
+
+The hook is designed to catch premature stopping when tasks are incomplete.
+
+Usage:
+    Receives JSON on stdin with session info including transcript_path
+    Outputs JSON with decision="block" and reason to continue, or no output to allow stop
+"""
+import json
+import os
+import shutil
+import subprocess
+import sys
+from pathlib import Path
+
+
+def extract_task_state(transcript_path: str) -> dict:
+    """Extract task state from TaskCreate and TaskUpdate tool calls.
+
+    Returns a dict with:
+        - tasks: dict mapping task_id to {subject, status}
+        - remaining: list of (task_id, subject, status) for incomplete tasks
+        - total: total number of tasks created
+    """
+    tasks: dict[str, dict] = {}
+
+    try:
+        path = Path(transcript_path).expanduser()
+        if not path.exists():
+            return {"tasks": {}, "remaining": [], "total": 0}
+
+        lines = path.read_text().strip().split("\n")
+
+        for line in lines:
+            try:
+                entry = json.loads(line)
+                if entry.get("type") != "assistant":
+                    continue
+
+                content = entry.get("message", {}).get("content", [])
+                for part in content:
+                    if part.get("type") != "tool_use":
+                        continue
+
+                    tool_name = part.get("name", "")
+                    tool_input = part.get("input", {})
+
+                    if tool_name == "TaskCreate":
+                        # TaskCreate assigns sequential IDs starting from 1
+                        task_id = str(len(tasks) + 1)
+                        subject = tool_input.get("subject", f"Task {task_id}")
+                        tasks[task_id] = {
+                            "subject": subject,
+                            "status": "pending"
+                        }
+
+                    elif tool_name == "TaskUpdate":
+                        task_id = tool_input.get("taskId", "")
+                        if task_id and task_id in tasks:
+                            if "status" in tool_input:
+                                tasks[task_id]["status"] = tool_input["status"]
+                            if "subject" in tool_input:
+                                tasks[task_id]["subject"] = tool_input["subject"]
+
+            except json.JSONDecodeError:
+                continue
+
+    except Exception:
+        return {"tasks": {}, "remaining": [], "total": 0}
+
+    # Find remaining (non-completed) tasks
+    remaining = [
+        (task_id, info["subject"], info["status"])
+        for task_id, info in tasks.items()
+        if info["status"] != "completed"
+    ]
+
+    return {
+        "tasks": tasks,
+        "remaining": remaining,
+        "total": len(tasks)
+    }
+
+
+def get_claude_path() -> str | None:
+    """Find the claude CLI path."""
+    claude_path = shutil.which("claude")
+    if claude_path:
+        return claude_path
+
+    home = Path.home()
+    default_path = home / ".claude" / "local" / "claude"
+    if default_path.exists() and os.access(default_path, os.X_OK):
+        return str(default_path)
+
+    return None
+
+
+def read_transcript(transcript_path: str, max_chars: int = 32000) -> str:
+    """Read and format the transcript for analysis.
+
+    Includes content from the beginning and end of the transcript,
+    truncating from the middle if needed to stay within limits.
+
+    Args:
+        transcript_path: Path to the JSONL transcript file
+        max_chars: Maximum characters for the output (default 32000, ~8000 tokens)
+    """
+    try:
+        path = Path(transcript_path).expanduser()
+        if not path.exists():
+            return ""
+
+        lines = path.read_text().strip().split("\n")
+
+        formatted = []
+        for line in lines:
+            try:
+                entry = json.loads(line)
+                msg_type = entry.get("type", "unknown")
+
+                if msg_type == "user":
+                    content = entry.get("message", {}).get("content", "")
+                    if isinstance(content, list):
+                        text_parts = [
+                            p.get("text", "") for p in content if p.get("type") == "text"
+                        ]
+                        content = " ".join(text_parts)
+                    formatted.append(f"USER: {content[:500]}")
+
+                elif msg_type == "assistant":
+                    content = entry.get("message", {}).get("content", [])
+                    text_parts = []
+                    tool_uses = []
+                    for part in content:
+                        if part.get("type") == "text":
+                            text_parts.append(part.get("text", "")[:300])
+                        elif part.get("type") == "tool_use":
+                            tool_uses.append(part.get("name", "unknown"))
+                    if text_parts:
+                        formatted.append(f"ASSISTANT: {' '.join(text_parts)}")
+                    if tool_uses:
+                        formatted.append(f"ASSISTANT TOOLS: {', '.join(tool_uses)}")
+
+                elif msg_type == "tool_result":
+                    # Just note that a tool result came back
+                    formatted.append("TOOL_RESULT: (received)")
+
+            except json.JSONDecodeError:
+                continue
+
+        result = "\n".join(formatted)
+        if len(result) > max_chars:
+            # Keep beginning and end, truncate from middle
+            # Reserve ~40% for beginning, ~60% for end (end is more important)
+            begin_budget = int(max_chars * 0.4)
+            end_budget = max_chars - begin_budget - 50  # 50 chars for truncation marker
+
+            begin_part = result[:begin_budget]
+            end_part = result[-end_budget:]
+
+            result = begin_part + "\n\n...(middle truncated)...\n\n" + end_part
+        return result
+
+    except OSError:
+        return ""
+
+
+def analyze_with_claude(transcript: str, cwd: str) -> dict | None:
+    """
+    Use Claude CLI to analyze whether Claude should continue working.
+    Returns dict with 'continue' (bool) and 'reason' (str) or None on failure.
+    """
+    claude_path = get_claude_path()
+    if not claude_path:
+        return None
+
+    if not transcript:
+        return None
+
+    project_dir = os.environ.get("CLAUDE_PROJECT_DIR", cwd)
+
+    prompt = f"""You are evaluating whether Claude Code should stop working or continue.
+
+## Recent Conversation
+{transcript}
+
+## Analysis Instructions
+
+Analyze the conversation above and determine if Claude should CONTINUE working or is allowed to STOP.
+
+CONTINUE (block stopping) if ANY of these are true:
+- Tasks the user requested are not fully completed
+- Errors occurred that weren't resolved
+- Claude said it would do something but didn't actually do it
+- There are obvious next steps that should be done
+- Work quality appears partial or incomplete
+- There are failing tests or unresolved issues
+
+ALLOW STOP if ALL of these are true:
+- All requested tasks are genuinely, fully complete
+- No unresolved errors exist
+- No obvious follow-up work remains
+- Claude has provided a clear summary or completion message
+
+Respond with ONLY a JSON object:
+{{"continue": true, "reason": "specific explanation of what still needs to be done"}}
+OR
+{{"continue": false}}
+
+JSON response:"""
+
+    try:
+        # Use stdin for prompt to avoid command-line length limits with large transcripts
+        # Timeout is 25s (5s margin under the 30s hook timeout in settings.json)
+        result = subprocess.run(
+            [
+                claude_path,
+                "--print",
+                "--output-format", "text",
+                "--model", "sonnet",
+                "--no-session-persistence",
+                "-p", "-"
+            ],
+            input=prompt,
+            capture_output=True,
+            text=True,
+            timeout=25,
+            cwd=project_dir,
+        )
+
+        if result.returncode != 0:
+            return None
+
+        response_text = result.stdout.strip()
+
+        # Try direct JSON parse
+        try:
+            parsed = json.loads(response_text)
+            if "continue" in parsed:
+                return parsed
+        except json.JSONDecodeError:
+            pass
+
+        # Extract JSON from response (handle markdown code fences, etc.)
+        start_indices = [i for i, c in enumerate(response_text) if c == '{']
+        for start in start_indices:
+            depth = 0
+            in_string = False
+            escape_next = False
+            for i, c in enumerate(response_text[start:], start):
+                if escape_next:
+                    escape_next = False
+                    continue
+                if c == '\\' and in_string:
+                    escape_next = True
+                    continue
+                if c == '"' and not escape_next:
+                    in_string = not in_string
+                    continue
+                if not in_string:
+                    if c == '{':
+                        depth += 1
+                    elif c == '}':
+                        depth -= 1
+                        if depth == 0:
+                            candidate = response_text[start:i + 1]
+                            try:
+                                parsed = json.loads(candidate)
+                                if "continue" in parsed:
+                                    return parsed
+                            except json.JSONDecodeError:
+                                pass
+                            break
+
+        return None
+
+    except (subprocess.SubprocessError, OSError):
+        return None
+
+
+def make_block_decision(reason: str) -> dict:
+    """Block Claude from stopping - force continuation."""
+    return {
+        "decision": "block",
+        "reason": reason
+    }
+
+
+def main():
+    try:
+        input_data = json.load(sys.stdin)
+    except json.JSONDecodeError:
+        # Can't parse input, allow stop
+        sys.exit(0)
+
+    # Check for infinite loop prevention
+    if input_data.get("stop_hook_active", False):
+        # Already continuing due to stop hook, allow stop to prevent infinite loop
+        sys.exit(0)
+
+    transcript_path = input_data.get("transcript_path", "")
+    cwd = input_data.get("cwd", os.getcwd())
+
+    # First, check for remaining tasks (deterministic check)
+    if transcript_path:
+        task_state = extract_task_state(transcript_path)
+        remaining = task_state["remaining"]
+        total = task_state["total"]
+
+        if remaining:
+            # Build detailed reason with remaining tasks
+            task_list = "\n".join(
+                f"  - Task {task_id}: \"{subject}\" (status: {status})"
+                for task_id, subject, status in remaining
+            )
+            reason = (
+                f"{len(remaining)} of {total} tasks remaining:\n{task_list}\n\n"
+                f"Please complete all tasks before stopping."
+            )
+            print(json.dumps(make_block_decision(reason)))
+            sys.exit(0)
+
+    # If no remaining tasks (or no tasks at all), fall back to AI analysis
+    transcript = read_transcript(transcript_path)
+    if not transcript:
+        # No transcript to analyze, allow stop
+        sys.exit(0)
+
+    result = analyze_with_claude(transcript, cwd)
+
+    if result is None:
+        # Analysis failed, allow stop
+        sys.exit(0)
+
+    should_continue = result.get("continue", False)
+    reason = result.get("reason", "Tasks may be incomplete")
+
+    if should_continue:
+        print(json.dumps(make_block_decision(reason)))
+
+    # If not continuing, no output = allow stop
+    sys.exit(0)
+
+
+if __name__ == "__main__":
+    main()
--- a/.claude/hooks/tests/fixtures/completed_tasks.jsonl
+++ b/.claude/hooks/tests/fixtures/completed_tasks.jsonl
+{"type": "user", "message": {"content": "Fix all the PR review comments on this branch"}}
+{"type": "assistant", "message": {"content": [{"type": "text", "text": "I'll help you fix the PR review comments. Let me create a task list."}, {"type": "tool_use", "name": "TaskCreate", "input": {"subject": "Determine the PR", "activeForm": "Determining PR"}}]}}
+{"type": "tool_result", "content": "Task 1 created"}
+{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskCreate", "input": {"subject": "Fetch review threads", "activeForm": "Fetching threads"}}]}}
+{"type": "tool_result", "content": "Task 2 created"}
+{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskCreate", "input": {"subject": "Address comments", "activeForm": "Addressing comments"}}]}}
+{"type": "tool_result", "content": "Task 3 created"}
+{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskUpdate", "input": {"taskId": "1", "status": "in_progress"}}]}}
+{"type": "tool_result", "content": "Task 1 in_progress"}
+{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskUpdate", "input": {"taskId": "1", "status": "completed"}}]}}
+{"type": "tool_result", "content": "Task 1 completed"}
+{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskUpdate", "input": {"taskId": "2", "status": "in_progress"}}]}}
+{"type": "tool_result", "content": "Task 2 in_progress"}
+{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskUpdate", "input": {"taskId": "2", "status": "completed"}}]}}
+{"type": "tool_result", "content": "Task 2 completed"}
+{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskUpdate", "input": {"taskId": "3", "status": "in_progress"}}]}}
+{"type": "tool_result", "content": "Task 3 in_progress"}
+{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "Edit", "input": {"file_path": "src/main.ts"}}]}}
+{"type": "tool_result", "content": "File edited"}
+{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskUpdate", "input": {"taskId": "3", "status": "completed"}}]}}
+{"type": "tool_result", "content": "Task 3 completed"}
+{"type": "assistant", "message": {"content": [{"type": "text", "text": "I've completed all the PR review comments. Here's a summary:\n\n1. Found PR #123\n2. Fetched 2 review threads\n3. Fixed both issues by updating src/main.ts\n\nAll tasks are complete. The changes have been committed and pushed."}]}}
--- a/.claude/hooks/tests/fixtures/incomplete_tasks.jsonl
+++ b/.claude/hooks/tests/fixtures/incomplete_tasks.jsonl
+{"type": "user", "message": {"content": "Fix all the PR review comments on this branch"}}
+{"type": "assistant", "message": {"content": [{"type": "text", "text": "I'll help you fix the PR review comments. Let me create a task list to track progress."}, {"type": "tool_use", "name": "TaskCreate", "input": {"subject": "Determine the PR to work on", "description": "Get the current branch's PR", "activeForm": "Determining PR"}}]}}
+{"type": "tool_result", "content": "Task 1 created"}
+{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskCreate", "input": {"subject": "Fetch unresolved PR review threads", "description": "Get all review threads", "activeForm": "Fetching review threads"}}]}}
+{"type": "tool_result", "content": "Task 2 created"}
+{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskCreate", "input": {"subject": "Process and address review comments", "description": "Fix each comment", "activeForm": "Processing comments"}}]}}
+{"type": "tool_result", "content": "Task 3 created"}
+{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskCreate", "input": {"subject": "Verify, commit, and push changes", "description": "Run lint, commit, push", "activeForm": "Verifying and pushing"}}]}}
+{"type": "tool_result", "content": "Task 4 created"}
+{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskCreate", "input": {"subject": "Provide summary to user", "description": "Report what was done", "activeForm": "Providing summary"}}]}}
+{"type": "tool_result", "content": "Task 5 created"}
+{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskUpdate", "input": {"taskId": "1", "status": "in_progress"}}]}}
+{"type": "tool_result", "content": "Task 1 updated to in_progress"}
+{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "Bash", "input": {"command": "gh pr view --json number"}}]}}
+{"type": "tool_result", "content": "PR #123"}
+{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskUpdate", "input": {"taskId": "1", "status": "completed"}}]}}
+{"type": "tool_result", "content": "Task 1 completed"}
+{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskUpdate", "input": {"taskId": "2", "status": "in_progress"}}]}}
+{"type": "tool_result", "content": "Task 2 updated to in_progress"}
+{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "Bash", "input": {"command": "gh api graphql"}}]}}
+{"type": "tool_result", "content": "Found 2 review threads"}
+{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskUpdate", "input": {"taskId": "2", "status": "completed"}}]}}
+{"type": "tool_result", "content": "Task 2 completed"}
+{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskUpdate", "input": {"taskId": "3", "status": "in_progress"}}]}}
+{"type": "tool_result", "content": "Task 3 updated to in_progress"}
+{"type": "assistant", "message": {"content": [{"type": "text", "text": "I found 2 review comments to address."}, {"type": "tool_use", "name": "Edit", "input": {"file_path": "src/main.ts", "old_string": "foo", "new_string": "bar"}}]}}
+{"type": "tool_result", "content": "File edited"}
+{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskUpdate", "input": {"taskId": "3", "status": "completed"}}]}}
+{"type": "tool_result", "content": "Task 3 completed"}
+{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "TaskUpdate", "input": {"taskId": "4", "status": "in_progress"}}]}}
+{"type": "tool_result", "content": "Task 4 updated to in_progress"}
+{"type": "assistant", "message": {"content": [{"type": "text", "text": "Now running lint checks..."}, {"type": "tool_use", "name": "Bash", "input": {"command": "npm run lint"}}]}}
+{"type": "tool_result", "content": "Lint passed"}
--- a/.claude/hooks/tests/test_stop_hook.py
+++ b/.claude/hooks/tests/test_stop_hook.py
+#!/usr/bin/env python3
+"""
+Tests for the AI-powered Stop hook.
+
+This is a Stop hook that runs when Claude is about to stop working.
+It can block stopping to force continuation if tasks are incomplete.
+
+Response format: { "decision": "block", "reason": "..." } or no output to allow stop
+"""
+import json
+import subprocess
+import sys
+from pathlib import Path
+
+import pytest
+
+# Get the hook path
+HOOK_PATH = Path(__file__).parent.parent / "stop-hook.py"
+
+
+def run_hook(input_data: dict) -> tuple[int, str]:
+    """Run the hook with the given input and return (returncode, stdout)."""
+    input_json = json.dumps(input_data)
+
+    result = subprocess.run(
+        [sys.executable, str(HOOK_PATH)],
+        input=input_json,
+        capture_output=True,
+        text=True,
+    )
+
+    return result.returncode, result.stdout
+
+
+def parse_response(stdout: str) -> dict | None:
+    """Parse the hook response, return None if empty/invalid."""
+    if not stdout.strip():
+        return None
+    try:
+        return json.loads(stdout)
+    except json.JSONDecodeError:
+        return None
+
+
+class TestHookBasics:
+    """Test basic hook behavior without AI."""
+
+    def test_invalid_json_allows_stop(self):
+        """Invalid JSON should allow stop (exit 0, no output)."""
+        result = subprocess.run(
+            [sys.executable, str(HOOK_PATH)],
+            input="not valid json",
+            capture_output=True,
+            text=True,
+        )
+        assert result.returncode == 0
+        assert result.stdout == ""
+
+    def test_stop_hook_active_allows_stop(self):
+        """When stop_hook_active is true, should allow stop to prevent infinite loop."""
+        returncode, stdout = run_hook({
+            "session_id": "test",
+            "transcript_path": "/nonexistent/path",
+            "cwd": "/tmp",
+            "stop_hook_active": True
+        })
+        assert returncode == 0
+        assert stdout == ""
+
+    def test_missing_transcript_allows_stop(self):
+        """Missing transcript should allow stop."""
+        returncode, stdout = run_hook({
+            "session_id": "test",
+            "cwd": "/tmp",
+            "stop_hook_active": False
+        })
+        assert returncode == 0
+        assert stdout == ""
+
+    def test_nonexistent_transcript_allows_stop(self):
+        """Nonexistent transcript path should allow stop."""
+        returncode, stdout = run_hook({
+            "session_id": "test",
+            "transcript_path": "/nonexistent/path/to/transcript.jsonl",
+            "cwd": "/tmp",
+            "stop_hook_active": False
+        })
+        assert returncode == 0
+        assert stdout == ""
+
+
+class TestNoCLI:
+    """Test behavior when claude CLI is not available."""
+
+    def test_analyze_returns_none_when_no_cli(self, tmp_path, monkeypatch):
+        """analyze_with_claude should return None when claude CLI is not found."""
+        module = load_hook_module()
+
+        # Mock get_claude_path to return None (simulating no CLI available)
+        monkeypatch.setattr(module, "get_claude_path", lambda: None)
+
+        assert module.analyze_with_claude("test transcript", str(tmp_path)) is None
+
+    def test_no_claude_cli_allows_stop(self, tmp_path):
+        """Without claude CLI, should allow stop (no AI analysis possible)."""
+        # Create a minimal transcript
+        transcript = tmp_path / "transcript.jsonl"
+        transcript.write_text('{"type": "user", "message": {"content": "hello"}}\n')
+
+        returncode, _ = run_hook({
+            "session_id": "test",
+            "transcript_path": str(transcript),
+            "cwd": str(tmp_path),
+            "stop_hook_active": False
+        })
+        assert returncode == 0
+        # The hook should complete without error (output depends on CLI availability)
+
+
+class TestAnalyzeWithClaude:
+    """Test analyze_with_claude function with mocked subprocess."""
+
+    def test_parses_valid_json_response(self, tmp_path, monkeypatch):
+        """Should correctly parse a valid JSON response from CLI."""
+        module = load_hook_module()
+        monkeypatch.setattr(module, "get_claude_path", lambda: "/usr/bin/claude")
+
+        mock_result = subprocess.CompletedProcess(
+            args=[],
+            returncode=0,
+            stdout='{"continue": true, "reason": "Tasks incomplete"}',
+            stderr=""
+        )
+        monkeypatch.setattr(subprocess, "run", lambda *_args, **_kwargs: mock_result)
+
+        result = module.analyze_with_claude("test transcript", str(tmp_path))
+        assert result is not None
+        assert result["continue"] is True
+        assert result["reason"] == "Tasks incomplete"
+
+    def test_parses_json_in_markdown_code_fence(self, tmp_path, monkeypatch):
+        """Should extract JSON wrapped in markdown code fences."""
+        module = load_hook_module()
+        monkeypatch.setattr(module, "get_claude_path", lambda: "/usr/bin/claude")
+
+        mock_result = subprocess.CompletedProcess(
+            args=[],
+            returncode=0,
+            stdout='```json\n{"continue": false}\n```',
+            stderr=""
+        )
+        monkeypatch.setattr(subprocess, "run", lambda *_args, **_kwargs: mock_result)
+
+        result = module.analyze_with_claude("test transcript", str(tmp_path))
+        assert result is not None
+        assert result["continue"] is False
+
+    def test_returns_none_for_no_json(self, tmp_path, monkeypatch):
+        """Should return None when response contains no JSON."""
+        module = load_hook_module()
+        monkeypatch.setattr(module, "get_claude_path", lambda: "/usr/bin/claude")
+
+        mock_result = subprocess.CompletedProcess(
+            args=[],
+            returncode=0,
+            stdout="I think the tasks are complete.",
+            stderr=""
+        )
+        monkeypatch.setattr(subprocess, "run", lambda *_args, **_kwargs: mock_result)
+
+        result = module.analyze_with_claude("test transcript", str(tmp_path))
+        assert result is None
+
+    def test_returns_none_for_malformed_json(self, tmp_path, monkeypatch):
+        """Should return None when JSON is malformed."""
+        module = load_hook_module()
+        monkeypatch.setattr(module, "get_claude_path", lambda: "/usr/bin/claude")
+
+        mock_result = subprocess.CompletedProcess(
+            args=[],
+            returncode=0,
+            stdout='{"continue": true, reason: incomplete}',  # Missing quotes
+            stderr=""
+        )
+        monkeypatch.setattr(subprocess, "run", lambda *_args, **_kwargs: mock_result)
+
+        result = module.analyze_with_claude("test transcript", str(tmp_path))
+        assert result is None
+
+    def test_returns_none_on_nonzero_returncode(self, tmp_path, monkeypatch):
+        """Should return None when subprocess returns non-zero exit code."""
+        module = load_hook_module()
+        monkeypatch.setattr(module, "get_claude_path", lambda: "/usr/bin/claude")
+
+        mock_result = subprocess.CompletedProcess(
+            args=[],
+            returncode=1,
+            stdout='{"continue": true}',
+            stderr="Error: API rate limit"
+        )
+        monkeypatch.setattr(subprocess, "run", lambda *_args, **_kwargs: mock_result)
+
+        result = module.analyze_with_claude("test transcript", str(tmp_path))
+        assert result is None
+
+    def test_returns_none_on_subprocess_error(self, tmp_path, monkeypatch):
+        """Should return None when subprocess raises an error."""
+        module = load_hook_module()
+        monkeypatch.setattr(module, "get_claude_path", lambda: "/usr/bin/claude")
+
+        def raise_error(*_args, **_kwargs):
+            raise subprocess.TimeoutExpired(cmd="claude", timeout=25)
+
+        monkeypatch.setattr(subprocess, "run", raise_error)
+
+        result = module.analyze_with_claude("test transcript", str(tmp_path))
+        assert result is None
+
+    def test_returns_none_for_empty_transcript(self, tmp_path, monkeypatch):
+        """Should return None when transcript is empty."""
+        module = load_hook_module()
+        monkeypatch.setattr(module, "get_claude_path", lambda: "/usr/bin/claude")
+
+        result = module.analyze_with_claude("", str(tmp_path))
+        assert result is None
+
+
+class TestResponseFormat:
+    """Test that responses follow Stop hook format."""
+
+    def test_response_format_documented(self):
+        """Response format should be documented in the hook."""
+        hook_content = HOOK_PATH.read_text()
+        assert '"decision"' in hook_content
+        assert '"block"' in hook_content
+        assert '"reason"' in hook_content
+
+    def test_hook_checks_stop_hook_active(self):
+        """Hook should check stop_hook_active to prevent infinite loops."""
+        hook_content = HOOK_PATH.read_text()
+        assert "stop_hook_active" in hook_content
+
+
+def load_hook_module():
+    """Load the stop hook module for testing."""
+    import importlib.util
+    spec = importlib.util.spec_from_file_location("stop_hook", HOOK_PATH)
+    assert spec is not None
+    module = importlib.util.module_from_spec(spec)
+    assert spec.loader is not None
+    spec.loader.exec_module(module)
+    return module
+
+
+class TestTaskStateExtraction:
+    """Test task state extraction from transcripts."""
+
+    def test_extracts_incomplete_tasks(self):
+        """Should correctly identify remaining tasks from incomplete_tasks fixture.
+
+        The fixture has:
+        - 5 tasks created
+        - Tasks 1, 2, 3 completed
+        - Task 4 in_progress (not completed)
+        - Task 5 pending (never started)
+        => 2 tasks remaining
+        """
+        module = load_hook_module()
+        fixture_path = Path(__file__).parent / "fixtures" / "incomplete_tasks.jsonl"
+        result = module.extract_task_state(str(fixture_path))
+
+        assert result["total"] == 5
+        assert len(result["remaining"]) == 2
+
+        # Check the specific remaining tasks
+        remaining_ids = [task_id for task_id, _, _ in result["remaining"]]
+        assert "4" in remaining_ids
+        assert "5" in remaining_ids
+
+        # Task 4 should be in_progress
+        task4 = [(tid, subj, status) for tid, subj, status in result["remaining"] if tid == "4"][0]
+        assert task4[2] == "in_progress"
+        assert "Verify" in task4[1] or "commit" in task4[1] or "push" in task4[1]
+
+        # Task 5 should be pending
+        task5 = [(tid, subj, status) for tid, subj, status in result["remaining"] if tid == "5"][0]
+        assert task5[2] == "pending"
+        assert "summary" in task5[1].lower()
+
+    def test_extracts_completed_tasks(self):
+        """Should correctly identify all tasks completed from completed_tasks fixture."""
+        module = load_hook_module()
+        fixture_path = Path(__file__).parent / "fixtures" / "completed_tasks.jsonl"
+        result = module.extract_task_state(str(fixture_path))
+
+        assert result["total"] == 3
+        assert len(result["remaining"]) == 0
+
+    def test_handles_nonexistent_file(self):
+        """Should handle nonexistent transcript gracefully."""
+        module = load_hook_module()
+        result = module.extract_task_state("/nonexistent/path.jsonl")
+
+        assert result["total"] == 0
+        assert len(result["remaining"]) == 0
+
+    def test_handles_no_tasks(self, tmp_path):
+        """Should handle transcript with no tasks."""
+        module = load_hook_module()
+        transcript = tmp_path / "no_tasks.jsonl"
+        transcript.write_text('{"type": "user", "message": {"content": "hello"}}\n')
+
+        result = module.extract_task_state(str(transcript))
+
+        assert result["total"] == 0
+        assert len(result["remaining"]) == 0
+
+
+class TestRemainingTasksBlocking:
+    """Test that hook blocks when tasks are remaining."""
+
+    def test_blocks_with_remaining_tasks(self):
+        """Hook should block and list remaining tasks when incomplete."""
+        fixture_path = Path(__file__).parent / "fixtures" / "incomplete_tasks.jsonl"
+
+        returncode, stdout = run_hook({
+            "session_id": "test",
+            "transcript_path": str(fixture_path),
+            "cwd": "/tmp",
+            "stop_hook_active": False
+        })
+
+        assert returncode == 0
+        response = parse_response(stdout)
+        assert response is not None
+        assert response["decision"] == "block"
+
+        # Reason should mention the count
+        assert "2 of 5 tasks remaining" in response["reason"]
+
+        # Reason should list the specific tasks
+        assert "Task 4" in response["reason"]
+        assert "Task 5" in response["reason"]
+        assert "in_progress" in response["reason"]
+        assert "pending" in response["reason"]
+
+    def test_allows_stop_with_all_tasks_completed(self, monkeypatch):
+        """Hook should allow stop (via AI check) when all tasks are completed."""
+        fixture_path = Path(__file__).parent / "fixtures" / "completed_tasks.jsonl"
+
+        # Remove claude from PATH so AI check is skipped
+        monkeypatch.setenv("PATH", "/nonexistent")
+
+        returncode, stdout = run_hook({
+            "session_id": "test",
+            "transcript_path": str(fixture_path),
+            "cwd": "/tmp",
+            "stop_hook_active": False
+        })
+
+        assert returncode == 0
+        # No remaining tasks, and no AI available, so should allow stop
+        assert parse_response(stdout) is None
+
+
+class TestTranscriptReading:
+    """Test transcript reading functionality."""
+
+    def test_reads_incomplete_tasks_fixture(self):
+        """Should correctly parse the incomplete_tasks fixture.
+
+        This fixture represents a real scenario where:
+        - 5 tasks were created
+        - Tasks 1, 2, 3 were completed
+        - Task 4 is in_progress (never completed)
+        - Task 5 is pending (never started)
+        => 2 tasks remaining
+        """
+        module = load_hook_module()
+        fixture_path = Path(__file__).parent / "fixtures" / "incomplete_tasks.jsonl"
+        result = module.read_transcript(str(fixture_path))
+
+        # Should contain the user request
+        assert "USER:" in result
+        assert "Fix all the PR review comments" in result
+
+        # Should show task creation
+        assert "TaskCreate" in result
+
+        # Should show the incomplete state - last message is about lint checks
+        assert "lint checks" in result
+
+        # Should NOT have a completion summary
+        assert "All tasks are complete" not in result
+        assert "summary" not in result.lower() or "Providing summary" not in result
+
+    def test_reads_completed_tasks_fixture(self):
+        """Should correctly parse the completed_tasks fixture.
+
+        This fixture represents a scenario where all tasks completed successfully.
+        """
+        module = load_hook_module()
+        fixture_path = Path(__file__).parent / "fixtures" / "completed_tasks.jsonl"
+        result = module.read_transcript(str(fixture_path))
+
+        # Should contain the user request
+        assert "USER:" in result
+        assert "Fix all the PR review comments" in result
+
+        # Should have a completion summary at the end
+        assert "completed all the PR review comments" in result
+        assert "All tasks are complete" in result
+
+    def test_reads_user_messages(self, tmp_path):
+        """Should be able to read user messages from transcript."""
+        module = load_hook_module()
+        transcript = tmp_path / "transcript.jsonl"
+        transcript.write_text(
+            '{"type": "user", "message": {"content": "test message"}}\n'
+        )
+
+        result = module.read_transcript(str(transcript))
+        assert "USER:" in result
+        assert "test message" in result
+
+    def test_reads_assistant_messages(self, tmp_path):
+        """Should be able to read assistant messages from transcript."""
+        module = load_hook_module()
+        transcript = tmp_path / "transcript.jsonl"
+        transcript.write_text(
+            '{"type": "assistant", "message": {"content": [{"type": "text", "text": "response text"}]}}\n'
+        )
+
+        result = module.read_transcript(str(transcript))
+        assert "ASSISTANT:" in result
+        assert "response text" in result
+
+    def test_truncates_large_transcripts(self, tmp_path):
+        """Should truncate large transcripts from the middle, keeping beginning and end."""
+        module = load_hook_module()
+        transcript = tmp_path / "transcript.jsonl"
+        # Create a large transcript with many messages
+        lines = []
+        for i in range(100):
+            lines.append(f'{{"type": "user", "message": {{"content": "message {i} with some extra content to make it longer"}}}}')
+        transcript.write_text("\n".join(lines))
+
+        # With a small max_chars, should truncate from the middle
+        result = module.read_transcript(str(transcript), max_chars=500)
+        assert len(result) <= 600  # Allow buffer for truncation marker
+        assert "...(middle truncated)..." in result
+        # Should keep beginning messages (lower numbers)
+        assert "message 0" in result or "message 1" in result
+        # Should keep end messages (higher numbers)
+        assert "message 99" in result or "message 98" in result
+
+
+if __name__ == "__main__":
+    pytest.main([__file__, "-v"])
--- a/.claude/settings.json
+++ b/.claude/settings.json
@@ -154,9 +154,9 @@
      {
        "hooks": [
          {
-            "type": "prompt",
-            "prompt": "You are evaluating whether Claude should stop working. Your default bias should be to CONTINUE WORKING (ok: false). Only allow stopping when absolutely certain all work is complete.\n\nContext: $ARGUMENTS\n\nAnalyze the conversation and determine:\n\n1. TASK COMPLETION CHECK\n   - Has EVERY task the user requested been fully completed?\n   - Not just acknowledged, but actually done with verifiable output?\n   - Are there any implied subtasks that haven't been addressed?\n\n2. ERROR CHECK\n   - Did any errors occur that weren't resolved?\n   - Are there failed operations that need retry?\n   - Did Claude say it would do something but didn't actually do it?\n\n3. FOLLOW-UP CHECK\n   - Is there obvious next-step work that should be done?\n   - Did the user's request imply multiple steps where only some were completed?\n   - Would a thorough assistant do more here?\n\n4. QUALITY CHECK\n   - Was the work done to completion, not just partially?\n   - Are there placeholders, TODOs, or incomplete sections?\n   - Did Claude stop prematurely mid-task?\n\nDECISION RULES:\n- If ANY task is incomplete → continue (ok: false)\n- If ANY error is unresolved → continue (ok: false)\n- If there's reasonable follow-up work → continue (ok: false)\n- If Claude said \"I'll do X\" but didn't → continue (ok: false)\n- If work quality is partial/rushed → continue (ok: false)\n- ONLY return ok: true if ALL tasks are genuinely, fully complete AND no errors exist AND no reasonable follow-up remains\n\nRespond with JSON:\n- {\"ok\": false, \"reason\": \"specific explanation of what still needs to be done\"} — DEFAULT, use this unless everything is truly complete\n- {\"ok\": true} — ONLY when absolutely certain all work is finished and nothing remains",
-            "timeout": 30
+            "type": "command",
+            "command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/stop-hook.py",
+            "timeout": 30000
          }
        ]
      }