Extend deflake-e2e-recent-commits to scan PRs by wwwillchen/wwwillchen-bot (#2647)

## Summary - Extends the `deflake-e2e-recent-commits` command to also gather flaky tests from open PRs authored by `wwwillchen` and `wwwillchen-bot` - Parses Playwright Test Results comments on these PRs to extract flaky test names - Provides more comprehensive coverage for deflaking efforts by combining main branch CI runs with PR-reported flakes ## Test plan - Run `/dyad:deflake-e2e-recent-commits` and verify it now scans both main branch CI runs AND open PRs by the specified authors - Verify flaky tests from PR comments are correctly parsed and added to the deflake list 🤖 Generated with [Claude Code](https://claude.com/claude-code)  --- <a href="https://app.devin.ai/review/dyad-sh/dyad/pull/2647" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a>   --- ## Summary by cubic Extends deflake-e2e-recent-commits to also scan open PRs by wwwillchen and wwwillchen-bot for Playwright-reported flaky tests. This broadens coverage beyond main-branch CI and improves deflaking accuracy. - **New Features** - Lists recent open PRs by wwwillchen and wwwillchen-bot. - Parses the latest “Playwright Test Results” bot comment to extract flaky test titles. - Merges PR-derived flakes with main-branch results, de-duplicates, and notes PR sources in the summary. - Updates no-results message to include PRs (“recent commits or PRs”). <sup>Written for commit 32766d69227eb2454f45899e5784021161765019. Summary will update on new commits.</sup>   --- > [!NOTE] > **Low Risk** > Documentation-only change that broadens the data sources described for collecting flaky tests; no runtime or production code is modified. > > **Overview** > Extends the `.claude` command `deflake-e2e-recent-commits` to **collect flaky Playwright tests from two sources**: recent `main` CI `html-report` artifacts *and* the latest “Playwright Test Results” bot comment on recent open PRs authored by `wwwillchen`/`wwwillchen-bot`. > > Updates the instructions to include the PR scanning/parsing workflow, to attribute flakes by source in the final report, and to change the no-flakes message to cover “recent commits or PRs.” > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 32766d69227eb2454f45899e5784021161765019. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup>  Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

Extend deflake-e2e-recent-commits to scan PRs by wwwillchen/wwwillchen-bot (#2647)
c3c6d3e9 · Will Chen · GitHub · 4936b851 · c3c6d3e9
--- a/.claude/commands/dyad/deflake-e2e-recent-commits.md
+++ b/.claude/commands/dyad/deflake-e2e-recent-commits.md
 # Deflake E2E Tests from Recent Commits

-Automatically gather flaky E2E tests from recent CI runs on the main branch and deflake them.
+Automatically gather flaky E2E tests from recent CI runs on the main branch and from recent PRs by wwwillchen/wwwillchen-bot, then deflake them.

 ## Arguments

@@ -44,16 +44,29 @@ Automatically gather flaky E2E tests from recent CI runs on the main branch and

   **Note:** Some runs may not have an html-report artifact (e.g., if they were cancelled early, the merge-reports job didn't complete, or artifacts have expired past the 3-day retention period). Skip these runs and continue to the next one.

-2. **Parse flaky tests from results:**
+2. **Gather flaky tests from recent PRs by wwwillchen and wwwillchen-bot:**

-   From each `results.json`, extract flaky test names. A test is flaky if:
-   - It has multiple results (retries occurred)
-   - The final result status is `"passed"`
-   - At least one prior result has status `"failed"`, `"timedOut"`, or `"interrupted"`
+   In addition to main branch CI runs, scan recent open PRs authored by `wwwillchen` or `wwwillchen-bot` for flaky tests reported in Playwright report comments.

-   The test title format is: `<spec_file.spec.ts> > <Suite Name> > <Test Name>`
+   a. List recent open PRs by these authors:

-   Parse each title to extract the spec file (everything before the first `>`).
+   ```
+   gh pr list --author wwwillchen --state open --limit 10 --json number,title
+   gh pr list --author wwwillchen-bot --state open --limit 10 --json number,title
+   ```
+
+   b. For each PR, find the most recent Playwright Test Results comment (posted by a bot, containing "🎭 Playwright Test Results"):
+
+   ```
+   gh api "repos/{owner}/{repo}/issues/<pr_number>/comments" --jq '[.[] | select(.user.type == "Bot" and (.body | contains("Playwright Test Results")))] | last'
+   ```
+
+   c. Parse the comment body to extract flaky tests. The comment format includes a "⚠️ Flaky Tests" section with test names in backticks:
+   - Look for lines matching the pattern: ``- `<test_title>` (passed after N retries)``
+   - Extract the test title from within the backticks
+   - The test title format is: `<spec_file.spec.ts> > <Suite Name> > <Test Name>`
+
+   d. Add these flaky tests to the overall collection, noting they came from PR #N for the summary

 3. **Deduplicate and rank by frequency:**

@@ -70,7 +83,7 @@ Automatically gather flaky E2E tests from recent CI runs on the main branch and

 4. **Skip if no flaky tests found:**

-   If no flaky tests are found, report "No flaky tests found in recent commits" and stop.
+   If no flaky tests are found, report "No flaky tests found in recent commits or PRs" and stop.

 5. **Install dependencies and build:**

@@ -128,7 +141,8 @@ Automatically gather flaky E2E tests from recent CI runs on the main branch and
 7. **Summarize results:**

   Report:
-   - Total flaky tests found across commits
+   - Total flaky tests found across main branch commits and PRs
+   - Sources of flaky tests (main branch CI runs vs. PR comments from wwwillchen/wwwillchen-bot)
   - Which tests were successfully deflaked
   - What fixes were applied to each
   - Which tests could not be fixed (and why)