Unverified 提交 50a72da9 authored 作者: Will Chen's avatar Will Chen 提交者: GitHub

Replace deflake-e2e-recent-prs with deflake-e2e-recent-commits (#2607)

## Summary - Replaced `deflake-e2e-recent-prs` command with `deflake-e2e-recent-commits` that scans CI workflow runs on main instead of PR comments - Downloads the `html-report` artifact (`results.json`) from each CI run to extract flaky test data, which works for push events that don't post PR comments - Updated `claude-deflake-e2e.yml` workflow to use the new command ## Test plan - [ ] Trigger the `Claude Deflake E2E` workflow manually and verify it correctly scans recent main branch CI runs - [ ] Verify flaky tests are correctly parsed from `results.json` artifacts 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/dyad-sh/dyad/pull/2607" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end --> <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Low Risk** > Low risk doc/workflow tweak that changes how the deflake automation sources flaky tests (GitHub Actions runs/artifacts) but does not touch production code or test logic. > > **Overview** > Updates the deflake automation to **scan recent `main` CI workflow runs** (push events) instead of PR Playwright summary comments, by downloading each run’s `html-report` artifact and parsing `results.json` to detect retry-passed tests with prior failures/timeouts. > > Adjusts the scheduled `Claude Deflake E2E` workflow to accept `commit_count`, grant `actions: read`, and invoke `/dyad:deflake-e2e-recent-commits` rather than the old PR-based command. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 0da1e67da43e509577d5b8dc1f155779742d1529. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Switched the deflake command to scan recent main CI runs and parse html-report results.json to find flaky E2E tests. Updated the Claude Deflake E2E workflow to use commit_count and added actions: read permission. - **Refactors** - List completed main push runs via gh api, fetch 3x commit_count, and filter to success/failure. - Download non-expired html-report artifacts; parse results.json with a Node.js script to detect flakes (final passed after fail/timedOut/interrupted). - Build "<spec_path.spec.ts> > Suite > Test" titles; group and rank by frequency; clean up artifacts. - Skip runs without artifacts; note 3-day artifact retention. - **Bug Fixes** - Updated command doc to reference the TodoWrite tool. <sup>Written for commit 0da1e67da43e509577d5b8dc1f155779742d1529. Summary will update on new commits.</sup> <!-- End of auto-generated description by cubic. --> --------- Co-authored-by: 's avatarClaude Opus 4.6 <noreply@anthropic.com>
上级 9b9c059b
# Deflake E2E Tests from Recent PRs
# Deflake E2E Tests from Recent Commits
Automatically gather flaky E2E tests from recent PR Playwright summary comments and deflake them.
Automatically gather flaky E2E tests from recent CI runs on the main branch and deflake them.
## Arguments
- `$ARGUMENTS`: (Optional) Number of recent PRs to scan (default: 20)
- `$ARGUMENTS`: (Optional) Number of recent commits to scan (default: 10)
## Task Tracking
**You MUST use the TaskCreate and TaskUpdate tools to track your progress.** At the start, create tasks for each major step below. Mark each task as `in_progress` when you start it and `completed` when you finish.
**You MUST use the TodoWrite tool to track your progress.** At the start, create todos for each major step below. Mark each todo as `in_progress` when you start it and `completed` when you finish.
## Instructions
1. **Gather flaky tests from recent PRs:**
1. **Gather flaky tests from recent CI runs on main:**
Use `gh` to find recent PRs that have Playwright summary comments (search for PRs with `github-actions[bot]` Playwright comments):
List recent CI workflow runs triggered by pushes to main:
```
gh pr list --search 'commenter:github-actions[bot] "Playwright Test Results" in:comments' --state all --limit <PR_COUNT> --json number
gh api "repos/{owner}/{repo}/actions/workflows/ci.yml/runs?branch=main&event=push&per_page=<COMMIT_COUNT * 3>&status=completed" --jq '.workflow_runs[] | select(.conclusion == "success" or .conclusion == "failure") | {id, head_sha, conclusion}'
```
Use `$ARGUMENTS` as the PR count, defaulting to 20 if not provided.
**Note:** We fetch 3x the desired commit count because many runs may be `cancelled` (due to concurrency groups). Filter to only `success` and `failure` conclusions to get runs that actually completed and have artifacts.
For each PR, fetch comments from `github-actions[bot]` that contain the Playwright test results.
Use `$ARGUMENTS` as the commit count, defaulting to 10 if not provided.
**Note:** `{owner}` and `{repo}` are auto-replaced by `gh` CLI. Replace `<pr_number>` with the actual PR number.
For each completed run, download the `html-report` artifact which contains `results.json` with the full Playwright test results:
a. Find the html-report artifact for the run:
```
gh api repos/{owner}/{repo}/issues/<pr_number>/comments --paginate --jq '.[] | select(.user.login == "github-actions[bot]") | select(.body | contains("Playwright Test Results")) | .body'
gh api "repos/{owner}/{repo}/actions/runs/<run_id>/artifacts?per_page=30" --jq '.artifacts[] | select(.name | startswith("html-report")) | select(.expired == false) | .name'
```
2. **Parse flaky tests from comments:**
Extract flaky test names from the "Flaky Tests" section of each comment. Flaky tests appear in this format:
b. Download it using `gh run download`:
```
- `<spec_file.spec.ts> > <test name>` (passed after N retry/retries)
gh run download <run_id> --name <artifact_name> --dir /tmp/playwright-report-<run_id>
```
Parse each line with this pattern to extract the spec file and test name. The spec file is everything before the first `>`.
c. Parse `/tmp/playwright-report-<run_id>/results.json` to extract flaky tests. Write a Node.js script inside the `.claude/` directory to do this parsing. Flaky tests are those where the final result status is `"passed"` but a prior result has status `"failed"`, `"timedOut"`, or `"interrupted"`. The test title is built by joining parent suite titles (including the spec file path) and the test title, separated by `>`.
d. Clean up the downloaded artifact directory after parsing.
**Note:** Some runs may not have an html-report artifact (e.g., if they were cancelled early, the merge-reports job didn't complete, or artifacts have expired past the 3-day retention period). Skip these runs and continue to the next one.
2. **Parse flaky tests from results:**
From each `results.json`, extract flaky test names. A test is flaky if:
- It has multiple results (retries occurred)
- The final result status is `"passed"`
- At least one prior result has status `"failed"`, `"timedOut"`, or `"interrupted"`
The test title format is: `<spec_file.spec.ts> > <Suite Name> > <Test Name>`
Parse each title to extract the spec file (everything before the first `>`).
3. **Deduplicate and rank by frequency:**
Count how many times each test appears as flaky across all PRs. Sort by frequency (most flaky first). Group tests by their spec file.
Count how many times each test appears as flaky across all CI runs. Sort by frequency (most flaky first). Group tests by their spec file.
Print a summary table:
......@@ -55,7 +70,7 @@ Automatically gather flaky E2E tests from recent PR Playwright summary comments
4. **Skip if no flaky tests found:**
If no flaky tests are found, report "No flaky tests found in recent PRs" and stop.
If no flaky tests are found, report "No flaky tests found in recent commits" and stop.
5. **Install dependencies and build:**
......@@ -113,7 +128,7 @@ Automatically gather flaky E2E tests from recent PR Playwright summary comments
7. **Summarize results:**
Report:
- Total flaky tests found across PRs
- Total flaky tests found across commits
- Which tests were successfully deflaked
- What fixes were applied to each
- Which tests could not be fixed (and why)
......
......@@ -6,8 +6,8 @@ on:
- cron: "0 10 * * *"
workflow_dispatch:
inputs:
pr_count:
description: "Number of recent PRs to scan for flaky tests"
commit_count:
description: "Number of recent commits on main to scan for flaky tests"
required: false
default: "10"
type: string
......@@ -20,6 +20,7 @@ jobs:
- macOS
- ARM64
permissions:
actions: read
contents: write
pull-requests: write
steps:
......@@ -70,4 +71,4 @@ jobs:
claude_args: --model claude-opus-4-6
direct: true
prompt: |
/dyad:deflake-e2e-recent-prs ${{ inputs.pr_count || '10' }}
/dyad:deflake-e2e-recent-commits ${{ inputs.commit_count || '10' }}
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论