test: deflake setup and context E2E flows (#3319)

## Summary - Deflake setup-flow E2E by clearing the fake OPENAI_API_KEY when a test explicitly wants the setup screen. - Update custom provider key setup to wait for the saved masked key UI instead of the raw secret text. - Wait for Smart Context settings persistence and give cloud sandbox undo enough time to finish snapshot reconciliation. ## Root cause Run: https://github.com/dyad-sh/dyad/actions/runs/25189808932 The red setup-flow failure was caused by two test assumptions drifting from app behavior. The setup-screen fixture could inherit OPENAI_API_KEY from an earlier test in the same worker, causing the setup banner to disappear. The provider helper also waited for the full raw test API key, while the app now saves and renders the key masked as test...2345. Two retry-only flakes were separate E2E timing contracts: the Smart Context test sent the next dump prompt before the off setting was persisted, and cloud sandbox undo could still be syncing/restarting when the digest poll hit the LONG timeout. ## Why this fix is correct - Tests that opt into showSetupScreen now get an environment without the fake OpenAI key, matching the setup-screen contract. - The provider helper waits for the persisted saved-key state the UI actually exposes, without asserting raw secret text. - The Smart Context test waits on the settings file before sending the prompt that depends on it. - Cloud sandbox undo already uses EXTRA_LONG for preview startup; applying the same budget to undo reconciliation matches the slower cloud path seen in CI. ## Test plan - npm run fmt && npm run lint:fix && npm run ts - npm test - PYTHON=/usr/bin/python3 npm run build - PLAYWRIGHT_HTML_OPEN=never npm run e2e -- e2e-tests/setup_flow.spec.ts - PLAYWRIGHT_HTML_OPEN=never npm run e2e -- e2e-tests/context_manage.spec.ts e2e-tests/cloud_sandbox.spec.ts 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Will Chen <7344640+wwwillchen@users.noreply.github.com>

test: deflake setup and context E2E flows (#3319)
30300641 · keppo-bot[bot] · GitHub · 971aca34 · 30300641 · 30300641
--- a/e2e-tests/cloud_sandbox.spec.ts
+++ b/e2e-tests/cloud_sandbox.spec.ts
@@ -64,7 +64,7 @@ testSkipIfWindows(
            .textContent({ timeout: Timeout.LONG });
          return digestText?.split(": ").at(-1)?.trim();
        },
-        { timeout: Timeout.LONG },
+        { timeout: Timeout.EXTRA_LONG },
      )
      .not.toBe(updatedDigest);
  },

--- a/e2e-tests/context_manage.spec.ts
+++ b/e2e-tests/context_manage.spec.ts
@@ -286,6 +286,9 @@ test("manage context - smart context", async ({ po }) => {
  // the auto-includes.
  const proModesDialog = await po.openProModesDialog();
  await proModesDialog.setSmartContextMode("off");
+  await expect
+    .poll(() => po.settings.recordSettings().enableProSmartFilesContextMode)
+    .toBe(false);
  await proModesDialog.close();

  await po.sendPrompt("[dump]");

--- a/e2e-tests/helpers/fixtures.ts
+++ b/e2e-tests/helpers/fixtures.ts
@@ -95,6 +95,8 @@ export const test = base.extend<{
      if (!electronConfig.showSetupScreen) {
        // This is just a hack to avoid the AI setup screen.
        process.env.OPENAI_API_KEY = "sk-test";
+      } else {
+        delete process.env.OPENAI_API_KEY;
      }
      const baseTmpDir = os.tmpdir();
      const userDataDir = path.join(baseTmpDir, `dyad-e2e-tests-${Date.now()}`);

--- a/e2e-tests/helpers/page-objects/components/Settings.ts
+++ b/e2e-tests/helpers/page-objects/components/Settings.ts
@@ -205,7 +205,7 @@ export class Settings {
      .fill("test-api-key-12345");
    await this.page.getByRole("button", { name: "Save Key" }).click();
    // Wait for the key to be saved
-    await expect(this.page.getByText("test-api-key-12345")).toBeVisible();
+    await expect(this.page.getByText(/test.+2345/)).toBeVisible();
  }

  async setUpDyadProvider() {

--- a/rules/e2e-testing.md
+++ b/rules/e2e-testing.md
@@ -124,6 +124,8 @@ If `npm run build` fails while rebuilding native modules with `ImportError` from
 - **Navigation to tabs**: Use `await expect(link).toBeVisible({ timeout: Timeout.EXTRA_LONG })` before clicking tab links (especially in `goToAppsTab()`). Electron sidebar links can take time to render during app initialization.
 - **Confirming flakiness**: Use `PLAYWRIGHT_RETRIES=0 PLAYWRIGHT_HTML_OPEN=never npm run e2e -- e2e-tests/<spec> --repeat-each=10` to reproduce flaky tests. `PLAYWRIGHT_RETRIES=0` is critical — CI defaults to 2 retries, hiding flakiness.
 - **`expect(...).toPass()` wrappers**: Give inner Playwright actions/assertions short explicit timeouts. Default 30s click/expect timeouts can consume the whole `toPass()` budget, so the retry wrapper never actually retries.
+- **Setup-screen tests and provider env vars**: E2E worker processes reuse `process.env`, so tests that set fake provider keys (for example `OPENAI_API_KEY`) can affect later tests in the same worker. When a fixture intentionally shows the setup screen, explicitly clear any env key that would make the provider appear configured.
+- **Settings-dependent prompts**: After toggling a setting that affects the next chat request (for example Smart Context mode), wait for the persisted settings state with `expect.poll(() => po.settings.recordSettings().someKey)` before sending the prompt. UI clicks can return before the main-process settings write is visible to the request path.
 - **Monaco file-switch assertions**: For code-editor tests, don't stop at waiting for the editor textbox to appear. Wait until Monaco's active model URI matches the file you clicked; otherwise the test can type into a still-switching editor model and miss real file-switch races.
 - **Monaco race repros**: If a file-editor bug only appears during quick tab/file changes, alternate between the affected files several times in one test before declaring it non-reproducible. A single switch often misses save-vs-switch timing bugs that show up immediately under `--repeat-each`.
 - **GitHub sync success assertions**: Scope "Successfully pushed to GitHub!" assertions to `getByTestId("github-connected-repo")`; the same text can also appear in a toast, causing Playwright strict-mode failures.