• Will Chen's avatar
    Summarize chat trigger (#1890) · 6235f7bb
    Will Chen 提交于
    <!-- CURSOR_SUMMARY -->
    > [!NOTE]
    > Adds a context-limit banner with one-click “summarize into new chat,”
    refactors token counting with react-query, and persists per-message max
    token usage.
    > 
    > - **Chat UX**
    > - **Context limit banner** (`ContextLimitBanner.tsx`,
    `MessagesList.tsx`): shows when within 40k tokens of `contextWindow`,
    with tooltip and action to summarize into a new chat.
    > - **Summarize flow**: extracted to `useSummarizeInNewChat` and used in
    chat input and banner; new summarize system prompt
    (`summarize_chat_system_prompt.ts`).
    > - **Token usage & counting**
    > - **Persist max tokens used per assistant message**: DB migration
    (`messages.max_tokens_used`), schema updates, and saving usage during
    streaming (`chat_stream_handlers.ts`).
    > - **Token counting refactor** (`useCountTokens.ts`): react-query with
    debounce; returns `estimatedTotalTokens` and `actualMaxTokens`;
    invalidated on model change and stream end; `TokenBar` updated.
    > - **Surfacing usage**: tooltip on latest assistant message shows total
    tokens (`ChatMessage.tsx`).
    > - **Model/config tweaks**
    > - Set `auto` model `contextWindow` to `200_000`
    (`language_model_constants.ts`).
    >   - Improve chat auto-scroll dependency (`ChatPanel.tsx`).
    >   - Fix app path validation regex (`app_handlers.ts`).
    > - **Testing & dev server**
    > - E2E tests for banner and summarize
    (`e2e-tests/context_limit_banner.spec.ts` + fixtures/snapshot).
    > - Fake LLM server streams usage to simulate high token scenarios
    (`testing/fake-llm-server/*`).
    > 
    > <sup>Written by [Cursor
    Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
    2ae16a14d50699cc772407426419192c2fdf2ec3. This will update automatically
    on new commits. Configure
    [here](https://cursor.com/dashboard?tab=bugbot).</sup>
    <!-- /CURSOR_SUMMARY -->
    
    
    
    
    
    
    
    
    
    
    
    
    
    <!-- This is an auto-generated description by cubic. -->
    ---
    ## Summary by cubic
    Adds a “Summarize into new chat” trigger and a context limit banner to
    help keep conversations focused and avoid hitting model limits. Also
    tracks and surfaces actual token usage per assistant message, with a
    token counting refactor for reliability.
    
    - **New Features**
    - Summarize into new chat from the input or banner; improved system
    prompt with clear output format.
    - Context limit banner shows when within 40k tokens of the model’s
    context window and offers a one-click summarize action.
      - Tooltip on the latest assistant message shows total tokens used.
    
    - **Refactors**
    - Token counting now uses react-query and returns estimatedTotalTokens
    and actualMaxTokens; counts are invalidated on model change and when
    streaming settles.
    - Persist per-message max_tokens_used in the messages table; backend
    aggregates model usage during streaming and saves it.
    - Adjusted default “Auto” model contextWindow to 200k for more realistic
    limits.
    - Improved chat scrolling while streaming; fixed app path validation
    regex.
    
    <sup>Written for commit 2ae16a14d50699cc772407426419192c2fdf2ec3.
    Summary will update automatically on new commits.</sup>
    
    <!-- End of auto-generated description by cubic. -->
    6235f7bb