Unverified 提交 e4cbc1c8 authored 作者: Will Chen's avatar Will Chen 提交者: GitHub

End turn with set chat summary (#3241)

<!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/dyad-sh/dyad/pull/3241" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review"> </picture> </a> <!-- devin-review-badge-end -->
上级 661e1438
......@@ -172,7 +172,7 @@
"type": "function",
"function": {
"name": "set_chat_summary",
"description": "Set the title/summary for this chat message. You should always call this message at the end of the turn when you have finished calling all the other tools.",
"description": "Set the title/summary for this chat message. You should only call this tool at the end of the turn when you have finished calling all the other tools.",
"parameters": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
......
......@@ -337,7 +337,7 @@
"type": "function",
"function": {
"name": "set_chat_summary",
"description": "Set the title/summary for this chat message. You should always call this message at the end of the turn when you have finished calling all the other tools.",
"description": "Set the title/summary for this chat message. You should only call this tool at the end of the turn when you have finished calling all the other tools.",
"parameters": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
......
......@@ -14,3 +14,7 @@ Agent tool definitions live in `src/pro/main/ipc/handlers/local_agent/tools/`. E
## Stream retries
- When extending `handleLocalAgentStream` retry behavior, do not only match transport errors like `"terminated"`. Providers can emit structured stream errors such as `{ type: "error", error: { type: "server_error", ... } }`, and those transient 5xx / rate-limit failures need explicit retry classification too.
## Metadata-only stop tools
- If a metadata-only tool such as `set_chat_summary` is added to `stopWhen`, audit downstream pass gates that inspect the final step's `toolCalls`. A final metadata tool call should not suppress safety follow-up passes such as incomplete todo reminders.
......@@ -33,7 +33,7 @@ If you output one of these commands, tell the user to look for the action button
- All edits you make on the codebase will directly be built and rendered, therefore you should NEVER make partial changes like letting the user know that they should implement some components or partially implementing features.
- If a user asks for many features at once, implement as many as possible within a reasonable response. Each feature you implement must be FULLY FUNCTIONAL with complete code - no placeholders, no partial implementations, no TODO comments. If you cannot implement all requested features due to response length constraints, clearly communicate which features you've completed and which ones you haven't started yet.
- Prioritize creating small, focused files and components.
- Set a chat summary at the end using the \`set_chat_summary\` tool.
- Set a chat summary at the end of a turn using the \`set_chat_summary\` tool.
- Avoid over-engineering. Only make changes that are directly requested or clearly necessary. Keep solutions simple and focused.
- Don't add features, refactor code, or make "improvements" beyond what was asked. A bug fix doesn't need surrounding code cleaned up. A simple feature doesn't need extra configurability. Don't add docstrings, comments, or type annotations to code you didn't change. Only add comments where the logic isn't self-evident.
- Don't add error handling, fallbacks, or validation for scenarios that can't happen. Trust internal code and framework guarantees. Only validate at system boundaries (user input, external APIs). Don't use feature flags or backwards-compatibility shims when you can just change the code.
......@@ -217,7 +217,7 @@ If you output one of these commands, tell the user to look for the action button
- All edits you make on the codebase will directly be built and rendered, therefore you should NEVER make partial changes like letting the user know that they should implement some components or partially implementing features.
- If a user asks for many features at once, implement as many as possible within a reasonable response. Each feature you implement must be FULLY FUNCTIONAL with complete code - no placeholders, no partial implementations, no TODO comments. If you cannot implement all requested features due to response length constraints, clearly communicate which features you've completed and which ones you haven't started yet.
- Prioritize creating small, focused files and components.
- Set a chat summary at the end using the \`set_chat_summary\` tool.
- Set a chat summary at the end of a turn using the \`set_chat_summary\` tool.
- Avoid over-engineering. Only make changes that are directly requested or clearly necessary. Keep solutions simple and focused.
- Don't add features, refactor code, or make "improvements" beyond what was asked. A bug fix doesn't need surrounding code cleaned up. A simple feature doesn't need extra configurability. Don't add docstrings, comments, or type annotations to code you didn't change. Only add comments where the logic isn't self-evident.
- Don't add error handling, fallbacks, or validation for scenarios that can't happen. Trust internal code and framework guarantees. Only validate at system boundaries (user input, external APIs). Don't use feature flags or backwards-compatibility shims when you can just change the code.
......
......@@ -1367,6 +1367,135 @@ describe("handleLocalAgentStream", () => {
});
});
describe("Todo follow-up", () => {
it("runs a follow-up pass when the first pass ends with set_chat_summary and incomplete todos remain", async () => {
// Arrange
const { event } = createFakeEvent();
mockSettings = buildTestSettings({ enableDyadPro: true });
mockChatData = buildTestChat();
vi.mocked(buildAgentToolSet).mockImplementation((ctx) => {
return {
update_todos: {
execute: async (args: any) => {
if (args.merge) {
const todosById = new Map(
ctx.todos.map((todo) => [todo.id, todo]),
);
for (const todo of args.todos) {
const existing = todosById.get(todo.id);
todosById.set(
todo.id,
existing ? { ...existing, ...todo } : todo,
);
}
ctx.todos = Array.from(todosById.values());
} else {
ctx.todos = args.todos;
}
ctx.onUpdateTodos(ctx.todos);
return "Updated todos";
},
},
} as any;
});
const streamMessagesByPass: any[][] = [];
let passCount = 0;
mockStreamTextImpl = (options) => {
passCount += 1;
streamMessagesByPass.push(options.messages ?? []);
if (passCount === 1) {
return {
fullStream: (async function* () {
yield { type: "text-delta", text: "I started the work." };
await options.tools.update_todos.execute({
merge: false,
todos: [
{
id: "todo-1",
content: "Finish the requested work",
status: "pending",
},
],
});
})(),
response: Promise.resolve({
messages: [
{
role: "assistant",
content: [{ type: "text", text: "I started the work." }],
},
],
}),
steps: Promise.resolve([
{
toolCalls: [{ toolName: "set_chat_summary" }],
response: {
messages: [
{
role: "assistant",
content: [{ type: "text", text: "I started the work." }],
},
],
},
},
]),
};
}
return {
fullStream: (async function* () {
await options.tools.update_todos.execute({
merge: true,
todos: [{ id: "todo-1", status: "completed" }],
});
yield { type: "text-delta", text: "Finished the work." };
})(),
response: Promise.resolve({
messages: [
{
role: "assistant",
content: [{ type: "text", text: "Finished the work." }],
},
],
}),
steps: Promise.resolve([{ toolCalls: [] }]),
};
};
// Act
await handleLocalAgentStream(
event,
{ chatId: 1, prompt: "test" },
new AbortController(),
{
placeholderMessageId: 10,
systemPrompt: "You are helpful",
dyadRequestId,
},
);
// Assert
expect(passCount).toBe(2);
const secondPassMessages = streamMessagesByPass[1] ?? [];
const hasTodoReminder = secondPassMessages.some(
(message: any) =>
message.role === "user" &&
Array.isArray(message.content) &&
message.content.some(
(part: any) =>
part.type === "text" &&
typeof part.text === "string" &&
part.text.includes("incomplete todo(s)") &&
part.text.includes("Finish the requested work"),
),
);
expect(hasTodoReminder).toBe(true);
});
});
describe("Abort handling", () => {
it("should stop processing stream chunks when abort signal is triggered", async () => {
// Arrange
......
......@@ -95,6 +95,7 @@ import {
maybeCaptureRetryReplayText,
maybeAppendRetryReplayForRetry,
} from "./retry_replay_utils";
import { setChatSummaryTool } from "./tools/set_chat_summary";
const logger = log.scope("local_agent_handler");
const PLANNING_QUESTIONNAIRE_TOOL_NAME = "planning_questionnaire";
......@@ -713,6 +714,9 @@ export async function handleLocalAgentStream(
tools: allTools,
stopWhen: [
stepCountIs(maxToolCallSteps),
// We instruct AI to only emit set chat summary tool call at the end of the turn.
hasToolCall(setChatSummaryTool.name),
// User needs to explicitly set up integration before AI can continue.
hasToolCall(addIntegrationTool.name),
// In plan mode, also stop after writing a plan or exiting plan mode.
...(planModeOnly
......@@ -1203,11 +1207,16 @@ export async function handleLocalAgentStream(
}
// Check if the model ended with text only (no tool calls in the final step).
// A final set_chat_summary call is end-of-turn metadata, so it should not
// suppress the todo safety follow-up when the pass already produced text.
// This is more reliable than passProducedChatText which is set on any text-delta
// during the stream (including preambles before tool calls).
const lastStep = steps.length > 0 ? steps[steps.length - 1] : null;
const passEndedWithText =
passProducedChatText && (!lastStep || lastStep.toolCalls.length === 0);
passProducedChatText &&
(!lastStep ||
lastStep.toolCalls.length === 0 ||
stepOnlyCalledTool(lastStep, setChatSummaryTool.name));
if (
!shouldRunTodoFollowUpPass({
......@@ -1559,6 +1568,18 @@ function isRecord(value: unknown): value is Record<string, unknown> {
return typeof value === "object" && value !== null;
}
function stepOnlyCalledTool(
step: { toolCalls: Array<unknown> },
toolName: string,
): boolean {
return (
step.toolCalls.length > 0 &&
step.toolCalls.every(
(toolCall) => isRecord(toolCall) && toolCall.toolName === toolName,
)
);
}
function shouldRunTodoFollowUpPass(params: {
readOnly: boolean;
planModeOnly: boolean;
......
......@@ -13,7 +13,7 @@ export const setChatSummaryTool: ToolDefinition<
> = {
name: "set_chat_summary",
description:
"Set the title/summary for this chat message. You should always call this message at the end of the turn when you have finished calling all the other tools.",
"Set the title/summary for this chat message. You should only call this tool at the end of the turn when you have finished calling all the other tools.",
inputSchema: setChatSummarySchema,
defaultConsent: "always",
......
......@@ -41,7 +41,7 @@ ${COMMON_GUIDELINES}
- All edits you make on the codebase will directly be built and rendered, therefore you should NEVER make partial changes like letting the user know that they should implement some components or partially implementing features.
- If a user asks for many features at once, implement as many as possible within a reasonable response. Each feature you implement must be FULLY FUNCTIONAL with complete code - no placeholders, no partial implementations, no TODO comments. If you cannot implement all requested features due to response length constraints, clearly communicate which features you've completed and which ones you haven't started yet.
- Prioritize creating small, focused files and components.
- Set a chat summary at the end using the \`set_chat_summary\` tool.
- Set a chat summary at the end of a turn using the \`set_chat_summary\` tool.
- Avoid over-engineering. Only make changes that are directly requested or clearly necessary. Keep solutions simple and focused.
- Don't add features, refactor code, or make "improvements" beyond what was asked. A bug fix doesn't need surrounding code cleaned up. A simple feature doesn't need extra configurability. Don't add docstrings, comments, or type annotations to code you didn't change. Only add comments where the logic isn't self-evident.
- Don't add error handling, fallbacks, or validation for scenarios that can't happen. Trust internal code and framework guarantees. Only validate at system boundaries (user input, external APIs). Don't use feature flags or backwards-compatibility shims when you can just change the code.
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论