Self-Harness Optimization — 自学习 AI Harness

核心论点

Harness engineering 是 2026-06-08 后 AI 圈新范式：Anthropic Boris 自己说 "stop prompting Claude, build loops that prompt themselves"。Anthropic 把"intelligence"从 LLM 移到 harness（参考 Fable 5 现象：70% refusal = 模型被换 harness）。Shanghai AI Lab 提出 Self-Harness Optimization —— 让 LLM 自己优化自己的 harness（不改权重），用 frozen model + 数学 loop 检测 failure pattern → 提案 → 验证 → 更新。

🔥 Fable 5 吐槽（背景）

💬 Hugging Face CTO 评价

"Super high frequency of refusals may backfire on Fable/Anthropic"
"Why pay 2x if 70% probability of refusal?"
"I have 70%+ refusal for science tasks"
禁 medical / chemistry / biology / cybersecurity
"Hey, can you explain the heart of a human being?" → flagged for cybersecurity risk

💡 Discover AI 关键猜想

"Is Fable 5 really a new LLM, or since it's input a vision language model — is this really a new object or has it just a better harness?"

"Tropic is moving the intelligence out from their LLM system into the harness system."

📊 Fable 5 vs Claude 4.6

任务类型	Fable 5	Claude 4.6
Video game	✅ 好	✅ 好
HTML 生成	✅ 好	✅ 好
科学 task	❌ 70%+ refusal	✅ work
价格	2x	1x

🏗️ Scaffold vs Harness（核心区分）

论文定义（Shanghai AI Lab）

"We use the term harness to denote the non-parametric scaffolding that governs how a fixed language model is deployed as an agent."

简化：论文把 "everything that is not the LLM" 都叫 harness，包含 scaffold。

Discover AI 自己的区分（更精细）

维度	Scaffold	Harness
本质	Expertise as workflow	Execution substrate
问	"How should this problem be solved?"	"With which tool? How often retry? When stop?"
包含	Skills + workflows	Tool calling / file I/O / code exec / memory / loop control
类比	OSC: 知识	OSC: runtime layer
Source of expertise	✅ 是	❌ 否（只是 operational layer）

Discover AI 框架

"LLM running inside an iterative loop, governed by a harness, and shaped by a scaffold of skills and workflows."

4 大 Harness 优化元素

1. Prompts

System / execution / verification / failure recovery instruction

2. Tools

Tool calling / file I/O / code execution

3. Memory

Context compaction / past experience

4. Policy

Runtime control / routing

🔁 Self-Harness 4 步 Loop

Evaluate（评估）

收集 execution traces · Cluster failed records by verifier-grounded failure signatures · 排序 by actionability

Propose（提案）

用 frozen LLM (M) + current harness 角色 · 提供 failure patterns + passing behavior + 之前 attempted edits · LLM 提案 minimum candidate modifications (baby steps)

Validate（验证）

同一 evaluator 评估新 proposal · Pass → accept · Fail → reject

Update（更新）

接受 → 更新 harness · 否则 → 保留旧版本

关键设计：Baby Steps

"Since it's a self-learning procedure, you know this is like a cool back library versions, you don't want to diverse too much here from your given probability distribution. So, you just make baby steps."

为什么不大幅修改：保持分布稳定 · 避免 catastrophic forgetting · 验证局部最优化（小改易验证）

📊 Cluster & 排序

Cluster 结构（per cluster）

字段	含义
Cluster size	失败记录数
Representative task	代表任务
Shared trace symptoms	trace 共同症状
Verifier evidence	verifier 证据
Agent mechanisms	agent 行为

排序原则

"Order clusters by estimated actionability — what can we solve immediately, what's simplest to correct, how can we optimize performance immediately? We don't waste 30-min or 1-hour runs to find what's wrong."

提案包含

Verifier-grounded failure patterns（检测到的问题）
Passing behavior（什么 work）
Summary of previously attempted edits（之前试过什么）

10 大 Initial Harness 元素

#	元素
1	System prompt
2	Memory source
3	Sub-agent
4	Skills
5	Bootstrap instruction
6	Execution instruction
7	Verification instruction
8	Failure recovery instruction
9	Runtime control policy
10	Routing policy

→ Self-harness optimization 改的就是这 10 个 element

💡 失败 case 例子

场景：被 stuck in tool loop（52 calls 后）

❌ 旧 behavior

"继续重复 same failed path，stuck in loop"

✅ Self-correction（harness 自动改）

"You appear to be stuck in a tool loop. Stop repeating the same failed path. Summarize the evidence already collected. Choose the smallest remaining implementation step and then run one targeted verification."

💡 关键能力

不是改 LLM 权重，而是改 harness 的 instruction → 立即可读 / 立即生效

比 fine-tune 快 + 比 prompt 更结构化

📈 性能数据（Terminal Bench 2）

89 containerized terminal tasks · tool-based execution

Minimax M 2.5

初始 pass rate

42%

自优化后

53%

Q&A 3.5

初始 pass rate

20%

自优化后

36%

⚠️ 限制

"Of course, since the tasks are rather simple and you have a simple harness, a self-evolving harness, these are the easy wins."

"For a higher complexity system, there's nothing particular to it. It is more or less more of the same."

🔗 与 Patrick 工作的关联

🔗 Sam Altman Stanford

Sam 暗示 Anthropic "把 intelligence 移到 harness"（Fable 5 现象）

Sam "inference underinvested" → 被验证：harness engineering 是 inference 高阶

🔗 K1（Agent Native KG）

K1 = "harness 设计" 优秀 case（17 MCP tools + 3 源 retrieval）

Self-harness = 进一步让 harness 自我进化

K1 可以加 self-harness loop → 自动检测 failure → 优化 CLI

🔗 Claude Code / IndyDevDan

IndyDevDan "Claude Code = LLM + 迭代 loop + harness"

"Milestone + commit" = harness 的人工版本

Self-harness = Milestone + commit 的自动化

🔗 hybrid-llm-router ⭐

4 harness 元素：prompts / tools / memory / policy → memory + policy 是 routing 关键

Self-harness optimization 可以优化 "policy" → router policy 自学习

"估计 actionability" 排序 = routing decision log 分析

🔗 OpenSwarm

OpenSwarm = 多个 harness 协作

每个 harness 可以自学习 → 自演化 multi-agent

"OpenSkill"（Discover AI 之前）= self-evolution skill loop

🗺️ Patrick 实战路线

立即可做（30 分钟）

在 hybrid-llm-router skill 文档加 case study（self-harness optimization）
找当前 OpenSwarm / hybrid-llm-router 跑过的 traces
手工 cluster failure patterns（学思路）

中期（1-2 周）

实现 self-harness loop（基于 hybrid-llm-router 框架）
4 元素（prompts/tools/memory/policy）按 cluster 排序
Baby-step 验证

长期（1 个月+）

Self-evolving multi-agent system（OpenSwarm + self-harness）
跨 harness 共享 failure patterns（collective learning）
集成到生产 LiteLLM proxy

💡 Boris 6/8 公告（Anthropic）

"Stop prompting Claude and build loops that prompt them self."

→ Harness engineering = 2026-06-08 后新范式

→ Self-harness = loop 的 self-improvement 版本