Claude Code 多了個 dynamic workflows,我打開那段 JS 看了一下
Claude Code 5/28 釋出 dynamic workflows,跟 Opus 4.8 同一天上。比起「能開 1000 個 subagent」那個數字,更關鍵的是 orchestration 那段 JS 是 Claude 寫的、不是 Claude 在跑——這件事其實滿值得想一下的。
Claude Code 5/28 釋出 dynamic workflows,跟 Opus 4.8 同一天上。比起「能開 1000 個 subagent」那個數字,更關鍵的是 orchestration 那段 JS 是 Claude 寫的、不是 Claude 在跑——這件事其實滿值得想一下的。
Claude Code's new dynamic workflows hand the orchestration plan over to a JavaScript script that Claude writes. The runtime executes it with up to 1,000 subagents — 16 concurrent — and Claude's context only sees the final cross-checked answer.
叫 AI 數 strawberry 有幾個 r,它曾經很有自信地答錯。新模型現在大多答對了,但它當初為什麼會錯——用一個積木的比喻聊聊,順便講為什麼那個原因到現在還沒真的消失。
AI's most dangerous trait isn't that it's wrong sometimes. It's that its tone when wrong is identical to its tone when right. Here's my plain-language take on why, including why it won't just say 'I don't know'.
Not a benchmark or a verdict on which AI is best — just the small habits I picked up from keeping ChatGPT, Claude, and Gemini all open: route by task, give context first, don't expect one perfect answer, and verify the confident-sounding stuff.
AI 最會唬人的地方,不是它會錯,是它錯的時候那個口氣跟講對的時候完全一樣。用『它一直在猜下一個最順的字』這個角度,白話聊聊為什麼篤定不等於知道。
沒什麼大道理,就是同時用 ChatGPT、Gemini、Claude 一陣子之後,自己順手摸出來的幾個小習慣。不同事丟不同家、先講清楚再問、別期待一次到位這類的。
GSM8k 99%、MMLU 90 出頭、HLE 在 2026 年中已進入 40 分檔。每出一份『更難的 benchmark』看起來都在解決問題,但結構性的事沒變:我們從來沒在驗證模型學會了什麼,只是在量它有沒有看過。
GSM8k at 99%, MMLU at the 88-94% noise band, HLE already in the mid-40s by mid-2026. Each round of harder benchmarks looks like progress, but the field never solved the underlying problem: we measure correlation with a test distribution and call it capability.
A relaxed take on Python list comprehensions: translate them back into the equivalent for-loop, and check what's actually true about variable leaking and speed on Python 3.14.