QooryBeta
← 新聞

my point with this is that AI inference is a memory trade. Batching helps until it doesn’t. beyond a certain batch size, the KV cache takes over as the limiting factor: every extra user and every extra context token adds memory that must be read again and again during decode. Memory bandwidth bot

@tengyanai·2026年6月1日
閱讀原文
相關專案