QooryBeta
← 뉴스

my point with this is that AI inference is a memory trade. Batching helps until it doesn’t. beyond a certain batch size, the KV cache takes over as the limiting factor: every extra user and every extra context token adds memory that must be read again and again during decode. Memory bandwidth bot

@tengyanai·2026년 6월 1일
기사 읽기
관련 프로젝트