新品🚀 1T 参数开源模型 - 256K 上下文，深度推理模式

Kimi K2 Thinking：具备深度推理能力的超长上下文 AI

一个专为深度多步推理和超长上下文理解设计的万亿参数 MoE 模型。拥有 256K token 上下文窗口和原生思考模式，Kimi K2 Thinking 在复杂推理任务上表现卓越,同时保持成本效益。采用修改版 MIT 许可证完全开源。

评测

开发者对 Kimi K2 Thinking 的评价

观看 AI 研究人员、开发者和技术专家探索 Kimi K2 Thinking 能力的技术评测和实操演示

Kimi K2 Thinking 太疯狂了... (重大更新)

现在等待 20B 蒸馏版本

Kimi K2 Thinking 是最佳开源模型 - 首次体验与测试

Kimi 的写作总是如此出色。它非常像人类,在 AI 检测器中很少被发现。

5 分钟了解 Kimi K2

快速更正：MoonShot AI 网站上推荐运行 k2-base 量化版本的硬件至少需要 8 个 H100,所以成本至少是我这里计算的 8 倍。它在可行性方面仍然稍微落后,但重点是差距会改变。为计算错误道歉！

性能基准对比

了解 Kimi K2 Thinking 在关键推理、编程和智能体基准测试中与主流 AI 模型的表现对比。

Performance Across Key Categories

Kimi K2 Thinking Benchmark Comparison - Agentic Coding, Tool Use, Math & STEM

Comprehensive performance comparison across Agentic & Competitive Coding, Tool Use, and Math & STEM benchmarks

Coding Tasks

Software engineering and competitive programming benchmarks

Benchmark	K2 Thinking	GPT-5 (High)	Claude Sonnet 4.5	K2 0905	DeepSeek-V3.2
SWE-bench Verified (w/ tools)	71.3	74.9	77.2	69.2	67.8
SWE-bench Multilingual (w/ tools)	61.1	55.3*	68.0	55.9	57.9
LiveCodeBench v6 (no tools)	83.1	87.0*	64.0*	56.1*	74.1
OJ-Bench (cpp) (no tools)	48.7	56.2*	30.4*	25.5*	38.2*
Terminal-Bench (w/ simulated tools)	47.1	43.8	51.0	44.5	37.7

Reasoning Tasks

Multi-step reasoning, mathematics, and STEM problem-solving

Benchmark	K2 Thinking	GPT-5 (High)	Claude Sonnet 4.5	K2 0905	DeepSeek-V3.2	Grok-4
HLE (w/ tools)	44.9	41.7*	32.0*	21.7	20.3*	41.0
AIME25 (w/ python)	99.1	99.6	100.0	75.2	58.1*	98.8
HMMT25 (w/ python)	95.1	96.7	88.8*	70.4	49.5*	93.9
GPQA (no tools)	84.5	85.7	83.4	74.2	79.9	87.5

* indicates values from third-party reports or unofficial sources

Data source: Official Kimi K2 Thinking Model Card

快速开始指南

使用 vLLM 在您的基础设施上部署 Kimi K2 Thinking。简单的 5 步设置,即可完成生产级推理部署。

Hardware Requirements

Minimum setup for deploying Kimi K2 Thinking:

•8x GPUs with Tensor Parallel (NVIDIA H200 recommended)
•Supports INT4 quantized weights with 256k context length

Install vLLM

Install vLLM inference framework:

bash

pip install vllm

Download Model

Download the model from Hugging Face:

bash

huggingface-cli download moonshotai/Kimi-K2-Thinking --local-dir ./kimi-k2-thinking

Launch vLLM Server

Start the inference server with essential parameters:

vLLM Deployment

bash

vllm serve moonshotai/Kimi-K2-Thinking \
  --tensor-parallel-size 8 \
  --tool-call-parser kimi_k2 \
  --reasoning-parser kimi_k2 \
  --max-num-batched-tokens 32768

Test Deployment

Verify the deployment is working:

Test API

bash

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "moonshotai/Kimi-K2-Thinking",
    "messages": [
      {"role": "user", "content": "Hello, what is 1+1?"}
    ]
  }'

For complete deployment guide including SGLang and KTransformers:

Official Deployment Guide

Kimi K2 Thinking 核心能力

探索使 Kimi K2 Thinking 成为复杂推理和开发工作流理想选择的强大功能。

深度思维链推理

端到端训练的多步推理能力,具备原生思考模式。在复杂问题解决中可保持 200-300 次连续工具调用的逻辑连贯性。

超长上下文理解

业界领先的 256K token 上下文窗口,能够处理整个代码库、长篇文档和多文件项目,同时保持完整的上下文理解。

万亿参数 MoE 架构

1 万亿参数的专家混合设计,每次前向传播激活 32B 参数,在保持高效计算成本的同时提供卓越性能。

卓越编程与智能体能力

在 SWE-bench Verified 上达到 71.3%,在 LiveCodeBench v6 上达到 83.1%。在智能体任务上表现出色,BrowseComp 达到 60.2%,人类最后的考试达到 44.9%。

原生 INT4 量化

量化感知训练使 INT4 精度下推理加速 2 倍,同时保持模型质量,适合生产部署。

开源且高性价比

采用修改版 MIT 许可证发布,API 定价为输入 $0.60/M tokens(缓存时 $0.15)、输出 $2.50/M - 比 GPT-4 和 Claude 便宜 60-80%。

社区在 X 上的反响

加入关于 Kimi K2 Thinking 的讨论,看看开发者社区分享的使用体验

🚀 Hello, Kimi K2 Thinking!
The Open-Source Thinking Agent Model is here.

🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%)
🔹 Executes up to 200 – 300 sequential tool calls without human interference
🔹 Excels in reasoning, agentic search, and coding
🔹 256K context window

Built… pic.twitter.com/lZCNBIgbV2
— Kimi.ai (@Kimi_Moonshot) November 6, 2025

Kimi K2 Thinking is the new leading open weights model: it demonstrates particular strength in agentic contexts but is very verbose, generating the most tokens of any model in completing our Intelligence Index evals@Kimi_Moonshot's Kimi K2 Thinking achieves a 67 in the… pic.twitter.com/m6SvpW7iif
— Artificial Analysis (@ArtificialAnlys) November 7, 2025

The new 1 Trillion parameter Kimi K2 Thinking model runs well on 2 M3 Ultras in its native format - no loss in quality!

The model was quantization aware trained (qat) at int4.

Here it generated ~3500 tokens at 15 toks/sec using pipeline-parallelism in mlx-lm: pic.twitter.com/oH5DPi7kAg
— Awni Hannun (@awnihannun) November 7, 2025

If Kimi K2 Thinking was truly trained with only $4.6 million, the close AI labs are cooked. pic.twitter.com/LPbSL0v1U5
— Yuchen Jin (@Yuchenj_UW) November 7, 2025

Give me 1 reason why I shouldn't buy this top of the line Mac Studio, download Kimi K2 Thinking (best AI model in the world right now), and let it control the computer autonomously 24/7

A full employee working for me year round

Would anyone want to this live streamed? pic.twitter.com/6vZd7dyAoP
— Alex Finn (@AlexFinn) November 7, 2025

Kimi K2 Thinking：具备深度推理能力的超长上下文 AI

开发者对 Kimi K2 Thinking 的评价

Kimi K2 Thinking 太疯狂了... (重大更新)

Kimi K2 Thinking 是最佳开源模型 - 首次体验与测试

5 分钟了解 Kimi K2

性能基准对比

Performance Across Key Categories

Coding Tasks

Reasoning Tasks

快速开始指南

Hardware Requirements

Install vLLM

Download Model

Launch vLLM Server

Test Deployment

Kimi K2 Thinking 核心能力

深度思维链推理

超长上下文理解

万亿参数 MoE 架构

卓越编程与智能体能力

原生 INT4 量化

开源且高性价比

社区在 X 上的反响

常见问题

什么是 Kimi K2 Thinking,它与标准 K2 有何不同？

思考模式是如何工作的？

Kimi K2 Thinking 最适合哪些用例？

如何访问和使用 Kimi K2 Thinking？

定价结构是怎样的？

Kimi K2 Thinking 与 o1 和 DeepSeek R1 等推理模型相比如何？

Kimi K2 Thinking 如何平衡推理深度与速度和成本？

我可以本地部署 Kimi K2 Thinking 吗,需要什么硬件？

有效使用 Kimi K2 Thinking 的最佳实践是什么？