Baru🚀 Model Open-Source 1T Parameter - Konteks 256K, Mode Penalaran Mendalam

Kimi K2 Thinking: AI Penalaran Mendalam dengan Konteks yang Diperluas

Model MoE dengan parameter triliunan yang dirancang untuk penalaran multi-langkah mendalam dan pemahaman konteks yang diperluas. Dengan jendela konteks 256K token dan mode berpikir native, Kimi K2 Thinking menghasilkan performa state-of-the-art pada tugas penalaran kompleks sambil mempertahankan efisiensi biaya. Sepenuhnya open-source di bawah lisensi Modified MIT.

Ulasan

Apa Kata Developer Tentang Kimi K2 Thinking

Tonton ulasan teknis dan demonstrasi langsung dari peneliti AI, developer, dan pakar teknologi yang mengeksplorasi kemampuan Kimi K2 Thinking

Kimi K2 Thinking is CRAZY... (HUGE UPDATE)

now waiting for a 20B distillation

Kimi K2 Thinking Is The BEST Open Source Model - First Look & Testing

Kimi's writing is always so good. It's human-like and rarely detected in AI detector.

Kimi K2 explained in 5 minutes

Quick correction: the recommended hardware on MoonShot AI's site for running the k2-base is 8 units of h100 for the quantized version so the cost is at least 8x than what i calculated here. It's still a bit behind in feasibility but the point remains that the gap will change. I apologize for the miscalculation!

Perbandingan Benchmark Performa

Lihat bagaimana performa Kimi K2 Thinking dibandingkan dengan model AI terkemuka di berbagai benchmark penalaran, coding, dan agentic.

Performance Across Key Categories

Kimi K2 Thinking Benchmark Comparison - Agentic Coding, Tool Use, Math & STEM

Comprehensive performance comparison across Agentic & Competitive Coding, Tool Use, and Math & STEM benchmarks

Coding Tasks

Software engineering and competitive programming benchmarks

Benchmark	K2 Thinking	GPT-5 (High)	Claude Sonnet 4.5	K2 0905	DeepSeek-V3.2
SWE-bench Verified (w/ tools)	71.3	74.9	77.2	69.2	67.8
SWE-bench Multilingual (w/ tools)	61.1	55.3*	68.0	55.9	57.9
LiveCodeBench v6 (no tools)	83.1	87.0*	64.0*	56.1*	74.1
OJ-Bench (cpp) (no tools)	48.7	56.2*	30.4*	25.5*	38.2*
Terminal-Bench (w/ simulated tools)	47.1	43.8	51.0	44.5	37.7

Reasoning Tasks

Multi-step reasoning, mathematics, and STEM problem-solving

Benchmark	K2 Thinking	GPT-5 (High)	Claude Sonnet 4.5	K2 0905	DeepSeek-V3.2	Grok-4
HLE (w/ tools)	44.9	41.7*	32.0*	21.7	20.3*	41.0
AIME25 (w/ python)	99.1	99.6	100.0	75.2	58.1*	98.8
HMMT25 (w/ python)	95.1	96.7	88.8*	70.4	49.5*	93.9
GPQA (no tools)	84.5	85.7	83.4	74.2	79.9	87.5

* indicates values from third-party reports or unofficial sources

Data source: Official Kimi K2 Thinking Model Card

Panduan Mulai Cepat

Deploy Kimi K2 Thinking pada infrastruktur Anda menggunakan vLLM. Setup sederhana 5 langkah untuk inferensi production-ready.

Hardware Requirements

Minimum setup for deploying Kimi K2 Thinking:

•8x GPUs with Tensor Parallel (NVIDIA H200 recommended)
•Supports INT4 quantized weights with 256k context length

Install vLLM

Install vLLM inference framework:

bash

pip install vllm

Download Model

Download the model from Hugging Face:

bash

huggingface-cli download moonshotai/Kimi-K2-Thinking --local-dir ./kimi-k2-thinking

Launch vLLM Server

Start the inference server with essential parameters:

vLLM Deployment

bash

vllm serve moonshotai/Kimi-K2-Thinking \
  --tensor-parallel-size 8 \
  --tool-call-parser kimi_k2 \
  --reasoning-parser kimi_k2 \
  --max-num-batched-tokens 32768

Test Deployment

Verify the deployment is working:

Test API

bash

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "moonshotai/Kimi-K2-Thinking",
    "messages": [
      {"role": "user", "content": "Hello, what is 1+1?"}
    ]
  }'

For complete deployment guide including SGLang and KTransformers:

Official Deployment Guide

Kemampuan Utama Kimi K2 Thinking

Temukan fitur-fitur kuat yang membuat Kimi K2 Thinking ideal untuk penalaran kompleks dan alur kerja pengembangan.

Penalaran Chain-of-Thought Mendalam

Dilatih end-to-end untuk penalaran multi-langkah dengan mode berpikir native. Mempertahankan logika yang koheren di seluruh 200-300 panggilan alat berurutan untuk pemecahan masalah kompleks.

Pemahaman Konteks yang Diperluas

Jendela konteks 256K token terdepan di industri memungkinkan pemrosesan seluruh codebase, dokumen panjang, dan proyek multi-file sambil mempertahankan konteks penuh sepanjang waktu.

Arsitektur MoE Trillion-Parameter

Desain Mixture-of-Experts 1 triliun parameter dengan 32B parameter aktif per forward pass, menghasilkan performa luar biasa dengan biaya komputasi yang efisien.

Kemampuan Coding & Agen Superior

Mencapai 71,3% pada SWE-bench Verified dan 83,1% pada LiveCodeBench v6. Unggul dalam tugas agentic dengan 60,2% pada BrowseComp dan 44,9% pada Humanity's Last Exam.

Kuantisasi INT4 Native

Pelatihan sadar kuantisasi memungkinkan akselerasi inferensi 2x dengan presisi INT4 sambil mempertahankan kualitas model untuk deployment production.

Open-Source & Hemat Biaya

Dirilis di bawah Modified MIT License dengan harga API $0,60/M token input ($0,15 dengan cache) dan $2,50/M output - 60-80% lebih murah dari GPT-4 dan Claude.

Reaksi Komunitas di X

Bergabunglah dengan percakapan tentang Kimi K2 Thinking dan lihat apa yang komunitas developer bagikan tentang pengalaman mereka

🚀 Hello, Kimi K2 Thinking!
The Open-Source Thinking Agent Model is here.

🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%)
🔹 Executes up to 200 – 300 sequential tool calls without human interference
🔹 Excels in reasoning, agentic search, and coding
🔹 256K context window

Built… pic.twitter.com/lZCNBIgbV2
— Kimi.ai (@Kimi_Moonshot) November 6, 2025

Kimi K2 Thinking is the new leading open weights model: it demonstrates particular strength in agentic contexts but is very verbose, generating the most tokens of any model in completing our Intelligence Index evals@Kimi_Moonshot's Kimi K2 Thinking achieves a 67 in the… pic.twitter.com/m6SvpW7iif
— Artificial Analysis (@ArtificialAnlys) November 7, 2025

The new 1 Trillion parameter Kimi K2 Thinking model runs well on 2 M3 Ultras in its native format - no loss in quality!

The model was quantization aware trained (qat) at int4.

Here it generated ~3500 tokens at 15 toks/sec using pipeline-parallelism in mlx-lm: pic.twitter.com/oH5DPi7kAg
— Awni Hannun (@awnihannun) November 7, 2025

If Kimi K2 Thinking was truly trained with only $4.6 million, the close AI labs are cooked. pic.twitter.com/LPbSL0v1U5
— Yuchen Jin (@Yuchenj_UW) November 7, 2025

Give me 1 reason why I shouldn't buy this top of the line Mac Studio, download Kimi K2 Thinking (best AI model in the world right now), and let it control the computer autonomously 24/7

A full employee working for me year round

Would anyone want to this live streamed? pic.twitter.com/6vZd7dyAoP
— Alex Finn (@AlexFinn) November 7, 2025

Kimi K2 Thinking: AI Penalaran Mendalam dengan Konteks yang Diperluas

Apa Kata Developer Tentang Kimi K2 Thinking

Kimi K2 Thinking is CRAZY... (HUGE UPDATE)

Kimi K2 Thinking Is The BEST Open Source Model - First Look & Testing

Kimi K2 explained in 5 minutes

Perbandingan Benchmark Performa

Performance Across Key Categories

Coding Tasks

Reasoning Tasks

Panduan Mulai Cepat

Hardware Requirements

Install vLLM

Download Model

Launch vLLM Server

Test Deployment

Kemampuan Utama Kimi K2 Thinking

Penalaran Chain-of-Thought Mendalam

Pemahaman Konteks yang Diperluas

Arsitektur MoE Trillion-Parameter

Kemampuan Coding & Agen Superior

Kuantisasi INT4 Native

Open-Source & Hemat Biaya

Reaksi Komunitas di X

FAQ

Apa itu Kimi K2 Thinking dan apa perbedaannya dengan K2 standar?

Bagaimana cara kerja mode berpikir?

Untuk kasus penggunaan apa Kimi K2 Thinking paling cocok?

Bagaimana cara mengakses dan menggunakan Kimi K2 Thinking?

Bagaimana struktur harganya?

Bagaimana perbandingan Kimi K2 Thinking dengan model penalaran seperti o1 dan DeepSeek R1?

Bagaimana Kimi K2 Thinking menyeimbangkan kedalaman penalaran dengan kecepatan dan biaya?

Bisakah saya mendeploy Kimi K2 Thinking secara lokal, dan apa persyaratannya?

Apa praktik terbaik untuk menggunakan Kimi K2 Thinking secara efektif?