Documentation Index
Fetch the complete documentation index at: https://se7en.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
vLLM
推测解码
GPU
Harness Engineering
Harness Engineering 指的是一种开发方式:工程师不直接写大量代码,而是设计环境、规则和测试反馈系统,让 AI Agent 自动生成并改进代码。- Effective harnesses for long-running agents
- Harness engineering: leveraging Codex in an agent-first world
- Minions: Stripe’s one-shot, end-to-end coding agents
- Minions: Stripe’s one-shot, end-to-end coding agents—Part 2
- Vibe Coding AReaL:零手打代码开发分布式 RL 训练框架
量化
CUTLASS
Triton
- Deep Dive into Triton Internals (Part 1)
- Deep Dive into Triton Internals (Part 2)
- Deep Dive into Triton Internals (Part 3)
FlashAttention
- FlashAttention from First Principles
- FlashAttention — Visually and Exhaustively Explained
- Flash Attention 2.0 with Tri Dao (author)!
- Flash Attention学习过程【详】解
- ELI5: FlashAttention
- Designing Hardware-Aware Algorithms: FlashAttention
- FlashAttention: Fast and Memory-Efficient Exact Attention With IO-Awareness
