大模型基础1

本篇是浙大毛玉仁老师的《大模型原理与技术》课程的相关整理。课程视频：https://www.bilibili.com/video/BV1PB6XYFET2 教材：https://github.com/ZJU-LLMs/Foundations-of-LLMs 1 序言 1. 语言&智能的定义：语言：a system ofcommunication that...

Aug 4, 2025 LLMs

Transformers源码安装运行

环境配置可以成功安装运行的搭配：环境版本 CUDA 11.8 Python 3.10.8 vLLM 0.6.4.post1 PyTorch 2.5.1+c...

Jul 17, 2025 大模型

PagedAttention论文阅读

本文为论文 Efficient Memory Management for Large Language Model Serving with PagedAttention 的阅读笔记。问题背景与挑战：大语言模型（LLMs）在高吞吐量的推理场景下，需要一次处理足够多的请求（即批量化）。但目前的系统在这一点上面临困难，原因主要是每个请求的KV缓存占用大量内存，并且其大小是动态变化的。...

Jul 8, 2025 大模型

VLLM源码安装运行示例

vLLM 是一个高性能的大语言模型推理和服务库，专注于提供快速、易用、低成本的LLM服务。它采用PagedAttention 技术高效管理注意力的键值内存，支持连续批处理，并提供了多种优化手段。使用vLLM 可以通过快速安装可运行版本，也可以使用源码开发模式。本文记录了从源码安装运行 vLLM 的示例，以及一些踩坑教程。 1 环境配置首先要确定自己使用的vLLM版本，再去搜对应的环...

Jul 7, 2025 大模型

FlexGen论文阅读

本文为论文 FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU 的阅读笔记。摘要大语言模型推理需要高计算和内存资源，通常依赖多块GPU。但在对延迟不敏感、可批量处理的应用中，存在在资源有限设备（如单张普通GPU）上实现高吞吐量推理的需求。论文提出 Flex...

Jun 13, 2025 KVCache

LLM 和 KV cache 详解

1 从prompt到output 作为一种大型语言模型，LLaMA 的工作原理是接收输入文本（prompt），并预测下一个标记或单词。举个例子。我们的 prompt 是： Quantum mechanics is a fundamental theory in physics that LLM 会根据它所接受的训练，尝试续写这个句子： provides insights int...

May 22, 2025 KVCache

MOONCAKE论文阅读

本篇为论文 Mooncake: Trading More Storage for Less Computation — A KVCache-centric Architecture for Serving LLM Chatbot 的阅读笔记。摘要 MOONCAKE 是大语言模型聊天服务 Kimi 的推理平台，其核心任务是高效地进行 LLM 请求的分布式推理调度与缓存管理。 1. ...

May 12, 2025 KVCache

A Beginner's Guide to Getting Started with Open Source Projects

Open source projects are a cornerstone of software development, offering learning resources and a platform for technological advancement and community collaboration. As a newcomer to the open sourc...

Mar 14, 2025 Other

Getting Started with C-Reduce: A Quick Guide

In software development, simplifying code is essential for improving code quality and maintainability. C-reduce is a tool designed to reduce code, helping developers quickly pinpoint and fix issues...

Mar 13, 2025 Debugging

HDD: Hierarchical Delta Debugging

This is a note for the paper HDD: Hierarchical Delta Debugging. Abstract During program debugging, failure-inducing inputs are often large and contain irrelevant information, making debugging mor...

Mar 12, 2025 Debugging