Models

Abstract

在 LangChain 中，Model 是 Agent 和 LLM 应用的推理引擎。它负责理解输入、生成文本、判断是否需要调用工具、解释工具结果，并决定什么时候输出最终答案。模型本身的能力会直接影响 Agent 的可靠性：指令遵循、工具调用、结构化输出、上下文长度、多模态和推理能力都需要在选型时考虑。

LangChain 的价值在于提供统一的模型接口。不同 provider 的 API 差异很大，但通过 LangChain 可以用相似的方式调用 OpenAI、Anthropic、Google Gemini、Azure、AWS Bedrock、HuggingFace、OpenRouter、Ollama 等模型，从而降低切换模型或做实验的成本。

基本使用

Model 可以有两种使用方式：

配合 Agent 使用：作为 Agent 的推理核心，负责决定下一步动作和最终回答。
单独调用：不进入 Agent 循环，直接用于文本生成、分类、总结、信息抽取等任务。

最简单的初始化方式是使用 init_chat_model：

from langchain.chat_models import init_chat_model

model = init_chat_model("openai:gpt-5.4")
response = model.invoke("Explain LangChain in one sentence.")
print(response.text)

也可以直接使用 provider 对应的模型类，这种方式更适合需要精细配置参数的场景：

from langchain_openai import ChatOpenAI

model = ChatOpenAI(
    model="gpt-5.4",
    temperature=0.1,
    max_tokens=1000,
    timeout=30,
)

在 Agent 中使用时，模型可以直接传给 create_agent：

from langchain.agents import create_agent

agent = create_agent(
    model="openai:gpt-5.4",
    tools=tools,
)

动态模型选择

Agent 不一定只能固定使用一个模型。更工程化的做法是根据任务复杂度、上下文长度、用户等级或成本策略动态切换模型。例如短对话使用较便宜的模型，长对话或复杂任务切换到更强的模型。

from langchain_openai import ChatOpenAI
from langchain.agents import create_agent
from langchain.agents.middleware import wrap_model_call, ModelRequest, ModelResponse


basic_model = ChatOpenAI(model="gpt-5.4-mini")
advanced_model = ChatOpenAI(model="gpt-5.4")

@wrap_model_call
def dynamic_model_selection(request: ModelRequest, handler) -> ModelResponse:
    """Choose model based on conversation complexity."""
    message_count = len(request.state["messages"])

    if message_count > 10:
        # Use an advanced model for longer conversations
        model = advanced_model
    else:
        model = basic_model

    return handler(request.override(model=model))

agent = create_agent(
    model=basic_model,  # Default model
    tools=tools,
    middleware=[dynamic_model_selection],
)

这个例子说明：默认模型是 basic_model，但 middleware 可以在每次模型调用前检查当前 state，并通过 request.override(model=...) 替换本轮实际使用的模型。

常用调用方法

invoke

invoke 是最直接的调用方式，输入一条消息或一组消息，等待模型生成完整结果。

response = model.invoke("Why is tool calling useful?")
print(response.text)

也可以传入多轮对话：

conversation = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Translate: I love building applications."},
]

response = model.invoke(conversation)

stream

stream 用于流式输出，适合长回答或需要前端逐步展示结果的场景。

for chunk in model.stream("Explain vector databases."):
    print(chunk.text, end="", flush=True)

流式输出返回的是多个 chunk，最终可以累加成完整的 AIMessage。

batch

batch 用于并行处理多个相互独立的请求。

responses = model.batch([
    "Summarize LangChain.",
    "Summarize LangGraph.",
    "Summarize LangSmith.",
])

如果输入很多，可以通过 max_concurrency 控制并发数，避免触发限流或占用过多资源。

关键参数

不同 provider 支持的参数不完全相同，但常见配置包括：

model：指定具体模型名称，也可以使用 provider:model 格式。
api_key：模型服务的认证密钥，通常通过环境变量提供。
temperature：控制随机性，越低越稳定，越高越发散。
max_tokens：限制输出长度。
timeout：请求超时时间。
max_retries：失败后的最大重试次数，适合处理网络波动、限流或临时服务错误。

实践中，生产环境不要只关心模型名称，还应该明确超时、重试、并发、token 成本和日志追踪策略。

Tool Calling

很多现代模型支持 tool calling，也就是模型可以根据用户请求生成“调用某个工具及其参数”的结构化请求。

from langchain.tools import tool

@tool
def get_weather(location: str) -> str:
    """Get the weather at a location."""
    return f"It's sunny in {location}."

model_with_tools = model.bind_tools([get_weather])
response = model_with_tools.invoke("What's the weather like in Boston?")

for tool_call in response.tool_calls:
    print(tool_call["name"], tool_call["args"])

单独使用 model 时，模型只会返回 tool call 请求，工具执行循环需要自己处理：执行工具、把结果追加回消息历史，再让模型生成最终答案。使用 Agent 时，这个循环由 Agent 自动完成。

需要注意：

可以强制模型调用某个工具，或要求必须调用任意工具。
支持并行 tool calls 的模型可以一次请求多个独立工具。
流式输出时，tool call 也可能以 chunk 的形式逐步生成。
部分 provider 支持 server-side tools，例如网页搜索或代码解释器，工具调用和结果由服务端处理。

Structured Output

结构化输出用于让模型返回符合 schema 的结果，方便后续程序解析。常见 schema 类型包括：

Pydantic：支持运行时校验，适合严肃的数据抽取。
TypedDict：轻量，适合只需要类型提示和简单结构的场景。
JSON Schema：通用性强，适合跨语言或需要精确控制 schema 的场景。

PydanticTypedDictJSON Schema

from pydantic import BaseModel, Field

class Movie(BaseModel):
    title: str = Field(description="The title of the movie")
    year: int = Field(description="The year the movie was released")
    director: str = Field(description="The director of the movie")

model_with_structure = model.with_structured_output(Movie)
response = model_with_structure.invoke("Provide details about the movie Inception")
print(response)

from typing_extensions import TypedDict, Annotated

class MovieDict(TypedDict):
    """A movie with details."""
    title: Annotated[str, ..., "The title of the movie"]
    year: Annotated[int, ..., "The year the movie was released"]
    director: Annotated[str, ..., "The director of the movie"]
    rating: Annotated[float, ..., "The movie's rating out of 10"]

model_with_structure = model.with_structured_output(MovieDict)
response = model_with_structure.invoke("Provide details about the movie Inception")
print(response)  # {'title': 'Inception', 'year': 2010, 'director': 'Christopher Nolan', 'rating': 8.8}

import json

json_schema = {
    "title": "Movie",
    "description": "A movie with details",
    "type": "object",
    "properties": {
        "title": {
            "type": "string",
            "description": "The title of the movie"
        },
        "year": {
            "type": "integer",
            "description": "The year the movie was released"
        },
        "director": {
            "type": "string",
            "description": "The director of the movie"
        },
        "rating": {
            "type": "number",
            "description": "The movie's rating out of 10"
        }
    },
    "required": ["title", "year", "director", "rating"]
}

model_with_structure = model.with_structured_output(
    json_schema,
    method="json_schema",
)
response = model_with_structure.invoke("Provide details about the movie Inception")
print(response)  # {'title': 'Inception', 'year': 2010, ...}

如果需要同时查看原始模型消息和解析后的结构，可以使用 include_raw=True。

进阶能力

Model Profiles

LangChain 的 model profile 用来描述模型能力，例如上下文窗口、是否支持图片输入、是否支持 tool calling、是否支持 reasoning output 等。它可以帮助应用在运行时根据能力选择模型或调整策略。

Multimodal

部分模型支持图片、音频、视频等多模态输入或输出。在 LangChain 中，多模态内容通常通过 message content blocks 表达。使用前需要确认目标模型和 provider 是否支持对应模态。

Reasoning

一些模型支持显式 reasoning 输出，或允许设置 reasoning effort。对于复杂任务，这有助于提升推理质量；对于简单任务，则可能增加延迟和成本。

Local Models

LangChain 也支持本地模型，例如通过 Ollama 运行本地 chat model。适合隐私要求高、需要离线运行、或希望降低云 API 成本的场景。

Prompt Caching

很多 provider 提供 prompt caching，用于降低重复上下文带来的延迟和成本。缓存可能是隐式的，也可能需要显式指定 cache point。长系统提示词、固定文档上下文和重复任务尤其适合利用缓存。

Rate Limiting

模型服务通常有速率限制。LangChain 支持在模型初始化时传入 rate limiter，用于控制请求频率。它能限制单位时间请求数，但不能替代 token 预算管理。

Invocation Config

调用模型时可以通过 config 传入运行配置，例如 run_name、tags、metadata、callbacks、max_concurrency 等。这些信息常用于 LangSmith tracing、日志监控、成本分析和批量任务控制。

小结

Model 是 LangChain 应用的能力底座。选模型时不只看“哪个模型更强”，还要看它是否支持目标任务需要的能力：tool calling、结构化输出、流式输出、多模态、上下文长度、成本、延迟、限流和可观测性。LangChain 的统一接口让模型可以被替换、组合和动态路由，这也是 Agent 工程化的重要基础。