在 LlamaIndex 抽象层中定制大型语言模型#

您可以将这些大型语言模型抽象集成到 LlamaIndex 的其他模块中（索引、检索器、查询引擎、Agent），从而构建针对您的数据的高级工作流。

默认情况下，我们使用 OpenAI 的 gpt-3.5-turbo 模型。但您可以选择定制底层使用的大型语言模型。

示例：改变底层大型语言模型#

以下展示了定制使用的大型语言模型的示例代码片段。

在此示例中，我们使用 gpt-4o-mini 而非 gpt-3.5-turbo。可用模型包括 gpt-4o-mini、gpt-4o、o3-mini 等。

from llama_index.core import Settings, VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI

# define LLM
llm = OpenAI(temperature=0.1, model="gpt-4o-mini")

# change the global default LLM
Settings.llm = llm

documents = SimpleDirectoryReader("data").load_data()

# build index
index = VectorStoreIndex.from_documents(documents)

# locally override the LLM
query_engine = index.as_query_engine(llm=llm)
response = query_engine.query(
    "What did the author do after his time at Y Combinator?"
)

示例：使用定制大型语言模型 - 高级#

要使用定制大型语言模型，您只需实现 LLM 类（或者对于更简单的接口，实现 CustomLLM）。您将负责将文本传递给模型并返回新生成的 token。

此实现可以是某个本地模型，甚至是您自己的 API 的封装。

请注意，为了获得完全私有的体验，还需要设置本地嵌入模型。

这是一个小的样板示例

from typing import Optional, List, Mapping, Any

from llama_index.core import SimpleDirectoryReader, SummaryIndex
from llama_index.core.callbacks import CallbackManager
from llama_index.core.llms import (
    CustomLLM,
    CompletionResponse,
    CompletionResponseGen,
    LLMMetadata,
)
from llama_index.core.llms.callbacks import llm_completion_callback
from llama_index.core import Settings


class OurLLM(CustomLLM):
    context_window: int = 3900
    num_output: int = 256
    model_name: str = "custom"
    dummy_response: str = "My response"

    @property
    def metadata(self) -> LLMMetadata:
        """Get LLM metadata."""
        return LLMMetadata(
            context_window=self.context_window,
            num_output=self.num_output,
            model_name=self.model_name,
        )

    @llm_completion_callback()
    def complete(self, prompt: str, **kwargs: Any) -> CompletionResponse:
        return CompletionResponse(text=self.dummy_response)

    @llm_completion_callback()
    def stream_complete(
        self, prompt: str, **kwargs: Any
    ) -> CompletionResponseGen:
        response = ""
        for token in self.dummy_response:
            response += token
            yield CompletionResponse(text=response, delta=token)


# define our LLM
Settings.llm = OurLLM()

# define embed model
Settings.embed_model = "local:BAAI/bge-base-en-v1.5"


# Load the your data
documents = SimpleDirectoryReader("./data").load_data()
index = SummaryIndex.from_documents(documents)

# Query and print response
query_engine = index.as_query_engine()
response = query_engine.query("<query_text>")
print(response)

使用这种方法，您可以使用任何大型语言模型。也许您有本地运行的模型，或者运行在您自己服务器上的模型。只要类已实现并返回生成的 token，它就应该可以工作。请注意，我们需要使用提示助手来定制提示大小，因为每个模型具有略微不同的上下文长度。

装饰器是可选的，但通过大型语言模型调用的回调提供可观测性。

请注意，您可能需要调整内部提示才能获得良好的性能。即便如此，您仍应使用足够大的大型语言模型，以确保它能够处理 LlamaIndex 内部使用的复杂查询，因此效果可能会有所不同。

所有默认内部提示的列表可在此处获取 here，特定于聊天的提示列于此处 here。您还可以实现您自己的定制提示。