构建带行内引用的 RAG¶

本 notebook 演示了如何使用 Workflows 实现带源节点行内引用的 RAG。

具体来说，我们将实现 CitationQueryEngine，它在生成的响应中提供基于节点的行内引用。

In [ ]

Copied!

!pip install -U llama-index
!pip install -U llama-index

In [ ]

Copied!

import os

os.environ["OPENAI_API_KEY"] = "sk-..."
import os os.environ["OPENAI_API_KEY"] = "sk-..."

下载数据¶

In [ ]

Copied!

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/' !wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

--2024-08-15 00:23:50--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8000::154, 2606:50c0:8001::154, 2606:50c0:8002::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8000::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: ‘data/paul_graham/paul_graham_essay.txt’

data/paul_graham/pa 100%[===================>]  73.28K  --.-KB/s    in 0.01s   

2024-08-15 00:23:50 (5.27 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]

由于工作流首先是异步的，因此在 notebook 中运行良好。如果您在自己的代码中运行，并且没有异步事件循环正在运行，则需要使用 asyncio.run() 来启动异步事件循环。

async def main():
    <async code>

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

设计工作流¶

CitationQueryEngine 包含一些清晰定义的步骤

索引数据，创建索引
使用该索引 + 查询检索相关节点
向检索到的节点添加引用。
合成最终响应

考虑到这一点，我们可以创建事件和工作流步骤来遵循此过程！

工作流事件¶

为了处理这些步骤，我们需要定义一些事件

一个事件将检索到的节点传递给创建引用的步骤
一个事件将引用的节点传递给合成器

其他步骤将使用内置的 StartEvent 和 StopEvent 事件。

In [ ]

Copied!

from llama_index.core.workflow import Event
from llama_index.core.schema import NodeWithScore

class RetrieverEvent(Event):
    """Result of running retrieval"""

    nodes: list[NodeWithScore]

class CreateCitationsEvent(Event):
    """Add citations to the nodes."""

    nodes: list[NodeWithScore]
from llama_index.core.workflow import Event from llama_index.core.schema import NodeWithScore class RetrieverEvent(Event): """检索运行结果""" nodes: list[NodeWithScore] class CreateCitationsEvent(Event): """向节点添加引用。""" nodes: list[NodeWithScore]

引用提示模板¶

这里我们定义了默认的 CITATION_QA_TEMPLATE、CITATION_REFINE_TEMPLATE、DEFAULT_CITATION_CHUNK_SIZE 和 DEFAULT_CITATION_CHUNK_OVERLAP。

In [ ]

Copied!





from llama_index.core.prompts import PromptTemplate

CITATION_QA_TEMPLATE = PromptTemplate(
    "Please provide an answer based solely on the provided sources. "
    "When referencing information from a source, "
    "cite the appropriate source(s) using their corresponding numbers. "
    "Every answer should include at least one source citation. "
    "Only cite a source when you are explicitly referencing it. "
    "If none of the sources are helpful, you should indicate that. "
    "For example:\n"
    "Source 1:\n"
    "The sky is red in the evening and blue in the morning.\n"
    "Source 2:\n"
    "Water is wet when the sky is red.\n"
    "Query: When is water wet?\n"
    "Answer: Water will be wet when the sky is red [2], "
    "which occurs in the evening [1].\n"
    "Now it's your turn. Below are several numbered sources of information:"
    "\n------\n"
    "{context_str}"
    "\n------\n"
    "Query: {query_str}\n"
    "Answer: "
)

CITATION_REFINE_TEMPLATE = PromptTemplate(
    "Please provide an answer based solely on the provided sources. "
    "When referencing information from a source, "
    "cite the appropriate source(s) using their corresponding numbers. "
    "Every answer should include at least one source citation. "
    "Only cite a source when you are explicitly referencing it. "
    "If none of the sources are helpful, you should indicate that. "
    "For example:\n"
    "Source 1:\n"
    "The sky is red in the evening and blue in the morning.\n"
    "Source 2:\n"
    "Water is wet when the sky is red.\n"
    "Query: When is water wet?\n"
    "Answer: Water will be wet when the sky is red [2], "
    "which occurs in the evening [1].\n"
    "Now it's your turn. "
    "We have provided an existing answer: {existing_answer}"
    "Below are several numbered sources of information. "
    "Use them to refine the existing answer. "
    "If the provided sources are not helpful, you will repeat the existing answer."
    "\nBegin refining!"
    "\n------\n"
    "{context_msg}"
    "\n------\n"
    "Query: {query_str}\n"
    "Answer: "
)

DEFAULT_CITATION_CHUNK_SIZE = 512
DEFAULT_CITATION_CHUNK_OVERLAP = 20
from llama_index.core.prompts import PromptTemplate CITATION_QA_TEMPLATE = PromptTemplate( "请仅根据提供的来源提供答案。 " "在引用来源信息时，" "使用其对应的编号引用相应的来源。 " "每个答案都应包含至少一个来源引用。 " "只有明确引用某个来源时才引用它。 " "如果所有来源都没有帮助，您应该指明这一点。 " "例如：\n" "来源 1：\n" "傍晚时天空是红色的，早晨是蓝色的。\n" "来源 2：\n" "当天空是红色时水是湿的。\n" "查询：水什么时候是湿的？\n" "答案：当天空是红色时水会是湿的 [2]，" "这发生在傍晚 [1]。\n" "现在轮到你了。以下是几个编号的信息来源：" "\n------\n" "{context_str}" "\n------\n" "查询：{query_str}\n" "答案： " ) CITATION_REFINE_TEMPLATE = PromptTemplate( "请仅根据提供的来源提供答案。 " "在引用来源信息时，" "使用其对应的编号引用相应的来源。 " "每个答案都应包含至少一个来源引用。 " "只有明确引用某个来源时才引用它。 " "如果所有来源都没有帮助，您应该指明这一点。 " "例如：\n" "来源 1：\n" "傍晚时天空是红色的，早晨是蓝色的。\n" "来源 2：\n" "当天空是红色时水是湿的。\n" "查询：水什么时候是湿的？\n" "答案：当天空是红色时水会是湿的 [2]，" "这发生在傍晚 [1]。\n" "现在轮到你了。 " "我们提供了一个现有答案：{existing_answer}" "以下是几个编号的信息来源。 " "使用它们来改进现有答案。 " "如果提供的来源没有帮助，您将重复现有答案。" "\n开始改进！" "\n------\n" "{context_msg}" "\n------\n" "查询：{query_str}\n" "答案： " ) DEFAULT_CITATION_CHUNK_SIZE = 512 DEFAULT_CITATION_CHUNK_OVERLAP = 20

工作流本身¶

定义好事件后，我们可以构建工作流和步骤了。

请注意，工作流会使用类型注解自动验证自身，因此我们步骤中的类型注解非常有用！

In [ ]

Copied!





from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.workflow import (
    Context,
    Workflow,
    StartEvent,
    StopEvent,
    step,
)

from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

from llama_index.core.schema import (
    MetadataMode,
    NodeWithScore,
    TextNode,
)

from llama_index.core.response_synthesizers import (
    ResponseMode,
    get_response_synthesizer,
)

from typing import Union, List
from llama_index.core.node_parser import SentenceSplitter


class CitationQueryEngineWorkflow(Workflow):
    @step
    async def retrieve(
        self, ctx: Context, ev: StartEvent
    ) -> Union[RetrieverEvent, None]:
        "Entry point for RAG, triggered by a StartEvent with `query`."
        query = ev.get("query")
        if not query:
            return None

        print(f"Query the database with: {query}")

        # store the query in the global context
        await ctx.set("query", query)

        if ev.index is None:
            print("Index is empty, load some documents before querying!")
            return None

        retriever = ev.index.as_retriever(similarity_top_k=2)
        nodes = retriever.retrieve(query)
        print(f"Retrieved {len(nodes)} nodes.")
        return RetrieverEvent(nodes=nodes)

    @step
    async def create_citation_nodes(
        self, ev: RetrieverEvent
    ) -> CreateCitationsEvent:
        """
        Modify retrieved nodes to create granular sources for citations.

        Takes a list of NodeWithScore objects and splits their content
        into smaller chunks, creating new NodeWithScore objects for each chunk.
        Each new node is labeled as a numbered source, allowing for more precise
        citation in query results.

        Args:
            nodes (List[NodeWithScore]): A list of NodeWithScore objects to be processed.

        Returns:
            List[NodeWithScore]: A new list of NodeWithScore objects, where each object
            represents a smaller chunk of the original nodes, labeled as a source.
        """
        nodes = ev.nodes

        new_nodes: List[NodeWithScore] = []

        text_splitter = SentenceSplitter(
            chunk_size=DEFAULT_CITATION_CHUNK_SIZE,
            chunk_overlap=DEFAULT_CITATION_CHUNK_OVERLAP,
        )

        for node in nodes:
            text_chunks = text_splitter.split_text(
                node.node.get_content(metadata_mode=MetadataMode.NONE)
            )

            for text_chunk in text_chunks:
                text = f"Source {len(new_nodes)+1}:\n{text_chunk}\n"

                new_node = NodeWithScore(
                    node=TextNode.parse_obj(node.node), score=node.score
                )
                new_node.node.text = text
                new_nodes.append(new_node)
        return CreateCitationsEvent(nodes=new_nodes)

    @step
    async def synthesize(
        self, ctx: Context, ev: CreateCitationsEvent
    ) -> StopEvent:
        """Return a streaming response using the retrieved nodes."""
        llm = OpenAI(model="gpt-4o-mini")
        query = await ctx.get("query", default=None)

        synthesizer = get_response_synthesizer(
            llm=llm,
            text_qa_template=CITATION_QA_TEMPLATE,
            refine_template=CITATION_REFINE_TEMPLATE,
            response_mode=ResponseMode.COMPACT,
            use_async=True,
        )

        response = await synthesizer.asynthesize(query, nodes=ev.nodes)
        return StopEvent(result=response)
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex from llama_index.core.workflow import ( Context, Workflow, StartEvent, StopEvent, step, ) from llama_index.llms.openai import OpenAI from llama_index.embeddings.openai import OpenAIEmbedding from llama_index.core.schema import ( MetadataMode, NodeWithScore, TextNode, ) from llama_index.core.response_synthesizers import ( ResponseMode, get_response_synthesizer, ) from typing import Union, List from llama_index.core.node_parser import SentenceSplitter class CitationQueryEngineWorkflow(Workflow): @step async def retrieve( self, ctx: Context, ev: StartEvent ) -> Union[RetrieverEvent, None]: "RAG 的入口点，由带有 `query` 的 StartEvent 触发。" query = ev.get("query") if not query: return None print(f"使用以下查询数据库: {query}") # 在全局上下文存储查询 await ctx.set("query", query) if ev.index is None: print("索引为空，请在查询前加载一些文档！") return None retriever = ev.index.as_retriever(similarity_top_k=2) nodes = retriever.retrieve(query) print(f"检索到 {len(nodes)} 个节点。") return RetrieverEvent(nodes=nodes) @step async def create_citation_nodes( self, ev: RetrieverEvent ) -> CreateCitationsEvent: """ 修改检索到的节点以创建用于引用的更细粒度的来源。 接受一个 NodeWithScore 对象列表，并将其内容分割成更小的块， 为每个块创建新的 NodeWithScore 对象。 每个新节点都标有一个编号的来源，以便在查询结果中进行更精确的引用。 参数： nodes (List[NodeWithScore])：要处理的 NodeWithScore 对象列表。 返回： List[NodeWithScore]：新的 NodeWithScore 对象列表， 其中每个对象代表原始节点的更小块，并标记为来源。 """ nodes = ev.nodes new_nodes: List[NodeWithScore] = [] text_splitter = SentenceSplitter( chunk_size=DEFAULT_CITATION_CHUNK_SIZE, chunk_overlap=DEFAULT_CITATION_CHUNK_OVERLAP, ) for node in nodes: text_chunks = text_splitter.split_text( node.node.get_content(metadata_mode=MetadataMode.NONE) ) for text_chunk in text_chunks: text = f"Source {len(new_nodes)+1}:\n{text_chunk}\n" new_node = NodeWithScore( node=TextNode.parse_obj(node.node), score=node.score ) new_node.node.text = text new_nodes.append(new_node) return CreateCitationsEvent(nodes=new_nodes) @step async def synthesize( self, ctx: Context, ev: CreateCitationsEvent ) -> StopEvent: """使用检索到的节点返回流式响应。""" llm = OpenAI(model="gpt-4o-mini") query = await ctx.get("query", default=None) synthesizer = get_response_synthesizer( llm=llm, text_qa_template=CITATION_QA_TEMPLATE, refine_template=CITATION_REFINE_TEMPLATE, response_mode=ResponseMode.COMPACT, use_async=True, ) response = await synthesizer.asynthesize(query, nodes=ev.nodes) return StopEvent(result=response)

就是这样！让我们稍微探索一下我们编写的工作流。

我们有一个入口点（接受 StartEvent 的步骤）
工作流上下文用于存储用户查询
检索节点，创建引用，最后返回响应

创建索引¶

In [ ]

Copied!

documents = SimpleDirectoryReader("data/paul_graham").load_data()
index = VectorStoreIndex.from_documents(
    documents=documents,
    embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"),
)
documents = SimpleDirectoryReader("data/paul_graham").load_data() index = VectorStoreIndex.from_documents( documents=documents, embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"), )

运行工作流！¶

In [ ]

Copied!

w = CitationQueryEngineWorkflow()
w = CitationQueryEngineWorkflow()

In [ ]

Copied!

# Run a query
result = await w.run(query="What information do you have", index=index)
# 运行查询 result = await w.run(query="What information do you have", index=index)

Query the database with: What information do you have
Retrieved 2 nodes.

In [ ]

Copied!

from IPython.display import Markdown, display

display(Markdown(f"{result}"))
from IPython.display import Markdown, display display(Markdown(f"{result}"))

提供的来源包含 Paul Graham 关于编程、写作及其教育经历的各种见解和想法。例如，他回忆了早期在 IBM 1401 上编程的经历，当时由于技术的限制，他很难创建有意义的程序 [2]。他还描述了转向使用微型计算机的过程，这使得交互式编程体验更加便捷 [3]。此外，Graham 分享了他大学期间对哲学的初步兴趣，但他后来发现与人工智能和编程领域相比，哲学不那么吸引人 [3]。总体而言，这些来源突显了他作为作家和程序员的演变，以及他不断变化的学术兴趣。

检查引用。¶

In [ ]

Copied!

print(result.source_nodes[0].node.get_text())
print(result.source_nodes[0].node.get_text())

Source 1:
But after Heroku got bought we had enough money to go back to being self-funded.

[15] I've never liked the term "deal flow," because it implies that the number of new startups at any given time is fixed. This is not only false, but it's the purpose of YC to falsify it, by causing startups to be founded that would not otherwise have existed.

[16] She reports that they were all different shapes and sizes, because there was a run on air conditioners and she had to get whatever she could, but that they were all heavier than she could carry now.

[17] Another problem with HN was a bizarre edge case that occurs when you both write essays and run a forum. When you run a forum, you're assumed to see if not every conversation, at least every conversation involving you. And when you write essays, people post highly imaginative misinterpretations of them on forums. Individually these two phenomena are tedious but bearable, but the combination is disastrous. You actually have to respond to the misinterpretations, because the assumption that you're present in the conversation means that not responding to any sufficiently upvoted misinterpretation reads as a tacit admission that it's correct. But that in turn encourages more; anyone who wants to pick a fight with you senses that now is their chance.

[18] The worst thing about leaving YC was not working with Jessica anymore. We'd been working on YC almost the whole time we'd known each other, and we'd neither tried nor wanted to separate it from our personal lives, so leaving was like pulling up a deeply rooted tree.

[19] One way to get more precise about the concept of invented vs discovered is to talk about space aliens. Any sufficiently advanced alien civilization would certainly know about the Pythagorean theorem, for example. I believe, though with less certainty, that they would also know about the Lisp in McCarthy's 1960 paper.

But if so there's no reason to suppose that this is the limit of the language that might be known to them. Presumably aliens need numbers and errors and I/O too. So it seems likely there exists at least one path out of McCarthy's Lisp along which discoveredness is preserved.

Thanks to Trevor Blackwell, John Collison, Patrick Collison, Daniel Gackle, Ralph Hazell, Jessica Livingston, Robert Morris, and Harj Taggar for reading drafts of this.

In [ ]

Copied!

print(result.source_nodes[1].node.get_text())
print(result.source_nodes[1].node.get_text())

Source 2:
What I Worked On

February 2021

Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.

The first programs I tried writing were on the IBM 1401 that our school district used for what was then called "data processing." This was in 9th grade, so I was 13 or 14. The school district's 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was like a mini Bond villain's lair down there, with all these alien-looking machines — CPU, disk drives, printer, card reader — sitting up on a raised floor under bright fluorescent lights.

The language we used was an early version of Fortran. You had to type programs on punch cards, then stack them in the card reader and press a button to load the program into memory and run it. The result would ordinarily be to print something on the spectacularly loud printer.

I was puzzled by the 1401. I couldn't figure out what to do with it. And in retrospect there's not much I could have done with it. The only form of input to programs was data stored on punched cards, and I didn't have any data stored on punched cards. The only other option was to do things that didn't rely on any input, like calculate approximations of pi, but I didn't know enough math to do anything interesting of that type. So I'm not surprised I can't remember any programs I wrote, because they can't have done much. My clearest memory is of the moment I learned it was possible for programs not to terminate, when one of mine didn't. On a machine without time-sharing, this was a social as well as a technical error, as the data center manager's expression made clear.

With microcomputers, everything changed. Now you could have a computer sitting right in front of you, on a desk, that could respond to your keystrokes as it was running instead of just churning through a stack of punch cards and then stopping.