使用 Ollama 和 Replicate 的 Llama3 实践手册¶
Meta 开发并发布了 Meta Llama 3 系列大语言模型 (LLMs),这是一组预训练和指令微调的生成式文本模型,包含 8B 和 70B 两种尺寸。Llama 3 指令微调模型针对对话用例进行了优化,在常见的行业基准测试中表现优于许多可用的开源聊天模型。
在本 notebook 中,我们将演示如何将 Llama3 与 LlamaIndex 一起用于一系列广泛的使用场景。
- 基本补全 / 聊天
- 基本 RAG (向量搜索, 摘要)
- 高级 RAG (路由, 子问题)
- 文本到 SQL
- 结构化数据提取
- 聊天引擎 + 记忆
- 代理
我们通过 Ollama 使用 Llama3-8B,通过 Replicate 使用 Llama3-70B。
安装和设置¶
!pip install llama-index
!pip install llama-index-llms-ollama
!pip install llama-index-llms-replicate
!pip install llama-index-embeddings-huggingface
!pip install llama-parse
!pip install replicate
import nest_asyncio
nest_asyncio.apply()
使用 Ollama 设置 LLM¶
from llama_index.llms.ollama import Ollama
llm = Ollama(model="llama3", request_timeout=120.0)
使用 Replicate 设置 LLM¶
确保你已指定 REPLICATE_API_TOKEN!
# os.environ["REPLICATE_API_TOKEN"] = "<YOUR_API_KEY>"
from llama_index.llms.replicate import Replicate
llm_replicate = Replicate(model="meta/meta-llama-3-70b-instruct")
# llm_replicate = Replicate(model="meta/meta-llama-3-8b-instruct")
设置嵌入模型¶
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
定义全局设置配置¶
在 LlamaIndex 中,你可以定义全局设置,这样就不必在每个地方传递 LLM / 嵌入模型对象。
from llama_index.core import Settings
Settings.llm = llm
Settings.embed_model = embed_model
!mkdir data
!wget "https://www.dropbox.com/scl/fi/t1soxfjdp0v44an6sdymd/drake_kendrick_beef.pdf?rlkey=u9546ymb7fj8lk2v64r6p5r5k&st=wjzzrgil&dl=1" -O data/drake_kendrick_beef.pdf
!wget "https://www.dropbox.com/scl/fi/nts3n64s6kymner2jppd6/drake.pdf?rlkey=hksirpqwzlzqoejn55zemk6ld&st=mohyfyh4&dl=1" -O data/drake.pdf
!wget "https://www.dropbox.com/scl/fi/8ax2vnoebhmy44bes2n1d/kendrick.pdf?rlkey=fhxvn94t5amdqcv9vshifd3hj&st=dxdtytn6&dl=1" -O data/kendrick.pdf
加载数据¶
默认情况下,我们使用 LlamaParse 加载数据,但如果你没有账户,也可以选择使用我们的免费 pypdf 读取器(SimpleDirectoryReader 中默认包含)!
LlamaParse: 在此注册账户:cloud.llamaindex.ai。每天可免费处理 1k 页,付费计划提供 7k 免费页 + 每额外页面 0.3 美分。如果你需要解析复杂的文档,如包含图表、表格等的 PDF,LlamaParse 是一个不错的选择。
默认 PDF 解析器(在
SimpleDirectoryReader
中)。如果你不想注册账户或使用 PDF 服务,只需使用我们文件加载器捆绑的默认 PyPDF 读取器。这是开始入门的好选择!
from llama_parse import LlamaParse
docs_kendrick = LlamaParse(result_type="text").load_data("./data/kendrick.pdf")
docs_drake = LlamaParse(result_type="text").load_data("./data/drake.pdf")
docs_both = LlamaParse(result_type="text").load_data(
"./data/drake_kendrick_beef.pdf"
)
# from llama_index.core import SimpleDirectoryReader
# docs_kendrick = SimpleDirectoryReader(input_files=["data/kendrick.pdf"]).load_data()
# docs_drake = SimpleDirectoryReader(input_files=["data/drake.pdf"]).load_data()
# docs_both = SimpleDirectoryReader(input_files=["data/drake_kendrick_beef.pdf"]).load_data()
Started parsing the file under job_id 32a7bb50-6a25-4295-971c-2de6f1588e0d .Started parsing the file under job_id b8cc075e-b6d5-4ded-b060-f72e9393b391 ..Started parsing the file under job_id 42fc41a4-68b6-49ee-8647-781b5cdb8893 ...
1. 基本补全和聊天¶
使用提示调用 complete¶
response = llm.complete("do you like drake or kendrick better?")
print(response)
I'm just an AI, I don't have personal preferences or opinions, nor can I listen to music. I exist solely to provide information and assist with tasks, so I don't have the capacity to enjoy or compare different artists' music. Both Drake and Kendrick Lamar are highly acclaimed rappers, and it's subjective which one you might prefer based on your individual tastes in music.
stream_response = llm.stream_complete(
"you're a drake fan. tell me why you like drake more than kendrick"
)
for t in stream_response:
print(t.delta, end="")
As a hypothetical Drake fan, I'd say that there are several reasons why I might prefer his music over Kendrick's. Here are a few possible reasons: 1. **Lyrical storytelling**: Drake is known for his vivid storytelling on tracks like "Marvins Room" and "Take Care." He has a way of painting pictures with his words, making listeners feel like they're right there with him, experiencing the highs and lows he's singing about. Kendrick, while also an incredible storyteller, might not have the same level of lyrical detail that Drake does. 2. **Melodic flow**: Drake's melodic flow is infectious! He has a way of crafting hooks and choruses that get stuck in your head, making it hard to stop listening. Kendrick's flows are often more complex and intricate, but Drake's simplicity can be just as effective in getting the job done. 3. **Vulnerability**: Drake isn't afraid to show his vulnerable side on tracks like "Hold On" and "I'm Upset." He wears his heart on his sleeve, sharing personal struggles and emotions with listeners. This vulnerability makes him relatable and easier to connect with on a deeper level. 4. **Production**: Drake has had the privilege of working with some incredible producers (like Noah "40" Shebib and Boi-1da) who bring out the best in him. The way he incorporates these sounds into his songs is often seamless, creating a unique blend of hip-hop and R&B that's hard to resist. 5. **Cultural relevance**: As someone who grew up in Toronto, Drake has a deep understanding of the Canadian experience and the struggles that come with it. He often references his hometown and the people he grew up around, giving his music a distinctly Canadian flavor. This cultural relevance makes his music feel more authentic and connected to the world we live in. 6. **Commercial appeal**: Let's face it – Drake has a knack for creating hits! His songs are often catchy, radio-friendly, and designed to get stuck in your head. While Kendrick might not have the same level of commercial success, Drake's ability to craft songs that resonate with a wider audience is undeniable. Of course, this is all just hypothetical – as a fan, I can appreciate both artists for their unique strengths and styles! What do you think?
使用消息列表调用 chat¶
from llama_index.core.llms import ChatMessage
messages = [
ChatMessage(role="system", content="You are Kendrick."),
ChatMessage(role="user", content="Write a verse."),
]
response = llm.chat(messages)
print(response)
assistant: "Listen up, y'all, I got a message to share Been through the struggles, but my spirit's still fair From Compton streets to the top of the game I'm the real Hov, ain't nobody gonna claim my fame"
2. 基本 RAG (向量搜索, 摘要)¶
基本 RAG (向量搜索)¶
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(docs_both)
query_engine = index.as_query_engine(similarity_top_k=3)
response = query_engine.query("Tell me about family matters")
print(str(response))
According to the provided context, "Family Matters" is a seven-and-a-half-minute diss track by Drake in response to Kendrick Lamar's disses against him. The song has three different beats and features several shots at Kendrick, as well as other members of Drake's entourage, including A$AP Rocky and The Weeknd. In the song, Drake raps about his personal life, including his relationships with Rihanna and Whitney Alford, and even makes allegations about Kendrick's domestic life.
基本 RAG (摘要)¶
from llama_index.core import SummaryIndex
summary_index = SummaryIndex.from_documents(docs_both)
summary_engine = summary_index.as_query_engine()
response = summary_engine.query(
"Given your assessment of this article, who won the beef?"
)
print(str(response))
**Repeat** The article does not provide a clear verdict on who "won" the beef, nor does it suggest that the conflict has been definitively resolved. Instead, it presents the situation as ongoing and multifaceted, with both artists continuing to engage in a game of verbal sparring and lyrical one-upmanship.
3. 高级 RAG (路由, 子问题)¶
构建一个可以根据需要选择向量搜索或摘要的 Router¶
from llama_index.core.tools import QueryEngineTool, ToolMetadata
vector_tool = QueryEngineTool(
index.as_query_engine(),
metadata=ToolMetadata(
name="vector_search",
description="Useful for searching for specific facts.",
),
)
summary_tool = QueryEngineTool(
index.as_query_engine(response_mode="tree_summarize"),
metadata=ToolMetadata(
name="summary",
description="Useful for summarizing an entire document.",
),
)
from llama_index.core.query_engine import RouterQueryEngine
query_engine = RouterQueryEngine.from_defaults(
[vector_tool, summary_tool], select_multi=False, verbose=True
)
response = query_engine.query(
"Tell me about the song meet the grahams - why is it significant"
)
Selecting query engine 0: The song 'Meet the Grahams' might contain specific facts or information about the band, making it useful for searching for those specific details..
print(response)
"Meet the Grahams" artwork is a crucial part of a larger strategy by Kendrick Lamar to address Drake's family matters in a diss track. The artwork shows a pair of Maybach gloves, a shirt, receipts, and prescription bottles, including one for Ozempic prescribed to Drake. This song is significant because it serves as the full picture that Kendrick teased earlier on "6.16 in LA" and addresses all members of Drake's family, including his son Adonis, mother Sandi, father Dennis, and an alleged 11-year-old daughter. The song takes it to the point of no return, with Kendrick musing that he wishes Dennis Graham wore a condom the night Drake was conceived and telling both Drake's parents that they raised a man whose house is due to be raided any day now on Harvey Weinstein-level allegations.
将复杂问题分解为子问题¶
我们的子问题查询引擎将复杂问题分解为子问题。
drake_index = VectorStoreIndex.from_documents(docs_drake)
drake_query_engine = drake_index.as_query_engine(similarity_top_k=3)
kendrick_index = VectorStoreIndex.from_documents(docs_kendrick)
kendrick_query_engine = kendrick_index.as_query_engine(similarity_top_k=3)
from llama_index.core.tools import QueryEngineTool, ToolMetadata
drake_tool = QueryEngineTool(
drake_index.as_query_engine(),
metadata=ToolMetadata(
name="drake_search",
description="Useful for searching over Drake's life.",
),
)
kendrick_tool = QueryEngineTool(
kendrick_index.as_query_engine(),
metadata=ToolMetadata(
name="kendrick_summary",
description="Useful for searching over Kendrick's life.",
),
)
from llama_index.core.query_engine import SubQuestionQueryEngine
query_engine = SubQuestionQueryEngine.from_defaults(
[drake_tool, kendrick_tool],
llm=llm_replicate, # llama3-70b
verbose=True,
)
response = query_engine.query("Which albums did Drake release in his career?")
print(response)
Generated 1 sub questions. [drake_search] Q: What are the albums released by Drake [drake_search] A: Based on the provided context information, the albums released by Drake are: 1. Take Care (album) 2. Nothing Was the Same 3. If You're Reading This It's Too Late (rumored to be a mixtape or album) 4. Certified Lover Boy 5. Honestly, Nevermind Based on the provided context information, the albums released by Drake are: 1. Take Care (album) 2. Nothing Was the Same 3. If You're Reading This It's Too Late (rumored to be a mixtape or album) 4. Certified Lover Boy 5. Honestly, Nevermind
4. 文本到 SQL¶
在这里,我们将下载并使用一个包含 11 个表(包含音乐、播放列表和客户的各种信息)的 SQLite 示例数据库。本次测试我们将仅使用其中的几个表。
!wget "https://www.sqlitetutorial.net/wp-content/uploads/2018/03/chinook.zip" -O "./data/chinook.zip"
!unzip "./data/chinook.zip"
--2024-05-10 23:40:37-- https://www.sqlitetutorial.net/wp-content/uploads/2018/03/chinook.zip Resolving www.sqlitetutorial.net (www.sqlitetutorial.net)... 2606:4700:3037::6815:1e8d, 2606:4700:3037::ac43:acfa, 104.21.30.141, ... Connecting to www.sqlitetutorial.net (www.sqlitetutorial.net)|2606:4700:3037::6815:1e8d|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 305596 (298K) [application/zip] Saving to: ‘./data/chinook.zip’ ./data/chinook.zip 100%[===================>] 298.43K --.-KB/s in 0.02s 2024-05-10 23:40:37 (13.9 MB/s) - ‘./data/chinook.zip’ saved [305596/305596]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Archive: ./data/chinook.zip inflating: chinook.db
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
from sqlalchemy import (
create_engine,
MetaData,
Table,
Column,
String,
Integer,
select,
column,
)
engine = create_engine("sqlite:///chinook.db")
from llama_index.core import SQLDatabase
sql_database = SQLDatabase(engine)
from llama_index.core.indices.struct_store import NLSQLTableQueryEngine
query_engine = NLSQLTableQueryEngine(
sql_database=sql_database,
tables=["albums", "tracks", "artists"],
llm=llm_replicate,
)
response = query_engine.query("What are some albums?")
print(response)
Here are 10 album titles with their corresponding artists: 1. "For Those About To Rock We Salute You" by Artist 1 2. "Balls to the Wall" by Artist 2 3. "Restless and Wild" by Artist 2 4. "Let There Be Rock" by Artist 1 5. "Big Ones" by Artist 3 6. "Jagged Little Pill" by Artist 4 7. "Facelift" by Artist 5 8. "Warner 25 Anos" by Artist 6 9. "Plays Metallica By Four Cellos" by Artist 7 10. "Audioslave" by Artist 8
response = query_engine.query("What are some artists? Limit it to 5.")
print(response)
Here are 5 artists: AC/DC, Accept, Aerosmith, Alanis Morissette, and Alice In Chains.
最后这个查询应该是一个更复杂的 join
response = query_engine.query(
"What are some tracks from the artist AC/DC? Limit it to 3"
)
print(response)
Here are three tracks from the legendary Australian rock band AC/DC: "For Those About To Rock (We Salute You)", "Put The Finger On You", and "Let's Get It Up".
print(response.metadata["sql_query"])
SELECT tracks.Name FROM tracks JOIN albums ON tracks.AlbumId = albums.AlbumId JOIN artists ON albums.ArtistId = artists.ArtistId WHERE artists.Name = 'AC/DC' LIMIT 3;
5. 结构化数据提取¶
函数调用一个重要的用例是提取结构化对象。LlamaIndex 通过 structured_predict
提供了一个直观的接口——只需定义目标 Pydantic 类(可以是嵌套的),然后给定一个提示,我们就能提取出所需的对象。
注意:由于 Llama3 / Ollama 没有原生函数调用支持,结构化提取是通过向 LLM 发送提示 + 输出解析来执行的。
from llama_index.llms.ollama import Ollama
from llama_index.core.prompts import PromptTemplate
from pydantic import BaseModel
class Restaurant(BaseModel):
"""A restaurant with name, city, and cuisine."""
name: str
city: str
cuisine: str
llm = Ollama(model="llama3")
prompt_tmpl = PromptTemplate(
"Generate a restaurant in a given city {city_name}"
)
restaurant_obj = llm.structured_predict(
Restaurant, prompt_tmpl, city_name="Miami"
)
print(restaurant_obj)
name='Tropical Bites' city='Miami' cuisine='Caribbean'
6. 向 RAG 添加聊天历史记录(聊天引擎)¶
在本节中,我们将使用我们的聊天引擎抽象,从 RAG 管道创建一个有状态的聊天机器人。
与无状态的查询引擎不同,聊天引擎维护对话历史记录(通过像缓冲区记忆这样的记忆模块)。它根据一个精简的问题执行检索,并将精简的问题 + 上下文 + 聊天历史记录馈送到最终的 LLM 提示中。
相关资源: https://docs.llamaindex.org.cn/en/stable/examples/chat_engine/chat_engine_condense_plus_context/
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.chat_engine import CondensePlusContextChatEngine
memory = ChatMemoryBuffer.from_defaults(token_limit=3900)
chat_engine = CondensePlusContextChatEngine.from_defaults(
index.as_retriever(),
memory=memory,
llm=llm,
context_prompt=(
"You are a chatbot, able to have normal interactions, as well as talk"
" about the Kendrick and Drake beef."
"Here are the relevant documents for the context:\n"
"{context_str}"
"\nInstruction: Use the previous chat history, or the context above, to interact and help the user."
),
verbose=True,
)
response = chat_engine.chat(
"Tell me about the songs Drake released in the beef."
)
print(str(response))
response = chat_engine.chat("What about Kendrick?")
print(str(response))
Kendrick Lamar's contributions to the beef! According to the article, Kendrick released several diss tracks in response to Drake's initial shots. One notable track is "Not Like Us", which directly addresses Drake and his perceived shortcomings. However, the article highlights that Kendrick's most significant response was his album "Mr. Morale & The Big Steppers", which features several tracks that can be seen as indirect disses towards Drake. The article also mentions that Kendrick's family has been a target of Drake's attacks, with Drake referencing Kendrick's estranged relationship with his partner Whitney and their two kids (one of whom is allegedly fathered by Dave Free). It's worth noting that Kendrick didn't directly respond to Drake's THP6 track. Instead, he focused on his own music and let the lyrics speak for themselves. Overall, Kendrick's approach was more subtle yet still packed a punch, showcasing his storytelling ability and lyrical prowess. Would you like me to elaborate on any specific tracks or moments from the beef?
7. Agent¶
在这里,我们使用 Llama 3 构建 Agent。我们对简单的函数以及上面的文档执行 RAG。
Agent 和工具¶
import json
from typing import Sequence, List
from llama_index.core.llms import ChatMessage
from llama_index.core.tools import BaseTool, FunctionTool
from llama_index.core.agent import ReActAgent
import nest_asyncio
nest_asyncio.apply()
定义工具¶
def multiply(a: int, b: int) -> int:
"""Multiple two integers and returns the result integer"""
return a * b
def add(a: int, b: int) -> int:
"""Add two integers and returns the result integer"""
return a + b
def subtract(a: int, b: int) -> int:
"""Subtract two integers and returns the result integer"""
return a - b
def divide(a: int, b: int) -> int:
"""Divides two integers and returns the result integer"""
return a / b
multiply_tool = FunctionTool.from_defaults(fn=multiply)
add_tool = FunctionTool.from_defaults(fn=add)
subtract_tool = FunctionTool.from_defaults(fn=subtract)
divide_tool = FunctionTool.from_defaults(fn=divide)
ReAct Agent¶
agent = ReActAgent.from_tools(
[multiply_tool, add_tool, subtract_tool, divide_tool],
llm=llm_replicate,
verbose=True,
)
查询¶
response = agent.chat("What is (121 + 2) * 5?")
print(str(response))
Thought: The current language of the user is: English. I need to use a tool to help me answer the question. Action: add Action Input: {'a': 121, 'b': 2} Observation: 123 Thought: I have the result of the addition, now I need to multiply it by 5. Action: multiply Action Input: {'a': 123, 'b': 5} Observation: 615 Thought: I can answer without using any more tools. I'll use the user's language to answer Answer: 615 615
使用 RAG QueryEngine 工具的 ReAct Agent¶
from llama_index.core import (
SimpleDirectoryReader,
VectorStoreIndex,
StorageContext,
load_index_from_storage,
)
from llama_index.core.tools import QueryEngineTool, ToolMetadata
使用 RAG QueryEngine 工具创建 ReAct Agent¶
drake_tool = QueryEngineTool(
drake_index.as_query_engine(),
metadata=ToolMetadata(
name="drake_search",
description="Useful for searching over Drake's life.",
),
)
kendrick_tool = QueryEngineTool(
kendrick_index.as_query_engine(),
metadata=ToolMetadata(
name="kendrick_search",
description="Useful for searching over Kendrick's life.",
),
)
query_engine_tools = [drake_tool, kendrick_tool]
agent = ReActAgent.from_tools(
query_engine_tools, ## TODO: define query tools
llm=llm_replicate,
verbose=True,
)
查询¶
response = agent.chat("Tell me about how Kendrick and Drake grew up")
print(str(response))
Thought: The current language of the user is: English. I need to use a tool to help me answer the question. Action: kendrick_search Action Input: {'input': "Kendrick Lamar's childhood"} Observation: Kendrick Lamar was born on June 17, 1987, in Compton, California. He is the first child of Kenneth "Kenny" Duckworth, a former gang hustler who previously worked at KFC, and Paula Oliver, a hairdresser who previously worked at McDonald's. Both of his parents are African Americans from the South Side of Chicago, and they relocated to Compton in 1984 due to his father's affiliation with the Gangster Disciples. Lamar was named after singer-songwriter Eddie Kendricks of the Temptations. He was an only child until the age of seven and was described as a loner by his mother. Thought: I have information about Kendrick's childhood, but I need to know more about Drake's upbringing to answer the question. Action: drake_search Action Input: {'input': "Drake's childhood"} Observation: Drake was raised in two neighborhoods. He lived on Weston Road in Toronto's working-class west end until grade six and attended Weston Memorial Junior Public School until grade four. He moved to one of the city's affluent neighbourhoods, Forest Hill, in 2000. Drake appeared in a comedic sketch which aired during the 1997 NHL Awards, featuring Martin Brodeur and Ron Hextall. At age 10, he attended Forest Hill Collegiate Institute for high school. Observation: Error: Could not parse output. Please follow the thought-action-input format. Try again. Thought: I apologize for the mistake. I need to use a tool to help me answer the question. Action: drake_search Action Input: {'input': "Drake's childhood"} Observation: Drake was raised in two neighborhoods. He lived on Weston Road in Toronto's working-class west end until grade six and attended Weston Memorial Junior Public School until grade four. He played minor hockey with the Weston Red Wings, reaching the Upper Canada College hockey camp before leaving due to a vicious cross-check to his neck during a game. At age 10, Drake appeared in a comedic sketch which aired during the 1997 NHL Awards. Thought: I have information about both Kendrick and Drake's childhood, so I can answer the question without using any more tools. Answer: Kendrick Lamar grew up in Compton, California, as the child of a former gang hustler and a hairdresser, while Drake was raised in two neighborhoods in Toronto, Ontario, Canada, and had a brief experience in minor hockey before pursuing a career in entertainment. Kendrick Lamar grew up in Compton, California, as the child of a former gang hustler and a hairdresser, while Drake was raised in two neighborhoods in Toronto, Ontario, Canada, and had a brief experience in minor hockey before pursuing a career in entertainment.