Llama2 + 向量存储索引¶
本笔记本介绍了使用 LlamaIndex 配置 llama-2 的正确方法。具体来说,我们将重点介绍如何使用向量存储索引。
设置¶
如果你在 colab 上打开本笔记本,你可能需要安装 LlamaIndex 🦙。
输入 [ ]
已复制!
%pip install llama-index-llms-replicate
%pip install llama-index-llms-replicate
输入 [ ]
已复制!
!pip install llama-index
!pip install llama-index
密钥¶
输入 [ ]
已复制!
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
os.environ["REPLICATE_API_TOKEN"] = "YOUR_REPLICATE_TOKEN"
import os os.environ["OPENAI_API_KEY"] = "sk-..." os.environ["REPLICATE_API_TOKEN"] = "YOUR_REPLICATE_TOKEN"
加载文档,构建向量存储索引¶
输入 [ ]
已复制!
# Optional logging
# import logging
# import sys
# logging.basicConfig(stream=sys.stdout, level=logging.INFO)
# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from IPython.display import Markdown, display
# Optional logging # import logging # import sys # logging.basicConfig(stream=sys.stdout, level=logging.INFO) # logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout)) from llama_index.core import VectorStoreIndex, SimpleDirectoryReader from IPython.display import Markdown, display
输入 [ ]
已复制!
from llama_index.llms.replicate import Replicate
from llama_index.core.llms.llama_utils import (
messages_to_prompt,
completion_to_prompt,
)
# The replicate endpoint
LLAMA_13B_V2_CHAT = "a16z-infra/llama13b-v2-chat:df7690f1994d94e96ad9d568eac121aecf50684a0b0963b25a41cc40061269e5"
# inject custom system prompt into llama-2
def custom_completion_to_prompt(completion: str) -> str:
return completion_to_prompt(
completion,
system_prompt=(
"You are a Q&A assistant. Your goal is to answer questions as "
"accurately as possible is the instructions and context provided."
),
)
llm = Replicate(
model=LLAMA_13B_V2_CHAT,
temperature=0.01,
# override max tokens since it's interpreted
# as context window instead of max tokens
context_window=4096,
# override completion representation for llama 2
completion_to_prompt=custom_completion_to_prompt,
# if using llama 2 for data agents, also override the message representation
messages_to_prompt=messages_to_prompt,
)
from llama_index.llms.replicate import Replicate from llama_index.core.llms.llama_utils import ( messages_to_prompt, completion_to_prompt, ) # The replicate endpoint LLAMA_13B_V2_CHAT = "a16z-infra/llama13b-v2-chat:df7690f1994d94e96ad9d568eac121aecf50684a0b0963b25a41cc40061269e5" # inject custom system prompt into llama-2 def custom_completion_to_prompt(completion: str) -> str: return completion_to_prompt( completion, system_prompt=( "你是一个问答助手。你的目标是根据提供的指令和上下文尽可能准确地回答问题。" ), ) llm = Replicate( model=LLAMA_13B_V2_CHAT, temperature=0.01, # override max tokens since it's interpreted # as context window instead of max tokens context_window=4096, # override completion representation for llama 2 completion_to_prompt=custom_completion_to_prompt, # if using llama 2 for data agents, also override the message representation messages_to_prompt=messages_to_prompt, )
输入 [ ]
已复制!
from llama_index.core import Settings
Settings.llm = llm
from llama_index.core import Settings Settings.llm = llm
下载数据
输入 [ ]
已复制!
# load documents
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
# load documents documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
输入 [ ]
已复制!
index = VectorStoreIndex.from_documents(documents)
index = VectorStoreIndex.from_documents(documents)
查询¶
输入 [ ]
已复制!
# set Logging to DEBUG for more detailed outputs
query_engine = index.as_query_engine()
# set Logging to DEBUG for more detailed outputs query_engine = index.as_query_engine()
输入 [ ]
已复制!
response = query_engine.query("What did the author do growing up?")
display(Markdown(f"<b>{response}</b>"))
response = query_engine.query("作者在成长过程中做了什么?") display(Markdown(f"{response}"))
根据提供的上下文信息,作者在成长过程中的活动包括:
- 写短篇故事,这些故事“很糟糕”,“几乎没有任何情节”。
- 九年级时在 IBM 1401 计算机上编程,使用 Fortran 语言的早期版本。
- 制作简单的游戏,一个预测模型火箭高度的程序,以及一个给他父亲的文字处理器。
- 阅读科幻小说,例如海因莱因的《严厉的月亮》(The Moon is a Harsh Mistress),这些小说激发了他研究人工智能的兴趣。
- 住在意大利佛罗伦萨,步行穿过城市街道前往 Accademia。
请注意,这些活动在文本中提到,并非基于先验知识或假设。
流式支持¶
输入 [ ]
已复制!
query_engine = index.as_query_engine(streaming=True)
response = query_engine.query("What happened at interleaf?")
for token in response.response_gen:
print(token, end="")
query_engine = index.as_query_engine(streaming=True) response = query_engine.query("在 interleaf 发生了什么?") for token in response.response_gen: print(token, end="")
Based on the context information provided, it appears that the author worked at Interleaf, a company that made software for creating and managing documents. The author mentions that Interleaf was "on the way down" and that the company's Release Engineering group was large compared to the group that actually wrote the software. It is inferred that Interleaf was experiencing financial difficulties and that the author was nervous about money. However, there is no explicit mention of what specifically happened at Interleaf.