设置¶
如果您在 colab 上打开此 Notebook,您可能需要安装 LlamaIndex 🦙。
In [ ]
已复制!
%pip install llama-index-embeddings-huggingface
%pip install llama-index-llms-openai
%pip install llama-index-embeddings-huggingface %pip install llama-index-llms-openai
In [ ]
已复制!
!pip install llama-index
!pip install llama-index
In [ ]
已复制!
import os
import openai
os.environ["OPENAI_API_KEY"] = "sk-..."
import os import openai os.environ["OPENAI_API_KEY"] = "sk-..."
In [ ]
已复制!
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
Settings.llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0.1)
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")
from llama_index.embeddings.huggingface import HuggingFaceEmbedding from llama_index.llms.openai import OpenAI from llama_index.core import Settings Settings.llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0.1) Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")
/home/loganm/miniconda3/envs/llama-index/lib/python3.11/site-packages/torch/cuda/__init__.py:546: UserWarning: Can't initialize NVML warnings.warn("Can't initialize NVML")
下载数据
In [ ]
已复制!
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/' !wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
In [ ]
已复制!
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
from llama_index.core import SimpleDirectoryReader documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
In [ ]
已复制!
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)
from llama_index.core import VectorStoreIndex index = VectorStoreIndex.from_documents(documents)
运行查询¶
In [ ]
已复制!
from llama_index.core.postprocessor import LongContextReorder
reorder = LongContextReorder()
reorder_engine = index.as_query_engine(
node_postprocessors=[reorder], similarity_top_k=5
)
base_engine = index.as_query_engine(similarity_top_k=5)
from llama_index.core.postprocessor import LongContextReorder reorder = LongContextReorder() reorder_engine = index.as_query_engine( node_postprocessors=[reorder], similarity_top_k=5 ) base_engine = index.as_query_engine(similarity_top_k=5)
In [ ]
已复制!
from llama_index.core.response.notebook_utils import display_response
base_response = base_engine.query("Did the author meet Sam Altman?")
display_response(base_response)
from llama_index.core.response.notebook_utils import display_response base_response = base_engine.query("Did the author meet Sam Altman?") display_response(base_response)
最终回复:
是的,作者在请 Sam Altman 担任 Y Combinator 总裁时见到了他。这发生在作者攻读计算机科学博士学位并同时追求艺术热情期间。他们当时正在申请艺术学校,最终去了 RISD。
In [ ]
已复制!
reorder_response = reorder_engine.query("Did the author meet Sam Altman?")
display_response(reorder_response)
reorder_response = reorder_engine.query("Did the author meet Sam Altman?") display_response(reorder_response)
最终回复:
是的,作者在请 Sam Altman 担任 Y Combinator 总裁时见到了他。这次会面发生在作者家的一次聚会上,由共同的朋友 Jessica Livingston 介绍。Jessica 后来编辑了一本关于创业公司创始人的访谈录,作者在她在波士顿一家风投公司找工作期间,与她分享了他们对风险投资缺陷的看法。
检查排序差异¶
In [ ]
已复制!
print(base_response.get_formatted_sources())
print(base_response.get_formatted_sources())
> Source (Doc id: 81bc66bb-2c45-4697-9f08-9f848bd78b12): [17] As well as HN, I wrote all of YC's internal software in Arc. But while I continued to work ... > Source (Doc id: bd660905-e4e0-4d02-a113-e3810b59c5d1): [19] One way to get more precise about the concept of invented vs discovered is to talk about spa... > Source (Doc id: 3932e4a4-f17e-4dd2-9d25-5f0e65910dc5): Not so much because it was badly written as because the problem is so convoluted. When you're wor... > Source (Doc id: 0d801f0a-4a99-475d-aa7c-ad5d601947ea): [10] Wow, I thought, there's an audience. If I write something and put it on the web, anyone can... > Source (Doc id: bf726802-4d0d-4ee5-ab2e-ffa8a5461bc4): I was briefly tempted, but they were so slow by present standards; what was the point? No one els...
In [ ]
已复制!
print(reorder_response.get_formatted_sources())
print(reorder_response.get_formatted_sources())
> Source (Doc id: 81bc66bb-2c45-4697-9f08-9f848bd78b12): [17] As well as HN, I wrote all of YC's internal software in Arc. But while I continued to work ... > Source (Doc id: 3932e4a4-f17e-4dd2-9d25-5f0e65910dc5): Not so much because it was badly written as because the problem is so convoluted. When you're wor... > Source (Doc id: bf726802-4d0d-4ee5-ab2e-ffa8a5461bc4): I was briefly tempted, but they were so slow by present standards; what was the point? No one els... > Source (Doc id: 0d801f0a-4a99-475d-aa7c-ad5d601947ea): [10] Wow, I thought, there's an audience. If I write something and put it on the web, anyone can... > Source (Doc id: bd660905-e4e0-4d02-a113-e3810b59c5d1): [19] One way to get more precise about the concept of invented vs discovered is to talk about spa...