Elasticsearch¶
Elasticsearch 是一种搜索引擎数据库,支持全文搜索和向量搜索。
基本示例¶
在这个基本示例中,我们使用了 Paul Graham 的一篇散文,将其分割成块,使用一个开源嵌入模型进行嵌入,将其加载到 Elasticsearch 中,然后进行查询。有关使用不同检索策略的示例,请参阅 Elasticsearch 向量存储。
如果你在 Colab 上打开这个 Notebook,你可能需要安装 LlamaIndex 🦙。
In [ ]
已复制!
%pip install -qU llama-index-vector-stores-elasticsearch llama-index-embeddings-huggingface llama-index
%pip install -qU llama-index-vector-stores-elasticsearch llama-index-embeddings-huggingface llama-index
In [ ]
已复制!
# import
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.elasticsearch import ElasticsearchStore
from llama_index.core import StorageContext
# 导入 from llama_index.core import VectorStoreIndex, SimpleDirectoryReader from llama_index.vector_stores.elasticsearch import ElasticsearchStore from llama_index.core import StorageContext
In [ ]
已复制!
# set up OpenAI
import os
import getpass
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
# 设置 OpenAI import os import getpass os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
下载数据
In [ ]
已复制!
!mkdir -p 'data/paul_graham/'
!wget -nv 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/' !wget -nv 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
2024-05-13 15:10:43 URL:https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt [75042/75042] -> "data/paul_graham/paul_graham_essay.txt" [1]
In [ ]
已复制!
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings
# define embedding function
Settings.embed_model = HuggingFaceEmbedding(
model_name="BAAI/bge-small-en-v1.5"
)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding from llama_index.core import Settings # 定义嵌入函数 Settings.embed_model = HuggingFaceEmbedding( model_name="BAAI/bge-small-en-v1.5" )
In [ ]
已复制!
# load documents
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
# define index
vector_store = ElasticsearchStore(
es_url="http://localhost:9200", # see Elasticsearch Vector Store for more authentication options
index_name="paul_graham_essay",
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context
)
# 加载文档 documents = SimpleDirectoryReader("./data/paul_graham/").load_data() # 定义索引 vector_store = ElasticsearchStore( es_url="http://localhost:9200", # 更多认证选项请参阅 Elasticsearch 向量存储 index_name="paul_graham_essay", ) storage_context = StorageContext.from_defaults(vector_store=vector_store) index = VectorStoreIndex.from_documents( documents, storage_context=storage_context )
In [ ]
已复制!
# Query Data
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(response)
# 查询数据 query_engine = index.as_query_engine() response = query_engine.query("What did the author do growing up?") print(response)
The author worked on writing and programming outside of school. They wrote short stories and tried writing programs on an IBM 1401 computer. They also built a microcomputer kit and started programming on it, writing simple games and a word processor.