指南：将向量存储索引与现有 Weaviate 向量存储配合使用¶

如果您在 colab 上打开此笔记本，您可能需要安装 LlamaIndex 🦙。

In [ ]

已复制！

%pip install llama-index-vector-stores-weaviate
%pip install llama-index-embeddings-openai
%pip install llama-index-vector-stores-weaviate %pip install llama-index-embeddings-openai

In [ ]

已复制！

!pip install llama-index
!pip install llama-index

In [ ]

已复制！

import weaviate
import weaviate

In [ ]

已复制！

client = weaviate.Client("https://test-cluster-bbn8vqsn.weaviate.network")
client = weaviate.Client("https://test-cluster-bbn8vqsn.weaviate.network")

准备示例“现有” Weaviate 向量存储¶

定义模式¶

我们为“Book”类创建一个模式，包含 4 个属性：title (str), author (str), content (str), 和 year (int)

In [ ]

已复制！

try:
    client.schema.delete_class("Book")
except:
    pass
try: client.schema.delete_class("Book") except: pass

In [ ]

已复制！





schema = {
    "classes": [
        {
            "class": "Book",
            "properties": [
                {"name": "title", "dataType": ["text"]},
                {"name": "author", "dataType": ["text"]},
                {"name": "content", "dataType": ["text"]},
                {"name": "year", "dataType": ["int"]},
            ],
        },
    ]
}

if not client.schema.contains(schema):
    client.schema.create(schema)
schema = { "classes": [ { "class": "Book", "properties": [ {"name": "title", "dataType": ["text"]}, {"name": "author", "dataType": ["text"]}, {"name": "content", "dataType": ["text"]}, {"name": "year", "dataType": ["int"]}, ], }, ] } if not client.schema.contains(schema): client.schema.create(schema)

定义示例数据¶

我们创建 4 本示例书籍

In [ ]

已复制！





books = [
    {
        "title": "To Kill a Mockingbird",
        "author": "Harper Lee",
        "content": (
            "To Kill a Mockingbird is a novel by Harper Lee published in"
            " 1960..."
        ),
        "year": 1960,
    },
    {
        "title": "1984",
        "author": "George Orwell",
        "content": (
            "1984 is a dystopian novel by George Orwell published in 1949..."
        ),
        "year": 1949,
    },
    {
        "title": "The Great Gatsby",
        "author": "F. Scott Fitzgerald",
        "content": (
            "The Great Gatsby is a novel by F. Scott Fitzgerald published in"
            " 1925..."
        ),
        "year": 1925,
    },
    {
        "title": "Pride and Prejudice",
        "author": "Jane Austen",
        "content": (
            "Pride and Prejudice is a novel by Jane Austen published in"
            " 1813..."
        ),
        "year": 1813,
    },
]
books = [ { "title": "To Kill a Mockingbird", "author": "Harper Lee", "content": ( "To Kill a Mockingbird is a novel by Harper Lee published in" " 1960..." ), "year": 1960, }, { "title": "1984", "author": "George Orwell", "content": ( "1984 is a dystopian novel by George Orwell published in 1949..." ), "year": 1949, }, { "title": "The Great Gatsby", "author": "F. Scott Fitzgerald", "content": ( "The Great Gatsby is a novel by F. Scott Fitzgerald published in" " 1925..." ), "year": 1925, }, { "title": "Pride and Prejudice", "author": "Jane Austen", "content": ( "Pride and Prejudice is a novel by Jane Austen published in" " 1813..." ), "year": 1813, }, ]

添加数据¶

我们将示例书籍添加到我们的 Weaviate "Book" 类（并对 content 字段进行嵌入）

In [ ]

已复制！

from llama_index.embeddings.openai import OpenAIEmbedding

embed_model = OpenAIEmbedding()
from llama_index.embeddings.openai import OpenAIEmbedding embed_model = OpenAIEmbedding()

In [ ]

已复制！





with client.batch as batch:
    for book in books:
        vector = embed_model.get_text_embedding(book["content"])
        batch.add_data_object(
            data_object=book, class_name="Book", vector=vector
        )
with client.batch as batch: for book in books: vector = embed_model.get_text_embedding(book["content"]) batch.add_data_object( data_object=book, class_name="Book", vector=vector )

对“现有” Weaviate 向量存储进行查询¶

In [ ]

已复制！

from llama_index.vector_stores.weaviate import WeaviateVectorStore
from llama_index.core import VectorStoreIndex
from llama_index.core.response.pprint_utils import pprint_source_node
from llama_index.vector_stores.weaviate import WeaviateVectorStore from llama_index.core import VectorStoreIndex from llama_index.core.response.pprint_utils import pprint_source_node

您必须正确指定与所需的 Weaviate 类匹配的 "index_name"，并选择一个类属性作为 "text" 字段。

In [ ]

已复制！

vector_store = WeaviateVectorStore(
    weaviate_client=client, index_name="Book", text_key="content"
)
vector_store = WeaviateVectorStore( weaviate_client=client, index_name="Book", text_key="content" )

In [ ]

已复制！

retriever = VectorStoreIndex.from_vector_store(vector_store).as_retriever(
    similarity_top_k=1
)
retriever = VectorStoreIndex.from_vector_store(vector_store).as_retriever( similarity_top_k=1 )

In [ ]

已复制！

nodes = retriever.retrieve("What is that book about a bird again?")
nodes = retriever.retrieve("那本关于鸟的书是什么？")

我们来检查检索到的节点。我们可以看到书籍数据已作为 LlamaIndex Node 对象加载，其中 "content" 字段是主要文本。

In [ ]

已复制！

pprint_source_node(nodes[0])
pprint_source_node(nodes[0])

Document ID: cf927ce7-0672-4696-8aae-7e77b33b9659
Similarity: None
Text: author: Harper Lee title: To Kill a Mockingbird year: 1960  To
Kill a Mockingbird is a novel by Harper Lee published in 1960......

其余字段应作为元数据加载（在 metadata 中）

In [ ]

已复制！

nodes[0].node.metadata
nodes[0].node.metadata

Out[ ]

{'author': 'Harper Lee', 'title': 'To Kill a Mockingbird', 'year': 1960}