指南：将向量存储索引与现有 Pinecone 向量存储一起使用¶

如果您在 colab 上打开此 Notebook，您可能需要安装 LlamaIndex 🦙。

In [ ]

已复制！

%pip install llama-index-embeddings-openai
%pip install llama-index-vector-stores-pinecone
%pip install llama-index-embeddings-openai %pip install llama-index-vector-stores-pinecone

In [ ]

已复制！

!pip install llama-index
!pip install llama-index

In [ ]

已复制！

import os
import pinecone
import os import pinecone

In [ ]

已复制！

api_key = os.environ["PINECONE_API_KEY"]
pinecone.init(api_key=api_key, environment="eu-west1-gcp")
api_key = os.environ["PINECONE_API_KEY"] pinecone.init(api_key=api_key, environment="eu-west1-gcp")

准备示例“现有” Pinecone 向量存储¶

创建索引¶

In [ ]

已复制！

indexes = pinecone.list_indexes()
print(indexes)
indexes = pinecone.list_indexes() print(indexes)

['quickstart-index']

In [ ]

已复制！

if "quickstart-index" not in indexes:
    # dimensions are for text-embedding-ada-002
    pinecone.create_index(
        "quickstart-index", dimension=1536, metric="euclidean", pod_type="p1"
    )
if "quickstart-index" not in indexes: # 维度适用于 text-embedding-ada-002 pinecone.create_index( "quickstart-index", dimension=1536, metric="euclidean", pod_type="p1" )

In [ ]

已复制！

pinecone_index = pinecone.Index("quickstart-index")
pinecone_index = pinecone.Index("quickstart-index")

In [ ]

已复制！

pinecone_index.delete(deleteAll="true")
pinecone_index.delete(deleteAll="true")

Out[ ]

{}

定义示例数据¶

我们创建了 4 本示例书籍

In [ ]

已复制！





books = [
    {
        "title": "To Kill a Mockingbird",
        "author": "Harper Lee",
        "content": (
            "To Kill a Mockingbird is a novel by Harper Lee published in"
            " 1960..."
        ),
        "year": 1960,
    },
    {
        "title": "1984",
        "author": "George Orwell",
        "content": (
            "1984 is a dystopian novel by George Orwell published in 1949..."
        ),
        "year": 1949,
    },
    {
        "title": "The Great Gatsby",
        "author": "F. Scott Fitzgerald",
        "content": (
            "The Great Gatsby is a novel by F. Scott Fitzgerald published in"
            " 1925..."
        ),
        "year": 1925,
    },
    {
        "title": "Pride and Prejudice",
        "author": "Jane Austen",
        "content": (
            "Pride and Prejudice is a novel by Jane Austen published in"
            " 1813..."
        ),
        "year": 1813,
    },
]
books = [ { "title": "To Kill a Mockingbird", "author": "Harper Lee", "content": ( "To Kill a Mockingbird is a novel by Harper Lee published in" " 1960..." ), "year": 1960, }, { "title": "1984", "author": "George Orwell", "content": ( "1984 is a dystopian novel by George Orwell published in 1949..." ), "year": 1949, }, { "title": "The Great Gatsby", "author": "F. Scott Fitzgerald", "content": ( "The Great Gatsby is a novel by F. Scott Fitzgerald published in" " 1925..." ), "year": 1925, }, { "title": "Pride and Prejudice", "author": "Jane Austen", "content": ( "Pride and Prejudice is a novel by Jane Austen published in" " 1813..." ), "year": 1813, }, ]

添加数据¶

我们将示例书籍添加到我们的 Weaviate “Book”类中（并嵌入内容字段）

In [ ]

已复制！

import uuid
from llama_index.embeddings.openai import OpenAIEmbedding

embed_model = OpenAIEmbedding()
import uuid from llama_index.embeddings.openai import OpenAIEmbedding embed_model = OpenAIEmbedding()

In [ ]

已复制！





entries = []
for book in books:
    vector = embed_model.get_text_embedding(book["content"])
    entries.append(
        {"id": str(uuid.uuid4()), "values": vector, "metadata": book}
    )
pinecone_index.upsert(entries)
entries = [] for book in books: vector = embed_model.get_text_embedding(book["content"]) entries.append( {"id": str(uuid.uuid4()), "values": vector, "metadata": book} ) pinecone_index.upsert(entries)

Out[ ]

{'upserted_count': 4}

查询“现有” Pinecone 向量存储¶

In [ ]

已复制！

from llama_index.vector_stores.pinecone import PineconeVectorStore
from llama_index.core import VectorStoreIndex
from llama_index.core.response.pprint_utils import pprint_source_node
from llama_index.vector_stores.pinecone import PineconeVectorStore from llama_index.core import VectorStoreIndex from llama_index.core.response.pprint_utils import pprint_source_node

您必须正确选择一个类属性作为“text”字段。

In [ ]

已复制！

vector_store = PineconeVectorStore(
    pinecone_index=pinecone_index, text_key="content"
)
vector_store = PineconeVectorStore( pinecone_index=pinecone_index, text_key="content" )

In [ ]

已复制！

retriever = VectorStoreIndex.from_vector_store(vector_store).as_retriever(
    similarity_top_k=1
)
retriever = VectorStoreIndex.from_vector_store(vector_store).as_retriever( similarity_top_k=1 )

In [ ]

已复制！

nodes = retriever.retrieve("What is that book about a bird again?")
nodes = retriever.retrieve("那本关于鸟的书是关于什么的？")

让我们检查检索到的节点。我们可以看到书籍数据被加载为 LlamaIndex Node 对象，其中“content”字段是主要文本。

In [ ]

已复制！

pprint_source_node(nodes[0])
pprint_source_node(nodes[0])

Document ID: 07e47f1d-cb90-431b-89c7-35462afcda28
Similarity: 0.797243237
Text: author: Harper Lee title: To Kill a Mockingbird year: 1960.0  To
Kill a Mockingbird is a novel by Harper Lee published in 1960......

剩余的字段应作为元数据（在 metadata 中）加载

In [ ]

已复制！

nodes[0].node.metadata
nodes[0].node.metadata

Out[ ]

{'author': 'Harper Lee', 'title': 'To Kill a Mockingbird', 'year': 1960.0}