Google GenAI 嵌入¶

使用 Google 的 google-genai 包，LlamaIndex 提供了 GoogleGenAIEmbedding 类，允许您使用 Google 的 GenAI 模型从 Gemini 和 Vertex AI API 嵌入文本。

如果您在 colab 上打开此 Notebook，您可能需要安装 LlamaIndex 🦙。

输入 [ ]

已复制！

%pip install llama-index-embeddings-google-genai
%pip install llama-index-embeddings-google-genai

输入 [ ]

已复制！

import os

os.environ["GOOGLE_API_KEY"] = "..."
import os os.environ["GOOGLE_API_KEY"] = "..."

设置¶

GoogleGenAIEmbedding 是 google-genai 包的一个封装，这意味着它开箱即用地支持 Gemini 和 Vertex AI API。

您可以直接传入 api_key，或者传入 vertexai_config 来使用 Vertex AI API。

其他选项包括 embed_batch_size、model_name 和 embedding_config。

默认模型是 text-embedding-004。

输入 [ ]

已复制！





from llama_index.embeddings.google_genai import GoogleGenAIEmbedding
from google.genai.types import EmbedContentConfig

embed_model = GoogleGenAIEmbedding(
    model_name="text-embedding-004",
    embed_batch_size=100,
    # can pass in the api key directly
    # api_key="...",
    # or pass in a vertexai_config
    # vertexai_config={
    #     "project": "...",
    #     "location": "...",
    # }
    # can also pass in an embedding_config
    # embedding_config=EmbedContentConfig(...)
)
from llama_index.embeddings.google_genai import GoogleGenAIEmbedding from google.genai.types import EmbedContentConfig embed_model = GoogleGenAIEmbedding( model_name="text-embedding-004", embed_batch_size=100, # 可以直接传入 api key # api_key="...", # 或者传入 vertexai_config # vertexai_config={ # "project": "...", # "location": "...", # } # 也可以传入 embedding_config # embedding_config=EmbedContentConfig(...) )

用法¶

同步¶

输入 [ ]

已复制！

embeddings = embed_model.get_text_embedding("Google Gemini Embeddings.")
print(embeddings[:5])
print(f"Dimension of embeddings: {len(embeddings)}")
embeddings = embed_model.get_text_embedding("Google Gemini Embeddings.") print(embeddings[:5]) print(f"Dimension of embeddings: {len(embeddings)}")

[0.031099992, 0.02192731, -0.06523498, 0.016788177, 0.0392835]
Dimension of embeddings: 768

输入 [ ]

已复制！

embeddings = embed_model.get_query_embedding("Query Google Gemini Embeddings.")
print(embeddings[:5])
print(f"Dimension of embeddings: {len(embeddings)}")
embeddings = embed_model.get_query_embedding("Query Google Gemini Embeddings.") print(embeddings[:5]) print(f"Dimension of embeddings: {len(embeddings)}")

[0.022199392, 0.03671178, -0.06874573, 0.02195774, 0.05475164]
Dimension of embeddings: 768

输入 [ ]

已复制！





embeddings = embed_model.get_text_embedding_batch(
    [
        "Google Gemini Embeddings.",
        "Google is awesome.",
        "Llamaindex is awesome.",
    ]
)
print(f"Got {len(embeddings)} embeddings")
print(f"Dimension of embeddings: {len(embeddings[0])}")
embeddings = embed_model.get_text_embedding_batch( [ "Google Gemini Embeddings.", "Google is awesome.", "Llamaindex is awesome.", ] ) print(f"Got {len(embeddings)} embeddings") print(f"Dimension of embeddings: {len(embeddings[0])}")

Got 3 embeddings
Dimension of embeddings: 768

异步¶

输入 [ ]

已复制！

embeddings = await embed_model.aget_text_embedding("Google Gemini Embeddings.")
print(embeddings[:5])
print(f"Dimension of embeddings: {len(embeddings)}")
embeddings = await embed_model.aget_text_embedding("Google Gemini Embeddings.") print(embeddings[:5]) print(f"Dimension of embeddings: {len(embeddings)}")

[0.031099992, 0.02192731, -0.06523498, 0.016788177, 0.0392835]
Dimension of embeddings: 768

输入 [ ]

已复制！

embeddings = await embed_model.aget_query_embedding(
    "Query Google Gemini Embeddings."
)
print(embeddings[:5])
print(f"Dimension of embeddings: {len(embeddings)}")
embeddings = await embed_model.aget_query_embedding( "Query Google Gemini Embeddings." ) print(embeddings[:5]) print(f"Dimension of embeddings: {len(embeddings)}")

[0.022199392, 0.03671178, -0.06874573, 0.02195774, 0.05475164]
Dimension of embeddings: 768

输入 [ ]

已复制！





embeddings = await embed_model.aget_text_embedding_batch(
    [
        "Google Gemini Embeddings.",
        "Google is awesome.",
        "Llamaindex is awesome.",
    ]
)
print(f"Got {len(embeddings)} embeddings")
print(f"Dimension of embeddings: {len(embeddings[0])}")
embeddings = await embed_model.aget_text_embedding_batch( [ "Google Gemini Embeddings.", "Google is awesome.", "Llamaindex is awesome.", ] ) print(f"Got {len(embeddings)} embeddings") print(f"Dimension of embeddings: {len(embeddings[0])}")

Got 3 embeddings
Dimension of embeddings: 768