Azure AI Search¶
基本示例¶
在此笔记本中,我们使用 Paul Graham 的一篇随笔,将其分割成块,使用 Azure OpenAI 嵌入模型进行嵌入,加载到 Azure AI Search 索引中,然后进行查询。
如果您在 colab 上打开此笔记本,您可能需要安装 LlamaIndex 🦙。
In [ ]
已复制!
!pip install llama-index
!pip install wget
%pip install llama-index-vector-stores-azureaisearch
%pip install azure-search-documents==11.5.1
%llama-index-embeddings-azure-openai
%llama-index-llms-azure-openai
!pip install llama-index !pip install wget %pip install llama-index-vector-stores-azureaisearch %pip install azure-search-documents==11.5.1 %llama-index-embeddings-azure-openai %llama-index-llms-azure-openai
In [ ]
已复制!
import logging
import sys
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from IPython.display import Markdown, display
from llama_index.core import (
SimpleDirectoryReader,
StorageContext,
VectorStoreIndex,
)
from llama_index.core.settings import Settings
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding
from llama_index.vector_stores.azureaisearch import AzureAISearchVectorStore
from llama_index.vector_stores.azureaisearch import (
IndexManagement,
MetadataIndexFieldType,
)
import logging import sys from azure.core.credentials import AzureKeyCredential from azure.search.documents import SearchClient from azure.search.documents.indexes import SearchIndexClient from IPython.display import Markdown, display from llama_index.core import ( SimpleDirectoryReader, StorageContext, VectorStoreIndex, ) from llama_index.core.settings import Settings from llama_index.llms.azure_openai import AzureOpenAI from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding from llama_index.vector_stores.azureaisearch import AzureAISearchVectorStore from llama_index.vector_stores.azureaisearch import ( IndexManagement, MetadataIndexFieldType, )
设置 Azure OpenAI¶
In [ ]
已复制!
aoai_api_key = "YOUR_AZURE_OPENAI_API_KEY"
aoai_endpoint = "YOUR_AZURE_OPENAI_ENDPOINT"
aoai_api_version = "2024-10-21"
llm = AzureOpenAI(
model="YOUR_AZURE_OPENAI_COMPLETION_MODEL_NAME",
deployment_name="YOUR_AZURE_OPENAI_COMPLETION_DEPLOYMENT_NAME",
api_key=aoai_api_key,
azure_endpoint=aoai_endpoint,
api_version=aoai_api_version,
)
# You need to deploy your own embedding model as well as your own chat completion model
embed_model = AzureOpenAIEmbedding(
model="YOUR_AZURE_OPENAI_EMBEDDING_MODEL_NAME",
deployment_name="YOUR_AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME",
api_key=aoai_api_key,
azure_endpoint=aoai_endpoint,
api_version=aoai_api_version,
)
aoai_api_key = "YOUR_AZURE_OPENAI_API_KEY" aoai_endpoint = "YOUR_AZURE_OPENAI_ENDPOINT" aoai_api_version = "2024-10-21" llm = AzureOpenAI( model="YOUR_AZURE_OPENAI_COMPLETION_MODEL_NAME", deployment_name="YOUR_AZURE_OPENAI_COMPLETION_DEPLOYMENT_NAME", api_key=aoai_api_key, azure_endpoint=aoai_endpoint, api_version=aoai_api_version, ) # 您需要部署您自己的嵌入模型以及您自己的聊天完成模型 embed_model = AzureOpenAIEmbedding( model="YOUR_AZURE_OPENAI_EMBEDDING_MODEL_NAME", deployment_name="YOUR_AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME", api_key=aoai_api_key, azure_endpoint=aoai_endpoint, api_version=aoai_api_version, )
设置 Azure AI Search¶
In [ ]
已复制!
search_service_api_key = "YOUR-AZURE-SEARCH-SERVICE-ADMIN-KEY"
search_service_endpoint = "YOUR-AZURE-SEARCH-SERVICE-ENDPOINT"
search_service_api_version = "2024-07-01"
credential = AzureKeyCredential(search_service_api_key)
# Index name to use
index_name = "llamaindex-vector-demo"
# Use index client to demonstrate creating an index
index_client = SearchIndexClient(
endpoint=search_service_endpoint,
credential=credential,
)
# Use search client to demonstration using existing index
search_client = SearchClient(
endpoint=search_service_endpoint,
index_name=index_name,
credential=credential,
)
search_service_api_key = "YOUR-AZURE-SEARCH-SERVICE-ADMIN-KEY" search_service_endpoint = "YOUR-AZURE-SEARCH-SERVICE-ENDPOINT" search_service_api_version = "2024-07-01" credential = AzureKeyCredential(search_service_api_key) # 要使用的索引名称 index_name = "llamaindex-vector-demo" # 使用索引客户端演示创建索引 index_client = SearchIndexClient( endpoint=search_service_endpoint, credential=credential, ) # 使用搜索客户端演示使用现有索引 search_client = SearchClient( endpoint=search_service_endpoint, index_name=index_name, credential=credential, )
创建索引(如果不存在)¶
演示如何创建名为 "llamaindex-vector-demo" 的向量索引(如果不存在)。该索引包含以下字段:
字段名 | OData 类型 |
---|---|
id | Edm.String |
chunk | Edm.String |
embedding | Collection(Edm.Single) |
metadata | Edm.String |
doc_id | Edm.String |
author | Edm.String |
theme | Edm.String |
director | Edm.String |
In [ ]
已复制!
metadata_fields = {
"author": "author",
"theme": ("topic", MetadataIndexFieldType.STRING),
"director": "director",
}
vector_store = AzureAISearchVectorStore(
search_or_index_client=index_client,
filterable_metadata_field_keys=metadata_fields,
index_name=index_name,
index_management=IndexManagement.CREATE_IF_NOT_EXISTS,
id_field_key="id",
chunk_field_key="chunk",
embedding_field_key="embedding",
embedding_dimensionality=1536,
metadata_string_field_key="metadata",
doc_id_field_key="doc_id",
language_analyzer="en.lucene",
vector_algorithm_type="exhaustiveKnn",
# compression_type="binary" # Option to use "scalar" or "binary". NOTE: compression is only supported for HNSW
)
metadata_fields = { "author": "author", "theme": ("topic", MetadataIndexFieldType.STRING), "director": "director", } vector_store = AzureAISearchVectorStore( search_or_index_client=index_client, filterable_metadata_field_keys=metadata_fields, index_name=index_name, index_management=IndexManagement.CREATE_IF_NOT_EXISTS, id_field_key="id", chunk_field_key="chunk", embedding_field_key="embedding", embedding_dimensionality=1536, metadata_string_field_key="metadata", doc_id_field_key="doc_id", language_analyzer="en.lucene", vector_algorithm_type="exhaustiveKnn", # Option to use "scalar" or "binary". NOTE: compression is only supported for HNSW # compression_type="binary" )
In [ ]
已复制!
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/' !wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
加载文档¶
使用 SimpleDirectoryReader 加载存储在 data/paul_graham/
中的文档
In [ ]
已复制!
# Load documents
documents = SimpleDirectoryReader("../data/paul_graham/").load_data()
storage_context = StorageContext.from_defaults(vector_store=vector_store)
Settings.llm = llm
Settings.embed_model = embed_model
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context
)
# Load documents documents = SimpleDirectoryReader("../data/paul_graham/").load_data() storage_context = StorageContext.from_defaults(vector_store=vector_store) Settings.llm = llm Settings.embed_model = embed_model index = VectorStoreIndex.from_documents( documents, storage_context=storage_context )
In [ ]
已复制!
# Query Data
query_engine = index.as_query_engine(similarity_top_k=3)
response = query_engine.query("What did the author do growing up?")
display(Markdown(f"<b>{response}</b>"))
# Query Data query_engine = index.as_query_engine(similarity_top_k=3) response = query_engine.query("作者成长过程中做了什么?") display(Markdown(f"{response}"))
作者在学校之外专注于写作和编程,写短篇故事并在九年级时在 IBM 1401 上尝试编程。后来,作者继续在微型计算机上编程,最终说服父亲买了一台 TRS-80,他在上面开始写简单的游戏和一个文字处理器。
In [ ]
已复制!
response = query_engine.query(
"What did the author learn?",
)
display(Markdown(f"<b>{response}</b>"))
response = query_engine.query( "作者学到了什么?", ) display(Markdown(f"{response}"))
作者学到了关于早期计算机编程的知识、早期 AI 的局限性、从事不起眼的事情的重要性,以及在线撰写随笔的意义。
使用现有索引¶
In [ ]
已复制!
index_name = "llamaindex-vector-demo"
metadata_fields = {
"author": "author",
"theme": ("topic", MetadataIndexFieldType.STRING),
"director": "director",
}
vector_store = AzureAISearchVectorStore(
search_or_index_client=search_client,
filterable_metadata_field_keys=metadata_fields,
index_management=IndexManagement.VALIDATE_INDEX,
id_field_key="id",
chunk_field_key="chunk",
embedding_field_key="embedding",
embedding_dimensionality=1536,
metadata_string_field_key="metadata",
doc_id_field_key="doc_id",
)
index_name = "llamaindex-vector-demo" metadata_fields = { "author": "author", "theme": ("topic", MetadataIndexFieldType.STRING), "director": "director", } vector_store = AzureAISearchVectorStore( search_or_index_client=search_client, filterable_metadata_field_keys=metadata_fields, index_management=IndexManagement.VALIDATE_INDEX, id_field_key="id", chunk_field_key="chunk", embedding_field_key="embedding", embedding_dimensionality=1536, metadata_string_field_key="metadata", doc_id_field_key="doc_id", )
In [ ]
已复制!
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
[],
storage_context=storage_context,
)
storage_context = StorageContext.from_defaults(vector_store=vector_store) index = VectorStoreIndex.from_documents( [], storage_context=storage_context, )
In [ ]
已复制!
query_engine = index.as_query_engine()
response = query_engine.query("What was a hard moment for the author?")
display(Markdown(f"<b>{response}</b>"))
query_engine = index.as_query_engine() response = query_engine.query("作者经历的艰难时刻是什么?") display(Markdown(f"{response}"))
作者的母亲因结肠癌中风,这最终导致她去世,作者因此经历了一个艰难的时刻。
In [ ]
已复制!
response = query_engine.query("Who is the author?")
display(Markdown(f"<b>{response}</b>"))
response = query_engine.query("作者是谁?") display(Markdown(f"{response}"))
Paul Graham
In [ ]
已复制!
import time
query_engine = index.as_query_engine(streaming=True)
response = query_engine.query("What happened at interleaf?")
start_time = time.time()
token_count = 0
for token in response.response_gen:
print(token, end="")
token_count += 1
time_elapsed = time.time() - start_time
tokens_per_second = token_count / time_elapsed
print(f"\n\nStreamed output at {tokens_per_second} tokens/s")
import time query_engine = index.as_query_engine(streaming=True) response = query_engine.query("在 Interleaf 发生了什么?") start_time = time.time() token_count = 0 for token in response.response_gen: print(token, end="") token_count += 1 time_elapsed = time.time() - start_time tokens_per_second = token_count / time_elapsed print(f"\n\n流式输出速度为 {tokens_per_second} tokens/s")
The company added a scripting language inspired by Emacs, and made the scripting language a dialect of Lisp. Streamed output at 64.01633939770672 tokens/s
向现有索引添加文档¶
In [ ]
已复制!
response = query_engine.query("What colour is the sky?")
display(Markdown(f"<b>{response}</b>"))
response = query_engine.query("天空是什么颜色的?") display(Markdown(f"{response}"))
天空的颜色因一天中的时间、天气条件和位置等因素而异。
In [ ]
已复制!
from llama_index.core import Document
index.insert_nodes([Document(text="The sky is indigo today")])
from llama_index.core import Document index.insert_nodes([Document(text="今天天空是靛蓝色的")])
In [ ]
已复制!
response = query_engine.query("What colour is the sky?")
display(Markdown(f"<b>{response}</b>"))
response = query_engine.query("天空是什么颜色的?") display(Markdown(f"{response}"))
天空的颜色是靛蓝色。
过滤¶
可以使用 `filters` 参数应用过滤器(使用 llama-index 的过滤器语法)或直接传入 `odata_filters` 参数。
In [ ]
已复制!
from llama_index.core.schema import TextNode
nodes = [
TextNode(
text="The Shawshank Redemption",
metadata={
"author": "Stephen King",
"theme": "Friendship",
},
),
TextNode(
text="The Godfather",
metadata={
"director": "Francis Ford Coppola",
"theme": "Mafia",
},
),
TextNode(
text="Inception",
metadata={
"director": "Christopher Nolan",
},
),
]
from llama_index.core.schema import TextNode nodes = [ TextNode( text="The Shawshank Redemption", metadata={ "author": "Stephen King", "theme": "Friendship", }, ), TextNode( text="The Godfather", metadata={ "director": "Francis Ford Coppola", "theme": "Mafia", }, ), TextNode( text="Inception", metadata={ "director": "Christopher Nolan", }, ), ]
In [ ]
已复制!
index.insert_nodes(nodes)
index.insert_nodes(nodes)
In [ ]
已复制!
from llama_index.core.vector_stores.types import (
MetadataFilters,
MetadataFilter,
FilterOperator,
FilterCondition,
)
filters = MetadataFilters(
filters=[
MetadataFilter(key="theme", value="Mafia", operator=FilterOperator.EQ)
],
# if you want to apply multiple filters, you can use the AND, OR, NOT condition
# condition=FilterCondition.AND
)
retriever = index.as_retriever(filters=filters)
retriever.retrieve("What is inception about?")
from llama_index.core.vector_stores.types import ( MetadataFilters, MetadataFilter, FilterOperator, FilterCondition, ) filters = MetadataFilters( filters=[ MetadataFilter(key="theme", value="Mafia", operator=FilterOperator.EQ) ], # if you want to apply multiple filters, you can use the AND, OR, NOT condition # condition=FilterCondition.AND ) retriever = index.as_retriever(filters=filters) retriever.retrieve("《盗梦空间》讲了什么?")
Out [ ]
[NodeWithScore(node=TextNode(id_='f0c299d8-1f59-4338-9c4e-06b99855ed23', embedding=None, metadata={'director': 'Francis Ford Coppola', 'theme': 'Mafia'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='The Godfather', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.8120511)]
或者直接传入 odata_filters 参数
In [ ]
已复制!
odata_filters = "theme eq 'Mafia'"
retriever = index.as_retriever(
vector_store_kwargs={"odata_filters": odata_filters}
)
retriever.retrieve("What is inception about?")
odata_filters = "theme eq 'Mafia'" retriever = index.as_retriever( vector_store_kwargs={"odata_filters": odata_filters} ) retriever.retrieve("《盗梦空间》讲了什么?")
查询模式¶
支持四种查询模式:DEFAULT(向量搜索)、SPARSE、HYBRID 和 SEMANTIC_HYBRID。
执行向量搜索¶
In [ ]
已复制!
from llama_index.core.vector_stores.types import VectorStoreQueryMode
default_retriever = index.as_retriever(
vector_store_query_mode=VectorStoreQueryMode.DEFAULT
)
response = default_retriever.retrieve("What is inception about?")
# Loop through each NodeWithScore in the response
for node_with_score in response:
node = node_with_score.node # The TextNode object
score = node_with_score.score # The similarity score
chunk_id = node.id_ # The chunk ID
# Extract the relevant metadata from the node
file_name = node.metadata.get("file_name", "Unknown")
file_path = node.metadata.get("file_path", "Unknown")
# Extract the text content from the node
text_content = node.text if node.text else "No content available"
# Print the results in a user-friendly format
print(f"Score: {score}")
print(f"File Name: {file_name}")
print(f"Id: {chunk_id}")
print("\nExtracted Content:")
print(text_content)
print("\n" + "=" * 40 + " End of Result " + "=" * 40 + "\n")
from llama_index.core.vector_stores.types import VectorStoreQueryMode default_retriever = index.as_retriever( vector_store_query_mode=VectorStoreQueryMode.DEFAULT ) response = default_retriever.retrieve("What is inception about?") # 遍历响应中的每个 NodeWithScore: for node_with_score in response: node = node_with_score.node # TextNode 对象 score = node_with_score.score # 相似度得分 chunk_id = node.id_ # 块 ID # 从节点中提取相关元数据 file_name = node.metadata.get("file_name", "未知") file_path = node.metadata.get("file_path", "未知") # 从节点中提取文本内容 text_content = node.text if node.text else "无内容可用" # 以用户友好的格式打印结果 print(f"得分:{score}") print(f"文件名:{file_name}") print(f"ID:{chunk_id}") print("\n提取内容:") print(text_content) print("\n" + "=" * 40 + " 结果结束 " + "=" * 40 + "\n")
Score: 0.87485534 File Name: Unknown Id: b4d2af4e-1de0-4cfe-8b18-629722bc12d7 Extracted Content: Inception ======================================== End of Result ======================================== Score: 0.8120511 File Name: Unknown Id: f0c299d8-1f59-4338-9c4e-06b99855ed23 Extracted Content: The Godfather ======================================== End of Result ========================================
执行混合搜索¶
In [ ]
已复制!
from llama_index.core.vector_stores.types import VectorStoreQueryMode
hybrid_retriever = index.as_retriever(
vector_store_query_mode=VectorStoreQueryMode.HYBRID
)
hybrid_retriever.retrieve("What is inception about?")
from llama_index.core.vector_stores.types import VectorStoreQueryMode hybrid_retriever = index.as_retriever( vector_store_query_mode=VectorStoreQueryMode.HYBRID ) hybrid_retriever.retrieve("《盗梦空间》讲了什么?")
Out [ ]
[NodeWithScore(node=TextNode(id_='18de619a-94ca-46d9-bb17-51bd1b5c041a', embedding=None, metadata={'director': 'Christopher Nolan'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='Inception', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.03333333507180214), NodeWithScore(node=TextNode(id_='226f09aa-b9f1-40eb-a377-35da6f288fa1', embedding=None, metadata={'file_path': 'c:\\Dev\\llama_index\\docs\\docs\\examples\\vector_stores\\..\\data\\paul_graham\\paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75395, 'creation_date': '2024-08-23', 'last_modified_date': '2024-08-23'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='ba4cac4e-640c-41de-b01d-2c0116f753c8', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'file_path': 'c:\\Dev\\llama_index\\docs\\docs\\examples\\vector_stores\\..\\data\\paul_graham\\paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75395, 'creation_date': '2024-08-23', 'last_modified_date': '2024-08-23'}, hash='aec9c37ce12c52b02feb44993a53c84733fc31d3637224bda1c831b42009fc17'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='b4120ac2-8829-4253-87ac-a714c1e680a2', node_type=<ObjectType.TEXT: '1'>, metadata={'file_path': 'c:\\Dev\\llama_index\\docs\\docs\\examples\\vector_stores\\..\\data\\paul_graham\\paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75395, 'creation_date': '2024-08-23', 'last_modified_date': '2024-08-23'}, hash='65795efeb29d8b8af5b61ff4d7b5607dfb0d262cb13f1eec55bb2e3d30654b28'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='7bf9e654-5bbb-40a8-8cb5-12e3737a837d', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='2cca6e84108227c4e35fdd947606cebb212464fbe7dc51e4204ee27d62533e54')}, text='I recruited Dan Giffin, who had worked for Viaweb, and two undergrads who wanted summer jobs, and we got to work trying to build what it\'s now clear is about twenty companies and several open source projects worth of software. The language for defining applications would of course be a dialect of Lisp. But I wasn\'t so naive as to assume I could spring an overt Lisp on a general audience; we\'d hide the parentheses, like Dylan did.\r\n\r\nBy then there was a name for the kind of company Viaweb was, an "application service provider," or ASP. This name didn\'t last long before it was replaced by "software as a service," but it was current for long enough that I named this new company after it: it was going to be called Aspra.\r\n\r\nI started working on the application builder, Dan worked on network infrastructure, and the two undergrads worked on the first two services (images and phone calls). But about halfway through the summer I realized I really didn\'t want to run a company — especially not a big one, which it was looking like this would have to be. I\'d only started Viaweb because I needed the money. Now that I didn\'t need money anymore, why was I doing this? If this vision had to be realized as a company, then screw the vision. I\'d build a subset that could be done as an open source project.\r\n\r\nMuch to my surprise, the time I spent working on this stuff was not wasted after all. After we started Y Combinator, I would often encounter startups working on parts of this new architecture, and it was very useful to have spent so much time thinking about it and even trying to write some of it.\r\n\r\nThe subset I would build as an open source project was the new Lisp, whose parentheses I now wouldn\'t even have to hide. A lot of Lisp hackers dream of building a new Lisp, partly because one of the distinctive features of the language is that it has dialects, and partly, I think, because we have in our minds a Platonic form of Lisp that all existing dialects fall short of. I certainly did. So at the end of the summer Dan and I switched to working on this new dialect of Lisp, which I called Arc, in a house I bought in Cambridge.\r\n\r\nThe following spring, lightning struck. I was invited to give a talk at a Lisp conference, so I gave one about how we\'d used Lisp at Viaweb. Afterward I put a postscript file of this talk online, on paulgraham.com, which I\'d created years before using Viaweb but had never used for anything. In one day it got 30,000 page views. What on earth had happened? The referring urls showed that someone had posted it on Slashdot. [10]\r\n\r\nWow, I thought, there\'s an audience. If I write something and put it on the web, anyone can read it. That may seem obvious now, but it was surprising then. In the print era there was a narrow channel to readers, guarded by fierce monsters known as editors. The only way to get an audience for anything you wrote was to get it published as a book, or in a newspaper or magazine. Now anyone could publish anything.\r\n\r\nThis had been possible in principle since 1993, but not many people had realized it yet. I had been intimately involved with building the infrastructure of the web for most of that time, and a writer as well, and it had taken me 8 years to realize it. Even then it took me several years to understand the implications. It meant there would be a whole new generation of essays. [11]\r\n\r\nIn the print era, the channel for publishing essays had been vanishingly small. Except for a few officially anointed thinkers who went to the right parties in New York, the only people allowed to publish essays were specialists writing about their specialties. There were so many essays that had never been written, because there had been no way to publish them. Now they could be, and I was going to write them. [12]\r\n\r\nI\'ve worked on several different things, but to the extent there was a turning point where I figured out what to work on, it was when I started publishing essays online. From then on I knew that whatever else I did, I\'d always write essays too.\r\n\r\nI knew that online essays would be a marginal medium at first. Socially they\'d seem more like rants posted by nutjobs on their GeoCities sites than the genteel and beautifully typeset compositions published in The New Yorker. But by this point I knew enough to find that encouraging instead of discouraging.', mimetype='text/plain', start_char_idx=40977, end_char_idx=45334, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.016393441706895828)]
执行带有语义重排序的混合搜索¶
此模式将语义重排序应用于混合搜索结果,以提高搜索相关性。
更多详细信息请参阅此链接:https://learn.microsoft.com/azure/search/semantic-search-overview
In [ ]
已复制!
hybrid_retriever = index.as_retriever(
vector_store_query_mode=VectorStoreQueryMode.SEMANTIC_HYBRID
)
hybrid_retriever.retrieve("What is inception about?")
hybrid_retriever = index.as_retriever( vector_store_query_mode=VectorStoreQueryMode.SEMANTIC_HYBRID ) hybrid_retriever.retrieve("《盗梦空间》讲了什么?")
Out [ ]
[NodeWithScore(node=TextNode(id_='b4d2af4e-1de0-4cfe-8b18-629722bc12d7', embedding=None, metadata={'director': 'Christopher Nolan'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='Inception', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=2.379289150238037), NodeWithScore(node=TextNode(id_='07d77810-7697-4f16-a945-43b725795641', embedding=None, metadata={'file_path': 'c:\\Dev\\llama_index\\docs\\docs\\examples\\vector_stores\\..\\data\\paul_graham\\paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75395, 'creation_date': '2024-08-23', 'last_modified_date': '2024-08-23'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='e02f3256-42cf-4e96-8a2a-5ba4b5bae898', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'file_path': 'c:\\Dev\\llama_index\\docs\\docs\\examples\\vector_stores\\..\\data\\paul_graham\\paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75395, 'creation_date': '2024-08-23', 'last_modified_date': '2024-08-23'}, hash='aec9c37ce12c52b02feb44993a53c84733fc31d3637224bda1c831b42009fc17'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='022dfb3b-baa4-477b-bd0e-2ab9d8ebbf93', node_type=<ObjectType.TEXT: '1'>, metadata={'file_path': 'c:\\Dev\\llama_index\\docs\\docs\\examples\\vector_stores\\..\\data\\paul_graham\\paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75395, 'creation_date': '2024-08-23', 'last_modified_date': '2024-08-23'}, hash='3cac601cf9b6b268149bedd26c5f6767d9492f245d57c8a11e591cd631ce0b00'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='b8570137-ba64-4838-84b4-baa2fd7322fa', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='7b37bcc69aec36a618efa128df1f1e8f6bd49b1ab4f57aea517945db504d0d83')}, text='[2]\r\n\r\nI\'m only up to age 25 and already there are such conspicuous patterns. Here I was, yet again about to attend some august institution in the hopes of learning about some prestigious subject, and yet again about to be disappointed. The students and faculty in the painting department at the Accademia were the nicest people you could imagine, but they had long since arrived at an arrangement whereby the students wouldn\'t require the faculty to teach anything, and in return the faculty wouldn\'t require the students to learn anything. And at the same time all involved would adhere outwardly to the conventions of a 19th century atelier. We actually had one of those little stoves, fed with kindling, that you see in 19th century studio paintings, and a nude model sitting as close to it as possible without getting burned. Except hardly anyone else painted her besides me. The rest of the students spent their time chatting or occasionally trying to imitate things they\'d seen in American art magazines.\r\n\r\nOur model turned out to live just down the street from me. She made a living from a combination of modelling and making fakes for a local antique dealer. She\'d copy an obscure old painting out of a book, and then he\'d take the copy and maltreat it to make it look old. [3]\r\n\r\nWhile I was a student at the Accademia I started painting still lives in my bedroom at night. These paintings were tiny, because the room was, and because I painted them on leftover scraps of canvas, which was all I could afford at the time. Painting still lives is different from painting people, because the subject, as its name suggests, can\'t move. People can\'t sit for more than about 15 minutes at a time, and when they do they don\'t sit very still. So the traditional m.o. for painting people is to know how to paint a generic person, which you then modify to match the specific person you\'re painting. Whereas a still life you can, if you want, copy pixel by pixel from what you\'re seeing. You don\'t want to stop there, of course, or you get merely photographic accuracy, and what makes a still life interesting is that it\'s been through a head. You want to emphasize the visual cues that tell you, for example, that the reason the color changes suddenly at a certain point is that it\'s the edge of an object. By subtly emphasizing such things you can make paintings that are more realistic than photographs not just in some metaphorical sense, but in the strict information-theoretic sense. [4]\r\n\r\nI liked painting still lives because I was curious about what I was seeing. In everyday life, we aren\'t consciously aware of much we\'re seeing. Most visual perception is handled by low-level processes that merely tell your brain "that\'s a water droplet" without telling you details like where the lightest and darkest points are, or "that\'s a bush" without telling you the shape and position of every leaf. This is a feature of brains, not a bug. In everyday life it would be distracting to notice every leaf on every bush. But when you have to paint something, you have to look more closely, and when you do there\'s a lot to see. You can still be noticing new things after days of trying to paint something people usually take for granted, just as you can after days of trying to write an essay about something people usually take for granted.\r\n\r\nThis is not the only way to paint. I\'m not 100% sure it\'s even a good way to paint. But it seemed a good enough bet to be worth trying.\r\n\r\nOur teacher, professor Ulivi, was a nice guy. He could see I worked hard, and gave me a good grade, which he wrote down in a sort of passport each student had. But the Accademia wasn\'t teaching me anything except Italian, and my money was running out, so at the end of the first year I went back to the US.\r\n\r\nI wanted to go back to RISD, but I was now broke and RISD was very expensive, so I decided to get a job for a year and then return to RISD the next fall. I got one at a company called Interleaf, which made software for creating documents. You mean like Microsoft Word? Exactly. That was how I learned that low end software tends to eat high end software. But Interleaf still had a few years to live yet. [5]\r\n\r\nInterleaf had done something pretty bold. Inspired by Emacs, they\'d added a scripting language, and even made the scripting language a dialect of Lisp.', mimetype='text/plain', start_char_idx=13709, end_char_idx=18066, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.9201899170875549)]