Vectara 管理索引¶
在本笔记本中,我们将展示如何将 Vectara 与 LlamaIndex 结合使用。请注意,本笔记本适用于 Vectara ManagedIndex 版本 >=0.4.0。
Vectara 是可信赖的 AI 助手和智能体平台,专注于为关键任务型应用提供企业级就绪能力。
Vectara 为检索增强生成或 RAG 提供端到端托管服务,其中包括
用于处理输入数据的集成 API,包括从文档中提取文本和基于 ML 的分块。
最先进的 Boomerang 嵌入模型。每个文本块使用 Boomerang 编码为向量嵌入,并存储在 Vectara 内部知识库(向量+文本)中。因此,将 Vectara 与 LlamaIndex 结合使用时,无需调用单独的嵌入模型 - 这在 Vectara 后端自动发生。
一种查询服务,可自动将查询编码为嵌入,并通过混合搜索和各种重排序策略检索最相关的文本片段,包括多语言重排序器、最大边缘相关性 (MMR) 重排序器、用户定义函数重排序器以及链式重排序器,后者提供了一种将多种重排序方法链接在一起以更好地控制重排序的方式,结合了各种重排序方法的优势。
选择创建生成式摘要,提供多种 LLM 摘要器(包括 Vectara 专为 RAG 相关任务训练的 Mockingbird),基于检索到的文档,并包含引用。
请参阅Vectara API 文档,了解有关如何使用 API 的更多信息。
使用 Vectara RAG 即服务构建应用的主要优势包括
- 准确性和质量:Vectara 提供了一个端到端平台,专注于消除幻觉、减少偏见并保障版权完整性。
- 安全性:Vectara 平台提供访问控制——防止提示注入攻击——并符合 SOC2 和 HIPAA 标准。
- 可解释性:Vectara 通过清晰解释改写后的查询、LLM 提示、检索结果和智能体操作,使故障排除变得容易。
入门¶
如果您在 colab 上打开此笔记本,您可能需要安装 LlamaIndex 🦙。
!pip install llama-index llama-index-indices-managed-vectara
使用 LlamaIndex 和 Vectara 进行 RAG¶
您可以通过几种方式将数据索引到 Vectara 中,包括
- 使用
VectaraIndex
的from_documents()
或insert_file()
方法 - 直接在Vectara 控制台中上传文件
- 使用 Vectara 的文件上传或文档索引 API
- 使用开源爬虫/索引器项目 vectara-ingest
- 使用我们的某个数据摄取集成合作伙伴,如 Airbyte、Unstructured 或 DataVolo。
为此,我们将使用一组简单的、较小的文档,因此直接使用 VectaraIndex
进行数据摄取就足够了。
让我们将“AI 权利法案”文档摄取到我们的新语料库中。
from llama_index.indices.managed.vectara import VectaraIndex
import requests
url = "https://www.whitehouse.gov/wp-content/uploads/2022/10/Blueprint-for-an-AI-Bill-of-Rights.pdf"
response = requests.get(url)
local_path = "ai-bill-of-rights.pdf"
with open(local_path, "wb") as file:
file.write(response.content)
index = VectaraIndex()
index.insert_file(
local_path, metadata={"name": "AI bill of rights", "year": 2022}
)
'ai-bill-of-rights.pdf'
使用 Vectara 查询引擎运行单次查询¶
现在我们已经上传了文档(或者之前已经上传了文档),我们可以直接在 LlamaIndex 中提问。这将激活 Vectara 的 RAG 管道。
要使用 Vectara 的内部 LLM 进行摘要,请确保在生成查询引擎时指定 summary_enabled=True
。以下是一个示例
questions = [
"What are the risks of AI?",
"What should we do to prevent bad actors from using AI?",
"What are the benefits?",
]
qe = index.as_query_engine(
n_sentences_before=1,
n_sentences_after=1,
summary_enabled=True,
summary_prompt_name="mockingbird-1.0-2024-07-16",
)
qe.query(questions[0]).response
'The risks of AI include biased data and discriminatory outcomes, opaque decision-making processes, and lack of public trust and understanding of algorithmic systems [1]. These risks can have significant impacts on individuals and communities, particularly those who are directly affected by AI systems [5]. To mitigate these risks, it is essential to identify and address potential risks before deployment, and to implement ongoing monitoring and mitigation strategies [2][6]. This includes risk assessments, auditing mechanisms, and public consultation to ensure that AI systems are designed and used in a responsible and transparent manner [2][6]. Additionally, the development of AI systems should be guided by principles that prioritize lawfulness, accuracy, and transparency, and that are regularly monitored and accountable [7].'
如果您希望以流式模式返回响应,只需设置 streaming=True
qe = index.as_query_engine(
n_sentences_before=1,
n_sentences_after=1,
summary_enabled=True,
summary_prompt_name="mockingbird-1.0-2024-07-16",
streaming=True,
)
response = qe.query(questions[0])
response.print_response_stream()
The risks of AI include biased data and discriminatory outcomes, opaque decision-making processes, and lack of public trust and understanding of algorithmic systems [1]. These risks can have significant impacts on individuals and communities, particularly those who are directly affected by AI systems [5]. To mitigate these risks, it is essential to identify and address potential risks before deployment, and to implement ongoing monitoring and mitigation strategies [2][6]. This includes risk assessments, auditing mechanisms, and public consultation to ensure that AI systems are designed and used in a responsible and transparent manner [2][6]. Additionally, the development of AI systems should be guided by principles that prioritize lawfulness, accuracy, and transparency, and that are regularly monitored and accountable [7].
使用 Vectara 聊天¶
Vectara 还支持简单的聊天模式。在此模式下,聊天历史记录由 Vectara 维护,因此您无需担心。要使用它,只需调用 as_chat_engine
。
(聊天模式始终使用 Vectara 的摘要功能,因此您无需像之前那样显式指定 summary_enabled=True
)
ce = index.as_chat_engine(n_sentences_before=1, n_sentences_after=1)
for q in questions:
print(f"Question: {q}\n")
response = ce.chat(q).response
print(f"Response: {response}\n")
Question: What are the risks of AI? Response: The risks of AI include potential biases and discriminatory outcomes due to biased data, opaque decision-making processes, and lack of public trust and understanding of algorithmic systems. Mitigating these risks involves ongoing transparency, participatory design, explanations for stakeholders, and public consultation [1]. Industry is developing innovative solutions like risk assessments, auditing mechanisms, and monitoring tools to ensure the safety and efficacy of AI systems [2]. Identifying and mitigating risks before deployment is crucial, focusing on impacts on rights, opportunities, and communities, as well as risks from misuse of the system [6]. The Executive Order on Trustworthy AI in the Federal Government outlines principles for lawful, purposeful, accurate, safe, understandable, responsible, monitored, transparent, and accountable AI use [7]. Question: What should we do to prevent bad actors from using AI? Response: To prevent bad actors from using AI, we should implement a set of principles and practices to ensure the safe and effective use of AI systems. This includes adhering to specific principles such as legality, respect for values, accuracy, reliability, safety, transparency, and accountability in the design and use of AI [2]. Additionally, entities should follow privacy and security best practices to prevent data leaks and employ audits and impact assessments to identify and mitigate algorithmic discrimination [3][4]. It is crucial to involve the public in discussions about the promises and potential harms of AI technologies to shape policies that protect against discrimination and ensure fairness in the use of automated systems [1][6][7]. By promoting transparency, ongoing monitoring, and public consultation, we can work towards building trust, understanding, and ethical use of AI while safeguarding against misuse by bad actors. Question: What are the benefits? Response: The benefits of AI include the potential to build innovative infrastructure, improve customer service through faster responses, and enhance decision-making processes. AI can also lead to transformative improvements in people's lives, protect individuals from potential harms, and ensure the ethical use of automated systems. By incorporating principles for responsible stewardship and trustworthy AI, companies and government agencies can create safe, effective, and transparent AI systems that respect values, ensure accuracy, and promote accountability [1][4][6][7].
当然,流式传输也适用于聊天
ce = index.as_chat_engine(
n_sentences_before=1, n_sentences_after=1, streaming=True
)
response = ce.stream_chat("Will artificial intelligence rule the government?")
response.print_response_stream()
Artificial intelligence will not rule the government. The government has established principles and guidelines for the ethical use of AI, ensuring it is used responsibly, lawfully, and in alignment with the nation's values. These principles emphasize safety, accountability, transparency, and regular monitoring of AI systems within the federal government [1] [2]. Additionally, there are specific considerations for law enforcement and national security activities, highlighting the need for oversight and adherence to existing policies and safeguards [3]. The government is focused on promoting equity, fairness, civil rights, and racial justice through the use of AI, guided by principles that protect the American public [5]. Transparency and accountability are key aspects to ensure that AI technologies are used in ways that respect people's rights and expectations [7].
智能体式 RAG¶
Vectara 也有自己的包 vectara-agentic,它基于 LlamaIndex 的许多功能构建,可以轻松实现智能体式 RAG 应用。它允许您创建自己的 AI 助手,该助手具有 RAG 查询工具和其他自定义工具,例如调用 API 从金融网站检索信息。您可以在此处找到 vectara-agentic 的完整文档。
让我们使用 vectara-agentic 创建一个带有单个 RAG 工具的 ReAct 智能体(要创建 ReAct 智能体,请在环境中将 VECTARA_AGENTIC_AGENT_TYPE
指定为 "REACT"
)。
Vectara 尚未拥有能够充当规划和工具使用智能体的 LLM,因此我们需要使用另一个 LLM 作为智能体推理的驱动。
在本演示中,我们使用 OpenAI 的 GPT4o。请确保您在环境中定义了 OPENAI_API_KEY
,或指定另一个具有相应密钥的 LLM(有关支持的 LLM 的完整列表,请查看我们关于设置环境的文档)。
!pip install -U vectara-agentic
from vectara_agentic.agent import Agent
from IPython.display import display, Markdown
agent = Agent.from_corpus(
tool_name="query_ai",
data_description="AI regulations",
assistant_specialty="artificial intelligence",
vectara_reranker="mmr",
vectara_rerank_k=50,
vectara_summary_num_results=5,
vectara_summarizer="mockingbird-1.0-2024-07-16",
verbose=True,
)
response = agent.chat(
"What are the risks of AI? What are the benefits? Compare and contrast and provide a summary with arguments for and against from experts."
)
display(Markdown(response))
Failed to set up observer (No module named 'phoenix.otel'), ignoring > Running step 21fe2d4d-c74c-45df-9921-94c7f9e4f670. Step input: What are the risks of AI? What are the benefits? Compare and contrast and provide a summary with arguments for and against from experts. Thought: The current language of the user is: English. I need to use a tool to help me answer the question. Action: query_ai Action Input: {'query': 'risks and benefits of AI, expert opinions'} Observation: Response: '''According to expert opinions, the risks of AI include biased data and discriminatory outcomes, opaque decision-making processes, and lack of public trust and understanding of algorithmic systems [1]. To mitigate these risks, experts emphasize the importance of ongoing transparency, value-sensitive and participatory design, explanations designed for relevant stakeholders, and public consultation [1]. Additionally, industry is providing innovative solutions to mitigate risks to the safety and efficacy of AI systems, including risk assessments, auditing mechanisms, and documentation procedures [3]. The National Institute of Standards and Technology (NIST) is developing a risk management framework to better manage risks posed to individuals, organizations, and society by AI [3]. Furthermore, the White House Office of Science and Technology Policy has led a year-long process to seek input from people across the country on the issue of algorithmic and data-driven harms and potential remedies [4].''' References: [1]: CreationDate='1663695035'; Producer='iLovePDF'; Title='Blueprint for an AI Bill of Rights'; Creator='Adobe Illustrator 26.3 (Macintosh)'; ModDate='1664808078'; name='AI bill of rights'; year='2022'; framework='llama_index'; title='Blueprint for an AI Bill of Rights'. [3]: CreationDate='1663695035'; Producer='iLovePDF'; Title='Blueprint for an AI Bill of Rights'; Creator='Adobe Illustrator 26.3 (Macintosh)'; ModDate='1664808078'; name='AI bill of rights'; year='2022'; framework='llama_index'; title='Blueprint for an AI Bill of Rights'. [4]: CreationDate='1663695035'; Producer='iLovePDF'; Title='Blueprint for an AI Bill of Rights'; Creator='Adobe Illustrator 26.3 (Macintosh)'; ModDate='1664808078'; name='AI bill of rights'; year='2022'; framework='llama_index'; title='Blueprint for an AI Bill of Rights'. > Running step a2b4d751-9f91-4fd9-9004-e276da54b75f. Step input: None Thought: I can answer without using any more tools. I'll use the user's language to answer Answer: The risks and benefits of AI are widely discussed among experts, and there are several key points to consider. **Risks of AI:** 1. **Bias and Discrimination:** AI systems can perpetuate and even amplify biases present in the data they are trained on, leading to discriminatory outcomes. 2. **Opaque Decision-Making:** The decision-making processes of AI systems can be difficult to understand, leading to a lack of transparency. 3. **Public Trust:** There is often a lack of public trust and understanding of how AI systems work, which can hinder their acceptance and use. To mitigate these risks, experts suggest measures such as ensuring transparency, involving stakeholders in the design process, providing clear explanations, and conducting public consultations. Additionally, there are efforts to develop frameworks and guidelines, such as the National Institute of Standards and Technology (NIST) risk management framework, to manage these risks effectively. **Benefits of AI:** 1. **Efficiency and Productivity:** AI can automate repetitive tasks, leading to increased efficiency and productivity in various industries. 2. **Innovation:** AI drives innovation by enabling new applications and solutions that were not possible before. 3. **Improved Decision-Making:** AI can process large amounts of data quickly, providing insights that can improve decision-making processes. **Expert Opinions:** Experts argue for the benefits of AI in terms of its potential to transform industries and improve quality of life. However, they also caution against the risks, emphasizing the need for responsible development and deployment of AI technologies. The balance between leveraging AI's benefits and managing its risks is crucial for its successful integration into society. References: - [Blueprint for an AI Bill of Rights](https://www.whitehouse.gov/ostp/ai-bill-of-rights/) Time taken: 20.452504634857178
人工智能的风险和益处在专家中得到了广泛讨论,有几个关键点需要考虑。
人工智能的风险
- 偏见与歧视:AI 系统可以延续甚至放大其训练数据中存在的偏见,导致歧视性结果。
- 不透明的决策:AI 系统的决策过程可能难以理解,导致缺乏透明度。
- 公众信任:公众往往缺乏对 AI 系统工作原理的信任和理解,这可能会阻碍其接受和使用。
为了减轻这些风险,专家建议采取措施,例如确保透明度、让利益相关者参与设计过程、提供清晰的解释以及进行公众咨询。此外,还在努力开发框架和指南,例如美国国家标准与技术研究院 (NIST) 风险管理框架,以有效管理这些风险。
人工智能的益处
- 效率与生产力:AI 可以自动化重复性任务,从而提高各行业的效率和生产力。
- 创新:AI 通过实现以前不可能的新应用和解决方案来推动创新。
- 改进决策:AI 可以快速处理大量数据,提供可改进决策过程的见解。
专家意见:专家认为 AI 在变革行业和改善生活质量方面的潜力具有益处。然而,他们也警告要警惕风险,强调负责任地开发和部署 AI 技术。平衡利用 AI 的益处和管理其风险对于其成功融入社会至关重要。
参考文献