Groq¶
欢迎来到 Groq!🚀 Groq 开发了世界上第一个语言处理单元™(LPU)。Groq LPU 采用确定性的单核流式架构,为生成式 AI 推理速度树立了标杆,并为任何给定工作负载提供了可预测且可重复的性能。
除了架构,我们的软件旨在为您这样的开发者提供所需的工具,以创建创新、强大的 AI 应用。有了 Groq 作为引擎,您可以:
- 实现实时 AI 和 HPC 推理的极致低延迟和高性能 🔥
- 了解任何给定工作负载的精确性能和计算时间 🔮
- 利用我们尖端技术保持竞争优势 💪
想了解更多 Groq?请访问我们的网站获取更多资源,并加入我们的Discord 社区与开发者交流!
设置¶
如果您正在 Colab 上打开此笔记本,您可能需要安装 LlamaIndex 🦙。
输入 [ ]
已复制!
% pip install llama-index-llms-groq
% pip install llama-index-llms-groq
输入 [ ]
已复制!
!pip install llama-index
!pip install llama-index
输入 [ ]
已复制!
from llama_index.llms.groq import Groq
from llama_index.llms.groq import Groq
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
在Groq 控制台创建一个 API 密钥,然后将其设置为环境变量 GROQ_API_KEY
。
export GROQ_API_KEY=<your api key>
或者,您可以在初始化 LLM 时将 API 密钥传递给它
输入 [ ]
已复制!
llm = Groq(model="llama3-70b-8192", api_key="your_api_key")
llm = Groq(model="llama3-70b-8192", api_key="您的_api_密钥")
可用的 LLM 模型列表可以在这里找到。
输入 [ ]
已复制!
response = llm.complete("Explain the importance of low latency LLMs")
response = llm.complete("解释低延迟 LLM 的重要性")
输入 [ ]
已复制!
print(response)
print(response)
Low latency Large Language Models (LLMs) are important in certain applications due to their ability to process and respond to inputs quickly. Latency refers to the time delay between a user's request and the system's response. In some real-time or time-sensitive applications, low latency is critical to ensure a smooth user experience and prevent delays or lag. For example, in conversational agents or chatbots, users expect quick and responsive interactions. If the system takes too long to process and respond to user inputs, it can negatively impact the user experience and lead to frustration. Similarly, in applications such as real-time language translation or speech recognition, low latency is essential to provide accurate and timely feedback to the user. Furthermore, low latency LLMs can enable new use cases and applications that require real-time or near real-time processing of language inputs. For instance, in the field of autonomous vehicles, low latency LLMs can be used for real-time speech recognition and natural language understanding, enabling voice-controlled interfaces that allow drivers to keep their hands on the wheel and eyes on the road. In summary, low latency LLMs are important for providing a smooth and responsive user experience, enabling real-time or near real-time processing of language inputs, and unlocking new use cases and applications that require real-time or near real-time processing of language inputs.
使用消息列表调用 chat
¶
输入 [ ]
已复制!
from llama_index.core.llms import ChatMessage
messages = [
ChatMessage(
role="system", content="You are a pirate with a colorful personality"
),
ChatMessage(role="user", content="What is your name"),
]
resp = llm.chat(messages)
from llama_index.core.llms import ChatMessage messages = [ ChatMessage( role="system", content="您是一个具有丰富个性的海盗" ), ChatMessage(role="user", content="您的名字是什么"), ] resp = llm.chat(messages)
输入 [ ]
已复制!
print(resp)
print(resp)
assistant: Arr, I be known as Captain Redbeard, the fiercest pirate on the seven seas! But ye can call me Cap'n Redbeard for short. I'm a fearsome pirate with a love for treasure and adventure, and I'm always ready for a good time! Whether I'm swabbin' the deck or swiggin' grog, I'm always up for a bit of fun. So hoist the Jolly Roger and let's set sail for adventure, me hearties!
流式传输¶
使用 stream_complete
端点
输入 [ ]
已复制!
response = llm.stream_complete("Explain the importance of low latency LLMs")
response = llm.stream_complete("解释低延迟 LLM 的重要性")
输入 [ ]
已复制!
for r in response:
print(r.delta, end="")
for r in response: print(r.delta, end="")
Low latency Large Language Models (LLMs) are important in the field of artificial intelligence and natural language processing (NLP) due to several reasons: 1. Real-time applications: Low latency LLMs are essential for real-time applications such as chatbots, voice assistants, and real-time translation services. These applications require immediate responses, and high latency can result in a poor user experience. 2. Improved user experience: Low latency LLMs can provide a more seamless and responsive user experience. Users are more likely to continue using a service that provides quick and accurate responses, leading to higher user engagement and satisfaction. 3. Better decision-making: In some applications, such as financial trading or autonomous vehicles, low latency LLMs can provide critical information in real-time, enabling better decision-making and reducing the risk of accidents. 4. Scalability: Low latency LLMs can handle a higher volume of requests, making them more scalable and suitable for large-scale applications. 5. Competitive advantage: Low latency LLMs can provide a competitive advantage in industries where real-time decision-making and responsiveness are critical. For example, in online gaming or e-commerce, low latency LLMs can provide a more immersive and engaging user experience, leading to higher customer loyalty and revenue. In summary, low latency LLMs are essential for real-time applications, providing a better user experience, enabling better decision-making, improving scalability, and providing a competitive advantage. As LLMs continue to play an increasingly important role in various industries, low latency will become even more critical for their success.
使用 stream_chat
端点
输入 [ ]
已复制!
from llama_index.core.llms import ChatMessage
messages = [
ChatMessage(
role="system", content="You are a pirate with a colorful personality"
),
ChatMessage(role="user", content="What is your name"),
]
resp = llm.stream_chat(messages)
from llama_index.core.llms import ChatMessage messages = [ ChatMessage( role="system", content="您是一个具有丰富个性的海盗" ), ChatMessage(role="user", content="您的名字是什么"), ] resp = llm.stream_chat(messages)
输入 [ ]
已复制!
for r in resp:
print(r.delta, end="")
for r in resp: print(r.delta, end="")
Arr, I be known as Captain Candybeard! A more colorful and swashbuckling pirate, ye will never find!