OCI 数据科学

OCIDataScienceEmbedding #

基础: BaseEmbedding

适用于 OCI Data Science 模型的 Embedding 类。

此类提供使用部署在 Oracle Cloud Infrastructure (OCI) Data Science 上的模型生成嵌入的方法。它支持同步和异步请求，并处理身份验证、批处理和其他配置。

设置

安装所需的包

pip install -U oracle-ads llama-index-embeddings-oci-data-science

使用 ads.set_auth() 配置身份验证。例如，使用 OCI 资源主体进行身份验证

import ads
ads.set_auth("resource_principal")

有关身份验证的更多详细信息，请参阅：https://accelerated-data-science.readthedocs.io/en/stable/user_guide/cli/authentication.html

确保您具有访问 OCI Data Science 模型部署端点所需的策略：https://docs.oracle.com/en-us/iaas/data-science/using/model-dep-policies-auth.htm

要了解如何在 OCI Data Science 中部署 LLM 模型，请参阅：https://docs.oracle.com/en-us/iaas/data-science/using/ai-quick-actions-model-deploy.htm

示例

基本用法

import ads
from llama_index.embeddings.oci_data_science import OCIDataScienceEmbedding

ads.set_auth(auth="security_token", profile="OC1")

embeddings = OCIDataScienceEmbedding(
    endpoint="https://<MD_OCID>/predict",
)

e1 = embeddings.get_text_embedding("This is a test document")
print(e1)

e2 = embeddings.get_text_embedding_batch([
    "This is a test document",
    "This is another test document"
])
print(e2)

异步用法

import ads
import asyncio
from llama_index.embeddings.oci_data_science import OCIDataScienceEmbedding

ads.set_auth(auth="security_token", profile="OC1")

embeddings = OCIDataScienceEmbedding(
    endpoint="https://<MD_OCID>/predict",
)

async def async_embedding():
    e1 = await embeddings.aget_query_embedding("This is a test document")
    print(e1)

asyncio.run(async_embedding())

属性

名称	类型	描述
`endpoint`	`str`	部署模型的端点的 URI。
`auth`	`Dict[str, Any]`	用于 OCI API 请求的身份验证字典。
`model_name`	`str`	OCI Data Science embedding 模型的名称。
`embed_batch_size`	`int`	embedding 调用的批处理大小。
`additional_kwargs`	`Dict[str, Any]`	OCI Data Science AI 请求的其他关键字参数。
`default_headers`	`Dict[str, str]`	API 请求的默认头部信息。

Source code in

llama-index-integrations/embeddings/llama-index-embeddings-oci-data-science/llama_index/embeddings/oci_data_science/base.py

class OCIDataScienceEmbedding(BaseEmbedding):
    """
    Embedding class for OCI Data Science models.

    This class provides methods to generate embeddings using models deployed on
    Oracle Cloud Infrastructure (OCI) Data Science. It supports both synchronous
    and asynchronous requests and handles authentication, batching, and other
    configurations.

    Setup:
        Install the required packages:
        ```bash
        pip install -U oracle-ads llama-index-embeddings-oci-data-science
        ```

        Configure authentication using `ads.set_auth()`. For example, to use OCI
        Resource Principal for authentication:
        ```python
        import ads
        ads.set_auth("resource_principal")
        ```

        For more details on authentication, see:
        https://accelerated-data-science.readthedocs.io/en/stable/user_guide/cli/authentication.html

        Ensure you have the required policies to access the OCI Data Science Model
        Deployment endpoint:
        https://docs.oracle.com/en-us/iaas/data-science/using/model-dep-policies-auth.htm

        To learn more about deploying LLM models in OCI Data Science, see:
        https://docs.oracle.com/en-us/iaas/data-science/using/ai-quick-actions-model-deploy.htm

    Examples:
        Basic Usage:
        ```python
        import ads
        from llama_index.embeddings.oci_data_science import OCIDataScienceEmbedding

        ads.set_auth(auth="security_token", profile="OC1")

        embeddings = OCIDataScienceEmbedding(
            endpoint="https://<MD_OCID>/predict",
        )

        e1 = embeddings.get_text_embedding("This is a test document")
        print(e1)

        e2 = embeddings.get_text_embedding_batch([
            "This is a test document",
            "This is another test document"
        ])
        print(e2)
        ```

        Asynchronous Usage:
        ```python
        import ads
        import asyncio
        from llama_index.embeddings.oci_data_science import OCIDataScienceEmbedding

        ads.set_auth(auth="security_token", profile="OC1")

        embeddings = OCIDataScienceEmbedding(
            endpoint="https://<MD_OCID>/predict",
        )

        async def async_embedding():
            e1 = await embeddings.aget_query_embedding("This is a test document")
            print(e1)

        asyncio.run(async_embedding())
        ```

    Attributes:
        endpoint (str): The URI of the endpoint from the deployed model.
        auth (Dict[str, Any]): The authentication dictionary used for OCI API requests.
        model_name (str): The name of the OCI Data Science embedding model.
        embed_batch_size (int): The batch size for embedding calls.
        additional_kwargs (Dict[str, Any]): Additional keyword arguments for the OCI Data Science AI request.
        default_headers (Dict[str, str]): The default headers for API requests.

    """

    endpoint: str = Field(
        default=None, description="The URI of the endpoint from the deployed model."
    )

    auth: Union[Dict[str, Any], None] = Field(
        default_factory=dict,
        exclude=True,
        description=(
            "The authentication dictionary used for OCI API requests. "
            "If not provided, it will be autogenerated based on environment variables."
        ),
    )
    model_name: Optional[str] = Field(
        default=DEFAULT_MODEL,
        description="The name of the OCI Data Science embedding model to use.",
    )

    embed_batch_size: int = Field(
        default=DEFAULT_EMBED_BATCH_SIZE,
        description="The batch size for embedding calls.",
        gt=0,
        le=2048,
    )

    max_retries: int = Field(
        default=DEFAULT_MAX_RETRIES,
        description="The maximum number of API retries.",
        ge=0,
    )

    timeout: float = Field(
        default=DEFAULT_TIMEOUT, description="The timeout to use in seconds.", ge=0
    )

    additional_kwargs: Optional[Dict[str, Any]] = Field(
        default_factory=dict,
        description="Additional keyword arguments for the OCI Data Science AI request.",
    )

    default_headers: Optional[Dict[str, str]] = Field(
        default_factory=dict, description="The default headers for API requests."
    )

    _client: Client = PrivateAttr()
    _async_client: AsyncClient = PrivateAttr()

    def __init__(
        self,
        endpoint: str,
        model_name: Optional[str] = DEFAULT_MODEL,
        auth: Dict[str, Any] = None,
        timeout: Optional[float] = DEFAULT_TIMEOUT,
        max_retries: Optional[int] = DEFAULT_MAX_RETRIES,
        embed_batch_size: int = DEFAULT_EMBED_BATCH_SIZE,
        additional_kwargs: Optional[Dict[str, Any]] = None,
        default_headers: Optional[Dict[str, str]] = None,
        callback_manager: Optional[CallbackManager] = None,
        **kwargs: Any
    ) -> None:
        """
        Initialize the OCIDataScienceEmbedding instance.

        Args:
            endpoint (str): The URI of the endpoint from the deployed model.
            model_name (Optional[str]): The name of the OCI Data Science embedding model to use. Defaults to "odsc-embeddings".
            auth (Optional[Dict[str, Any]]): The authentication dictionary for OCI API requests. Defaults to None.
            timeout (Optional[float]): The timeout setting for the HTTP request in seconds. Defaults to 120.
            max_retries (Optional[int]): The maximum number of retry attempts for the request. Defaults to 5.
            embed_batch_size (int): The batch size for embedding calls. Defaults to DEFAULT_EMBED_BATCH_SIZE.
            additional_kwargs (Optional[Dict[str, Any]]): Additional arguments for the OCI Data Science AI request. Defaults to None.
            default_headers (Optional[Dict[str, str]]): The default headers for API requests. Defaults to None.
            callback_manager (Optional[CallbackManager]): A callback manager for handling events during embedding operations. Defaults to None.
            **kwargs: Additional keyword arguments.

        """
        super().__init__(
            model_name=model_name,
            endpoint=endpoint,
            auth=auth,
            embed_batch_size=embed_batch_size,
            timeout=timeout,
            max_retries=max_retries,
            additional_kwargs=additional_kwargs or {},
            default_headers=default_headers or {},
            callback_manager=callback_manager,
            **kwargs
        )

    @model_validator(mode="before")
    # @_validate_dependency
    def validate_env(cls, values: Dict[str, Any]) -> Dict[str, Any]:
        """
        Validate the environment and dependencies before initialization.

        Args:
            values (Dict[str, Any]): The values passed to the model.

        Returns:
            Dict[str, Any]: The validated values.

        Raises:
            ImportError: If required dependencies are missing.

        """
        return values

    @property
    def client(self) -> Client:
        """
        Return the synchronous client instance.

        Returns:
            Client: The synchronous client for interacting with the OCI Data Science Model Deployment endpoint.

        """
        if not hasattr(self, "_client") or self._client is None:
            self._client = Client(
                endpoint=self.endpoint,
                auth=self.auth,
                retries=self.max_retries,
                timeout=self.timeout,
            )
        return self._client

    @property
    def async_client(self) -> AsyncClient:
        """
        Return the asynchronous client instance.

        Returns:
            AsyncClient: The asynchronous client for interacting with the OCI Data Science Model Deployment endpoint.

        """
        if not hasattr(self, "_async_client") or self._async_client is None:
            self._async_client = AsyncClient(
                endpoint=self.endpoint,
                auth=self.auth,
                retries=self.max_retries,
                timeout=self.timeout,
            )
        return self._async_client

    @classmethod
    def class_name(cls) -> str:
        """
        Get the class name.

        Returns:
            str: The name of the class.

        """
        return "OCIDataScienceEmbedding"

    def _get_query_embedding(self, query: str) -> List[float]:
        """
        Generate an embedding for a query string.

        Args:
            query (str): The query string for which to generate an embedding.

        Returns:
            List[float]: The embedding vector for the query.

        """
        return self.client.embeddings(
            input=query, payload=self.additional_kwargs, headers=self.default_headers
        )["data"][0]["embedding"]

    def _get_text_embedding(self, text: str) -> List[float]:
        """
        Generate an embedding for a text string.

        Args:
            text (str): The text string for which to generate an embedding.

        Returns:
            List[float]: The embedding vector for the text.

        """
        return self.client.embeddings(
            input=text, payload=self.additional_kwargs, headers=self.default_headers
        )["data"][0]["embedding"]

    async def _aget_text_embedding(self, text: str) -> List[float]:
        """
        Asynchronously generate an embedding for a text string.

        Args:
            text (str): The text string for which to generate an embedding.

        Returns:
            List[float]: The embedding vector for the text.

        """
        response = await self.async_client.embeddings(
            input=text, payload=self.additional_kwargs, headers=self.default_headers
        )
        return response["data"][0]["embedding"]

    def _get_text_embeddings(self, texts: List[str]) -> List[List[float]]:
        """
        Generate embeddings for a list of text strings.

        Args:
            texts (List[str]): A list of text strings for which to generate embeddings.

        Returns:
            List[List[float]]: A list of embedding vectors corresponding to the input texts.

        """
        response = self.client.embeddings(
            input=texts, payload=self.additional_kwargs, headers=self.default_headers
        )
        return [raw["embedding"] for raw in response["data"]]

    async def _aget_query_embedding(self, query: str) -> List[float]:
        """
        Asynchronously generate an embedding for a query string.

        Args:
            query (str): The query string for which to generate an embedding.

        Returns:
            List[float]: The embedding vector for the query.

        """
        response = await self.async_client.embeddings(
            input=query, payload=self.additional_kwargs, headers=self.default_headers
        )
        return response["data"][0]["embedding"]

    async def _aget_text_embeddings(self, texts: List[str]) -> List[List[float]]:
        """
        Asynchronously generate embeddings for a list of text strings.

        Args:
            texts (List[str]): A list of text strings for which to generate embeddings.

        Returns:
            List[List[float]]: A list of embedding vectors corresponding to the input texts.

        """
        response = await self.async_client.embeddings(
            input=texts, payload=self.additional_kwargs, headers=self.default_headers
        )
        return [raw["embedding"] for raw in response["data"]]

client `property` #

client: Client

返回同步客户端实例。

返回值

名称	类型	描述
`Client`	`Client`	用于与 OCI Data Science 模型部署端点交互的同步客户端。

async_client `property` #

async_client: AsyncClient

返回异步客户端实例。

返回值

名称	类型	描述
`AsyncClient`	`AsyncClient`	用于与 OCI Data Science 模型部署端点交互的异步客户端。

validate_env #

validate_env(values: Dict[str, Any]) -> Dict[str, Any]

在初始化之前验证环境和依赖项。

参数

名称	类型	描述	默认值
`values`	`Dict[str, Any]`	传递给模型的值。	required

返回值

类型	描述
`Dict[str, Any]`	Dict[str, Any]: 已验证的值。

抛出

类型	描述
`ImportError`	如果缺少必需的依赖项。

Source code in

llama-index-integrations/embeddings/llama-index-embeddings-oci-data-science/llama_index/embeddings/oci_data_science/base.py

@model_validator(mode="before")
# @_validate_dependency
def validate_env(cls, values: Dict[str, Any]) -> Dict[str, Any]:
    """
    Validate the environment and dependencies before initialization.

    Args:
        values (Dict[str, Any]): The values passed to the model.

    Returns:
        Dict[str, Any]: The validated values.

    Raises:
        ImportError: If required dependencies are missing.

    """
    return values

class_name `classmethod` #

class_name() -> str

获取类名。

返回值

名称	类型	描述
`str`	`str`	类的名称。

Source code in

llama-index-integrations/embeddings/llama-index-embeddings-oci-data-science/llama_index/embeddings/oci_data_science/base.py

@classmethod
def class_name(cls) -> str:
    """
    Get the class name.

    Returns:
        str: The name of the class.

    """
    return "OCIDataScienceEmbedding"

OCI 数据科学

OCIDataScienceEmbedding #

client property #

async_client property #

validate_env #

class_name classmethod #

client `property` #

async_client `property` #

class_name `classmethod` #