嵌入相似度评估器¶
这个 notebook 演示了 SemanticSimilarityEvaluator
,它通过语义相似度评估问答系统的质量。
具体来说,它计算生成答案和参考答案的嵌入之间的相似度分数。
如果你在 Colab 上打开这个 Notebook,你可能需要安装 LlamaIndex 🦙。
In [ ]
已复制!
!pip install llama-index
!pip install llama-index
In [ ]
已复制!
from llama_index.core.evaluation import SemanticSimilarityEvaluator
evaluator = SemanticSimilarityEvaluator()
from llama_index.core.evaluation import SemanticSimilarityEvaluator evaluator = SemanticSimilarityEvaluator()
In [ ]
已复制!
# This evaluator only uses `response` and `reference`, passing in query does not influence the evaluation
# query = 'What is the color of the sky'
response = "The sky is typically blue"
reference = """The color of the sky can vary depending on several factors, including time of day, weather conditions, and location.
During the day, when the sun is in the sky, the sky often appears blue.
This is because of a phenomenon called Rayleigh scattering, where molecules and particles in the Earth's atmosphere scatter sunlight in all directions, and blue light is scattered more than other colors because it travels as shorter, smaller waves.
This is why we perceive the sky as blue on a clear day.
"""
result = await evaluator.aevaluate(
response=response,
reference=reference,
)
# This evaluator only uses `response` and `reference`, passing in query does not influence the evaluation # query = 'What is the color of the sky' response = "The sky is typically blue" reference = """The color of the sky can vary depending on several factors, including time of day, weather conditions, and location. During the day, when the sun is in the sky, the sky often appears blue. This is because of a phenomenon called Rayleigh scattering, where molecules and particles in the Earth's atmosphere scatter sunlight in all directions, and blue light is scattered more than other colors because it travels as shorter, smaller waves. This is why we perceive the sky as blue on a clear day. """ result = await evaluator.aevaluate( response=response, reference=reference, )
In [ ]
已复制!
print("Score: ", result.score)
print("Passing: ", result.passing) # default similarity threshold is 0.8
print("Score: ", result.score) print("Passing: ", result.passing) # default similarity threshold is 0.8
Score: 0.874911773340899 Passing: True
In [ ]
已复制!
response = "Sorry, I do not have sufficient context to answer this question."
reference = """The color of the sky can vary depending on several factors, including time of day, weather conditions, and location.
During the day, when the sun is in the sky, the sky often appears blue.
This is because of a phenomenon called Rayleigh scattering, where molecules and particles in the Earth's atmosphere scatter sunlight in all directions, and blue light is scattered more than other colors because it travels as shorter, smaller waves.
This is why we perceive the sky as blue on a clear day.
"""
result = await evaluator.aevaluate(
response=response,
reference=reference,
)
response = "Sorry, I do not have sufficient context to answer this question." reference = """The color of the sky can vary depending on several factors, including time of day, weather conditions, and location. During the day, when the sun is in the sky, the sky often appears blue. This is because of a phenomenon called Rayleigh scattering, where molecules and particles in the Earth's atmosphere scatter sunlight in all directions, and blue light is scattered more than other colors because it travels as shorter, smaller waves. This is why we perceive the sky as blue on a clear day. """ result = await evaluator.aevaluate( response=response, reference=reference, )
In [ ]
已复制!
print("Score: ", result.score)
print("Passing: ", result.passing) # default similarity threshold is 0.8
print("Score: ", result.score) print("Passing: ", result.passing) # default similarity threshold is 0.8
Score: 0.7221738929165528 Passing: False
定制¶
In [ ]
已复制!
from llama_index.core.evaluation import SemanticSimilarityEvaluator
from llama_index.core.embeddings import SimilarityMode, resolve_embed_model
embed_model = resolve_embed_model("local")
evaluator = SemanticSimilarityEvaluator(
embed_model=embed_model,
similarity_mode=SimilarityMode.DEFAULT,
similarity_threshold=0.6,
)
from llama_index.core.evaluation import SemanticSimilarityEvaluator from llama_index.core.embeddings import SimilarityMode, resolve_embed_model embed_model = resolve_embed_model("local") evaluator = SemanticSimilarityEvaluator( embed_model=embed_model, similarity_mode=SimilarityMode.DEFAULT, similarity_threshold=0.6, )
In [ ]
已复制!
response = "The sky is yellow."
reference = "The sky is blue."
result = await evaluator.aevaluate(
response=response,
reference=reference,
)
response = "The sky is yellow." reference = "The sky is blue." result = await evaluator.aevaluate( response=response, reference=reference, )
In [ ]
已复制!
print("Score: ", result.score)
print("Passing: ", result.passing)
print("Score: ", result.score) print("Passing: ", result.passing)
Score: 0.9178505509625874 Passing: True
这里我们注意到,高分并不意味着答案总是正确的。
嵌入相似度主要捕捉“相关性”的概念。由于响应和参考答案都讨论了“天空”和颜色,因此它们在语义上是相似的。