Similarity and Ambiguity Index
Updated
Similarity refers to the measurement of how similar two or more pieces of training content are in terms of their content. It involves comparing content and determining their degree of resemblance or likeness. A similar content may lead to the hallucination of response, as LLM may get confused.
Ambiguity refers to the measurement of how two or more pieces of training content provide ambiguous content for the same context. It involves comparing one paragraph of text from a content source to all other content sources. A disambiguous content can lead to different answers for the same query.
Generating the Similarity and Ambiguity Reports
You have the capability to generate Similarity and Ambiguity Reports for content sources using the options available beneath the summary bar. Here, you can also view the date and time of the last generated report.
Similarity Report
The similarity report compares the content of one source with other existing content sources in your Twin's knowledge base. A high similarity will be marked by a red icon and you can view reports to check for similar content for content sources will high similarity.
To generate the similarity report for all content sources, click Generate Similarity Report at the top. Once generated, access the report by clicking the View Report icon next to the index. The report highlights content from different sources that share similarities with the selected source. Additionally, a red icon will appear to indicate high similarity within the report.
Identifying and addressing high similarity instances is crucial, as redundant information can strain the model's processing efficiency.
Ambiguity Report
Similarly, you can generate an ambiguity report to assess the ambiguity index, highlighting contradictory information between content sources. To initiate the ambiguity report for all content sources, click Generate Ambiguity Report at the top. Once generated, access the report by clicking the View Report icon next to the index. This report identifies and highlights ambiguity across various sources.
Identifying and addressing high ambiguity instances is crucial, as it helps prevent the Twin from sharing incorrect information. Additionally, you'll notice a red icon indicating high ambiguity within the report.