Edit

Use ai.similarity with pandas

The ai.similarity function compares text by meaning. Compare one column with a single reference value or with pairwise values in another column.

Note

Overview

The ai.similarity function extends the pandas Series class.

To calculate the semantic similarity of each input row for a single common text value, call the function on a pandas DataFrame text column. The function can also calculate the semantic similarity of each row for corresponding pairwise values in another column that has the same dimensions as the input column.

The function returns a pandas Series that contains similarity scores, which can be stored in a new DataFrame column.

Syntax

df["similarity"] = df["col1"].ai.similarity("value")

Parameters

Name Description
other
Required
A string that contains either:
- A single common text value, which is used to compute similarity scores for each input row.
- Another pandas Series with the same dimensions as the input. It contains text values to use to compute pairwise similarity scores for each input row.

Returns

The function returns a pandas Series that contains similarity scores for each input text row. The output similarity scores are relative, and they're best used for ranking. Score values can range from -1 (opposites) to 1 (identical). A score value of 0 indicates that the values are unrelated in meaning.

Example

# This code uses AI. Always review output for mistakes.

df = pd.DataFrame([ 
        ("Bill Gates"), 
        ("Satya Nadella"), 
        ("Joan of Arc")
    ], columns=["name"])
    
df["similarity"] = df["name"].ai.similarity("Microsoft")
display(df)

Output:

Screenshot of a data frame with columns 'name' and 'similarity'. The 'similarity' column contains similarity scores for the names and input word.