How to perform vector search and find the semantic similarity of documents in Python?
Last updated 20, Apr 2024
Question
How to perform vector search and find the semantic similarity of documents in Python?
Answer
In order to perform Vector Similarity searches in Python, first create the index to execute the recommendations for similar documents. For the model all-distilroberta-v1
, make sure DIM
is 768
(see the example).
FT.CREATE vss_index ON HASH PREFIX 1 "doc:" SCHEMA name TEXT content TEXT creation NUMERIC SORTABLE update NUMERIC SORTABLE content_embedding VECTOR FLAT 6 TYPE FLOAT32 DIM 768 DISTANCE_METRIC COSINE
Modeling documents
Then import the modeling library, in order to use all-distilroberta-v1
, you must include the library SentenceTransformer
.
from sentence_transformers import SentenceTransformer
Now we need to produce a vectorial representation of the document. Use a suitable model to compute the vector embedding of the :
content = "This is an arbitrary content"
model = SentenceTransformer('sentence-transformers/all-distilroberta-v1')
embedding = model.encode(content).astype(np.float32).tobytes()
Now you can store the embedding in the Hash that
doc = { "content_embedding" : embedding,
"name" : "Document's title",
"state" : document.state}
conn.hset("doc:{}".format(pk), mapping=doc)
Searching for similar documents
In order to search for documents similar to a provided document, you will model the document as done previously, when creating a database of vector embeddings.
model = SentenceTransformer('sentence-transformers/all-distilroberta-v1')
new_embedding = model.encode(content).astype(np.float32).tobytes()
And then perform the similarity search.
q = Query("*=>[KNN 3 @v $vec]").return_field("__v_score").dialect(2)
res = conn.ft("vss_index").search(q, query_params={"vec": new_embedding})