This tutorial demonstrates how to perform semantic search on product images using LangChain (OpenAI) and Redis. Specifically, we'll cover the following topics:
LangChain is an innovative library for building language model applications. It offers a structured way to combine different components like language models (e.g., OpenAI's models), storage solutions (like Redis), and custom logic. This modular approach facilitates the creation of sophisticated AI applications.
OpenAI provides advanced language models like GPT-3, which have revolutionized the field with their ability to understand and generate human-like text. These models form the backbone of many modern AI applications including semantic text/ image search and chatbots.
Below is a command to the clone the source code for the application used in this tutorial
git clone --branch v9.2.0 https://github.com/redis-developer/redis-microservices-ecommerce-solutions
Lets take a look at the architecture of the demo application:
products service
: handles querying products from the database and returning them to the frontendorders service
: handles validating and creating ordersorder history service
: handles querying a customer's order historypayments service
: handles processing orders for paymentapi gateway
: unifies the services under a single endpointmongodb/ postgresql
: serves as the write-optimized database for storing orders, order history, products, etc.You don't need to use MongoDB/ Postgresql as your write-optimized database in the demo application; you can use other prisma supported databases as well. This is just an example.
The e-commerce microservices application consists of a frontend, built using Next.js with TailwindCSS. The application backend uses Node.js. The data is stored in Redis and either MongoDB or PostgreSQL, using Prisma. Below are screenshots showcasing the frontend of the e-commerce app.
Dashboard: Displays a list of products with different search functionalities, configurable in the settings page.
Settings: Accessible by clicking the gear icon at the top right of the dashboard. Control the search bar, chatbot visibility, and other features here.
Dashboard (Semantic Text Search): Configured for semantic text search, the search bar enables natural language queries. Example: "pure cotton blue shirts."
Dashboard (Semantic Image-Based Queries): Configured for semantic image summary search, the search bar allows for image-based queries. Example: "Left chest nike logo."
Chat Bot: Located at the bottom right corner of the page, assisting in product searches and detailed views.
Selecting a product in the chat displays its details on the dashboard.
Shopping Cart: Add products to the cart and check out using the "Buy Now" button.
Order History: Post-purchase, the 'Orders' link in the top navigation bar shows the order status and history.
Admin Panel: Accessible via the 'admin' link in the top navigation. Displays purchase statistics and trending products.
Sign up for an OpenAI account to get your API key to be used in the demo (add OPEN_AI_API_KEY variable in .env file). You can also refer to the OpenAI API documentation for more information.
Below is a command to the clone the source code for the application used in this tutorial
git clone --branch v9.2.0 https://github.com/redis-developer/redis-microservices-ecommerce-solutions
In this tutorial, we'll use a simplified e-commerce dataset. Specifically, our JSON structure includes product
details and a key named styleImages_default_imageURL
, which links to an image of the product. This image will be the focus of our AI-driven semantic search.
const products = [
{
productId: '11000',
price: 3995,
productDisplayName: 'Puma Men Slick 3HD Yellow Black Watches',
variantName: 'Slick 3HD Yellow',
brandName: 'Puma',
// Additional product details...
styleImages_default_imageURL:
'http://host.docker.internal:8080/images/11000.jpg',
// Other properties...
},
// Additional products...
];
The following code segment outlines the process of generating a text summary for a product image using OpenAI's capabilities. We'll first convert the image URL to a base64 string using fetchImageAndConvertToBase64
function and then utilize OpenAI to generate a summary of the image using getOpenAIImageSummary
function.
import {
ChatOpenAI,
ChatOpenAICallOptions,
} from 'langchain/chat_models/openai';
import { HumanMessage } from 'langchain/schema';
import { Document } from 'langchain/document';
import { OpenAIEmbeddings } from 'langchain/embeddings/openai';
import { RedisVectorStore } from 'langchain/vectorstores/redis';
let llm: ChatOpenAI<ChatOpenAICallOptions>;
// Instantiates the LangChain ChatOpenAI instance
const getOpenAIVisionInstance = (_openAIApiKey: string) => {
//OpenAI supports images with text in input messages with their gpt-4-vision-preview.
if (!llm) {
llm = new ChatOpenAI({
openAIApiKey: _openAIApiKey,
modelName: 'gpt-4-vision-preview',
maxTokens: 1024,
});
}
return llm;
};
const fetchImageAndConvertToBase64 = async (_imageURL: string) => {
let base64Image = '';
try {
const response = await axios.get(_imageURL, {
responseType: 'arraybuffer',
});
// Convert image to Base64
base64Image = Buffer.from(response.data, 'binary').toString('base64');
} catch (error) {
console.error(
`Error fetching or converting the image: ${_imageURL}`,
error,
);
}
return base64Image;
};
// Generates an OpenAI summary for a given base64 image string
const getOpenAIImageSummary = async (
_openAIApiKey: string,
_base64Image: string,
_product: Prisma.ProductCreateInput,
) => {
/*
Reference : https://js.langchain.com/docs/integrations/chat/openai#multimodal-messages
- This function utilizes OpenAI's multimodal capabilities to generate a summary from the image.
- It constructs a prompt that combines the product description with the image.
- OpenAI's vision model then processes this prompt to generate a detailed summary.
*/
let imageSummary = '';
try {
if (_openAIApiKey && _base64Image && _product) {
const llmInst = getOpenAIVisionInstance(_openAIApiKey);
const text = `Below are the product details and image of an e-commerce product for reference. Please conduct and provide a comprehensive analysis of the product depicted in the image .
Product Details:
${_product.productDescriptors_description_value}
Image:
`;
// Constructing a multimodal message combining text and image
const imagePromptMessage = new HumanMessage({
content: [
{
type: 'text',
text: text,
},
{
type: 'image_url',
image_url: {
url: `data:image/jpeg;base64,${_base64Image}`,
detail: 'high', // low, high (if you want more detail)
},
},
],
});
// Invoking the LangChain ChatOpenAI model with the constructed message
const response = await llmInst.invoke([imagePromptMessage]);
if (response?.content) {
imageSummary = <string>response.content;
}
}
} catch (err) {
console.log(
`Error generating OpenAIImageSummary for product id ${_product.productId}`,
err,
);
}
return imageSummary;
};
The following section demonstrates the result of the above process. We'll use the image of a Puma T-shirt and generate a summary using OpenAI's capabilities.
Comprehensive summary generated by the OpenAI model is as follows:
This product is a black round neck T-shirt featuring a design consistent with the Puma brand aesthetic, which includes their iconic leaping cat logo in a contrasting yellow color placed prominently across the chest area. The T-shirt is made from 100% cotton, suggesting it is likely to be breathable and soft to the touch. It has a classic short-sleeve design with a ribbed neckline for added texture and durability. There is also mention of a vented hem, which may offer additional comfort and mobility.
The T-shirt is described to have a 'comfort' fit, which typically means it is designed to be neither too tight nor too loose, allowing for ease of movement without being baggy. This could be ideal for casual wear or active use.
Care instructions are also comprehensive, advising a gentle machine wash with similar colors in cool water at 30 degrees Celsius, indicating it is relatively easy to care for. However, one should avoid bleaching, tumble drying, and dry cleaning it, but a warm iron is permissible.
Looking at the image provided:
- The T-shirt appears to fit the model well, in accordance with the described 'comfort' fit.
- The color contrast between the T-shirt and the graphic gives the garment a modern, sporty look.
- The model is paired with denim jeans, showcasing the T-shirt's versatility for casual occasions. However, the product description suggests it can be part of an athletic ensemble when combined with Puma shorts and shoes.
- Considering the model's statistics, prospective buyers could infer how this T-shirt might fit on a person with similar measurements.
Overall, the T-shirt is positioned as a versatile item suitable for both lifestyle and sporting activities, with a strong brand identity through the graphic, and is likely comfortable and easy to maintain based on the product details provided.
The addImageSummaryEmbeddingsToRedis
function plays a critical role in integrating AI-generated image summaries with Redis. This process involves two main steps:
getImageSummaryVectorDocuments
function, we transform image summaries into vector documents. This transformation is crucial as it converts textual summaries into a format suitable for Redis storage.seedImageSummaryEmbeddings
function is then employed to store these vector documents into Redis. This step is essential for enabling efficient retrieval and search capabilities within the Redis database.// Function to generate vector documents from image summaries
const getImageSummaryVectorDocuments = async (
_products: Prisma.ProductCreateInput[],
_openAIApiKey: string,
) => {
const vectorDocs: Document[] = [];
if (_products?.length > 0) {
let count = 1;
for (let product of _products) {
if (product) {
let imageURL = product.styleImages_default_imageURL; //cdn url
const imageData = await fetchImageAndConvertToBase64(imageURL);
imageSummary = await getOpenAIImageSummary(
_openAIApiKey,
imageData,
product,
);
console.log(
`openAI imageSummary #${count++} generated for product id: ${
product.productId
}`,
);
if (imageSummary) {
let doc = new Document({
metadata: {
productId: product.productId,
imageURL: imageURL,
},
pageContent: imageSummary,
});
vectorDocs.push(doc);
}
}
}
}
return vectorDocs;
};
// Seeding vector documents into Redis
const seedImageSummaryEmbeddings = async (
vectorDocs: Document[],
_redisClient: NodeRedisClientType,
_openAIApiKey: string,
) => {
if (vectorDocs?.length && _redisClient && _openAIApiKey) {
const embeddings = new OpenAIEmbeddings({
openAIApiKey: _openAIApiKey,
});
const vectorStore = await RedisVectorStore.fromDocuments(
vectorDocs,
embeddings,
{
redisClient: _redisClient,
indexName: 'openAIProductImgIdx',
keyPrefix: 'openAIProductImgText:',
},
);
console.log('seeding imageSummaryEmbeddings completed');
}
};
const addImageSummaryEmbeddingsToRedis = async (
_products: Prisma.ProductCreateInput[],
_redisClient: NodeRedisClientType,
_openAIApiKey: string,
) => {
const vectorDocs = await getImageSummaryVectorDocuments(
_products,
_openAIApiKey,
);
await seedImageSummaryEmbeddings(vectorDocs, _redisClient, _openAIApiKey);
};
The image below shows the JSON structure of openAI image summary within RedisInsight.
Download RedisInsight to visually explore your Redis data or to engage with raw Redis commands in the workbench.
This section covers the API request and response structure for getProductsByVSSImageSummary
, which is essential for retrieving products based on semantic search using image summaries.
Request Format
The example request format for the API is as follows:
POST http://localhost:3000/products/getProductsByVSSImageSummary
{
"searchText":"Left chest nike logo",
//optional
"maxProductCount": 4, // 2 (default)
"similarityScoreLimit":0.2, // 0.2 (default)
}
Response Structure
The response from the API is a JSON object containing an array of product details that match the semantic search criteria:
{
"data": [
{
"productId": "10017",
"price": 3995,
"productDisplayName": "Nike Women As The Windru Blue Jackets",
"brandName": "Nike",
"styleImages_default_imageURL": "http://host.docker.internal:8080/products/01/10017/product-img.webp",
"productDescriptors_description_value": " Blue and White jacket made of 100% polyester, with an interior pocket ...",
"stockQty": 25,
"similarityScore": 0.163541972637,
"imageSummary": "The product in the image is a blue and white jacket featuring a design consistent with the provided description. ..."
}
// Additional products...
],
"error": null,
"auth": "SES_fd57d7f4-3deb-418f-9a95-6749cd06e348"
}
The backend implementation of this API involves following steps:
getProductsByVSSImageSummary
function handles the API Request.getSimilarProductsScoreByVSSImageSummary
function performs semantic search on image summaries. It integrates with OpenAI's semantic analysis capabilities to interpret the searchText and identify relevant products from Redis vector store.const getSimilarProductsScoreByVSSImageSummary = async (
_params: IParamsGetProductsByVSS,
) => {
let {
standAloneQuestion,
openAIApiKey,
//optional
KNN,
scoreLimit,
} = _params;
let vectorDocs: Document[] = [];
const client = getNodeRedisClient();
KNN = KNN || 2;
scoreLimit = scoreLimit || 1;
const embeddings = new OpenAIEmbeddings({
openAIApiKey: openAIApiKey,
});
// create vector store
const vectorStore = new RedisVectorStore(embeddings, {
redisClient: client,
indexName: 'openAIProductImgIdx',
keyPrefix: 'openAIProductImgText:',
});
// search for similar products
const vectorDocsWithScore = await vectorStore.similaritySearchWithScore(
standAloneQuestion,
KNN,
);
// filter by scoreLimit
for (let [doc, score] of vectorDocsWithScore) {
if (score <= scoreLimit) {
doc['similarityScore'] = score;
vectorDocs.push(doc);
}
}
return vectorDocs;
};
const getProductsByVSSImageSummary = async (
productsVSSFilter: IProductsVSSBodyFilter,
) => {
let { searchText, maxProductCount, similarityScoreLimit } = productsVSSFilter;
let products: IProduct[] = [];
const openAIApiKey = process.env.OPEN_AI_API_KEY || '';
maxProductCount = maxProductCount || 2;
similarityScoreLimit = similarityScoreLimit || 0.2;
//VSS search
const vectorDocs = await getSimilarProductsScoreByVSSImageSummary({
standAloneQuestion: searchText,
openAIApiKey: openAIApiKey,
KNN: maxProductCount,
scoreLimit: similarityScoreLimit,
});
if (vectorDocs?.length) {
const productIds = vectorDocs.map((doc) => doc?.metadata?.productId);
//get product with details
products = await getProductByIds(productIds, true);
}
//...
return products;
};
Semantic image summary search
option is enabled in the settings page.Left chest nike logo
, the search results will display products like a Nike jacket, characterized by a logo on its left chest, reflecting the query.Performing semantic search on image summaries is a powerful tool for e-commerce applications. It allows users to search for products based on their descriptions or images, enabling a more intuitive and efficient shopping experience. This tutorial has demonstrated how to integrate OpenAI's semantic analysis capabilities with Redis to create a robust search engine for e-commerce applications.