Home
/
Blog
/
/
Research Blog

Building an AI-powered Semantic Search Engine

05 Apr 2025
5 min read
Semantic Search Engine

1. Introduction

In this guide, we'll explore how to build a simple yet powerful semantic search engine using
Node.js, MongoDB and vector embeddings generated by the Xenova/multi-qa-MiniLM-L6-cos-v1
model.
This application allows you to:

  • Insert user profiles.
  • Generate embeddings for their bios
  • Search for them using semantic similarity based on contextual meanings rather than exact
  • keyword matching.

2. Project Prerequisites

Before running the script, make sure you have the following installed :

2.1. Install Node.js and npm

If you don't have Node.js installed, download it from:

Node.js Official Website

To verify your installation:

node -v
npm -v

2.2 Setting up MongoDB:

Before building the search engine, you need to install MongoDB and create a new database.

2.2.1 Install MongoDB

To get started, you need to install MongoDB. Follow the steps below for your respective operating
system.

On Ubuntu

1. Import the MongoDB public GPG Key:

wget -qO - https://www.mongodb.org/static/pgp/server-7.0.asc | gpg --dearmor | sudo tee/usr/share/keyrings/mongodb-server-7.0.gpg > /dev/null

2. Create a list file for MongoDB:

echo "deb [ signed-by=/usr/share/keyrings/mongodb-server-7.0.gpg ] https://repo.mongodb.org/apt/ubuntufocal/mongodb-org/7.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-7.0.list

3. Install MongoDB:

sudo apt update
sudo apt install -y mongodb-org

4. Start the MongoDB service:

sudo systemctl start mongod 
sudo systemctl enable mongodb

5. Verify the installation:

mongod --version
On Windows

1. Download the MongoDB installer from:

2. MongoDB Download Center

3. Follow the installation instructions and start the MongoDB service.

4. Verify the installation by opening Command Prompt and running:

mongod --version

5. Start the MongoDB service:

net start MongoDB
On macOS

Install MongoDB using Homebrew:

brew tap mongodb/brew
brew install mongodb-community@7.0
brew services start mongodb/brew/mongodb-community

2.2.2 Running MongoDB

Once MongoDB is installed, run the server:

mongod

To connect to MongoDB:

mongo

✅ You’re now ready to use MongoDB!

2.2.3 Creating a MongoDB Database

1. Open MongoDB shell by running:

mongosh

2. Create a new database:

use <YOUR_DB_NAME>

3. Create a users collection:

db.createCollection("<YOUR_COLLECTION_NAME>")

4. Verify that the collection was created:

show collections

2.3 Installing Dependencies

1. Initialize a Node.js project (if not already created):

npm init -y

2. Install the required dependencies:

npm install express cors body-parser mongodb @xenova/transformers

2.4 Connecting MongoDB to Node.js

To connect MongoDB to your Node.js project, follow these steps:

Use the following connection string in your index.js file:

const MONGO_URI = "mongodb://localhost:27017";
const DB_NAME = "<YOUR_DB_NAME>";
  const COLLECTION_NAME = "<YOUR_COLLECTION_NAME>";
    const client = new MongoClient(MONGO_URI);await client.connect();
    const db = client.db(DB_NAME);const usersCollection = db.collection(COLLECTION_NAME);

3. . Project Setup

Technologies Used :

Node.js:

For building the backend server.

MongoDB:

To store user data along with the generated vector embeddings.

Express.js:

To create API routes for CRUD operations.

Xenova/Transformers:

For generating vector embeddings from text.

Cosine Similarity:

To compare embeddings and find the closest matches.

Folder Structure

/index.js → Main backend server file
/package.json → Project dependencies
/mongodb → Database for storing user info and vector embeddings

Environment

  • MongoDB running locally ( mongodb://localhost:27017 )
  • The application listens on port 3000 .

4. Key Components Explained

4.1 Dependencies and Middleware

import express from "express";
import cors from "cors";
import bodyParser from "body-parser";
import { MongoClient } from "mongodb";
import { pipeline } from "@xenova/transformers";

express:

: Web framework to create server and handle routing.

cors:

Allows cross-origin resource sharing, making the API accessible from different origins.

body-parser:

Parses incoming JSON requests.

MongoClient:

Used to connect to MongoDB.

pipeline :

Imports the Xenova embedding model pipeline for feature extraction.

MongoDB Connection

const MONGO_URI = "mongodb://localhost:27017";
const DB_NAME = "<YOUR_DB_NAME>";
  const COLLECTION_NAME = "<YOUR_COLLECTION_NAME>";
    const client = new MongoClient(MONGO_URI);
    await client.connect();
    const db = client.db(DB_NAME);
    const usersCollection = db.collection(COLLECTION_NAME);
  • Connects to the local MongoDB server at localhost:27017 .
  • Uses the database and collection to store the user data.
  • await client.connect() establishes a connection with the MongoDB instance.

4.3 Loading the Embedding Model

const generateEmbedding = await pipeline("feature-extraction", "Xenova/multi-qa-MiniLM-L6-cos-v1");

  • Initializes the Xenova model for text embedding generation.
  • The model uses feature-extraction to convert textual data (user bios and search queries) into
  • numerical vector representations.

4.4 Cosine Similarity Function

function cosineSimilarity(vec1, vec2) {
const dotProduct = vec1.reduce((sum, val, i) => sum + val * vec2[i], 0); 
const magnitudeA = Math.sqrt(vec1.reduce((sum, val) => sum + val * val, 0)); 
const magnitudeB = Math.sqrt(vec2.reduce((sum, val) => sum + val * val, 0)); 
return dotProduct / (magnitudeA * magnitudeB);}
  • Computes cosine similarity between two vectors.
  • Formula: similarity = A - B / ||A|| x ||B||
  • Outputs a score between 0 and 1 indicating similarity

4.5 Server Initialization

app.listen(3000, "0.0.0.0", () => { console.log("Server running on port 3000");});
  • Starts the server on port 3000 .
  • Listens on all network interfaces ( 0.0.0.0 ).

4.6 API Routes

4.6.1 Fetching All Users

app.get("/users", async (req, res) => { try { const users = await usersCollection.find({}).toArray(); res.json({ totalUsers: users.length, users }); } catch (error) { console.log(error); res.status(500).json({ error: "Internal Server Error" }); }});
  • Fetches all users from the collection.
  • Returns the total number of users along with their details in JSON format.

4.6.2 Deleting All Users

app.delete("/users", async (req, res) => { try { const result = await usersCollection.deleteMany({}); res.json({ message: "All users deleted successfully", deletedCount: result.deletedCount }); } catch (error) { console.error(error); res.status(500).json({ error: "Internal Server Error" }); }});
  • Deletes all users from the database. •
  • Returns the number of deleted users.

4.6.3 Adding New Users

app.post("/add-users", async (req, res) => {
 try {
 const users = req.body.users;
 if (!users || !Array.isArray(users)) {
 return res.status(400).json({ error: "Invalid users data" });
 }
 for (let user of users) {
 const embeddingTensor = await generateEmbedding(user.bio, { pooling: "mean", normalize: true
});
 user.embedding = Array.from(embeddingTensor.data);
 }
 await usersCollection.insertMany(users);
 res.json({ message: "Users added successfully" });
 } catch (error) {
 res.status(500).json({ error: error.message });
 }
});
  • Accepts a list of users (with name , email , and bio ) in the request body
  • Generates embeddings for each user's bio using the Xenova model.
  • Embeddings are converted from tensors to arrays ( Array.from(embeddingTensor.data ) ) for MongoDB storage.
  • •Stores the users with their embeddings in the MongoDB collection.

4.6.4 Searching for Users

app.post("/search", async (req, res) => {
 try {
 const { query } = req.body;
const queryEmbeddingTensor = await generateEmbedding(query, { pooling: "mean", normalize: true
});
 const queryEmbedding = Array.from(queryEmbeddingTensor.data);
 const users = await usersCollection.find().toArray();
 const results = users.map((user) => ({
 ...user,
 similarity: cosineSimilarity(queryEmbedding, user.embedding),
 }));
 const filteredResults = results
 .filter(user => user.similarity >= 0.3)
 .sort((a, b) => b.similarity - a.similarity)
 .slice(0, 5);
 res.json(filteredResults);
 } catch (error) {
 res.status(500).json({ error: "Internal Server Error" });
 }
});
  • Accepts a search query in the request body. •
  • Generates an embedding vector for the query. •
  • Compares the query vector with each user embedding using cosine similarity. •
  • Filters results with a similarity score above 0.3 . •
  • Returns the top 5 most relevant users.

5. Running the Script

Once you've added the code, you can start the application by following these steps:

5.1 Start MongoDB

Ensure MongoDB is running in the background:

mongod --dbpath <your-database-path>

If you're using MongoDB Atlas (Cloud MongoDB), update the connection string in the script with
your Atlas connection URL.

5.2 Start the Node.js Server

Run the server with:

node index.js

Once the server is running, you should see:

Server running on port 300

5.3 Test the APIs in Postman

Use Postman or any REST client to test the API endpoints.

Add Users

{
 "users": [
 {
 "name": "Alice",
 "email": "alice@example.com",
 "bio": "AI researcher specializing in deep learning and NLP."
 },
 {
 "name": "Bob",
 "email": "bob@example.com",
 "bio": "Backend developer with expertise in Node.js, MongoDB, and GraphQL."
 }
 ]
}

Get All Users

{
 "totalUsers": 2,
 "users": [
 {
 "name": "Alice",
 "email": "alice@example.com",
 "bio": "AI researcher specializing in deep learning and NLP."
 },
 {
 "name": "Bob",
 "email": "bob@example.com",
 "bio": "Backend developer with expertise in Node.js, MongoDB, and GraphQL."
 }
 ]
}

✅ Delete All Users

Search for Users

{ "query": "natural language processing"}
  • Response Body:
[
 {
 "_id": "67d2b0f3010dd52492d9b3be",
 "name": "Alice",
 "email": "alice@example.com",
 "bio": "AI researcher specializing in deep learning and NLP.",
 "embedding": [
 -0.011950308457016945,
 -0.06424717605113983,
 ..
 ..
 ..
 -0.004725316539406776,
 -0.14137263596057892
 ],
 "similarity": 0.62751234530881516
 }
]

6. Semantic Search Flow

1. User Data Insertion:

The system accepts new user data, generates vector embeddings, and
stores them in MongoDB.

2. Query Embedding:

When a search query is entered, the system generates its vector
representation.

3. Cosine Similarity Matching:

The query vector is compared with all stored vectors, and the
most similar users are returned.

7. GitHub Repository

You can access the complete source code for this project on GitHub:

https://github.com/DheerajKumarMamidi/AI-Semantic-Search-Engine

8.  Conclusion

This project demonstrates how to build a semantic search engine using vector embeddings and
cosine similarity. It efficiently searches for users based on the meaning of the query rather than just matching keywords, making it ideal for building intelligent recommendation systems or knowledge retrieval applications.

Dheeraj profile image
Dheeraj Kumar
Technical Project Manager

Tech Lead with 8+ years of experience in Software development, project management, and UI/UX design, specialising in building scalable mobile applications, leading cross-functional teams, and delivering user-centric solutions with a strong focus on performance, quality, and innovation.

Dheeraj profile image
Dheeraj Kumar
Technical Project Manager

Tech Lead with 8+ years of experience in Software development, project management, and UI/UX design, specialising in building scalable mobile applications, leading cross-functional teams, and delivering user-centric solutions with a strong focus on performance, quality, and innovation.

10 Common MVP mistakes startups make
Mobile App Development
10 Common MVP Mistakes That Burn Startup Budgets
12 Jun 2026
Flutter vs React Native comparison
Mobile App Development
Flutter vs React Native: Which Is Better in 2026?
24 Apr 2026
Mobile App Development
How to Build an MVP in 30 Days (Step-by-Step Guide)
10 Apr 2026
Mobile App Development
App Development Cost Breakdown: MVP vs Full Product
01 Apr 2026
Human reviewing AI-generated code on screen
Artificial intelligence
Why Founders Over-Trust AI in Software Development
20 Mar 2026
AI brain and human intelligence
Artificial intelligence
AI Wrote the Code. Humans Own the Consequences.
04 Mar 2026
AI Meets Human Creativity and Design Taste
Artificial intelligence
The New Startup Stack: AI + Humans + Taste
20 Feb 2026
The power of AI native engineering
Artificial intelligence
The Rise of the Intuitive Developer in the Age of AI
04 Feb 2026
Next-generation AI dating app concept
Mobile App Development
The AI Features Every Dating App Needs in 2026
09 Jan 2026
Desktop App Development
Desktop App Development: A Complete Guide for 2026
10 Oct 2025
Mobile App Development
Why Sydney Startups Need a Custom Mobile App
04 Apr 2025
Tech Trends
Artificial intelligence
How AI and Machine Learning Are Revolutionising Mobile Apps
28 Mar 2025
Idea Illustration
Do you have an Idea?
Let's start, we'll take it from here.
Circle Pink
Give us a ring
9AM to 5PM (AEDT)
Call (03) 9344 1619
Circle Pink
Decades of experience
into a 30 mins call
Book a Consultation
Consultation Form
Close Button
Select a service
Please fill in this field
Error text
Please fill in this field
Please fill in this field
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.