Home
/
Blog
/
/
Research Blog

Building an AI-powered Semantic Search Engine

05 Apr 2025
5 min read
Semantic Search Engine

1. Introduction

In this guide, we'll explore how to build a simple yet powerful semantic search engine using
Node.js, MongoDB and vector embeddings generated by the Xenova/multi-qa-MiniLM-L6-cos-v1
model.
This application allows you to:

  • Insert user profiles.
  • Generate embeddings for their bios
  • Search for them using semantic similarity based on contextual meanings rather than exact
  • keyword matching.

2. Project Prerequisites

Before running the script, make sure you have the following installed :

2.1. Install Node.js and npm

If you don't have Node.js installed, download it from:

Node.js Official Website

To verify your installation:

node -v
npm -v

2.2 Setting up MongoDB:

Before building the search engine, you need to install MongoDB and create a new database.

2.2.1 Install MongoDB

To get started, you need to install MongoDB. Follow the steps below for your respective operating
system.

On Ubuntu

1. Import the MongoDB public GPG Key:

wget -qO - https://www.mongodb.org/static/pgp/server-7.0.asc | gpg --dearmor | sudo tee/usr/share/keyrings/mongodb-server-7.0.gpg > /dev/null

2. Create a list file for MongoDB:

echo "deb [ signed-by=/usr/share/keyrings/mongodb-server-7.0.gpg ] https://repo.mongodb.org/apt/ubuntufocal/mongodb-org/7.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-7.0.list

3. Install MongoDB:

sudo apt update
sudo apt install -y mongodb-org

4. Start the MongoDB service:

sudo systemctl start mongod 
sudo systemctl enable mongodb

5. Verify the installation:

mongod --version
On Windows

1. Download the MongoDB installer from:

2. MongoDB Download Center

3. Follow the installation instructions and start the MongoDB service.

4. Verify the installation by opening Command Prompt and running:

mongod --version

5. Start the MongoDB service:

net start MongoDB
On macOS

Install MongoDB using Homebrew:

brew tap mongodb/brew
brew install mongodb-community@7.0
brew services start mongodb/brew/mongodb-community

2.2.2 Running MongoDB

Once MongoDB is installed, run the server:

mongod

To connect to MongoDB:

mongo

✅ You’re now ready to use MongoDB!

2.2.3 Creating a MongoDB Database

1. Open MongoDB shell by running:

mongosh

2. Create a new database:

use <YOUR_DB_NAME>

3. Create a users collection:

db.createCollection("<YOUR_COLLECTION_NAME>")

4. Verify that the collection was created:

show collections

2.3 Installing Dependencies

1. Initialize a Node.js project (if not already created):

npm init -y

2. Install the required dependencies:

npm install express cors body-parser mongodb @xenova/transformers

2.4 Connecting MongoDB to Node.js

To connect MongoDB to your Node.js project, follow these steps:

Use the following connection string in your index.js file:

const MONGO_URI = "mongodb://localhost:27017";
const DB_NAME = "<YOUR_DB_NAME>";
  const COLLECTION_NAME = "<YOUR_COLLECTION_NAME>";
    const client = new MongoClient(MONGO_URI);await client.connect();
    const db = client.db(DB_NAME);const usersCollection = db.collection(COLLECTION_NAME);

3. . Project Setup

Technologies Used :

Node.js:

For building the backend server.

MongoDB:

To store user data along with the generated vector embeddings.

Express.js:

To create API routes for CRUD operations.

Xenova/Transformers:

For generating vector embeddings from text.

Cosine Similarity:

To compare embeddings and find the closest matches.

Folder Structure

/index.js → Main backend server file
/package.json → Project dependencies
/mongodb → Database for storing user info and vector embeddings

Environment

  • MongoDB running locally ( mongodb://localhost:27017 )
  • The application listens on port 3000 .

4. Key Components Explained

4.1 Dependencies and Middleware

import express from "express";
import cors from "cors";
import bodyParser from "body-parser";
import { MongoClient } from "mongodb";
import { pipeline } from "@xenova/transformers";

express:

: Web framework to create server and handle routing.

cors:

Allows cross-origin resource sharing, making the API accessible from different origins.

body-parser:

Parses incoming JSON requests.

MongoClient:

Used to connect to MongoDB.

pipeline :

Imports the Xenova embedding model pipeline for feature extraction.

MongoDB Connection

const MONGO_URI = "mongodb://localhost:27017";
const DB_NAME = "<YOUR_DB_NAME>";
  const COLLECTION_NAME = "<YOUR_COLLECTION_NAME>";
    const client = new MongoClient(MONGO_URI);
    await client.connect();
    const db = client.db(DB_NAME);
    const usersCollection = db.collection(COLLECTION_NAME);
  • Connects to the local MongoDB server at localhost:27017 .
  • Uses the database and collection to store the user data.
  • await client.connect() establishes a connection with the MongoDB instance.

4.3 Loading the Embedding Model

const generateEmbedding = await pipeline("feature-extraction", "Xenova/multi-qa-MiniLM-L6-cos-v1");

  • Initializes the Xenova model for text embedding generation.
  • The model uses feature-extraction to convert textual data (user bios and search queries) into
  • numerical vector representations.

4.4 Cosine Similarity Function

function cosineSimilarity(vec1, vec2) {
const dotProduct = vec1.reduce((sum, val, i) => sum + val * vec2[i], 0); 
const magnitudeA = Math.sqrt(vec1.reduce((sum, val) => sum + val * val, 0)); 
const magnitudeB = Math.sqrt(vec2.reduce((sum, val) => sum + val * val, 0)); 
return dotProduct / (magnitudeA * magnitudeB);}
  • Computes cosine similarity between two vectors.
  • Formula: similarity = A - B / ||A|| x ||B||
  • Outputs a score between 0 and 1 indicating similarity

4.5 Server Initialization

app.listen(3000, "0.0.0.0", () => { console.log("Server running on port 3000");});
  • Starts the server on port 3000 .
  • Listens on all network interfaces ( 0.0.0.0 ).

4.6 API Routes

4.6.1 Fetching All Users

app.get("/users", async (req, res) => { try { const users = await usersCollection.find({}).toArray(); res.json({ totalUsers: users.length, users }); } catch (error) { console.log(error); res.status(500).json({ error: "Internal Server Error" }); }});
  • Fetches all users from the collection.
  • Returns the total number of users along with their details in JSON format.

4.6.2 Deleting All Users

app.delete("/users", async (req, res) => { try { const result = await usersCollection.deleteMany({}); res.json({ message: "All users deleted successfully", deletedCount: result.deletedCount }); } catch (error) { console.error(error); res.status(500).json({ error: "Internal Server Error" }); }});
  • Deletes all users from the database. •
  • Returns the number of deleted users.

4.6.3 Adding New Users

app.post("/add-users", async (req, res) => {
 try {
 const users = req.body.users;
 if (!users || !Array.isArray(users)) {
 return res.status(400).json({ error: "Invalid users data" });
 }
 for (let user of users) {
 const embeddingTensor = await generateEmbedding(user.bio, { pooling: "mean", normalize: true
});
 user.embedding = Array.from(embeddingTensor.data);
 }
 await usersCollection.insertMany(users);
 res.json({ message: "Users added successfully" });
 } catch (error) {
 res.status(500).json({ error: error.message });
 }
});
  • Accepts a list of users (with name , email , and bio ) in the request body
  • Generates embeddings for each user's bio using the Xenova model.
  • Embeddings are converted from tensors to arrays ( Array.from(embeddingTensor.data ) ) for MongoDB storage.
  • •Stores the users with their embeddings in the MongoDB collection.

4.6.4 Searching for Users

app.post("/search", async (req, res) => {
 try {
 const { query } = req.body;
const queryEmbeddingTensor = await generateEmbedding(query, { pooling: "mean", normalize: true
});
 const queryEmbedding = Array.from(queryEmbeddingTensor.data);
 const users = await usersCollection.find().toArray();
 const results = users.map((user) => ({
 ...user,
 similarity: cosineSimilarity(queryEmbedding, user.embedding),
 }));
 const filteredResults = results
 .filter(user => user.similarity >= 0.3)
 .sort((a, b) => b.similarity - a.similarity)
 .slice(0, 5);
 res.json(filteredResults);
 } catch (error) {
 res.status(500).json({ error: "Internal Server Error" });
 }
});
  • Accepts a search query in the request body. •
  • Generates an embedding vector for the query. •
  • Compares the query vector with each user embedding using cosine similarity. •
  • Filters results with a similarity score above 0.3 . •
  • Returns the top 5 most relevant users.

5. Running the Script

Once you've added the code, you can start the application by following these steps:

5.1 Start MongoDB

Ensure MongoDB is running in the background:

mongod --dbpath <your-database-path>

If you're using MongoDB Atlas (Cloud MongoDB), update the connection string in the script with
your Atlas connection URL.

5.2 Start the Node.js Server

Run the server with:

node index.js

Once the server is running, you should see:

Server running on port 300

5.3 Test the APIs in Postman

Use Postman or any REST client to test the API endpoints.

Add Users

{
 "users": [
 {
 "name": "Alice",
 "email": "alice@example.com",
 "bio": "AI researcher specializing in deep learning and NLP."
 },
 {
 "name": "Bob",
 "email": "bob@example.com",
 "bio": "Backend developer with expertise in Node.js, MongoDB, and GraphQL."
 }
 ]
}

Get All Users

{
 "totalUsers": 2,
 "users": [
 {
 "name": "Alice",
 "email": "alice@example.com",
 "bio": "AI researcher specializing in deep learning and NLP."
 },
 {
 "name": "Bob",
 "email": "bob@example.com",
 "bio": "Backend developer with expertise in Node.js, MongoDB, and GraphQL."
 }
 ]
}

✅ Delete All Users

Search for Users

{ "query": "natural language processing"}
  • Response Body:
[
 {
 "_id": "67d2b0f3010dd52492d9b3be",
 "name": "Alice",
 "email": "alice@example.com",
 "bio": "AI researcher specializing in deep learning and NLP.",
 "embedding": [
 -0.011950308457016945,
 -0.06424717605113983,
 ..
 ..
 ..
 -0.004725316539406776,
 -0.14137263596057892
 ],
 "similarity": 0.62751234530881516
 }
]

6. Semantic Search Flow

1. User Data Insertion:

The system accepts new user data, generates vector embeddings, and
stores them in MongoDB.

2. Query Embedding:

When a search query is entered, the system generates its vector
representation.

3. Cosine Similarity Matching:

The query vector is compared with all stored vectors, and the
most similar users are returned.

7. GitHub Repository

You can access the complete source code for this project on GitHub:

https://github.com/DheerajKumarMamidi/AI-Semantic-Search-Engine

8.  Conclusion

This project demonstrates how to build a semantic search engine using vector embeddings and
cosine similarity. It efficiently searches for users based on the meaning of the query rather than just matching keywords, making it ideal for building intelligent recommendation systems or knowledge retrieval applications.

Dheeraj profile image
Dheeraj Kumar
Technical Project Manager

Tech Lead with 8+ years of experience in Software development, project management, and UI/UX design, specialising in building scalable mobile applications, leading cross-functional teams, and delivering user-centric solutions with a strong focus on performance, quality, and innovation.

Dheeraj profile image
Dheeraj Kumar
Technical Project Manager

Tech Lead with 8+ years of experience in Software development, project management, and UI/UX design, specialising in building scalable mobile applications, leading cross-functional teams, and delivering user-centric solutions with a strong focus on performance, quality, and innovation.

Flutter App Development Process Illustration
App Development
Mobile App Development
Flutter App Development: The Future of Cross-Platform Mobile Apps
03 Jan 2025
App Store Optimisation Techniques for Success
Mobile App Development
Unlocking the Secrets to App Store Success
04 Oct 2024
iOS App Development Tools
Mobile App Development
Top 5 iOS App Development Tools in 2024
25 May 2023
software development for business
App Development
Application Development Services
Mobile App Development
Updates
Top 5 Benefits of Custom Software Development for Businesses
21 Apr 2023
Artificial intelligence
The Future
Updates
ChatGPT Has a Serious Problem
20 Mar 2023
A side-by-side comparison of ChatGPT and DeepSeek AI models.
Artificial intelligence
Technology
ChatGPT vs DeepSeek | Who is Leading the AI Search Battle?
15 Feb 2023
App Development
Application Development Services
Design
The Future
Updates
Top 5 Mobile App Engagement & User Retention Techniques
30 Jan 2023
App Development
Application Development Services
Awards
The Manifest Features Jhavtech Studios as Melbourne’s Top Reviewed Developer for 2022
17 Nov 2022
App Development
Design
Web App Development
Web App Development Cost: Factors That Matter Most
12 Oct 2022
App Downloads
App Development
Application Development Services
Design
Mobile App Development
5 Fool-Proof Ways to Boost App Downloads By 40%
07 Sep 2022
App Development
Apple Product
Design
Updates
iOS 16: Everything You Need to Know
05 Jul 2022
App Development
Design
Mobile App Development
Web Development Trends of 2022 and Beyond
09 May 2022
App Development
Design
Mobile App Development
The Ultimate Guide for App Store Optimization
18 Apr 2022
Visual Representation of Metaverse App Features
App Development
Mobile App Development
App Development for the Metaverse in 2025: Creating Immersive Experiences
23 Mar 2022
Web App Development
Mobile App Development
iOS or Android: Which Platform Reigns Supreme?
09 Mar 2022
App Development
Application Development Services
Awards
Jhavtech Studios Named by Clutch as One of the Top 2022 Developers in Australia
15 Feb 2022
App Development
Mobile App Development
Understanding and Measuring Mobile App KPIs for Success in 2025
17 Jan 2022
App Development
Mobile App Development
.NET Core and .NET Framework: Key Differences
02 Dec 2021
https://www.jhavtech.com.au/angular-vs-angularjs-which-one-is-better-for-your-project/
App Development
Mobile App Development
Angular vs. AngularJS: Which One is Better for Your Project?
08 Nov 2021
Best PHP Frameworks for Web Development in 2024
Web App Development
Best PHP Frameworks in 2024
01 Aug 2021
App Development
Application Development Services
Crucial Factors that Affect Mobile App Development Cost
25 Jun 2021
Mobile App Development
Top Mobile App KPIs that Matter for 2021
18 Mar 2021
Mobile App Development
Role of Kiosks in the Post Covid-19 World
19 Oct 2020
Mobile App Development
Mobile App Design in a Nutshell
07 Sep 2020
Designing the perfect mobile app UI on a desktop screen
Mobile App Development
Mobile App Design: The Ultimate Comprehensive Guide
31 Aug 2020
App Development
Mobile Apps Are Now the Need of the Hour
07 Jul 2020
Adobe Flash
HTML5
Blended Learning - A New Era of Education
25 Apr 2020
Software Infrastructure Audit
Why You Need a Software Audit & How to Do It
15 Apr 2020
Neomorphism 2.0 in Mobile App Design for 2025
App Development
Top Mobile App Design Trends for 2025
22 Feb 2020
Kiosk Development
What is a Self Service Kiosk?
23 Oct 2019
Adobe Flash
HTML5
Why Convert Flash Games to HTML5?
08 Oct 2019
HTML5
What is HTML5?
10 Sep 2019
Adobe Flash
Why is Flash being put to rest?
11 Jan 2019
Idea Illustration
Do you have an Idea?
Let's start, we'll take it from here.
Circle Pink
Give us a ring
9AM to 5PM (AEDT)
Call (03) 9344 1619
Circle Pink
Decades of experience
into a 30 mins call
Book a Consultation
Consultation Form
Close Button
Select a service
Please fill in this field
Error text
Please fill in this field
Please fill in this field
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.