An Easy Way to Preview Semantic Search in MongoDB Atlas

marcussorealheis
4 min readApr 22, 2024

--

“The combination made my eyes bleed” — WuTang, C.R.E.A.M., 1994

(I wrote this blog post ~two years ago butit was emargoed. it’s not formatted)

This post is not for everyone. It’s specifically for JavaScript developers, among the least empowered to build AI-enhanced technologies. What follows is not production code. There are be better options. This post is meant to help you join the fun in one of the most transformational technologies of the digital era.

If you’re like most developers, you’ve heard a lot about using language models to convert text to dense vectors to enable semantic search, something I’ve blogged about before with much skepticism. Semantic search is a powerful way to surface relevant content without relying on lexical or boolean text matches. However, many teams find it difficult to manage the models and the search engines required for this approach.

With this blogpost, I hope to make it easy to prepare data for vector search in MongoDB in just 10 minutes. In this post, I’ll walk you through how to use MongoDB Atlas Database, Atlas Triggers, Atlas App Services and Atlas Search to get up and running quickly and efficiently. Due to the strong consistency in the MongoDB database, and the eventual consistency of Atlas Search, you can reliably use the vector capabilities discussed in this blog for mission-critical use cases like fraud detection in financial services. MongoDB, when run in Atlas, is the only database that I know of with such characteristics.

You will be able to complete this tutorial without my involvement, as a step one. The data engineering piece is often the toughest part. For step two, you need to reach out to me so that I can give you access to the feature and the documentation.

We’ll use MongoDB Atlas Database, Atlas Triggers, the Atlas App Services, Atlas Search, and scalethebrain.com.

MongoDB Atlas is a cloud-hosted database service that makes it easy to set up, operate, and scale MongoDB deployments in the cloud. MongoDB Atlas is available on AWS, Azure, and Google Cloud. Atlas enables you to deploy a highly available, fault-tolerant, and scalable MongoDB deployment in the cloud with just a few clicks.

Atlas Triggers is a feature of MongoDB Atlas that allows you to automatically trigger an action in response to a database event, such as a document being inserted into a collection. You can also set up scheduled Triggers that operate on a specified time interval.

Atlas App Services is a set of REST API endpoints that allow you to programmatically access MongoDB Atlas features.

Atlas Search is a feature of MongoDB Atlas that provides full-text search capabilities for your MongoDB data based on Apache Lucene. The sub-system here is important because it’s what enables MongoDB to be a vector database. Apache Lucene constructs a Navigable Small World graph to support calculating similarities between query vectors and field vectors in documents.

scalethebrain.com is a simple site for vectorizing data at low-latency that is plenty for query-time serialization. At the time of writing this blogpost, the service is only supporting the Universal Sentence Encoder service is free, and very simple to use.

Setting up the database with the sample dataset and an Atlas Database Trigger.

We need a dataset to work with, so we’ll use the sample dataset available in Atlas but you are free to use your own data if you’d like. We’ll use the MongoDB Atlas free tier to host our database. You can sign up for an account here. There’s a green try free button in the upper right corner so you can skip the tutorial in the link if you’re not interested in all that.

Once you have an account and your sample dataset loaded, we’ll set up two database triggers that look identical.

Great! We now have our dataset and a MongoDB database to store it in. The next step is to create two types of triggers — (1) a scheduled trigger we will eventually delete that will vectorize all the data in a couple fields in the sample_mflix database’s users collection, and (2) an event-based trigger to vectorize all new data as it’s being inserted into the database.

Here’s the code for converting all the docs to contain user names in vectors

Setting up an Atlas App Services Function to vectorize incoming queries.

Atlas App Services is a serverless platform that makes it easy to deploy and run code in response to events. We’ll use it to deploy a function that vectorizes incoming queries so they can be compared against our semantic search index.

To get started, create a new App Service and select “Functions” from the left-hand menu. Click “Add Function” and choose “Incoming Webhook”. Name your function and select a language. For our purposes, we’ll use Node.js.

Next, paste in the following code:

// This function is the endpoint’s request handler.

exports = async function({ query, headers, body}, response) {

var conn = context.services.get(“mongodb-atlas”).db(“sample_mflix”).collection(“movies”),

cors = require(“cors”);

const encodedTextQuery = query.query;

const queryVectors = await context.http

.get({ url: “https://scalethebrain.com/" + encodedTextQuery, headers: cors })

let searchAggregation = [{

$search: {

<marcus.eagan@mongodb.com to get the early access>: {

path: ‘plot_vectors’,

query: queryVectorS

}

}

}];

return conn.aggregate(searchAggregation).toArray();

};

At the bottom of the page, click “Deploy”. This action will create a URL for your function that you can use to POST queries.

Please email me here to get early access to the most efficient and feature-rich vector database technology in the world. It’s also the easiest.

You may wonder, why is he gating the feature? At MongoDB, I want to build with you, not for you. The processes associated with querying or vectorizing data can be expensive and training models can be more expensive. Oftentimes, it doesn’t yield the results you are looking for in the Search use case. For early access, the only payment I ask for is your feedback and design decisions. Many of our customers are driving the future of MongoDB products with us.

If you’re interested in getting early access to the most efficient and feature-rich vector database technology in the world, please email me here. It’s also the easiest system to get up and running, so you can search semantically in just a few minutes.

--

--

marcussorealheis

Apache Solr Committer, MongoDB and Weaviate Advisor, Co-Founder at a Futuristic Tools Company