12. Semantic Search

🎯 Learning Goals

Perform semantic search by comparing a query to a set of embeddings

Create vector embeddings using pre-trained language models

Compute semantic similarity using cosine similarity and matrix multiplication

📗 Technical Vocabulary

Semantic search

Cosine similarity

Vector

Tensor

PyTorch

Normalization

🌤️

Warm-Up

Imagine you're writing a research paper on climate change. You have 50 articles saved, but you need to find specific information about "how rising ocean temperatures affect coral reefs." Do you:

Read all 50 articles from start to finish? (That would take days!)

Use Ctrl + F to search for "coral reefs"? (But what if they use terms like "reef ecosystems" instead?)

Wish you had an AI assistant that could understand what you're actually looking for?

Semantic Search

Semantic search understands what you mean, not just what you say.

Regular search (like Ctrl + F) matches exact words:


Query: "coral reefs"
Document: "The ocean's reef ecosystems are in danger" ❌ NO MATCH!

Semantic search understands meaning:


Query: "coral reefs"
Document: "The ocean's reef ecosystems are in danger" ✅ MATCH!

Real-World Applications

Semantic search isn’t only helpful for finding the best resources to finish your homework. It’s used everywhere in applications you use all the time! It’s how:

Spotify knows what songs you’ll like

Netflix recommends movies

Google finds what you’re actually looking for

ChatGPT understands your question

Connection to the Final Project

In our last lesson, we touched on how RAG (Retrieval Augmented Generation) improves language model accuracy by giving it access to a relevant knowledge base. For your final project, you'll create a specialized knowledge base about your chosen topic. But how do we find the most relevant information from that knowledge base when generating responses?

Language is tricky! People say the same thing in lots of different ways. Like in our previous example, regular search might miss "reef ecosystems" when you search for "coral reefs."

This is where semantic search comes in! Today's lesson will teach you how semantic search works, providing you with a crucial skill for completing your final project successfully.

How does it work?

As we discussed in the Neural Networks lesson, machines don't understand words in the same way humans do. Before we dive into the technical details, let's visualize what we're trying to do. Imagine that every word or sentence lives in a vast space where similar meanings are placed close together. The word “happy” would be close to “joyful” but far from “sad.” When we perform a semantic search, we're essentially:

Taking your query and finding its location in this meaning space

Looking for all the documents that are closest to that location

Returning these nearby documents as the most relevant results

This “meaning space” is actually a mathematical construct with hundreds of dimensions, which is where tensors come in. Semantic search works by transforming text into high-dimensional numerical representations stored in tensors, which are used to compare meaning. Tensors are mathematical objects that can hold multiple dimensions of data, from single values (scalars) to vectors, matrices, and beyond.

Understanding Embeddings and Tensors

Let's pause and take a closer look at some key terminology that's often confused:

Embeddings are the semantic representations of words or sentences. They capture meaning in numerical (vector) form.

Tensors are the mathematical structures used to store and manipulate these embeddings efficiently.

Think of it this way: the embedding is the what (the semantic representation), while the tensor is the how (the data structure used to store and process it). In our code, we’ll create embeddings and then store them in tensor format for efficient computation.

Tensors can be organized in different dimensions:

A 0D tensor is a single number (scalar)

A 1D tensor is a list of numbers (vector)

A 2D tensor is a table of numbers (matrix)

Higher-dimensional tensors contain even more complex arrangements!

When we convert text to embeddings, we're creating tensors! Each word or sentence becomes a vector (1D tensor), meaning it’s a one-dimensional list of numbers. Though the tensor itself is one-dimensional in structure, it contains many individual values or features. For example, the embedding model we will use today maps sentences to vectors with 384 components (a 1D tensor with 384 individual values). This distinction is important: the tensor is 1D in structure (it’s a vector), but each vector contains hundreds of components that represent different semantic features!

The Semantic Search Process

In semantic search, we represent text as tensor vectors, where each vector contains hundreds of numbers that capture the semantic meaning of the text. These tensors allow us to perform powerful mathematical operations that can compare meanings across thousands of text chunks simultaneously. To perform a semantic search, follow these steps:

Break up the document into chunks (sentences or paragraphs).

Use an embedding model to convert those chunks into tensor vectors (typically 384-1536 dimensions) where semantic similarity is preserved in vector space.

Organize these vectors into a 2D tensor (essentially a matrix) where each row represents one chunk's embedding.

Convert the query into a vector using the same embedding model to ensure it exists in the same vector space.

Calculate cosine similarity between the query tensor and all document tensors using matrix multiplication.

Rank the results by similarity score and return the chunks that most closely match the semantic meaning of the query.

This approach allows us to find semantically related content even when the exact words don't match, making search much more intuitive and powerful.

Chunking Strategies

The way you divide your text into chunks can dramatically impact search quality:

Too large: Chunks containing multiple topics dilute the semantic focus, potentially returning irrelevant sections alongside relevant ones.

Too small: Chunks that break apart related concepts might lose important context.

When implementing your semantic search, experiment with different chunking strategies and observe how they affect the relevance of your search results. Chunks don't need to be uniform in size—you can vary them based on natural content boundaries (like paragraph breaks, section divisions, or topic shifts) rather than forcing arbitrary divisions like fixed word counts. Following these natural content boundaries often produces better results than forcing everything into fixed-size chunks. Remember, the ideal chunk size balances specificity with sufficient context!

Too large of a chunk is like shoving your entire closet into one giant suitcase.

It includes everything, but it’s too heavy, hard to sort through, and may mix unrelated things, like swimsuits and winter coats. You won’t be able to find what you need quickly!

Too small of a chunk is like packing each individual sock into its own tiny pouch.

You end up with too many tiny bags, each one with just a snippet of what you need. It takes forever to unpack, and you lose the bigger picture of your full outfit.

✅

The ideal chunk is like packing by outfit: one bag for beachwear, one for hiking gear, one for fancy dinners. That way, each chunk has a clear focus, but still enough context to be useful.

OpenAI. Chunking Metaphor: Packing for a Trip. 2025. ChatGPT, https://chat.openai.com/.

Cosine Similarity

You can imagine a vector as an arrow pointing in some direction in space. If two arrows point in almost the same direction, this means they have a similar meaning. However, if two arrows point in opposite directions, they have opposing meanings.

Cosine similarity measures the angle between these arrows:

1.0 → the vectors have identical meaning (same direction)

0.0 → the vectors are unrelated (perpendicular)

-1.0 → the vectors have opposite meaning (opposite directions)

It’s important to note that although cosine similarity ranges from -1 to 1, most text embeddings tend to have positive similarities! This is because modern embedding models are specifically trained to place semantically related concepts in similar directions in the vector space, making negative similarities rare in practice.

PyTorch

To perform tensor operations and calculate cosine similarity, we’ll use a popular Python library called PyTorch. PyTorch is a machine learning library that specializes in tensor operations that make semantic search efficient. When we use the built-in PyTorch method matmul (matrix multiplication), we're performing thousands of calculations in parallel. Let’s imagine an example where our knowledge base has 50 sentences and we want to calculate the cosine similarity between our query and these 50 chunks.

The knowledge base is converted to a tensor with shape [50, 384] (50 sentences, 384 dimensions)

The query is also converted to a tensor with shape [384] (1 query, 384 dimensions)

The matmul method calculates similarity between the query and all 50 sentences!

This tensor-based approach is why semantic search can compare a query against thousands of document chunks in milliseconds. The code you'll write uses tensors behind the scenes to perform these high-dimensional comparisons efficiently.

Normalization

Before calculating cosine similarity, we’ll need to normalize the vectors. Normalizing vectors means adjusting their length (magnitude) to 1 while preserving their direction. This normalization process helps the search focus on meaning, not intensity. For vector embeddings, direction represents semantic meaning, while magnitude represents intensity or confidence. By normalizing the vectors, we can ensure that we are only comparing the meaning. To normalize a vector mathematically, divide each component by the vector’s length.

Why does normalization matter? Imagine searching a medical database. Without normalization, a short document mentioning “cancer treatment” briefly might be considered less relevant than a long document that mentions it multiple times but is mostly about hospital administration. Normalization ensures we compare meaning rather than intensity or document length, leading to more accurate search results.

✏️

Try-It | Semantic Search

Open this Google Colab notebook to get started. Remember to save a copy in your Drive to refer back to this exercise later. Read through the text, complete each Try-It, and use the Think About It questions to understand how the code works. Be prepared to answer the following questions after completing the exercise:

How would you modify this exercise to return the top 5 most relevant chunks instead of 3?

How could you modify this exercise to capture both the text chunks and their similarity scores, so you can see how confident the model is in each match?

Compare your earlier predictions about which sentences would be most similar to "How does water get into the sky?" with the actual results. Did your predictions match the results? Why or why not?

What would happen if you queried something completely unrelated to the water cycle, like "How do you bake a cake?" What would the similarity scores look like, and why?

How could you integrate this semantic search function into the chatbot you created that generates responses using a large language model? How would this make the chatbot's responses more accurate and informative?

Congratulations! You created a simple AI system that implements a semantic search by converting text to embeddings and calculating cosine similarity. Now let’s modify your semantic search to make it related to a topic you’re interested in!

📝

Practice | Semantic Search

Change the semantic search in the Google Colab notebook to be about a completely different topic of your choice. You’ll need to make modifications in several places to accomplish this task:

Upload a new knowledge base to the files

Open and read the new knowledge base file

Store the text in a variable named appropriately for your topic

Use a different query to find relevant chunks from your new knowledge base

🌶️🌶️🌶️ Click here for a spicy challenge!

List comprehension is a more concise way to create lists by writing a single line of code that includes a loop and (optionally) a condition. It’s like a shorthand for writing a for loop that builds a list. There are two places in our code where we could use list comprehension to simplify our code. Use the following resources to learn about list comprehension and implement it in your Google Colab notebook:

Python List Comprehension Documentation

W3 Schools List Comprehension Example

Short Article on List Comprehension from freeCodeCamp

🤖 AI Connection

Need to build a knowledge base on your chosen topic quickly? Ask an AI tool: "Write 8-10 short paragraphs about [YOUR TOPIC], each covering a different subtopic. Keep each paragraph focused on one idea." This gives you a knowledge base file to upload, and the focused paragraphs make natural chunks for semantic search. After running your search, check whether the results make sense. Did the most relevant chunks come back for your query, or did the AI's paragraphs blend topics in ways that confused the search?

Semantic Search and RAG

In your final project, you’ll combine your understanding of semantic search and generative AI to build a chatbot that implements RAG (retrieval augmented generation). When a user asks a question, the system will first perform a semantic search to retrieve the most relevant information from the knowledge base and then pass that context along with the question to a large language model, which generates a response based on both. Using RAG in your chatbot final project ensures that your chatbot has access to relevant or specialized information, reducing the likelihood of hallucinations and improving the accuracy of the responses.

💼 Takeaways

Next time you search for something online, remember there's fascinating math working behind the scenes to understand what you really want - not just matching the exact words you typed.

Text can be represented as numbers that capture meaning. You learned how to turn sentences into vector embeddings using a pre-trained language model. These embeddings were stored as tensors that let us represent and manipulate these vectors efficiently, especially when working with batches of text.

We can measure how similar two pieces of text are using math. Using cosine similarity and matrix multiplication, you calculated how closely a user’s question matched each chunk of the knowledge base.

You built a simple AI system that finds the most relevant information. By combining embeddings and similarity scoring, you created a semantic search tool that answers questions based on meaning, not just keywords.

For a summary of this lesson, check out the 12. Semantic Search One-Pager!