9. Neural Networks

🎯 Learning Goals

Explain how complex data is turned into numerical vectors machines can compare

Identify the input, hidden, and output layers and describe how data flows through them

Define how weights, biases, and activations process information

Describe how multiple layers recognize complex patterns

📗 Technical Vocabulary

Embeddings

Neural network

Neurons

Layers

Parameters

Weights

Biases

Activation functions

Deep learning

🌤️

Warm-Up

Let’s play a game! For each word on the list, quickly write down the first word that comes to mind.

apple

bank

light

cold

Now let’s share what word you wrote for each word! For “apple”, maybe you wrote “fruit” or “red” or even “iPhone”. It makes sense that not everyone chose the same word, but chances are that some of us wrote the exact same word! Why do you think that happens?

We understand intuitively that words like “bank” and “money” are related, but how do we teach a computer to understand those relationships? Today, we’ll explore how machines learn to recognize words, meanings, and even context with neural networks.

Embeddings

Machines fundamentally only understand numbers and the relationships between them. To enable a machine to understand language, we must first represent that language numerically. After all, computers process everything—text, images, audio—as numerical data. Remember that sentiment analysis project where we made those simple 3-dimensional vectors to show if messages were positive or negative? Well, that was just the beginning! In the real world of NLP (Natural Language Processing), we create way more complex number representations called embeddings.

Embeddings are high-dimensional numerical vectors that capture the meanings and relationships of words. In other words, they transform words or sentences into sets of numbers (vectors) that preserve their original meaning and contextual connections.

By converting diverse content—such as search queries, songs, photos, or videos—into vectors, machines gain the power to effectively compare, categorize, and understand the content in a way that’s almost human. So how does this work? Let’s explore a simplified example.

Imagine plotting words as points on a two-dimensional graph, where each point represents the numerical vector (or embedding) for that word.

We’ll start with the word “car”, which might be represented by the vector (4.5, 7.1). This vector captures the meaning and nuance of the word “car” into a numerical representation that machines understand.

Next, we’ll plot “automobile”. This word is very similar in meaning to “car”, so its vector might be (4.7, 6.5). Words that have similar meanings are numerically similar and tend to be closely positioned in the vector space.

The word “truck” might be represented by (5.6, 7.5). Its meaning is similar to “car” and “automobile”, but it may be slightly farther away due to the nuance of its size.

Next, we’ll plot “bicycle” at (1.5, 5.5). Now this is different! It’s still in the same general domain of transportation, but slightly farther away because it’s not as closely related to the other words.

The word “motorcycle” is similar in meaning to “bicycle”, but includes a motor like a car, so it’s vector might be (1.6, 6.5).

Finally, a word like “phone” could be represented by the vector (8.3, 2.1), placing it very far away from both clusters!

In this simplified example, the x-axis and y-axis don’t actually have meaning, but we can see how words are somewhat related to each other in space. Words with similar meanings—like "car," "automobile," and "truck" or "bicycle" and "motorcycle"—cluster together on the graph, while a word with a very different meaning, like "phone," is positioned farther away. This illustrates how machines can begin to calculate relationships between words: similar words will have vector embeddings that are numerically similar.

✏️

Try-It | Vector Embeddings

How would you represent the relationships between the following words in a two-dimensional space?

coffee, espresso, latte, soda, pop, sandwich

When you’re finished, discuss with another scholar how you decided where to plot the words. What other words would have similar vector embedding to these? What words would have very different vector embeddings?

Of course, this is a massive simplification as real-world embeddings often exist in much higher dimensional spaces often spanning hundreds of dimensions. Each dimension in the vector might represent a different semantic or contextual representation of the word. For example, the word “date” can have wildly different meanings depending on the context. It can be a fruit or a romantic meeting or even a calendar day! The vector embedding would need many different dimensions to accurately capture the full meaning of the word.

How Do We Create These Complex Vector Embeddings?

Remember our basic 3-dimensional sentiment vectors? We simply counted word frequencies in positive and negative messages to create those vectors. Simple! But for real language understanding, you need hundreds or thousands of dimensions.

This is where embedding models come into play. An embedding model transforms text (words, sentences, or phrases) into meaningful vector representations. These nuanced relationships in human language can't be captured through simple rules or basic calculations. Instead, we need a model capable of learning complex, non-linear patterns from vast amounts of text data. This is precisely what neural networks excel at.

What is a Neural Network?

A neural network is a machine learning program, or model, that makes decisions in a manner similar to the human brain, by using processes that mimic the way biological neurons work together to identify phenomena, weigh options and arrive at conclusions.

Source: IBM — What is a neural network

At a high level, neural networks are powerful models capable of learning patterns and relationships directly from data without being explicitly programmed. They’re designed to mimic the way our human brains process information. Just like the human brain uses a network of neurons to process signals, artificial neural networks (ANNs) consist of interconnected nodes, often called neurons.

Neural networks excel at identifying complex patterns within large datasets, making them powerful tools for image recognition and natural language processing. However, to harness this power, the data must first be transformed into a numerical format that the network can understand—vector embeddings.

Neural Network Structure

A typical neural network is organized into layers.

The input layer receives the raw data.

The hidden layers process the input through a series of transformations, refining their understanding with each new layer.

The output layer produces the final result.

Example: Recognizing Handwritten Digits

A classic example for exploring how neural networks function is looking at how we might teach a machine to recognize handwritten numerical digits. As a human, it’s probably quite easy for you to identify these symbols as the digits 8 and 9, but what if you needed to tell a machine how to do it? What features would you tell the machine to look for? Explaining how to do it is actually quite challenging!

This is where neural networks come in. The idea is to build software that mimics the way our brains process information, so computers can solve tough problems that are hard for traditional algorithms to solve. Moreover, just as you learn by looking at many examples, neural networks don’t follow a fixed set of instructions for identifying digits. Instead, we show the program lots of examples of hand-drawn numbers, each labeled with the correct digit, and let it figure out the patterns on its own. Over time, the model adjusts and improves, much like you do when you practice a new skill.

To see how neural networks tackle a problem like this one, we’ll watch a series of videos. Open each toggle to watch the video and review the reflection questions to assess your understanding.

Part 1: Neural Networks

Source

💭

Think About It

What makes recognizing handwritten digits so challenging from a computer’s perspective?

Describe what the digit recognition program takes as input and what it produces as output.

Part 2: The Structure

Source

💭

Think About It

What is a neuron in the context of neural networks?

What range of values can a neuron hold, and what does this value represent?

Why does this network have exactly 784 input neurons and 10 output neurons? What determines these specific numbers?

Part 3: Weights & Biases

Source

💭

Think About It

What do the weights between neurons represent?

What is the purpose of bias in neural networks?

What does “learning” mean in the context of neural networks?

Part 4: Matrix Multiplication

Source

💭

Think About It

How is a neural network’s layered approach similar to how humans solve complex problems step-by-step?

Explain how matrix multiplication can compute all neuron activations in a layer simultaneously. What are the dimensions of the weight matrix W when going from 784 input neurons to 16 hidden neurons?

Part 5: Recap

Source

💭

Think About It

How is a neural network "just a function"? What makes this function "absurdly complicated" compared to typical mathematical functions?

If this “simple” network has 13,000 parameters, what does that say about real-world models?

Making the Embedding Connection

In the video, you saw how neural networks learn to recognize handwritten digits by finding complex patterns. But they can tackle much more challenging tasks, like understanding the actual meaning behind text! In an upcoming lesson, you’ll use the all-MiniLM-L6-v2 model to create vector embeddings to represent text. This model is a neural network that has been trained on massive amounts of text data and learned to create 384-dimensional vectors for any input. When you input text, the neural network processes it through 6 layers (the "L6" part), with each layer building a more sophisticated understanding. The final output is a vector with 384 numbers, each representing a different aspect of your text's meaning!

Types of Neural Networks & Large Language Models

In the video, you saw the basic structure of a simple neural network, but there are actually many different types of neural networks, each suited to a different type of task. These variants of neural networks include convolutional neural networks (CNNs), recurrent neural networks (RNNs), and even transformer networks like the ones used for text generation in models like BERT and GPT.

As you can imagine, neural networks for large language models introduce many more layers of complexity. For example, the number of neurons in the input layer of a large language model (LLM) typically contains thousands of neurons. As we saw in the first section of this lesson, word embeddings often have high dimensionality, which directly corresponds to the number of neurons in the first layer.

In addition to an increased number of neurons, LLM neural networks also typically have many hidden layers. For example, ChatGPT’s GPT-3.5 model has 96 hidden layers! These kinds of neural networks with multiple hidden layers are called deep neural networks due to their multilayered architecture and fall under the deep learning category of machine learning. While not all neural networks are deep learning models, all deep learning models are neural networks!

📝

Practice | Neural Networks

Create a 4–6 slide presentation in Canva or Google Slides that explains one of the following concepts to someone who has never heard of it:

What a neuron is and what it does

How data flows through the input, hidden, and output layers

What weights and biases do and why they matter

What it means for a neural network to "learn"

Why neural networks need so many layers to recognize complex patterns

Your presentation must include:

At least one diagram, visual, or example

An analogy that makes your concept click and where it breaks down

Something you could screen-share and walk someone through

Before you start building, write down:

Which concept did you choose?

What's your analogy or example?

Who is your audience? What can you assume they already know?

Not feeling the presentation format? If you have a strong idea for a different format, go for it! Your alternative must still explain one concept from the list above, include an analogy and its limitations, and be something you could share with the group. A few formats that have worked well: a short comic strip, an annotated diagram, a TikTok-style video, or a written story where each character represents part of the network.

🤖 AI Connection

Once you've chosen your concept and drafted your analogy, test it out on an AI tool: "I'm explaining [CONCEPT] to someone who has never heard of neural networks. I'm using this analogy: [YOUR ANALOGY]. Where does this analogy work well, and where does it break down or become misleading?" Use the feedback to strengthen your analogy before you start building your presentation. Remember, the AI's critique is just one perspective. You might disagree with some of its points, and that's fine!

💼 Takeaways

In this lesson, you learned how neural networks process raw data through layers of neurons to ultimately learn and make predictions about complex inputs like text!

Deep learning models translate complex data—like text—into numerical vectors (embeddings) that capture important patterns and meaning.

A neural network is organized into layers: the input layer receives raw data, hidden layers process and transform this data into more abstract features, and the output layer delivers the final prediction or result.

Each neuron is a fundamental unit that holds a number (its activation). Think of a neuron as a tiny processing unit that takes in values, performs a calculation, and then outputs a result.

Weights determine how strongly each input influences a neuron.
Biases adjust the weighted sum before it's passed on, shifting the neuron's response.

🌱 Extension Resources:

IBM - What is a Neural Network?

Neural Networks Explained in 5 mins

MIT - Explained: Neural Networks

AWS - What is a Neural Network?

Neural Network in 5 minutes (Timestamps 0:00 - 4:21)

CNN vs. RNN: How are they different?

For a summary of this lesson, check out the 9. Neural Networks One-Pager!