🎯 Learning Goals
- Define and explain the purpose of functions
- Write and use functions with parameters and return values
- Understand the purpose of Python libraries and how they enhance functionality
- Use
pandasandnumpyto manipulate data and perform mathematical operations
đź“— Technical Vocabulary
- Function
- Parameter
- Argument
- Return value
- Library
numpy
pandas
- DataFrame
Warm-Up
Imagine this: Every morning, you wake up craving avocado toast. You go to the kitchen and start from scratch—toasting the bread, mashing the avocado, seasoning it, and putting it all together. You do this every single day, step-by-step.
print("1. Toast the bread") print("2. Mash the avocado") print("3. Add seasoning") print("4. Put it all together")
Sounds a little repetitive, right? What if you could automate the process?
In the world of programming, we solve problems like this using functions—reusable blocks of code that perform a task whenever we need them. Instead of rewriting every single step every time we want toast, we can write a function that does it for us. Then, with just one command, we can get our delicious result without repeating ourselves!
Functions
A function is a reusable set of instructions that performs a specific task. Instead of writing the same steps over and over, we define a function once and then call it whenever we need it. Generally, a function:
- Takes input (optional): Some functions need information (like ingredients for a recipe).
- Performs a task: It follows a set of steps to achieve a result.
- Returns an output (optional): Some functions give us back a result (like a finished dish).
- Can be reused: Once created, a function can be called multiple times without rewriting the instructions.
Defining a Function
To create a function in Python, use the
def keyword, followed by a function name and parentheses (). The function name is written in snake_case and describes the function’s job. For example, if we wanted to package up the steps to make avocado toast into a single function, it might look like this:def make_avo_toast(): print("1. Toast the bread") print("2. Mash the avocado") print("3. Add seasoning") print("4. Put it all together")
Notice the colon and the indentation! This is required in Python. Everything inside the function must be indented.
Calling a Function
Once a function is defined, you call it to execute the code inside.
# defining the function def make_avo_toast(): print("1. Toast the bread") print("2. Mash the avocado") print("3. Add seasoning") print("4. Put it all together") # calling the function make_avo_toast()
Use this template to follow along with the Try-It and Practice exercises throughout this lesson!
Try-It | Defining and Calling Functions
- Define a function that prints the steps to make your favorite food.
- Call the function as many times as you want!
Arguments and Parameters
Functions can take parameters, which allow you to pass in values when calling the function.
# defining the function with a parameter def greet(name): print(f"Hello, {name}!") # calling the function with an argument greet("Karlie") # prints "Hello, Karlie!" # calling the function with a different argument greet("Beyonce") # prints "Hello, Beyonce!"
Parameters go inside the parentheses
(name). You can pass different values each time.Return Value
A function can return a value using the
return keyword.def multiply_numbers(a, b): return a * b result = multiply_numbers(5, 4) print(result)
The function returns a value instead of just printing it, allowing us to use it later.
Try-It | Parameters and Arguments
- Write a function called
five_more()that takes one integer as an argument and returns the sum of that number and 5.
- Write a function called
celsius_to_fahrenheit()that takes a temperature in Celsius as input and returns the equivalent Fahrenheit temperature using the formula: F=(CĂ—9/5)+32.
- Write a function that takes a list of integers as input and returns their sum. Use a
forloop to iterate through the list, adding each number to a total sum. Return the final sum. Choose a clear and descriptive name for your function.
- Write a function that takes a string as an argument and “cleans it up” by standardizing the formatting. The function should lowercase the input text, remove unwanted spaces and punctuation (
.,!?;:), and return the cleaned-up string after processing.
Python Libraries
Python libraries are collections of pre-written functions and code that we can use instead of writing everything from scratch. Think of them as toolkits that make programming easier by providing ready-made functions for different tasks.
How They Work:
- A library is like a cookbook full of recipes (functions).
- Rather than writing every function yourself, you can import a library and use its functions right away.
- This saves time and helps you avoid errors in repetitive tasks.
Let’s say we want to find the square root of a number. Instead of writing our own
square_root function, we can use Python’s built-in math library:import math # Import the math library result = math.sqrt(25) # Use the sqrt() function print(result) # Output: 5.0
Here,
math.sqrt() is a function inside the math library. We didn’t have to write the logic ourselves! By using libraries, we reuse existing functions instead of reinventing the wheel. Before using a function from a library, we must first import the library (like we did here with import math).Just like a good function saves us time in our own code, libraries save us even more time by giving us access to functions written by experts! In this lesson, we’ll explore two of the most popular Python libraries:
pandas and numpy.NumPy
NumPy (Numerical Python) is a library that makes working with numbers, especially large sets of numbers, fast and efficient. It introduces a new data type called an array, which is a collection like a list, but much faster for math operations. Let’s take a look at some of the most useful NumPy functions.
import numpy as np # import the numpy library numbers = np.array([1, 2, 3, 4, 5]) print(numbers) # -> [1 2 3 4 5]
Notice we imported the NumPy library using
import numpy and then assigned it an alias with as np. This convention allows access to NumPy tools with a short prefix (np.) and is extremely common.doubled = numbers * 2 # Multiply every element by 2 print(doubled) # -> [2 4 6 8 10] square_roots = np.sqrt(numbers) print(square_roots) # -> [1. 1.41421356 1.73205081 2. 2.23606798]
With NumPy arrays, we can quickly perform mathematical operations on every element in the array. In addition to performing an operation on every element, NumPy also includes functions for statistical analysis.
average = np.mean(numbers) print(average) # -> 3.0 standard_deviation = np.std(numbers) print(standard_deviation) # -> 1.4142135623730951
These examples are only scratching the surface of what NumPy can do! Libraries like NumPy are extremely common in Machine Learning for performing data manipulation and math operations. Functions like array arithmetic,
np.mean(), and np.sqrt() are commonly used to normalize data, compute distances, and perform matrix operations (e.g., dot products) which are fundamental in many machine learning algorithms.Try-It | NumPy
- Create a NumPy Array
- Import the
numpylibrary. - Create a one-dimensional array from a Python list of integers.
- Print the array and its type with the
type()method.
- Use NumPy Functions for Statistics: Using your array from the previous exercise, compute the sum, mean, and standard deviation of the array. Print each of the statistics to view the output.
Pandas
Pandas is a powerful library for working with data, especially structured data like tables (think spreadsheets). It introduces two main data structures: Series (a one-dimensional list) and DataFrame (a two-dimensional table).
- Series:Â A Series is a one-dimensional object that can hold any data type, such as integers, floats, and strings. You can think of a Series object as just one column.
- DataFrame:Â A DataFrame is a two-dimensional object that can have columns with different types. Different kinds of inputs for a DataFrame include dictionaries, lists, series, and even another DataFrame.
import pandas as pd data = { "Show": ["Stranger Things", "Bridgerton", "Wednesday"], "Genre": ["Sci-Fi", "Romance", "Mystery",], "Episodes": [42, 32, 16], } df = pd.DataFrame(data) # create a Pandas DataFrame from a Python dictionary print("DataFrame:\n", df) # DataFrame: # Show Genre Episodes # 0 Stranger Things Sci-Fi 42 # 1 Bridgerton Romance 32 # 2 Wednesday Mystery 16
Similar to using
np for numpy, it’s common practice to use pd as an alias for the pandas library. The dictionary called data contained separate lists for each of the fields. By using the DataFrame() constructor function and passing in data as the argument, we created a DataFrame, a two-dimensional table with columns and rows. But you don’t always have to create DataFrames from dictionaries or lists! With pandas, you can also read data from a file.df = pd.read_csv("sample_data/california_housing_test.csv") print(df.head()) # longitude latitude housing_median_age total_rooms total_bedrooms \ # 0 -122.05 37.37 27.0 3885.0 661.0 # 1 -118.30 34.26 43.0 1510.0 310.0 # 2 -117.81 33.78 27.0 3589.0 507.0 # 3 -118.36 33.82 28.0 67.0 15.0 # 4 -119.67 36.33 19.0 1241.0 244.0 # # population households median_income median_house_value # 0 1537.0 606.0 6.6085 344700.0 # 1 809.0 277.0 3.5990 176500.0 # 2 1484.0 495.0 5.7934 270500.0 # 3 49.0 11.0 6.1359 330000.0 # 4 850.0 237.0 2.9375 81700.0
Google Colab already has some sample data loaded for you in the
sample_data folder. This california_housing_test.csv file holds a subset of data about housing prices in California where each row represents a specific district or block group. Using the read_csv() function, we created a DataFrame from the data in the csv file! The head() method shows the first 5 rows of the data. We can also see a summary of common statistics with the describe() method.print(df.describe()) # longitude latitude housing_median_age total_rooms \ # count 3000.000000 3000.00000 3000.000000 3000.000000 # mean -119.589200 35.63539 28.845333 2599.578667 # std 1.994936 2.12967 12.555396 2155.593332 # min -124.180000 32.56000 1.000000 6.000000 # 25% -121.810000 33.93000 18.000000 1401.000000 # 50% -118.485000 34.27000 29.000000 2106.000000 # 75% -118.020000 37.69000 37.000000 3129.000000 # max -114.490000 41.92000 52.000000 30450.000000 # # total_bedrooms population households median_income \ # count 3000.000000 3000.000000 3000.00000 3000.000000 # mean 529.950667 1402.798667 489.91200 3.807272 # std 415.654368 1030.543012 365.42271 1.854512 # min 2.000000 5.000000 2.00000 0.499900 # 25% 291.000000 780.000000 273.00000 2.544000 # 50% 437.000000 1155.000000 409.50000 3.487150 # 75% 636.000000 1742.750000 597.25000 4.656475 # max 5419.000000 11935.000000 4930.00000 15.000100 # # median_house_value # count 3000.00000 # mean 205846.27500 # std 113119.68747 # min 22500.00000 # 25% 121200.00000 # 50% 177650.00000 # 75% 263975.00000 # max 500001.00000
We can also filter out values using
pandas. We can see that the mean age of homes in the dataset is around 29 years (28.845333 years), but there are some areas where the median age is only 1 year! Let’s focus on only the older homes and see how that changes the data.older_homes = df[df["housing_median_age"] >= 29] print(older_homes.describe()) # longitude latitude housing_median_age total_rooms \ # count 1552.000000 1552.000000 1552.000000 1552.000000 # mean -119.611211 35.504594 39.016108 1947.558634 # std 1.942280 1.993734 7.021777 1078.403645 # min -124.180000 32.570000 29.000000 16.000000 # 25% -121.922500 33.970000 34.000000 1239.000000 # 50% -118.410000 34.180000 37.000000 1762.500000 # 75% -118.150000 37.670000 44.000000 2436.250000 # max -115.560000 41.310000 52.000000 10088.000000 # # total_bedrooms population households median_income \ # count 1552.000000 1552.000000 1552.000000 1552.000000 # mean 417.813789 1161.641753 396.313144 3.583295 # std 237.658434 672.762212 220.361098 1.820155 # min 4.000000 8.000000 3.000000 0.499900 # 25% 266.000000 737.000000 256.000000 2.403100 # 50% 375.500000 1032.500000 358.000000 3.272200 # 75% 519.000000 1439.500000 491.250000 4.281475 # max 2010.000000 6675.000000 1939.000000 15.000100 # # median_house_value # count 1552.000000 # mean 211289.743557 # std 118645.550620 # min 22500.000000 # 25% 125000.000000 # 50% 180050.000000 # 75% 268950.000000 # max 500001.000000
Let’s break down how we did that! First, we pulled out a list of the median ages of homes from the entire dataset with
df[”housing_median_age”].- Then we compared each value in the
“housing_median_age”column to the number 29:df[”housing_median_age”] >= 29.
- Then we wrapped that comparison inside of
df[…]to use boolean indexing, which tellspandasto filterdfand keep only the rows where the condition isTrue.
- Then we assigned the filtered DataFrame to a new variable called
older_homes.
- Finally, using the
describe()method onolder_homes, we can see how the descriptive statistics compare to the overall dataset.
As you can see, the
pandas library is extremely useful for loading, exploring, and analyzing structured data. As we continue exploring machine learning, functions such as pd.read_csv(), df.head(), and df.describe() will help you load, inspect, and clean your data. This step is crucial before you feed your data into any machine learning model!Try-It | Pandas
- Import the
pandaslibrary.
- Load the
california_housing_test.csvdataset using theread_csv()function.
- Use the
head()anddescribe()methods to explore the dataset.
- Filter the dataset to include only rows where the median house value is at least $200,000.
Practice | Functions & Libraries
- Write a function that accepts a shopper’s subtotal as an argument and returns their total with sales tax. Sales tax in New York is 8.53%. Then, print a statement telling the user their total with sales tax. Remember to round the total to two decimal places. Hint: To find the total with sales tax, multiply the subtotal by 0.0853 and add it to the original subtotal.
- Create a DataFrame named
friendsfrom the following dictionary and use thehead()anddescribe()methods to display the first few rows and summary statistics.
friends_dictionary = { 'Name': ['Karlie', 'Serena', 'Billie', 'Lizzo', 'Demi', 'Olivia'], 'Age': [29, 40, 20, 33, 29, 22], 'City': ["Chicago", "Saginaw", "Los Angeles", "Detroit", "Dallas", "NYC"] }
- Write a function named
filter_by_columnthat accepts a Pandas DataFrame, a column name (string), and a threshold value. The function should return a new DataFrame with only the rows where the specified column's value is less than the threshold. Test your function on thefriendsDataFrame you created in Exercise 2 to filter friends who are younger than 30 years old.
🤖 AI Connection
You've now used a few
pandas methods like head(), describe(), and filtering with brackets. But pandas has hundreds of built-in methods! Ask an AI tool: "I'm learning pandas in Python. What are 3 other useful DataFrame methods for exploring a dataset, and what does each one do? Show me a short example of each." Try running the examples on your friends DataFrame or the California housing dataset. Do they work as described?Discuss
- How do functions make code more efficient?
- How do libraries like
numpyandpandashelp us work with data more easily?
đź’Ľ Takeaways
- Functions in Python help you write organized, reusable code that can simplify complex tasks and reduce repetition.
- NumPy arrays and the library’s built-in mathematical functions make numerical computations fast and efficient.
- Pandas DataFrames offer easy ways to load, explore, and manipulate data, using methods such as
describe()and filtering, which are foundational components of exploring data for machine learning.
For a summary of this lesson, check out the 4. Python: Functions & Libraries One-Pager!
Â