import holoviews as hv
import hvplot.pandas
Climate Coding Challenge
Climate change is impacting the way people live around the world
- Analyze temperature data over time
- Parse date information so that it is represented as a datetime type
- Use operators to convert to different units
- Resample time-series data to different frequencies
Part 1: Overview
Higher highs, lower lows, storms, and smoke – we’re all feeling the effects of climate change. In this workflow, you will take a look at trends in temperature over time in Boulder, CO.
:::
:::
What the fork?! Who wrote this?
Below is a scientific Python workflow. But something’s wrong – The code won’t run! Your task is to follow the instructions below to clean and debug the Python code below so that it runs.
Don’t worry if you can’t solve every bug right away. We’ll get there! If you are working on one bug for more than about 10 minutes, it’s time to ask for help.
At the end, you’ll repeat the workflow for a location and measurement of your choosing.
Alright! Let’s clean up this code.
Part 2: Wrangle your data
Python packages let you use code written by experts around the world
Because Python is open source, lots of different people and organizations can contribute (including you!). Many contributions are in the form of packages which do not come with a standard Python download.
Learn more about using Python packages. How do you find and use packages? What is the difference between installing and importing packages? When do you need to do each one? This article on Python packages will walk you through the basics.
In the cell below, someone was trying to import the pandas package, which helps us to work with tabular data such as comma-separated value or csv files.
- Correct the typo below to properly import the pandas package under its alias pd.
- Run the cell to import pandas
# Import pandas
import pandsa as pd
See our solution!
# Import pandas
import pandas as pd
Download the practice data
Next, lets download some climate data from Boulder, CO to practice with. We keep our practice data on GitHub, so that we can check that it still works and make sure it looks just like the data you would download from the original source.
Do you want to download your own climate data from a place of your choosing? We think the sample data we’ve provided is helpful for learning, but hopefully you have some other places and times you want data from. Learn how to modify your NCEI data download in our NCEI Data Library entry.
The cell below contains the URL for the data you will use in this part of the notebook. There are two things to notice about the URL code:
- It is surrounded by quotes – that means Python will interpret it as a
string
, or text, type, which makes sense for a URL. - The URL is too long to display as one line on most screens. We’ve put parentheses around it so that we can easily split it into multiple lines by writing two strings – one on each line.
However, we still have a problem - we can’t get the URL back later on because it isn’t saved in a variable. In other words, we need to give the url a name so that we can request in from Python later (sadly, Python has no ‘hey what was that thingy I typed yesterday?’ function).
One of the most common challenges for new programmers is making sure that your results are stored so you can use them again. In Python, this is called naming, or saving a variable. Learn more in this hands-on activity on using variables from our learning portal.
- Pick an expressive variable name for the URL.
- Click on the
Jupyter
tab in the console panel at the bottom of VSCode to see all your variables. Your new url variable will not be there until you define it and run the code. - At the end of the cell where you define your url variable, call your variable (type out its name) so it can be tested.
('https://github.com/cu-esiil-edu/esiil-learning-portal'
'/releases/download/data-release/climate-foundations-data.csv'
)
See our solution!
= (
ncei_url 'https://github.com/cu-esiil-edu/esiil-learning-portal'
'/releases/download/data-release/climate-foundations-data.csv'
) ncei_url
'https://github.com/cu-esiil-edu/esiil-learning-portal/releases/download/data-release/climate-foundations-data.csv'
The pandas
library you imported can download data from the internet directly into a type of Python object called a DataFrame
. In the code cell below, you can see an attempt to do just this. But there are some problems…
Leave a space between the
#
and text in the comment and try making the comment more informativeMake any changes needed to get this code to run. HINT: The
my_url
variable doesn’t exist - you need to replace it with the variable name you chose.Modify the
.read_csv()
statement to include the following parameters:index_col='DATE'
– this sets theDATE
column as the index. Needed for subsetting and resampling later onparse_dates=True
– this letspython
know that you are working with time-series data, and values in the indexed column are date time objectsna_values=['NaN']
– this letspython
know how to handle missing values
Clean up the code by using expressive variable names, expressive column names, PEP-8 compliant code, and descriptive comments
Make sure to call your DataFrame
by typing it’s name as the last line of your code cell Then, you will be able to run the test cell below and find out if your answer is correct.
= pd.read_csv(
climate_df
my_url,='something')
index_col climate_df
See our solution!
# Download the climate data
= pd.read_csv(
climate_df
ncei_url,='DATE',
index_col=True,
parse_dates=['NaN'])
na_values climate_df
STATION | PRCP | TOBS | |
---|---|---|---|
DATE | |||
1893-10-01 | USC00050848 | 0.94 | NaN |
1893-10-02 | USC00050848 | 0.00 | NaN |
1893-10-03 | USC00050848 | 0.00 | NaN |
1893-10-04 | USC00050848 | 0.04 | NaN |
1893-10-05 | USC00050848 | 0.00 | NaN |
... | ... | ... | ... |
2023-09-26 | USC00050848 | 0.00 | 74.0 |
2023-09-27 | USC00050848 | 0.00 | 69.0 |
2023-09-28 | USC00050848 | 0.00 | 73.0 |
2023-09-29 | USC00050848 | 0.00 | 66.0 |
2023-09-30 | USC00050848 | 0.00 | 78.0 |
45971 rows × 3 columns
HINT: Check out the
type()
function below - you can use it to check that your data is now inDataFrame
type object
# Check that the data was imported into a pandas DataFrame
type(climate_df)
Clean up your DataFrame
You can use double brackets ([[
and ]]
) to select only the columns that you want from your DataFrame
:
- Change
some_column_name
to the Precipitation column name andanother_column_name
to the Observed Temperature column name.
Column names are text values, not variable names, so you need to put them in quotes!
Make sure to call your DataFrame
by typing it’s name as the last line of your code cell Then, you will be able to run the test cell below and find out if your answer is correct.
= climate_df[['some_column_name', 'another_column_name']]
climate_df climate_df
See our solution!
# Clean up the DataFrame
= climate_df[['PRCP', 'TOBS']]
climate_df climate_df
PRCP | TOBS | |
---|---|---|
DATE | ||
1893-10-01 | 0.94 | NaN |
1893-10-02 | 0.00 | NaN |
1893-10-03 | 0.00 | NaN |
1893-10-04 | 0.04 | NaN |
1893-10-05 | 0.00 | NaN |
... | ... | ... |
2023-09-26 | 0.00 | 74.0 |
2023-09-27 | 0.00 | 69.0 |
2023-09-28 | 0.00 | 73.0 |
2023-09-29 | 0.00 | 66.0 |
2023-09-30 | 0.00 | 78.0 |
45971 rows × 2 columns
Part 3: Convert units
It’s important to keep track of the units of all your data. You don’t want to be like the NASA team who crashed a probe into Mars because different teams used different units)!
Use labels to keep track of units for you and your collaborators
One way to keep track of your data’s units is to include the unit in data labels. In the case of a DataFrame
, that usually means the column names.
A big part of writing expressive code is descriptive labels. Let’s rename the columns of your dataframe to include units. Complete the following steps:
- Replace
dataframe
with the name of yourDataFrame
, anddataframe_units
with an expressive new name. - Check out the documentation for GCHNd data. We downloaded data with “standard” units; find out what that means for both temperature and precipitation.
- Replace
'TOBS_UNIT'
and'PRCP_UNIT'
with column names that reference the correct unit for each.
= dataframe.rename(columns={
dataframe_units 'TOBS': 'TOBS_UNIT',
'PRCP': 'PRCP_UNIT'
})
dataframe
See our solution!
= climate_df.rename(columns={
climate_u_df 'TOBS': 'temp_f',
'PRCP': 'precip_in'
}) climate_u_df
precip_in | temp_f | |
---|---|---|
DATE | ||
1893-10-01 | 0.94 | NaN |
1893-10-02 | 0.00 | NaN |
1893-10-03 | 0.00 | NaN |
1893-10-04 | 0.04 | NaN |
1893-10-05 | 0.00 | NaN |
... | ... | ... |
2023-09-26 | 0.00 | 74.0 |
2023-09-27 | 0.00 | 69.0 |
2023-09-28 | 0.00 | 73.0 |
2023-09-29 | 0.00 | 66.0 |
2023-09-30 | 0.00 | 78.0 |
45971 rows × 2 columns
For scientific applications, it is often useful to have values in metric units
The code below attempts to convert the data to Celcius, using Python mathematical operators, like +
, -
, *
, and /
. Mathematical operators in Python work just like a calculator, and that includes using parentheses to designat the order of operations. The equation for converting Fahrenheit temperature to Celcius is:
\[ T_C = (T_F - 32) * \frac{5}{9} \]
This code is not well documented and doesn’t follow PEP-8 guidelines, which has caused the author to miss an important error!
Complete the following steps:
- Replace
dataframe
with the name of yourDataFrame
. - Replace
'old_temperature'
with the column name you used; Replace'new_temperature'
with an expressive column name. - THERE IS AN ERROR IN THE CONVERSION MATH - Fix it!
'new_temperature']= dataframe_units['old_temperature']-32*5/9
dataframe_units[ dataframe_units
See our solution!
'temp_c'] = (climate_u_df['temp_f'] - 32) * 5 / 9
climate_u_df[
climate_u_df
precip_in | temp_f | temp_c | |
---|---|---|---|
DATE | |||
1893-10-01 | 0.94 | NaN | NaN |
1893-10-02 | 0.00 | NaN | NaN |
1893-10-03 | 0.00 | NaN | NaN |
1893-10-04 | 0.04 | NaN | NaN |
1893-10-05 | 0.00 | NaN | NaN |
... | ... | ... | ... |
2023-09-26 | 0.00 | 74.0 | 23.333333 |
2023-09-27 | 0.00 | 69.0 | 20.555556 |
2023-09-28 | 0.00 | 73.0 | 22.777778 |
2023-09-29 | 0.00 | 66.0 | 18.888889 |
2023-09-30 | 0.00 | 78.0 | 25.555556 |
45971 rows × 3 columns
Using the code below as a framework, write and apply a function that converts to Celcius. You should also rewrite this function name to be more expressive.
def convert(temperature):
"""Convert temperature to Celcius"""
return temperature # Put your equation in here
'TOBS_C'] = dataframe['TOBS'].apply(convert) dataframe[
Part 4: Plot your results
You’ll also need some libraries later on. This is an extension to pandas
that will allow you to easily make beautiful, interactive plots, and a related library that will let you save your plots:
Plot the precpitation column (PRCP) vs time to explore the data
Plotting in Python is easy, but not quite this easy:
climate_df.plot()
Looks like we have both precipitation and temperature on the same plot, and it’s hard to see what it is because it’s missing labels!
Make sure each plot has:
- A title that explains where and when the data are from
- x- and y- axis labels with units where appropriate
- A legend where appropriate
When plotting in Python, you’ll always need to add some instructions on labels and how you want your plot to look.
- Change
dataframe
to yourDataFrame
name. - Change
y=
to the name of your observed temperature column name. - Use the
title
,ylabel
, andxlabel
parameters to add key text to your plot. - Adjust the size of your figure using
figsize=(x,y)
wherex
is figure width andy
is figure height
HINT: labels have to be a type in Python called a string. You can make a string by putting quotes around your label, just like the column names in the sample code (eg
y='TOBS'
).
# Plot the data using .plot
climate_df.plot(='the_precipitation_column',
y='Title Goes Here',
title='Horizontal Axis Label Goes Here',
xlabel='Vertical Axis Label Goes Here') ylabel
See our solution!
# Plot the data using .plot
climate_df.plot(='TOBS',
y='Daily Temperature in Boulder, CO',
title='Date',
xlabel='Temperature ($^\circ$F)') ylabel
There are many other things you can do to customize your plot. Take a look at the pandas plotting galleries and the documentation of plot to see if there’s other changes you want to make to your plot. Some possibilities include:
- Remove the legend since there’s only one data series
- Increase the figure size
- Increase the font size
- Change the colors
- Use a bar graph instead (usually we use lines for time series, but since this is annual it could go either way)
- Add a trend line
Not sure how to do any of these? Try searching the internet, or asking an AI!
Clean up time series plots by resampling
You may notice that your plot looks a little “fuzzy”. This happens when Python is trying to plot a value for every date, but the resolution of the image is too low to actually do that. You can address this issue by resampling the data, or summarizing it over a time period of your choice. In this case, we will resample annually, giving us one data point per year.
- Set the frequency of your final data by replacing
DT_OFFSET
with a Datetime Offset Code. Check out the table in the pandas datetime documentation to find the one you want (we recommend the start of the year). - Choose how to summarize each year of data by replacing
agg_method_here
with a method that will calculate the average annual value. Check out the pandas resampling documentation for a list of common built-in options.
= climate_df.resample('DT_OFFSET').agg_method_here()
ann_climate_df ann_climate_df
See our solution!
= climate_df.resample('YS').mean()
ann_climate_df # Store for later
%store ann_climate_df
ann_climate_df
Stored 'ann_climate_df' (DataFrame)
PRCP | TOBS | |
---|---|---|
DATE | ||
1893-01-01 | 0.025543 | NaN |
1894-01-01 | 0.058841 | NaN |
1895-01-01 | 0.117090 | NaN |
1896-01-01 | NaN | NaN |
1897-01-01 | 0.068922 | NaN |
... | ... | ... |
2019-01-01 | 0.057644 | 54.426997 |
2020-01-01 | 0.046721 | 57.691460 |
2021-01-01 | 0.056658 | 57.538462 |
2022-01-01 | 0.051479 | 56.139726 |
2023-01-01 | 0.076740 | 58.996337 |
131 rows × 2 columns
- Try plotting your new DataFrame in the cell below. Can you see what is going on more clearly now? Don’t forget to adjust your labels!
# Plot the annual data
See our solution!
# Plot the annual data using .plot
ann_climate_df.plot(='TOBS',
y='Annual Average Temperature in Boulder, CO',
title='Year',
xlabel='Temperature ($^\circ$F)'
ylabel )
Create a new Markdown cell below this one.
In the new cell, answer the following questions using a bulleted list in Markdown – what are 2 things you notice about this data? What physical phenomena or data anomaly could be causing each one?
Check specific values with an interactive plot
You can use the .hvplot()
method with similar arguments to create an interactive plot.
- Copy your plotting code into the cell below.
- Replace
.plot
in your code with.hvplot
Now, you should be able to hover over data points and see their values!
# Plot the annual data interactively
See our solution!
# Plot the annual data using .hvplot
= ann_climate_df.hvplot(
ann_climate_plot ='TOBS',
y='Annual Average Temperature in Boulder, CO',
title='Year',
xlabel='Temperature (deg. F)'
ylabel
) ann_climate_plot
Create a new Markdown cell below this one.
Hover over the lowest point on your plot. What is the overall minimum annual average temperature?
BONUS: Save your work
You will need to save your analyses and plots to tell others about what you find.
Just like with any other type of object in Python, if you want to reuse your work, you need to give it a name.
- Go back to your
hvplot
code, and give your plot a name by assigning it to a variable. HINT: if you still want your plot to display in your notebook, make sure to call its name at the end of the cell. - Replace
my_plot
with the name you gave to your plot. - Replace
'my_plot.html'
with the name you want for your plot. If you change the file extension,.html
, to.png
, you will get an image instead of an interactive webpage, provided you have the necessary libraries installed.
Once you run the code, you should see your saved plot in your files – go ahead and open it up.
You may need to right-click on your file and download it to be able to view it.
'my_plot.html') hv.save(my_plot,
See our solution!
'annual_climate.html') hv.save(ann_climate_plot,
The following cell contains package imports that you will need to calculate and plot an OLS Linear trend line. Make sure to run the cell before moving on, and if you have any additional packages you would like to use, add them here later on.
# Advanced options on matplotlib/seaborn/pandas plots
import matplotlib.pyplot as plt
# Common statistical plots for tabular data
import seaborn as sns
# Fit an OLS linear regression
from sklearn.linear_model import LinearRegression
- To get sample code, ask ChatGPT how to fit a linear model to your data. If you’re new to using large language modesl, go ahead and check out our query
- Copy code that uses the
scikit-learn
package to perform a OLS linear regression to the code cell below. - Check out your previous plot. Does it make sense to include all the data when calculating a trend line? Be sure to select out data that meets the OLS assumptions.
We know that some computers, networks, and countries block LLM (large language model) sites, and that LLMs can sometimes perpetuate oppressive or offensive language and ideas. However, LLMs are increasingly standard tools for programming – according to GitHub many developers code 55% faster with LLM assistance. We also see in our classes that LLMs give students the ability to work on complex real-world problems earlier on. We feel it’s worth the trade-off, and at this point we would be doing you a disservice professionally to teach you to code without LLMs. If you can’t access them, don’t worry – we’ll present a variety of options for finding example code. For example, you can also search for an example on a site like StackOverflow (this is how we all learned to code, and with the right question it’s a fantastic resource for any coder to get access to up-to-date information from world experts quickly). You can also use our solutions as a starting point.
# Fit an OLS Linear Regression to the data
See our solution!
= ann_climate_df.loc['1989':'2024']
ann_climate_df
# Drop no data values
= ann_climate_df.TOBS.dropna()
observations
# Define the dependent variable and independent variable(s)
= observations.index.year.values.reshape(-1, 1)
features = observations
response
# Create a Linear Regression model
= LinearRegression()
model
# Fit the model on the training data
model.fit(features, response)
# Calculate and print the metrics
print(f'Slope: {model.coef_[0]} degrees per year')
Slope: 0.13079071315632046 degrees per year
Plot your trend line
Trend lines are often used to help your audience understand and process a time-series plot. In this case, we’ve chosed mean temperature values rather than extremes, so we think OLS is an appropriate model to use to show a trend.
This is a tricky issue. When it comes to a trend line, choosing a model that is technically more appropriate may require much more complex code without resulting in a noticeably different trend line.
We think an OLS trend line is an ok visual tool to indicate the approximate direction and size of a trend. If you are showing standard error, making predictions or inferences based on your model, or calculating probabilities (p-values) based on your model, or making statements about the statistical significance of a trend, we’d suggest reconsidering your choice of model.
- Add values for x (year) and y (temperature) to plot a regression plot. You will have to select out the year from the index values, just like you probably did when fitting your linear model above!
- Label the axes of your plot with the
title
,xlabel
, andylabel
parameters. You can see how to add the degree symbol in the example below. Make sure your labels match what you’re plotting!
# Plot annual average temperature data with a trend line
= sns.regplot(
ax =,
x=,
y
)# Set plot labels
set(
ax.='',
title='',
xlabel='Temperature ($^\circ$F)'
ylabel
)# Display the plot without extra text
plt.show()
See our solution!
= sns.regplot(
ax =ann_climate_df.index.year,
x=ann_climate_df.TOBS,
y='red',
color={'color': 'black'})
line_kwsset(
ax.='Annual Average Daily Temperature over time in Boulder, CO',
title='Year',
xlabel='Temperature ($^\circ$F)'
ylabel
) plt.show()
Create a new Markdown cell below this one.
Write a plot headline. Your headline should interpret your plot, unlike a caption which neutrally describes the image.
Is the climate changing? How much? Report the slope of your trend line.
What question do you want to answer with climate data? The options are limitless! To get started, you could think about:
- How is climate change happening in your home town?
- How is climate change different at different latitudes?
- Do heat waves affect urban areas more?