Get your own climate data from Climate Data Online

There are more Earth Observation data online than any one person could ever look at

NASA’s Earth Observing System Data and Information System (EOSDIS) alone manages over 9PB of data. 1 PB is roughly 100 times the entire Library of Congress (a good approximation of all the books available in the US). It’s all available to you once you learn how to download what you want.

Here we’re using the NOAA National Centers for Environmental Information (NCEI) Access Data Service application progamming interface (API) to request data from their web servers. We will be using data collected as part of the Global Historical Climatology Network daily (GHCNd) from their Climate Data Online library program at NOAA.

For this example we’re requesting daily summary data in Chicago, IL (station ID USW00094846).

  1. Research the Global Historical Climatology Network - Daily data source.
  2. In the cell below, write a 2-3 sentence description of the data source.
  3. Include a citation of the data (HINT: See the ‘Data Citation’ tab on the GHCNd overview page).

Your description should include:

  • who takes the data
  • where the data were taken
  • what the maximum temperature units are
  • how the data are collected

Access NCEI GHCNd Data from the internet using its API 🖥️ 📡 🖥️

The cell below contains the URL for the data you will use in this part of the notebook. We created this URL by generating what is called an API endpoint using the NCEI API documentation.

What’s an API?

An application programming interface (API) is a way for two or more computer programs or components to communicate with each other. It is a type of software interface, offering a service to other pieces of software (Wikipedia).

First things first – you will need to import the pandas library to access NCEI data through its URL:

# Import required packages
See our solution!
# Import required packages
import pandas as pd
Try It: Format your URL for readability
  1. Pick an expressive variable name for the URL.
  2. Reformat the URL so that it adheres to the 79-character PEP-8 line limit. You should see two vertical lines in each cell - don’t let your code go past the second line.
  3. At the end of the cell where you define your url variable, call your variable (type out its name) so it can be tested.
stuff23 = ('https://www.ncei.noaa.gov/access/services/da'
'ta/v1?dataset=daily-summaries&dataTypes=TOBS,PRCP&stations=USC00050848&startDate=1893-10-01&2023-09-30')
stuff23
'https://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=TOBS,PRCP&stations=USC00050848&startDate=1893-10-01&2023-09-30'
See our solution!
ncei_url = (
    'https://www.ncei.noaa.gov/access/services/data/v1'
    '?dataset=daily-summaries'
    '&dataTypes=TOBS,PRCP'
    '&stations=USC00050848'
    '&startDate=1893-10-01'
    '&endDate=2023-09-30'
    '&units=standard'
)
ncei_url
'https://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=TOBS,PRCP&stations=USC00050848&startDate=1893-10-01&endDate=2023-09-30&units=standard'

Download and get started working with NCEI data

Go ahead and use pandas to import data from your API URL into Python. If you didn’t already, you should import the pandas library at the top of this notebook so that others who want to use your code can find it easily.

# Import data into Python from NCEI API
See our solution!
# Download the climate data
climate_df = pd.read_csv(
    ncei_url,
    # index_col='DATE',
    # parse_dates=True,
    # na_values=['NaN']
)

# Check that the download worked
climate_df.head()
STATION DATE PRCP TOBS
0 USC00050848 1893-10-01 0.94 NaN
1 USC00050848 1893-10-02 0.00 NaN
2 USC00050848 1893-10-03 0.00 NaN
3 USC00050848 1893-10-04 0.04 NaN
4 USC00050848 1893-10-05 0.00 NaN