The Cameron Peak Fire, Colorado, USA

The Cameron Peak Fire was the largest fire in Colorado history, with 326 square miles burned.

Observing vegetation health from space

We will look at the destruction and recovery of vegetation using NDVI (Normalized Difference Vegetation Index). How does it work? First, we need to learn about spectral reflectance signatures.

Every object reflects some wavelengths of light more or less than others. We can see this with our eyes, since, for example, plants reflect a lot of green in the summer, and then as that green diminishes in the fall they look more yellow or orange. The image below shows spectral signatures for water, soil, and vegetation:

> Image source: SEOS Project

Healthy vegetation reflects a lot of Near-InfraRed (NIR) radiation. Less healthy vegetation reflects a similar amounts of the visible light spectra, but less NIR radiation. We don’t see a huge drop in Green radiation until the plant is very stressed or dead. That means that NIR allows us to get ahead of what we can see with our eyes.

Healthy leaves reflect a lot of NIR radiation compared to dead or stressed leaves > Image source: Spectral signature literature review by px39n

Different species of plants reflect different spectral signatures, but the pattern of the signatures are similar. NDVI compares the amount of NIR reflectance to the amount of Red reflectance, thus accounting for many of the species differences and isolating the health of the plant. The formula for calculating NDVI is:

\[NDVI = \frac{(NIR - Red)}{(NIR + Red)}\]

Read more about NDVI and other vegetation indices:

{{< fa fa-keyboard >}} Import necessary libraries

In the cell below, making sure to keep the packages in order, add packages for:

  • Working with DataFrames
  • Working with GeoDataFrames
  • Making interactive plots of tabular and vector data

{{< fa fa-circle-question >}} What are we using the rest of these packages for? See if you can figure it out as you complete the notebook.

import getpass
import json
import os
import pathlib
from glob import glob

import earthpy.appeears as eaapp
import hvplot.xarray
import rioxarray as rxr
import xarray as xr
See our solution!
import getpass
import json
import os
import pathlib
from glob import glob

import earthpy.appeears as eaapp
import geopandas as gpd
import hvplot.pandas
import hvplot.xarray
import pandas as pd
import rioxarray as rxr
import xarray as xr

We have one more setup task. We’re not going to be able to load all our data directly from the web to Python this time. That means we need to set up a place for it.

GOTCHA ALERT!

A lot of times in Python we say “directory” to mean a “folder” on your computer. The two words mean the same thing in this context.

{{< fa keyboard large >}} Your task

In the cell below, replace ‘my-data-folder’ with a descriptive directory name.

data_dir = os.path.join(pathlib.Path.home(), 'my-data-folder')
# Make the data directory
os.makedirs(data_dir, exist_ok=True)
See our solution!
data_dir = os.path.join(
    pathlib.Path.home(), 'earth-analytics', 'data', 'cameron-peak')
# Make the data directory
os.makedirs(data_dir, exist_ok=True)

Study Area: Cameron Peak Fire Boundary

Earth Data Science data formats

In Earth Data Science, we get data in three main formats:

Data type Descriptions Common file formats Python type
Time Series The same data points (e.g. streamflow) collected multiple times over time Tabular formats (e.g. .csv, or .xlsx) pandas DataFrame
Vector Points, lines, and areas (with coordinates) Shapefile (often an archive like a .zip file because a Shapefile is actually a collection of at least 3 files) geopandas GeoDataFrame
Raster Evenly spaced spatial grid (with coordinates) GeoTIFF (.tif), NetCDF (.nc), HDF (.hdf) rioxarray DataArray
{{< fa glasses large >}} Read more

Check out the sections about about vector data and raster data in the textbook.

{{< fa pencil large >}} What do you think?

For this coding challenge, we are interested in the boundary of the Cameron Peak Fire. In the cell below, answer the following question: What data type do you think the fire boundary will be?

{{< fa keyboard large >}} Your task:
  • Search the National Interagency Fire Center Wildfire Boundary catalog for and incident name “Cameron Peak”
  • Copy the API results to your clipboard.
  • Load the data into Python using the geopandas library, e.g.: python gpd.read_file(url)
  • Call your data at the end of the cell for testing.
# Download the Cameron Peak fire boundary
See our solution!
url = (
    "https://services3.arcgis.com/T4QMspbfLg3qTGWY/arcgis/rest/services"
    "/WFIGS_Interagency_Perimeters/FeatureServer/0/query"
    "?where=poly_IncidentName%20%3D%20'CAMERON%20PEAK'"
    "&outFields=*&outSR=4326&f=json")

gdf = gpd.read_file(url)
gdf
OBJECTID poly_SourceOID poly_IncidentName poly_FeatureCategory poly_MapMethod poly_GISAcres poly_CreateDate poly_DateCurrent poly_PolygonDateTime poly_IRWINID ... attr_Source attr_IsCpxChild attr_CpxName attr_CpxID attr_SourceGlobalID GlobalID Shape__Area Shape__Length attr_IncidentComplexityLevel geometry
0 14659 10393 Cameron Peak Wildfire Final Fire Perimeter Infrared Image 208913.3 NaT 2023-03-14 15:13:42.810000+00:00 2020-10-06 21:30:00+00:00 {53741A13-D269-4CD5-AF91-02E094B944DA} ... FODR None None None {5F5EC9BA-5F85-495B-8B97-1E0969A1434E} e8e4ffd4-db84-4d47-bd32-d4b0a8381fff 0.089957 5.541765 None MULTIPOLYGON (((-105.88333 40.54598, -105.8837...

1 rows × 115 columns

ans_gdf = _
gdf_pts = 0

if isinstance(ans_gdf, gpd.GeoDataFrame):
    print('\u2705 Great work! You downloaded and opened a GeoDataFrame')
    gdf_pts +=2
else:
    print('\u274C Hmm, your answer is not a GeoDataFrame')

print('\u27A1 You earned {} of 2 points for downloading data'.format(gdf_pts))
✅ Great work! You downloaded and opened a GeoDataFrame
➡ You earned 2 of 2 points for downloading data

Site Map

We always want to create a site map when working with geospatial data. This helps us see that we’re looking at the right location, and learn something about the context of the analysis.

{{< fa keyboard large >}} Your task
  1. Plot your Cameron Peak Fire shapefile on an interactive map
  2. Make sure to add a title
  3. Add ESRI World Imagery as the basemap/background using the tiles=... parameter. To get the tiles to match up with your data, you will also need to add the geo=True parameter.
# Plot the Cameron Peak Fire boundary
gdf.hvplot(
    geo=True,
    title='Cameron Peak Fire, 2020',
    tiles='EsriImagery')

Exploring the AppEEARS API for NASA Earthdata access

Before you get started with the data download today, you will need a free NASA Earthdata account if you don’t have one already!

Over the next four cells, you will download MODIS NDVI data for the study period. MODIS is a multispectral instrument that measures Red and NIR data (and so can be used for NDVI). There are two MODIS sensors on two different platforms: satellites Terra and Aqua.

Since we’re asking for a special download that only covers our study area, we can’t just find a link to the data - we have to negotiate with the data server. We’re doing this using the APPEEARS API (Application Programming Interface). The API makes it possible for you to request data using code. You can use code from the earthpy library to handle the API request.

{{< fa keyboard large >}} Your task

Often when we want to do something more complex in coding we find an example and modify it. This download code is already almost a working example. Your task will be:

  1. Replace the start and end dates in the task parameters. Download data from July, when greenery is at its peak in the Northern Hemisphere.
  2. Replace the year range. You should get 3 years before and after the fire so you can see the change!
  3. Replace gdf with the name of your site geodataframe.
  4. Enter your NASA Earthdata username and password when prompted. The prompts can be a little hard to see – look at the top of your screen!

{{< fa circle-question large >}} What would the product and layer name be if you were trying to download Landsat Surface Temperature Analysis Ready Data (ARD) instead of MODIS NDVI?

Important

It can take some time for Appeears to process your request - anything from a few minutes to a few hours depending on how busy they are. You can check your progress by:

  1. Going to the Appeears webpage
  2. Clicking the Explore tab
  3. Logging in with your Earthdata account
# Initialize AppeearsDownloader for MODIS NDVI data
ndvi_downloader = eaapp.AppeearsDownloader(
    download_key='cp-ndvi',
    ea_dir=data_dir,
    product='MOD13Q1.061',
    layer='_250m_16_days_NDVI',
    start_date="01-01",
    end_date="01-31",
    recurring=True,
    year_range=[2021, 2021],
    polygon=gdf
)
# Download the prepared download -- this can take some time!
ndvi_downloader.download_files(cache=True)
# Initialize AppeearsDownloader for MODIS NDVI data
ndvi_downloader = eaapp.AppeearsDownloader(
    download_key='cp-ndvi',
    ea_dir=data_dir,
    product='MOD13Q1.061',
    layer='_250m_16_days_NDVI',
    start_date="07-01",
    end_date="07-31",
    recurring=True,
    year_range=[2018, 2023],
    polygon=gdf
)
ndvi_downloader.download_files(cache=True)
** Message: 18:00:50.035: Remote error from secret service: org.freedesktop.DBus.Error.ServiceUnknown: The name org.freedesktop.secrets was not provided by any .service files

Putting it together: Working with multi-file raster datasets in Python

Now you need to load all the downloaded files into Python. Let’s start by getting all the file names. You will also need to extract the date from the filename. Check out the lesson on getting information from filenames in the textbook.

{{< fa exclamation large >}} GOTCHA ALERT

glob doesn’t necessarily find files in the order you would expect. Make sure to sort your file names like it says in the textbook.

# Get a list of NDVI tif file paths
See our solution!
# Get list of NDVI tif file paths
ndvi_paths = sorted(glob(os.path.join(data_dir, 'cp-ndvi', '*', '*NDVI*.tif')))
len(ndvi_paths)
18

Repeating tasks in Python

Now you should have a few dozen files! For each file, you need to:

  • Load the file in using the rioxarray library
  • Get the date from the file name
  • Add the date as a dimension coordinate
  • Give your data variable a name
  • Divide by the scale factor of 10000

You don’t want to write out the code for each file! That’s a recipe for copy pasta. Luckily, Python has tools for doing similar tasks repeatedly. In this case, you’ll use one called a for loop.

There’s some code below that uses a for loop in what is called an accumulation pattern to process each file. That means that you will save the results of your processing to a list each time you process the files, and then merge all the arrays in the list.

{{< fa keyboard large >}} Your task
  • Look at the file names. How many characters from the end is the date? doy_start and doy_end are used to extract the day of the year (doy) from the file name. You will need to count characters from the end and change the values to get the right part of the file name. HINT: the index -1 in Python means the last value, -2 second-to-last, and so on.
  • Replace any required variable names with your chosen variable names
  • Change the scale_factor variable to be the correct scale factor for this NDVI dataset (HINT: NDVI should range between 0 and 1 – check your results!)
scale_factor = 1
doy_start = -1
doy_end = -1
See our solution!
scale_factor = 10000
doy_start = -19
doy_end = -12
ndvi_das = []
for ndvi_path in ndvi_paths:
    # Get date from file name
    doy = ndvi_path[doy_start:doy_end]
    date = pd.to_datetime(doy, format='%Y%j')

    # Open dataset
    da = rxr.open_rasterio(ndvi_path, masked=True).squeeze()

    # Add date dimension and clean up metadata
    da = da.assign_coords({'date': date})
    da = da.expand_dims({'date': 1})
    da.name = 'NDVI'

    # Multiple by scale factor
    da = da / scale_factor

    # Prepare for concatenation
    ndvi_das.append(da)

len(ndvi_das)
18

Next, stack your arrays by date into a time series using the xr.combine_by_coords() function. You will have to tell it which dimension you want to stack your data in.

See our solution!
ndvi_da = xr.combine_by_coords(ndvi_das, coords=['date'])
ndvi_da
<xarray.Dataset> Size: 4MB
Dimensions:      (date: 18, y: 153, x: 339)
Coordinates:
    band         int64 8B 1
  * x            (x) float64 3kB -105.9 -105.9 -105.9 ... -105.2 -105.2 -105.2
  * y            (y) float64 1kB 40.78 40.78 40.77 40.77 ... 40.47 40.46 40.46
    spatial_ref  int64 8B 0
  * date         (date) datetime64[ns] 144B 2018-06-26 2018-07-12 ... 2023-07-28
Data variables:
    NDVI         (date, y, x) float32 4MB 0.5672 0.5711 0.6027 ... 0.6327 0.6327
{{< fa keyboard large >}} Plot the change in NDVI spatially

Complete the following:

  • Select data from 2021 to 2023 (3 years after the fire)
  • Take the temporal mean (over the date, not spatially)
  • Get the NDVI variable (should be a DataArray, not a Dataset)
  • Repeat for the data from 2018 to 2020 (3 years before the fire)
  • Subtract the 2018-2020 time period from the 2021-2023 time period
  • Plot the result using a diverging color map like cmap=plt.cm.PiYG

There are different types of color maps for different types of data. In this case, we want decreases to be a different color from increases, so we should use a diverging color map. Check out available colormaps in the matplotlib documentation.

{{< fa pepper-hot large >}} For an extra challenge, add the fire boundary to the plot

# Compute the difference in NDVI before and after the fire

# Plot the difference
(
    ndvi_diff.hvplot(x='', y='', cmap='', geo=True)
    *
    gdf.hvplot(geo=True, fill_color=None, line_color='black')
)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[17], line 5
      1 # Compute the difference in NDVI before and after the fire
      2 
      3 # Plot the difference
      4 (
----> 5     ndvi_diff.hvplot(x='', y='', cmap='', geo=True)
      6     *
      7     gdf.hvplot(geo=True, fill_color=None, line_color='black')
      8 )

NameError: name 'ndvi_diff' is not defined
See our solution!
ndvi_diff = (
    ndvi_da
        .sel(date=slice('2021', '2023'))
        .mean('date')
        .NDVI 
   - ndvi_da
        .sel(date=slice('2018', '2020'))
        .mean('date')
        .NDVI
)
(
    ndvi_diff.hvplot(x='x', y='y', cmap='PiYG', geo=True)
    *
    gdf.hvplot(geo=True, fill_color=None, line_color='black')
)

Is the NDVI lower within the fire boundary after the fire?

You will compute the mean NDVI inside and outside the fire boundary. First, use the code below to get a GeoDataFrame of the area outside the Reservation. Your task: * Check the variable names - Make sure that the code uses your boundary GeoDataFrame * How could you test if the geometry was modified correctly? Add some code to take a look at the results.

out_gdf = (
    gpd.GeoDataFrame(geometry=gdf.envelope)
    .overlay(gdf, how='difference'))

Next, clip your DataArray to the boundaries for both inside and outside the reservation. You will need to replace the GeoDataFrame name with your own. Check out the lesson on clipping data with the rioxarray library in the textbook.

{{< fa exclamation large >}} GOTCHA ALERT

It’s important to use from_disk=True when clipping large arrays like this. It allows the computer to use less valuable memory resources when clipping - you will probably find that otherwise the cell below crashes the kernel

# Clip data to both inside and outside the boundary
See our solution!
ndvi_cp_da = ndvi_da.rio.clip(gdf.geometry, from_disk=True)
ndvi_out_da = ndvi_da.rio.clip(out_gdf.geometry, from_disk=True)
{{< fa keyboard large >}} Your Task

For both inside and outside the fire boundary:

  • Group the data by year
  • Take the mean. You always need to tell reducing methods in xarray what dimensions you want to reduce. When you want to summarize data across all dimensions, you can use the ... syntax, e.g. .mean(...) as a shorthand.
  • Select the NDVI variable
  • Convert to a DataFrame using the to_dataframe() method
  • Join the two DataFrames for plotting using the .join() method. You will need to rename the columns using the lsuffix= and rsuffix= parameters
{{< fa exclamation large >}} GOTCHA ALERT

The DateIndex in pandas is a little different from the Datetime Dimension in xarray. You will need to use the .dt.year syntax to access information about the year, not just .year.

Finally, plot annual July means for both inside and outside the Reservation on the same plot.

:::

# Compute mean annual July NDVI
See our solution!
# Compute mean annual July NDVI
jul_ndvi_cp_df = (
    ndvi_cp_da
    .groupby(ndvi_cp_da.date.dt.year)
    .mean(...)
    .NDVI.to_dataframe())
jul_ndvi_out_df = (
    ndvi_out_da
    .groupby(ndvi_out_da.date.dt.year)
    .mean(...)
    .NDVI.to_dataframe())

# Plot inside and outside the reservation
jul_ndvi_df = (
    jul_ndvi_cp_df[['NDVI']]
    .join(
        jul_ndvi_out_df[['NDVI']], 
        lsuffix=' Burned Area', rsuffix=' Unburned Area')
)

jul_ndvi_df.hvplot(
    title='NDVI before and after the Cameron Peak Fire'
)

Now, take the difference between outside and inside the Reservation and plot that. What do you observe? Don’t forget to write a headline and description of your plot!

# Plot difference inside and outside the reservation
See our solution!
# Plot difference inside and outside the reservation
jul_ndvi_df['difference'] = (
    jul_ndvi_df['NDVI Burned Area']
    - jul_ndvi_df['NDVI Unburned Area'])
jul_ndvi_df.difference.hvplot(
    title='Difference between NDVI within and outside the Cameron Peak Fire'
)

Your turn! Repeat this workflow in a different time and place.

It’s not just fires that affect NDVI! You could look at:

  • Recovery after a national disaster, like a wildfire or hurricane
  • Drought
  • Deforestation
  • Irrigation
  • Beaver reintroduction