import getpass
import json
import os
import pathlib
from glob import glob
import earthpy.appeears as eaapp
import hvplot.xarray
import rioxarray as rxr
import xarray as xr
The Cameron Peak Fire, Colorado, USA
The Cameron Peak Fire was the largest fire in Colorado history, with 326 square miles burned.
Observing vegetation health from space
We will look at the destruction and recovery of vegetation using NDVI (Normalized Difference Vegetation Index). How does it work? First, we need to learn about spectral reflectance signatures.
Every object reflects some wavelengths of light more or less than others. We can see this with our eyes, since, for example, plants reflect a lot of green in the summer, and then as that green diminishes in the fall they look more yellow or orange. The image below shows spectral signatures for water, soil, and vegetation:
> Image source: SEOS Project
Healthy vegetation reflects a lot of Near-InfraRed (NIR) radiation. Less healthy vegetation reflects a similar amounts of the visible light spectra, but less NIR radiation. We don’t see a huge drop in Green radiation until the plant is very stressed or dead. That means that NIR allows us to get ahead of what we can see with our eyes.
> Image source: Spectral signature literature review by px39n
Different species of plants reflect different spectral signatures, but the pattern of the signatures are similar. NDVI compares the amount of NIR reflectance to the amount of Red reflectance, thus accounting for many of the species differences and isolating the health of the plant. The formula for calculating NDVI is:
\[NDVI = \frac{(NIR - Red)}{(NIR + Red)}\]
Read more about NDVI and other vegetation indices:
See our solution!
import getpass
import json
import os
import pathlib
from glob import glob
import earthpy.appeears as eaapp
import geopandas as gpd
import hvplot.pandas
import hvplot.xarray
import pandas as pd
import rioxarray as rxr
import xarray as xr
We have one more setup task. We’re not going to be able to load all our data directly from the web to Python this time. That means we need to set up a place for it.
= os.path.join(pathlib.Path.home(), 'my-data-folder')
data_dir # Make the data directory
=True) os.makedirs(data_dir, exist_ok
See our solution!
= os.path.join(
data_dir 'earth-analytics', 'data', 'cameron-peak')
pathlib.Path.home(), # Make the data directory
=True) os.makedirs(data_dir, exist_ok
Study Area: Cameron Peak Fire Boundary
Earth Data Science data formats
In Earth Data Science, we get data in three main formats:
Data type | Descriptions | Common file formats | Python type |
---|---|---|---|
Time Series | The same data points (e.g. streamflow) collected multiple times over time | Tabular formats (e.g. .csv, or .xlsx) | pandas DataFrame |
Vector | Points, lines, and areas (with coordinates) | Shapefile (often an archive like a .zip file because a Shapefile is actually a collection of at least 3 files) |
geopandas GeoDataFrame |
Raster | Evenly spaced spatial grid (with coordinates) | GeoTIFF (.tif ), NetCDF (.nc ), HDF (.hdf ) |
rioxarray DataArray |
# Download the Cameron Peak fire boundary
See our solution!
= (
url "https://services3.arcgis.com/T4QMspbfLg3qTGWY/arcgis/rest/services"
"/WFIGS_Interagency_Perimeters/FeatureServer/0/query"
"?where=poly_IncidentName%20%3D%20'CAMERON%20PEAK'"
"&outFields=*&outSR=4326&f=json")
= gpd.read_file(url)
gdf gdf
OBJECTID | poly_SourceOID | poly_IncidentName | poly_FeatureCategory | poly_MapMethod | poly_GISAcres | poly_CreateDate | poly_DateCurrent | poly_PolygonDateTime | poly_IRWINID | ... | attr_Source | attr_IsCpxChild | attr_CpxName | attr_CpxID | attr_SourceGlobalID | GlobalID | Shape__Area | Shape__Length | attr_IncidentComplexityLevel | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 14659 | 10393 | Cameron Peak | Wildfire Final Fire Perimeter | Infrared Image | 208913.3 | NaT | 2023-03-14 15:13:42.810000+00:00 | 2020-10-06 21:30:00+00:00 | {53741A13-D269-4CD5-AF91-02E094B944DA} | ... | FODR | None | None | None | {5F5EC9BA-5F85-495B-8B97-1E0969A1434E} | e8e4ffd4-db84-4d47-bd32-d4b0a8381fff | 0.089957 | 5.541765 | None | MULTIPOLYGON (((-105.88333 40.54598, -105.8837... |
1 rows × 115 columns
= _
ans_gdf = 0
gdf_pts
if isinstance(ans_gdf, gpd.GeoDataFrame):
print('\u2705 Great work! You downloaded and opened a GeoDataFrame')
+=2
gdf_pts else:
print('\u274C Hmm, your answer is not a GeoDataFrame')
print('\u27A1 You earned {} of 2 points for downloading data'.format(gdf_pts))
✅ Great work! You downloaded and opened a GeoDataFrame
➡ You earned 2 of 2 points for downloading data
Site Map
We always want to create a site map when working with geospatial data. This helps us see that we’re looking at the right location, and learn something about the context of the analysis.
# Plot the Cameron Peak Fire boundary
gdf.hvplot(=True,
geo='Cameron Peak Fire, 2020',
title='EsriImagery') tiles
Exploring the AppEEARS API for NASA Earthdata access
Before you get started with the data download today, you will need a free NASA Earthdata account if you don’t have one already!
Over the next four cells, you will download MODIS NDVI data for the study period. MODIS is a multispectral instrument that measures Red and NIR data (and so can be used for NDVI). There are two MODIS sensors on two different platforms: satellites Terra and Aqua.
Since we’re asking for a special download that only covers our study area, we can’t just find a link to the data - we have to negotiate with the data server. We’re doing this using the APPEEARS API (Application Programming Interface). The API makes it possible for you to request data using code. You can use code from the earthpy
library to handle the API request.
It can take some time for Appeears to process your request - anything from a few minutes to a few hours depending on how busy they are. You can check your progress by:
- Going to the Appeears webpage
- Clicking the
Explore
tab - Logging in with your Earthdata account
# Initialize AppeearsDownloader for MODIS NDVI data
= eaapp.AppeearsDownloader(
ndvi_downloader ='cp-ndvi',
download_key=data_dir,
ea_dir='MOD13Q1.061',
product='_250m_16_days_NDVI',
layer="01-01",
start_date="01-31",
end_date=True,
recurring=[2021, 2021],
year_range=gdf
polygon
)# Download the prepared download -- this can take some time!
=True) ndvi_downloader.download_files(cache
# Initialize AppeearsDownloader for MODIS NDVI data
= eaapp.AppeearsDownloader(
ndvi_downloader ='cp-ndvi',
download_key=data_dir,
ea_dir='MOD13Q1.061',
product='_250m_16_days_NDVI',
layer="07-01",
start_date="07-31",
end_date=True,
recurring=[2018, 2023],
year_range=gdf
polygon
)=True) ndvi_downloader.download_files(cache
** Message: 18:00:50.035: Remote error from secret service: org.freedesktop.DBus.Error.ServiceUnknown: The name org.freedesktop.secrets was not provided by any .service files
Putting it together: Working with multi-file raster datasets in Python
Now you need to load all the downloaded files into Python. Let’s start by getting all the file names. You will also need to extract the date from the filename. Check out the lesson on getting information from filenames in the textbook.
# Get a list of NDVI tif file paths
See our solution!
# Get list of NDVI tif file paths
= sorted(glob(os.path.join(data_dir, 'cp-ndvi', '*', '*NDVI*.tif')))
ndvi_paths len(ndvi_paths)
18
Repeating tasks in Python
Now you should have a few dozen files! For each file, you need to:
- Load the file in using the
rioxarray
library - Get the date from the file name
- Add the date as a dimension coordinate
- Give your data variable a name
- Divide by the scale factor of 10000
You don’t want to write out the code for each file! That’s a recipe for copy pasta. Luckily, Python has tools for doing similar tasks repeatedly. In this case, you’ll use one called a for
loop.
There’s some code below that uses a for
loop in what is called an accumulation pattern to process each file. That means that you will save the results of your processing to a list each time you process the files, and then merge all the arrays in the list.
= 1
scale_factor = -1
doy_start = -1 doy_end
See our solution!
= 10000
scale_factor = -19
doy_start = -12 doy_end
= []
ndvi_das for ndvi_path in ndvi_paths:
# Get date from file name
= ndvi_path[doy_start:doy_end]
doy = pd.to_datetime(doy, format='%Y%j')
date
# Open dataset
= rxr.open_rasterio(ndvi_path, masked=True).squeeze()
da
# Add date dimension and clean up metadata
= da.assign_coords({'date': date})
da = da.expand_dims({'date': 1})
da = 'NDVI'
da.name
# Multiple by scale factor
= da / scale_factor
da
# Prepare for concatenation
ndvi_das.append(da)
len(ndvi_das)
18
Next, stack your arrays by date into a time series using the xr.combine_by_coords()
function. You will have to tell it which dimension you want to stack your data in.
See our solution!
= xr.combine_by_coords(ndvi_das, coords=['date'])
ndvi_da ndvi_da
<xarray.Dataset> Size: 4MB Dimensions: (date: 18, y: 153, x: 339) Coordinates: band int64 8B 1 * x (x) float64 3kB -105.9 -105.9 -105.9 ... -105.2 -105.2 -105.2 * y (y) float64 1kB 40.78 40.78 40.77 40.77 ... 40.47 40.46 40.46 spatial_ref int64 8B 0 * date (date) datetime64[ns] 144B 2018-06-26 2018-07-12 ... 2023-07-28 Data variables: NDVI (date, y, x) float32 4MB 0.5672 0.5711 0.6027 ... 0.6327 0.6327
# Compute the difference in NDVI before and after the fire
# Plot the difference
(='', y='', cmap='', geo=True)
ndvi_diff.hvplot(x*
=True, fill_color=None, line_color='black')
gdf.hvplot(geo )
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[17], line 5 1 # Compute the difference in NDVI before and after the fire 2 3 # Plot the difference 4 ( ----> 5 ndvi_diff.hvplot(x='', y='', cmap='', geo=True) 6 * 7 gdf.hvplot(geo=True, fill_color=None, line_color='black') 8 ) NameError: name 'ndvi_diff' is not defined
See our solution!
= (
ndvi_diff
ndvi_da=slice('2021', '2023'))
.sel(date'date')
.mean(
.NDVI - ndvi_da
=slice('2018', '2020'))
.sel(date'date')
.mean(
.NDVI
)
(='x', y='y', cmap='PiYG', geo=True)
ndvi_diff.hvplot(x*
=True, fill_color=None, line_color='black')
gdf.hvplot(geo )
Is the NDVI lower within the fire boundary after the fire?
You will compute the mean NDVI inside and outside the fire boundary. First, use the code below to get a GeoDataFrame
of the area outside the Reservation. Your task: * Check the variable names - Make sure that the code uses your boundary GeoDataFrame
* How could you test if the geometry was modified correctly? Add some code to take a look at the results.
= (
out_gdf =gdf.envelope)
gpd.GeoDataFrame(geometry='difference')) .overlay(gdf, how
Next, clip your DataArray to the boundaries for both inside and outside the reservation. You will need to replace the GeoDataFrame
name with your own. Check out the lesson on clipping data with the rioxarray
library in the textbook.
# Clip data to both inside and outside the boundary
See our solution!
= ndvi_da.rio.clip(gdf.geometry, from_disk=True)
ndvi_cp_da = ndvi_da.rio.clip(out_gdf.geometry, from_disk=True) ndvi_out_da
Finally, plot annual July means for both inside and outside the Reservation on the same plot.
:::
# Compute mean annual July NDVI
See our solution!
# Compute mean annual July NDVI
= (
jul_ndvi_cp_df
ndvi_cp_da
.groupby(ndvi_cp_da.date.dt.year)
.mean(...)
.NDVI.to_dataframe())= (
jul_ndvi_out_df
ndvi_out_da
.groupby(ndvi_out_da.date.dt.year)
.mean(...)
.NDVI.to_dataframe())
# Plot inside and outside the reservation
= (
jul_ndvi_df 'NDVI']]
jul_ndvi_cp_df[[
.join('NDVI']],
jul_ndvi_out_df[[=' Burned Area', rsuffix=' Unburned Area')
lsuffix
)
jul_ndvi_df.hvplot(='NDVI before and after the Cameron Peak Fire'
title )
Now, take the difference between outside and inside the Reservation and plot that. What do you observe? Don’t forget to write a headline and description of your plot!
# Plot difference inside and outside the reservation
See our solution!
# Plot difference inside and outside the reservation
'difference'] = (
jul_ndvi_df['NDVI Burned Area']
jul_ndvi_df[- jul_ndvi_df['NDVI Unburned Area'])
jul_ndvi_df.difference.hvplot(='Difference between NDVI within and outside the Cameron Peak Fire'
title )
Your turn! Repeat this workflow in a different time and place.
It’s not just fires that affect NDVI! You could look at:
- Recovery after a national disaster, like a wildfire or hurricane
- Drought
- Deforestation
- Irrigation
- Beaver reintroduction