Mapping Tasiyagnunpa (Western Meadowlark) migration

Introduction to vector data operations

Find out which U.S. states have the most observations of Western Meadowlark.

Learning Goals:
  • Combine different types of vector data with spatial joins
  • Create a chloropleth plot

Tasiyagnunpa (or Western Meadowlark, or sturnella neglecta) migrates each year to next on the Great Plains in the United States. Using crowd-sourced observations of these birds, we can see that migration happening throughout the year.

Read more about the Lakota connection to Tasiyagnunpa from Native Sun News Today

Set up your reproducible workflow

Import Python libraries

We will be getting data from a source called GBIF (Global Biodiversity Information Facility). We need a package called pygbif to access the data, which is not included in your environment. Install it by running the cell below:

%%bash
pip install pygbif
Defaulting to user installation because normal site-packages is not writeable
Collecting pygbif
  Downloading pygbif-0.6.4-py3-none-any.whl (64 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.3/64.3 KB 4.2 MB/s eta 0:00:00
Collecting matplotlib
  Downloading matplotlib-3.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.3/8.3 MB 74.3 MB/s eta 0:00:00
Collecting geojson-rewind
  Downloading geojson_rewind-1.1.0-py3-none-any.whl (5.2 kB)
Collecting geomet
  Downloading geomet-1.1.0-py3-none-any.whl (31 kB)
Requirement already satisfied: requests>2.7 in /usr/lib/python3/dist-packages (from pygbif) (2.25.1)
Collecting appdirs>=1.4.3
  Downloading appdirs-1.4.4-py2.py3-none-any.whl (9.6 kB)
Collecting requests-cache
  Downloading requests_cache-1.2.0-py3-none-any.whl (61 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.4/61.4 KB 20.6 MB/s eta 0:00:00
Requirement already satisfied: click in /usr/lib/python3/dist-packages (from geomet->pygbif) (8.0.3)
Collecting kiwisolver>=1.3.1
  Downloading kiwisolver-1.4.5-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 102.5 MB/s eta 0:00:00
Collecting pillow>=8
  Downloading pillow-10.3.0-cp310-cp310-manylinux_2_28_x86_64.whl (4.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.5/4.5 MB 116.1 MB/s eta 0:00:00
Collecting numpy>=1.23
  Downloading numpy-1.26.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.2/18.2 MB 92.1 MB/s eta 0:00:00
Requirement already satisfied: pyparsing>=2.3.1 in /usr/lib/python3/dist-packages (from matplotlib->pygbif) (2.4.7)
Collecting cycler>=0.10
  Downloading cycler-0.12.1-py3-none-any.whl (8.3 kB)
Collecting python-dateutil>=2.7
  Downloading python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 229.9/229.9 KB 57.5 MB/s eta 0:00:00
Collecting fonttools>=4.22.0
  Downloading fonttools-4.53.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.6/4.6 MB 124.8 MB/s eta 0:00:00
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->pygbif) (24.0)
Collecting contourpy>=1.0.1
  Downloading contourpy-1.2.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (305 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 305.2/305.2 KB 73.7 MB/s eta 0:00:00
Requirement already satisfied: urllib3>=1.25.5 in /usr/lib/python3/dist-packages (from requests-cache->pygbif) (1.26.5)
Requirement already satisfied: platformdirs>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests-cache->pygbif) (4.2.2)
Requirement already satisfied: attrs>=21.2 in /usr/lib/python3/dist-packages (from requests-cache->pygbif) (21.2.0)
Collecting cattrs>=22.2
  Downloading cattrs-23.2.3-py3-none-any.whl (57 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 57.5/57.5 KB 20.5 MB/s eta 0:00:00
Collecting url-normalize>=1.4
  Downloading url_normalize-1.4.3-py2.py3-none-any.whl (6.8 kB)
Collecting attrs>=21.2
  Downloading attrs-23.2.0-py3-none-any.whl (60 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 60.8/60.8 KB 23.0 MB/s eta 0:00:00
Collecting exceptiongroup>=1.1.1
  Downloading exceptiongroup-1.2.1-py3-none-any.whl (16 kB)
Collecting typing-extensions!=4.6.3,>=4.1.0
  Downloading typing_extensions-4.12.1-py3-none-any.whl (37 kB)
Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.7->matplotlib->pygbif) (1.16.0)
Installing collected packages: appdirs, url-normalize, typing-extensions, python-dateutil, pillow, numpy, kiwisolver, geomet, geojson-rewind, fonttools, exceptiongroup, cycler, attrs, contourpy, cattrs, requests-cache, matplotlib, pygbif
Successfully installed appdirs-1.4.4 attrs-23.2.0 cattrs-23.2.3 contourpy-1.2.1 cycler-0.12.1 exceptiongroup-1.2.1 fonttools-4.53.0 geojson-rewind-1.1.0 geomet-1.1.0 kiwisolver-1.4.5 matplotlib-3.9.0 numpy-1.26.4 pillow-10.3.0 pygbif-0.6.4 python-dateutil-2.9.0.post0 requests-cache-1.2.0 typing-extensions-4.12.1 url-normalize-1.4.3
Your Task: Import packages

Add imports for packages that will help you:

  1. Work with tabular data
  2. Work with geospatial vector data
  3. Make an interactive plot of tabular and/or vector data
import calendar
import os
import pathlib
import requests
import time
import zipfile
from getpass import getpass
from glob import glob

import cartopy.crs as ccrs
import panel as pn
import pygbif.occurrences as occ
import pygbif.species as species
ERROR 1: PROJ: proj_create_from_database: Open of /usr/share/miniconda/envs/learning-portal/share/proj failed
See our solution!
import calendar
import os
import pathlib
import requests
import time
import zipfile
from getpass import getpass
from glob import glob

import cartopy.crs as ccrs
import geopandas as gpd
import hvplot.pandas
import pandas as pd
import panel as pn
import pygbif.occurrences as occ
import pygbif.species as species

Create a folder for your data

For this challenge, you will need to save some data to your computer. We suggest saving to somewhere in your home folder (e.g. /home/username), rather than to your GitHub repository, since data files can easily become too large for GitHub.

Warning

The home directory is different for every user! Your home directory probably won’t exist on someone else’s computer. Make sure to use code like pathlib.Path.home() to compute the home directory on the computer the code is running on. This is key to writing reproducible and interoperable code.

Your Task: Create a project folder

The code below will help you get started with making a project directory

  1. Replace 'your-project-directory-name-here' and 'your-gbif-data-directory-name-here' with descriptive names
  2. Run the cell
  3. (OPTIONAL) Check in the terminal that you created the directory using the command ls ~/earth-analytics/data
# Create data directory in the home folder
data_dir = os.path.join(
    # Home directory
    pathlib.Path.home(),
    # Earth analytics data directory
    'earth-analytics',
    'data',
    # Project directory
    'your-project-directory-name-here',
)
os.makedirs(data_dir, exist_ok=True)

# Define the directory name for GBIF data
gbif_dir = os.path.join(data_dir, 'your-gbif-data-directory-name-here')
See our solution!
# Create data directory in the home folder
data_dir = os.path.join(
    pathlib.Path.home(),
    'earth-analytics',
    'data',
    'species-distribution',
)
os.makedirs(data_dir, exist_ok=True)

# Define the directory name for GBIF data
gbif_dir = os.path.join(data_dir, 'meadowlark_observations')

Define your study area – the ecoregions of North America

Track observations of Taciyagnunpa across the different ecoregions of North America! You should be able to see changes in the number of observations in each ecoregion throughout the year.

Download and save ecoregion boundaries

Your Task
  1. Find the URL for for the level III ecoregion boundaries. You can get ecoregion boundaries from the Environmental Protection Agency (EPA)..
  2. Replace your/url/here with the URL you found, making sure to format it so it is easily readable.
  3. Change all the variable names to descriptive variable names
  4. Run the cell to download and save the data.
# Set up the ecoregions level III boundary URL
a_url = ("your/url/here")
# Set up a path to save the dataon your machine
a_path = os.path.join(data_dir, 'filename.zip')

# Don't download twice
if not os.path.exists(a_path):
    # Download, and don't check the certificate for the EPA
    a_response = requests.get(a_url, verify=False)
    # Save the binary data to a file
    with open(a_path, 'wb') as a_file:
        a_file.write(a_response.content)
MissingSchema: Invalid URL 'your/url/here': No scheme supplied. Perhaps you meant https://your/url/here?
See our solution!
# Set up the ecoregions level III boundary URL
ecoregions_url = (
    "https://gaftp.epa.gov/EPADataCommons/ORD/Ecoregions/cec_na"
    "/NA_CEC_Eco_Level3.zip")
# Set up a path to save the dataon your machine
ecoregions_path = os.path.join(data_dir, 'NA_CEC_Eco_Level3.zip')

# Don't download twice
if not os.path.exists(ecoregions_path):
    # Download
    ecoregions_response = requests.get(ecoregions_url, verify=False)
    # Save the data to your file
    with open(ecoregions_path, 'wb') as ecoregions_file:
        ecoregions_file.write(ecoregions_response.content)
/usr/share/miniconda/envs/learning-portal/lib/python3.10/site-packages/urllib3/connectionpool.py:1103: InsecureRequestWarning: Unverified HTTPS request is being made to host 'gaftp.epa.gov'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(

Load the ecoregions into Python

Your task

Download and save ecoregion boundaries from the EPA:

  1. Replace a_path with the path your created for your ecoregions file.
  2. (optional) Consider renaming and selecting columns to make your GeoDataFrame easier to work with.
  3. Make a quick plot with .plot() to make sure the download worked.
  4. Run the cell to load the data into Python
# Open up the ecoregions boundaries
gdf = gpd.read_file(a_path)

# Name the index so it will match the other data later on
gdf.index.name = 'ecoregion'

# Plot the ecoregions to check download
ERROR:`/vsizip//home/runner/earth-analytics/data/species-distribution/filename.zip' does not exist in the file system, and is not recognized as a supported dataset name.
DriverError: '/vsizip//home/runner/earth-analytics/data/species-distribution/filename.zip' does not exist in the file system, and is not recognized as a supported dataset name.
See our solution!
# Open up the ecoregions boundaries
ecoregions_gdf = (
    gpd.read_file(ecoregions_path)
    .rename(columns={
        'NA_L3NAME': 'name',
        'Shape_Area': 'area'})
    [['name', 'area', 'geometry']]
)

# We'll name the index so it will match the other data
ecoregions_gdf.index.name = 'ecoregion'

# Plot the ecoregions to check download
ecoregions_gdf.plot(edgecolor='black', color='skyblue')

Create a simplified GeoDataFrame for plotting

Plotting larger files can be time consuming. The code below will streamline plotting with hvplot by simplifying the geometry, projecting it to a Mercator projection that is compatible with geoviews, and cropping off areas in the Arctic.

Your task

Download and save ecoregion boundaries from the EPA:

  1. Make a copy of your ecoregions GeoDataFrame with the .copy() method, and save it to another variable name. Make sure to do everything else in this cell with your new copy!
  2. Simplify the ecoregions with .simplify(1000), and save it back to the geometry column.
  3. Change the Coordinate Reference System (CRS) to Mercator with .to_crs(ccrs.Mercator())
  4. Use the plotting code in the cell to check that the plotting runs quickly and looks the way you want, making sure to change gdf to YOUR GeoDataFrame name.
# Make a copy of the ecoregions

# Simplify the geometry to speed up processing

# Change the CRS to Mercator for mapping

# Check that the plot runs
gdf.hvplot(geo=True, crs=ccrs.Mercator())
NameError: name 'gdf' is not defined
See our solution!
# Make a copy of the ecoregions
ecoregions_plot_gdf = ecoregions_gdf.copy()

# Simplify the geometry to speed up processing
ecoregions_plot_gdf.geometry = ecoregions_plot_gdf.simplify(1000)

# Change the CRS to Mercator for mapping
ecoregions_plot_gdf = ecoregions_plot_gdf.to_crs(ccrs.Mercator())

# Check that the plot runs
ecoregions_plot_gdf.hvplot(geo=True, crs=ccrs.Mercator())

Access locations and times of Tasiyagnunpa encounters

For this challenge, you will use a database called the Global Biodiversity Information Facility (GBIF). GBIF is compiled from species observation data all over the world, and includes everything from museum specimens to photos taken by citizen scientists in their backyards.

Your task: Explore GBIF

Before your get started, go to the GBIF occurrences search page and explore the data.

Contribute to open data

You can get your own observations added to GBIF using iNaturalist!

Register and log in to GBIF

You will need a GBIF account to complete this challenge. You can use your GitHub account to authenticate with GBIF. Then, run the following code to save your credentials on your computer.

Tip

If you accidentally enter your credentials wrong, you can set reset_credentials=True instead of reset_credentials=False

reset_credentials = False
# GBIF needs a username, password, and email
credentials = dict(
    GBIF_USER=(input, 'GBIF username:'),
    GBIF_PWD=(getpass, 'GBIF password'),
    GBIF_EMAIL=(input, 'GBIF email'),
)
for env_variable, (prompt_func, prompt_text) in credentials.items():
    # Delete credential from environment if requested
    if reset_credentials and (env_variable in os.environ):
        os.environ.pop(env_variable)
    # Ask for credential and save to environment
    if not env_variable in os.environ:
        os.environ[env_variable] = prompt_func(prompt_text)

Get the species key

Your task
  1. Replace the species_name with the name of the species you want to look up
  2. Run the code to get the species key
# Query species
species_info = species.name_lookup(species_name, rank='SPECIES')

# Get the first result
first_result = species_info['results'][0]

# Get the species key (nubKey)
species_key = first_result['nubKey']

# Check the result
first_result['species'], species_key
NameError: name 'species_name' is not defined
See our solution!
# Query species
species_info = species.name_lookup('sturnella neglecta', rank='SPECIES')

# Get the first result
first_result = species_info['results'][0]

# Get the species key (nubKey)
species_key = first_result['nubKey']

# Check the result
first_result['species'], species_key
('Sturnella neglecta', 9596413)

Download data from GBIF

Your task
  1. Replace csv_file_pattern with a string that will match any .csv file when used in the glob function. HINT: the character * represents any number of any values except the file separator (e.g. /)

  2. Add parameters to the GBIF download function, occ.download() to limit your query to:

    • Sturnella Neglecta observations
    • in north america (NORTH_AMERICA)
    • from 2023
    • with spatial coordinates.
  3. Then, run the download. This can take a few minutes.

# Only download once
gbif_pattern = os.path.join(gbif_dir, csv_file_pattern)
if not glob(gbif_pattern):
    # Submit query to GBIF
    gbif_query = occ.download([
        "continent = ",
        "speciesKey = ",
        "year = ",
        "hasCoordinate = ",
    ])
    if not 'GBIF_DOWNLOAD_KEY' in os.environ:
        os.environ['GBIF_DOWNLOAD_KEY'] = gbif_query[0]

        # Wait for the download to build
        wait = occ.download_meta(download_key)['status']
        while not wait=='SUCCEEDED':
            wait = occ.download_meta(download_key)['status']
            time.sleep(5)

    # Download GBIF data
    download_info = occ.download_get(
        os.environ['GBIF_DOWNLOAD_KEY'], 
        path=data_dir)

    # Unzip GBIF data
    with zipfile.ZipFile(download_info['path']) as download_zip:
        download_zip.extractall(path=gbif_dir)

# Find the extracted .csv file path
gbif_path = glob(gbif_pattern)[0]
NameError: name 'csv_file_pattern' is not defined
See our solution!
# Only download once
gbif_pattern = os.path.join(gbif_dir, '*.csv')
if not glob(gbif_pattern):
    # Submit query to GBIF
    gbif_query = occ.download([
        "continent = NORTH_AMERICA",
        "speciesKey = 9596413",
        "hasCoordinate = TRUE",
        "year = 2023",
    ])
    download_key = gbif_query[0]

    # Wait for the download to build
    if not 'GBIF_DOWNLOAD_KEY' in os.environ:
        os.environ['GBIF_DOWNLOAD_KEY'] = gbif_query[0]

        # Wait for the download to build
        wait = occ.download_meta(download_key)['status']
        while not wait=='SUCCEEDED':
            wait = occ.download_meta(download_key)['status']
            time.sleep(5)

    # Download GBIF data
    download_info = occ.download_get(
        os.environ['GBIF_DOWNLOAD_KEY'], 
        path=data_dir)

    # Unzip GBIF data
    with zipfile.ZipFile(download_info['path']) as download_zip:
        download_zip.extractall(path=gbif_dir)

# Find the extracted .csv file path (take the first result)
gbif_path = glob(gbif_pattern)[0]
INFO:Your download key is 0058859-240506114902167
INFO:Download file size: 26559580 bytes
INFO:On disk at /home/runner/earth-analytics/data/species-distribution/0022779-240506114902167.zip

Load the GBIF data into Python

Your task
  1. Look at the beginning of the file you downloaded using the code below. What do you think the delimiter is?
  2. Run the following code cell. What happens?
  3. Uncomment and modify the parameters of pd.read_csv() below until your data loads successfully and you have only the columns you want.

You can use the following code to look at the beginning of your file:

!head $gbif_path
gbifID  datasetKey  occurrenceID    kingdom phylum  class   order   family  genus   species infraspecificEpithet    taxonRank   scientificName  verbatimScientificName  verbatimScientificNameAuthorship    countryCode locality    stateProvince   occurrenceStatus    individualCount publishingOrgKey    decimalLatitude decimalLongitude    coordinateUncertaintyInMeters   coordinatePrecision elevation   elevationAccuracy   depth   depthAccuracy   eventDate   day month   year    taxonKey    speciesKey  basisOfRecord   institutionCode collectionCode  catalogNumber   recordNumber    identifiedBy    dateIdentified  license rightsHolder    recordedBy  typeStatus  establishmentMeans  lastInterpreted mediaType   issue
4617234711  4fa7b334-ce0d-4e88-aaae-2e0c138d049e    URN:catalog:CLO:EBIRD:OBS1665858440 Animalia    Chordata    Aves    Passeriformes   Icteridae   Sturnella   Sturnella neglecta      SPECIES Sturnella neglecta Audubon, 1844    Sturnella neglecta      US  61575–61597 OR-205, Burns US-OR (43.3530,-118.9791) Oregon  PRESENT 1   e2e717bf-551a-4917-bdc9-4fa0f342c530    43.35301    -118.97914                          2023-03-24  24  3   2023    9596413 9596413 HUMAN_OBSERVATION   CLO EBIRD   OBS1665858440               CC_BY_4_0       obsr945588          2024-04-17T09:01:34.460Z        CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_CONCEPT_ID_IGNORED
4835897733  4fa7b334-ce0d-4e88-aaae-2e0c138d049e    URN:catalog:CLO:EBIRD:OBS1734625931 Animalia    Chordata    Aves    Passeriformes   Icteridae   Sturnella   Sturnella neglecta      SPECIES Sturnella neglecta Audubon, 1844    Sturnella neglecta      US  Boise National Forest, Mountain Home US-ID 43.54993, -115.71172 Idaho   PRESENT 3   e2e717bf-551a-4917-bdc9-4fa0f342c530    43.54993    -115.71172                          2023-05-20  20  5   2023    9596413 9596413 HUMAN_OBSERVATION   CLO EBIRD   OBS1734625931               CC_BY_4_0       obsr924923          2024-04-17T09:02:16.937Z        CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_CONCEPT_ID_IGNORED
4678582942  4fa7b334-ce0d-4e88-aaae-2e0c138d049e    URN:catalog:CLO:EBIRD:OBS1686474056 Animalia    Chordata    Aves    Passeriformes   Icteridae   Sturnella   Sturnella neglecta      SPECIES Sturnella neglecta Audubon, 1844    Sturnella neglecta      US  Utah Lake Parkway Trail (North Shore)   Utah    PRESENT 1   e2e717bf-551a-4917-bdc9-4fa0f342c530    40.362244   -111.88321                          2023-04-15  15  4   2023    9596413 9596413 HUMAN_OBSERVATION   CLO EBIRD   OBS1686474056               CC_BY_4_0       obsr1094054         2024-04-17T08:58:20.137Z        CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_CONCEPT_ID_IGNORED
4650705779  4fa7b334-ce0d-4e88-aaae-2e0c138d049e    URN:catalog:CLO:EBIRD:OBS1843036378 Animalia    Chordata    Aves    Passeriformes   Icteridae   Sturnella   Sturnella neglecta      SPECIES Sturnella neglecta Audubon, 1844    Sturnella neglecta      US  Pt. Reyes--Abbotts Lagoon   California  PRESENT 1   e2e717bf-551a-4917-bdc9-4fa0f342c530    38.11906    -122.95225                          2023-10-04  4   10  2023    9596413 9596413 HUMAN_OBSERVATION   CLO EBIRD   OBS1843036378               CC_BY_4_0       obsr760562          2024-04-17T09:00:02.520Z        CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_CONCEPT_ID_IGNORED
4687482912  4fa7b334-ce0d-4e88-aaae-2e0c138d049e    URN:catalog:CLO:EBIRD:OBS1846220622 Animalia    Chordata    Aves    Passeriformes   Icteridae   Sturnella   Sturnella neglecta      SPECIES Sturnella neglecta Audubon, 1844    Sturnella neglecta      US  Brush Hollow Reservoir  Colorado    PRESENT 2   e2e717bf-551a-4917-bdc9-4fa0f342c530    38.46376    -105.05151                          2023-10-09  9   10  2023    9596413 9596413 HUMAN_OBSERVATION   CLO EBIRD   OBS1846220622               CC_BY_4_0       obsr561415          2024-04-17T09:00:04.581Z        CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_CONCEPT_ID_IGNORED
4745821830  4fa7b334-ce0d-4e88-aaae-2e0c138d049e    URN:catalog:CLO:EBIRD:OBS1659222377 Animalia    Chordata    Aves    Passeriformes   Icteridae   Sturnella   Sturnella neglecta      SPECIES Sturnella neglecta Audubon, 1844    Sturnella neglecta      US  Pella Crossing Open Space   Colorado    PRESENT 1   e2e717bf-551a-4917-bdc9-4fa0f342c530    40.18386    -105.178154                         2023-03-18  18  3   2023    9596413 9596413 HUMAN_OBSERVATION   CLO EBIRD   OBS1659222377               CC_BY_4_0       obsr687420          2024-04-17T08:59:50.781Z        CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_CONCEPT_ID_IGNORED
4619777735  4fa7b334-ce0d-4e88-aaae-2e0c138d049e    URN:catalog:CLO:EBIRD:OBS1744938929 Animalia    Chordata    Aves    Passeriformes   Icteridae   Sturnella   Sturnella neglecta      SPECIES Sturnella neglecta Audubon, 1844    Sturnella neglecta      CA  Last Mountain Bird Observatory  Saskatchewan    PRESENT 1   e2e717bf-551a-4917-bdc9-4fa0f342c530    51.350838   -105.2171                           2023-05-28  28  5   2023    9596413 9596413 HUMAN_OBSERVATION   CLO EBIRD   OBS1744938929               CC_BY_4_0       obsr899688          2024-04-17T09:00:45.961Z        CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_CONCEPT_ID_IGNORED
4684030902  4fa7b334-ce0d-4e88-aaae-2e0c138d049e    URN:catalog:CLO:EBIRD:OBS1636140876 Animalia    Chordata    Aves    Passeriformes   Icteridae   Sturnella   Sturnella neglecta      SPECIES Sturnella neglecta Audubon, 1844    Sturnella neglecta      US  Pacific Commons Linear Park California  PRESENT 11  e2e717bf-551a-4917-bdc9-4fa0f342c530    37.49512    -121.9796                           2023-02-19  19  2   2023    9596413 9596413 HUMAN_OBSERVATION   CLO EBIRD   OBS1636140876               CC_BY_4_0       obsr533369          2024-04-17T08:59:00.937Z        CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_CONCEPT_ID_IGNORED
4840691755  4fa7b334-ce0d-4e88-aaae-2e0c138d049e    URN:catalog:CLO:EBIRD:OBS1752431883 Animalia    Chordata    Aves    Passeriformes   Icteridae   Sturnella   Sturnella neglecta      SPECIES Sturnella neglecta Audubon, 1844    Sturnella neglecta      US  Breeding Bird Survey (BBS)--Danzig stop 20  North Dakota    PRESENT 1   e2e717bf-551a-4917-bdc9-4fa0f342c530    46.186672   -99.347374                          2023-06-06  6   6   2023    9596413 9596413 HUMAN_OBSERVATION   CLO EBIRD   OBS1752431883               CC_BY_4_0       obsr272867          2024-04-17T09:00:13.657Z        CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_CONCEPT_ID_IGNORED
# Load the GBIF data
gbif_df = pd.read_csv(
    gbif_path, 
    #delimiter='',
    #index_col='',
    #usecols=[]
)
gbif_df.head()
ParserError: Error tokenizing data. C error: Expected 4 fields in line 21, saw 6
See our solution!
# Load the GBIF data
gbif_df = pd.read_csv(
    gbif_path, 
    delimiter='\t',
    index_col='gbifID',
    usecols=['gbifID', 'decimalLatitude', 'decimalLongitude', 'month'])
gbif_df.head()
decimalLatitude decimalLongitude month
gbifID
4617234711 43.353010 -118.97914 3
4835897733 43.549930 -115.71172 5
4678582942 40.362244 -111.88321 4
4650705779 38.119060 -122.95225 10
4687482912 38.463760 -105.05151 10

Convert the GBIF data to a GeoDataFrame

To plot the GBIF data, we need to convert it to a GeoDataFrame first.

Your task
  1. Replace your_dataframe with the name of the DataFrame you just got from GBIF
  2. Replace longitude_column_name and latitude_column_name with column names from your `DataFrame
  3. Run the code to get a GeoDataFrame of the GBIF data.
gbif_gdf = (
    gpd.GeoDataFrame(
        your_dataframe, 
        geometry=gpd.points_from_xy(
            your_dataframe.longitude_column_name, 
            your_dataframe.latitude_column_name), 
        crs="EPSG:4326")
    # Select the desired columns
    [[]]
)
gbif_gdf
NameError: name 'your_dataframe' is not defined
See our solution!
gbif_gdf = (
    gpd.GeoDataFrame(
        gbif_df, 
        geometry=gpd.points_from_xy(
            gbif_df.decimalLongitude, 
            gbif_df.decimalLatitude), 
        crs="EPSG:4326")
    # Select the desired columns
    [['month', 'geometry']]
)
gbif_gdf
month geometry
gbifID
4617234711 3 POINT (-118.97914 43.35301)
4835897733 5 POINT (-115.71172 43.54993)
4678582942 4 POINT (-111.88321 40.36224)
4650705779 10 POINT (-122.95225 38.11906)
4687482912 10 POINT (-105.05151 38.46376)
... ... ...
4666761701 4 POINT (-102.40224 35.04218)
4823503698 5 POINT (-104.46048 38.57695)
4760808865 6 POINT (-96.43833 47.08567)
4690775913 6 POINT (-113.22199 53.19315)
4762651338 6 POINT (-105.06943 39.42953)

249042 rows × 2 columns

Count the number of observations in each ecosystem, during each month of 2023

Identify the ecoregion for each observation

You can combine the ecoregions and the observations spatially using a method called .sjoin(), which stands for spatial join.

Further reading

Check out the geopandas documentation on spatial joins to help you figure this one out. You can also ask your favorite LLM (Large-Language Model, like ChatGPT)

Your task
  1. Identify the correct values for the how= and predicate= parameters of the spatial join.
  2. Select only the columns you will need for your plot.
  3. Run the code.
gbif_ecoregion_gdf = (
    ecoregions_gdf
    # Match the CRS of the GBIF data and the ecoregions
    .to_crs(gbif_gdf.crs)
    # Find ecoregion for each observation
    .sjoin(
        gbif_gdf,
        how='', 
        predicate='')
    # Select the required columns
    
)
gbif_ecoregion_gdf
ValueError: `how` was "" but is expected to be in ['left', 'right', 'inner']
See our solution!
gbif_ecoregion_gdf = (
    ecoregions_gdf
    # Match the CRS of the GBIF data and the ecoregions
    .to_crs(gbif_gdf.crs)
    # Find ecoregion for each observation
    .sjoin(
        gbif_gdf,
        how='inner', 
        predicate='contains')
    # Select the required columns
    [['month', 'name']]
)
gbif_ecoregion_gdf
month name
ecoregion
57 6 Thompson-Okanogan Plateau
57 9 Thompson-Okanogan Plateau
57 6 Thompson-Okanogan Plateau
57 6 Thompson-Okanogan Plateau
57 8 Thompson-Okanogan Plateau
... ... ...
2545 6 Eastern Cascades Slopes and Foothills
2545 6 Eastern Cascades Slopes and Foothills
2545 5 Eastern Cascades Slopes and Foothills
2545 5 Eastern Cascades Slopes and Foothills
2545 4 Eastern Cascades Slopes and Foothills

248059 rows × 2 columns

Count the observations in each ecoregion each month

Your task:
  1. Replace columns_to_group_by with a list of columns. Keep in mind that you will end up with one row for each group – you want to count the observations in each ecoregion by month.
  2. Select only month/ecosystem combinations that have more than one occurrence recorded, since a single occurrence could be an error.
  3. Use the .groupby() and .mean() methods to compute the mean occurrences by ecoregion and by month.
  4. Run the code – it will normalize the number of occurrences by month and ecoretion.
occurrence_df = (
    gbif_ecoregion_gdf
    # For each ecoregion, for each month...
    .groupby(columns_to_group_by)
    # ...count the number of occurrences
    .agg(occurrences=('name', 'count'))
)

# Get rid of rare observations (possible misidentification?)
occurrence_df = occurrence_df[...]

# Take the mean by ecoregion
mean_occurrences_by_ecoregion = (
    occurrence_df
    ...
)
# Take the mean by month
mean_occurrences_by_month = (
    occurrence_df
    ...
)

# Normalize the observations by the monthly mean throughout the year
occurrence_df['norm_occurrences'] = (
    occurrence_df.occurrences 
    / mean_occurrences_by_ecoregion
    / mean_occurrences_by_month
)
occurrence_df
SyntaxError: invalid syntax. Perhaps you forgot a comma? (2944446990.py, line 14)
See our solution!
occurrence_df = (
    gbif_ecoregion_gdf
    # For each ecoregion, for each month...
    .groupby(['ecoregion', 'month'])
    # ...count the number of occurrences
    .agg(occurrences=('name', 'count'))
)

# Get rid of rare observation noise (possible misidentification?)
occurrence_df = occurrence_df[occurrence_df.occurrences>1]

# Take the mean by ecoregion
mean_occurrences_by_ecoregion = (
    occurrence_df
    .groupby(['ecoregion'])
    .mean()
)
# Take the mean by month
mean_occurrences_by_month = (
    occurrence_df
    .groupby(['month'])
    .mean()
)

# Normalize the observations by the monthly mean throughout the year
occurrence_df['norm_occurrences'] = (
    occurrence_df
    / mean_occurrences_by_ecoregion
    / mean_occurrences_by_month
)
occurrence_df
occurrences norm_occurrences
ecoregion month
57 3 132 0.003020
4 397 0.004641
5 660 0.004941
6 481 0.005170
7 182 0.003507
... ... ... ...
2545 8 76 0.003036
9 63 0.002618
10 78 0.002695
11 45 0.001367
12 61 0.001663

983 rows × 2 columns

Plot the Tasiyagnunpa observations by month

Your task
  1. If applicable, replace any variable names with the names you defined previously.
  2. Replace column_name_used_for_ecoregion_color and column_name_used_for_slider with the column names you wish to use.
  3. Customize your plot with your choice of title, tile source, color map, and size.
# Join the occurrences with the plotting GeoDataFrame
occurrence_gdf = ecoregions_plot_gdf.join(occurrence_df)

# Get the plot bounds so they don't change with the slider
xmin, ymin, xmax, ymax = occurrence_gdf.total_bounds

# Plot occurrence by ecoregion and month
migration_plot = (
    occurrence_gdf
    .hvplot(
        c=column_name_used_for_shape_color,
        groupby=column_name_used_for_slider,
        # Use background tiles
        geo=True, crs=ccrs.Mercator(), tiles='CartoLight',
        title="Your Title Here",
        xlim=(xmin, xmax), ylim=(ymin, ymax),
        frame_height=600,
        widget_location='bottom'
    )
)

# Save the plot
migration_plot.save('migration.html', embed=True)

# Show the plot
migration_plot
NameError: name 'column_name_used_for_shape_color' is not defined
See our solution!
# Join the occurrences with the plotting GeoDataFrame
occurrence_gdf = ecoregions_plot_gdf.join(occurrence_df)

# Get the plot bounds so they don't change with the slider
xmin, ymin, xmax, ymax = occurrence_gdf.total_bounds

# Define the slider widget
slider = pn.widgets.DiscreteSlider(
    name='month', 
    options={calendar.month_name[i]: i for i in range(1, 13)}
)

# Plot occurrence by ecoregion and month
migration_plot = (
    occurrence_gdf
    .hvplot(
        c='norm_occurrences',
        groupby='month',
        # Use background tiles
        geo=True, crs=ccrs.Mercator(), tiles='CartoLight',
        title="Tasiyagnunpa migration",
        xlim=(xmin, xmax), ylim=(ymin, ymax),
        frame_height=600,
        colorbar=False,
        widgets={'month': slider},
        widget_location='bottom'
    )
)

# Save the plot
migration_plot.save('migration.html', embed=True)

# Show the plot
migration_plot

::: {.content-visible when-format=“html”} :::

Want an EXTRA CHALLENGE?

Notice that the month slider displays numbers instead of the month name. Use pn.widgets.DiscreteSlider() with the options= parameter set to give the months names. You might want to try asking ChatGPT how to do this, or look at the documentation for pn.widgets.DiscreteSlider(). This is pretty tricky!