{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "[](https://colab.research.google.com/github/worldbank/OpenNightLights/blob/master/onl/tutorials/mod4_1_time_series_charts.ipynb)\n", "\n", "# Time Series Charts\n", "\n", "In the last module we worked through some basic operations and visualized the results as map layers. For analytical work, it is also useful to plot the data in charts. For understanding temporal dynamics, which is a common desire when working with remote sensing, you will want a line graph that shows quantities of a variable over time, commonly known as a time series graph.\n", "\n", "### Quick caveat on charts with the Earth Engine Python API\n", "Google Earth Engine (GEE) provides a User-Interface (UI) module for creating charts directly in the Editor. It's built on the Google Visualization API if you're familiar with that in other Google products.\n", "\n", "Unfortunately, the UI module is not availabe through the Python API, including the `ee` library, but we'll introduce some Python-centric approaches to extracting and visualizing data. \n", "\n", "As an additional constraint, plotting data in this manner requires you to actually extract data from its location on Google servers (\"in the cloud\") to your local machine for visualization. Just as with any data extraction, there are constraints to how much data you can actually move, so it will be prohibitive to plot very large scenes. \n", "\n", "This is a limitation of using the Python API, but for our tutorial, we'll make sure you're familiar with the basic concepts you can use to advance to working with larger, more complex data. If you find you are hitting limits, you will want to look into using the native GEE Editor (i.e. conducting your entire workflow within the cloud) -- as mentioned, it has comprehensive documentation. Or you might consider investing in resources, time, and training to extract and process this data yourself. This is becoming easier to do and the full world of remote sensing analysis awaits you! But is out of the scope of this tutorial.\n", "\n", "In this exercise, we will create a simple time series for VIIRS-DNB values at a specific location. We'll also build on what we've learned reduce data for a region, such as in {doc}`mod3_4_cell_statistics_band_math`, to create time series for an entire region.\n", "\n", "**Our tasks in this exercise:**\n", "1. Extracting VIIRS time series data and converting to a pandas dataframe\n", "2. Create a 2014-2020 time series graph from VIIRS-DNB data for a point in Seoul, South Korea\n", "2. Create a 2014-2020 time series graph for Sum Of Lights (SOL) for South Korea\n", "\n", "## Extract time series data and convert to pandas dataframe\n", "\n", "Those familiar with Python know that `pandas` is an indespensible library, a package data analysis built on another indispensible package, `numpy`. We will extract the data from our raster file of VIIRS-DNB radiance and convert it into a pandas dataframe in our local computing space so that we can use a plot libraries, `matplotlib` and `seaborn` to make our line graph.\n", "\n", "First we define our point of interest: the location of Seoul Olympic Stadium (a.k.a. Jamsil Olympic Stadium). Technically we'll pick a lat/lon coordinate in the stadium and create a 500m buffer around it.\n", "\n", "Then we'll grab a collection of VIIRS-DNB monthly composites from January 2014 to May 2020." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import json\n", "import pandas as pd\n", "\n", "# reminder that if you are installing libraries in a Google Colab instance you will be prompted to restart your kernal\n", "\n", "try:\n", " import geemap, ee\n", " import seaborn as sns\n", " import matplotlib.pyplot as plt\n", "except ModuleNotFoundError:\n", " if 'google.colab' in str(get_ipython()):\n", " print(\"package not found, installing w/ pip in Google Colab...\")\n", " !pip install geemap seaborn matplotlib\n", " else:\n", " print(\"package not found, installing w/ conda...\")\n", " !conda install mamba -c conda-forge -y\n", " !mamba install geemap -c conda-forge -y\n", " !conda install seaborn matplotlib -y\n", " import geemap, ee\n", " import seaborn as sns\n", " import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "try:\n", " ee.Initialize()\n", "except Exception as e:\n", " ee.Authenticate()\n", " ee.Initialize()\n", "\n", "# identify a 500 meter buffer around our Point Of Interest (POI)\n", "poi = ee.Geometry.Point(127.072483, 37.515817).buffer(500)\n", "\n", "viirs = ee.ImageCollection(\"NOAA/VIIRS/DNB/MONTHLY_V1/VCMSLCFG\").filterDate('2014-01-01','2020-5-31')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Extracting all image values in the collection\n", "\n", "To make a time series, we need to get all the values in our collection using the `map` function. We'll create a custom function in Python that takes a single image as an input and reduces the data in a given region (our point of interest in this case).\n", "\n", "We'll get the mean of the pixels in our region and set the scale to 30. We'll use the `avg_rad` band.\n", "\n", "We'll then need to set this reduced info as a property (we'll call it \"mean\") in our image so that the output of our function is to get the mean radiance of a particular region, and add this as a property on our image along with the date." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def poi_mean(img):\n", " mean = img.reduceRegion(reducer=ee.Reducer.mean(), geometry=poi, scale=30).get('avg_rad')\n", " return img.set('date', img.date().format()).set('mean',mean)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We map this function to every image in our collection to get a new ImageCollection, but now each image has the mean value for the region of interest and the date. These are the data we'll make our time series out of." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "poi_reduced_imgs = viirs.map(poi_mean)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To convert to a pandas dataframe, we dont want an ImageCollection; however, so we will reduce our images to a list of lists:\n", "- for each image, we have a 2-element list that contains that images date and mean value (for our point of interest)\n", "- each of these lists are themselves elements in our outer list, which is what we'll convert to a dataframe" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "nested_list = poi_reduced_imgs.reduceColumns(ee.Reducer.toList(2), ['date','mean']).values().get(0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This nested list can be turned into a dataframe using the `.DataFrame` constructor. We'll name the columns, \"date\" and \"mean\"." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | date | \n", "mean | \n", "
---|---|---|
0 | \n", "2014-01-01T00:00:00 | \n", "61.927905 | \n", "
1 | \n", "2014-02-01T00:00:00 | \n", "51.591837 | \n", "
2 | \n", "2014-03-01T00:00:00 | \n", "51.378309 | \n", "
3 | \n", "2014-04-01T00:00:00 | \n", "59.228776 | \n", "
4 | \n", "2014-05-01T00:00:00 | \n", "63.510432 | \n", "
... | \n", "... | \n", "... | \n", "
72 | \n", "2020-01-01T00:00:00 | \n", "44.474762 | \n", "
73 | \n", "2020-02-01T00:00:00 | \n", "38.256775 | \n", "
74 | \n", "2020-03-01T00:00:00 | \n", "46.065028 | \n", "
75 | \n", "2020-04-01T00:00:00 | \n", "45.411734 | \n", "
76 | \n", "2020-05-01T00:00:00 | \n", "35.520984 | \n", "
77 rows × 2 columns
\n", "\n", " | mean | \n", "
---|---|
date | \n", "\n", " |
2014-01-01 | \n", "61.927905 | \n", "
2014-02-01 | \n", "51.591837 | \n", "
2014-03-01 | \n", "51.378309 | \n", "
2014-04-01 | \n", "59.228776 | \n", "
2014-05-01 | \n", "63.510432 | \n", "
... | \n", "... | \n", "
2020-01-01 | \n", "44.474762 | \n", "
2020-02-01 | \n", "38.256775 | \n", "
2020-03-01 | \n", "46.065028 | \n", "
2020-04-01 | \n", "45.411734 | \n", "
2020-05-01 | \n", "35.520984 | \n", "
77 rows × 1 columns
\n", "