{ "cells": [ { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2020-09-01T02:58:28.765090Z", "start_time": "2020-09-01T02:58:28.761362Z" } }, "source": [ "(content:plotting_maps)=\n", "# Plotting Maps\n", "\n", "In this notebook, we will finally plot some visualizations based on [the post-processed collection previously created](content:post_process_collection). For that, we will use the python package `folium` (please install it with `pip install folium`)." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "ExecuteTime": { "end_time": "2021-02-24T11:36:00.719523Z", "start_time": "2021-02-24T11:36:00.714696Z" } }, "outputs": [], "source": [ "import folium\n", "import pandas as pd\n", "import numpy as np\n", "import base64\n", "import io\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This simple map that we are going to plot will need two files:\n", "1. [The post-processed collection (processed_top5_cities.csv)](content:post_process_collection)\n", "2. The Geojson/Shapefiles/KML/Lat-long for locations. This information was included in the [original data (worldcities_fb_keys.csv)](content:json_creation), but could also be obtained by querying Facebook for [KMLs](content:listing_all_cities_states_in_a_country_region) or by other Websites, such as [GADM](https://gadm.org/data.html)." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "ExecuteTime": { "end_time": "2021-02-24T11:36:18.021530Z", "start_time": "2021-02-24T11:36:17.989150Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
KeyNameRegionFullLocationboth_18-40_2Gboth_18-40_3Gboth_18-40_4Gboth_18-40_AllDevicesboth_18-40_Wifiboth_18-_2G...populationidfb_querynamekeyregionregion_idcountry_namecountry_codetype
02880782Minato-kuTokyoMinato-ku, Tokyo, JP10001000840064000340001000...35676000.01392685764TokyoMinato-ku2880782Tokyo1922JapanJPcity
12490299New YorkNew YorkNew York, New York, US10004900520000330000016000001000...19354922.01840034016New YorkNew York2490299New York3875United StatesUScity
22673660Mexico CityDistrito FederalMexico City, Distrito Federal, MX12001600001000000760000048000001700...19028000.01484247881Mexico CityMexico City2673660Distrito Federal2513MexicoMXcity
31035921MumbaiMaharashtraMumbai, Maharashtra, IN110004600053000009000000170000014000...18978000.01356226629MumbaiMumbai1035921Maharashtra1735IndiaINcity
4269969São PauloSão Paulo (state)São Paulo, São Paulo (state), BR100045000510000580000037000002000...18845000.01076532519São PauloSão Paulo269969São Paulo (state)460BrazilBRcity
\n", "

5 rows × 83 columns

\n", "
" ], "text/plain": [ " Key Name Region FullLocation \\\n", "0 2880782 Minato-ku Tokyo Minato-ku, Tokyo, JP \n", "1 2490299 New York New York New York, New York, US \n", "2 2673660 Mexico City Distrito Federal Mexico City, Distrito Federal, MX \n", "3 1035921 Mumbai Maharashtra Mumbai, Maharashtra, IN \n", "4 269969 São Paulo São Paulo (state) São Paulo, São Paulo (state), BR \n", "\n", " both_18-40_2G both_18-40_3G both_18-40_4G both_18-40_AllDevices \\\n", "0 1000 1000 8400 64000 \n", "1 1000 4900 520000 3300000 \n", "2 1200 160000 1000000 7600000 \n", "3 11000 46000 5300000 9000000 \n", "4 1000 45000 510000 5800000 \n", "\n", " both_18-40_Wifi both_18-_2G ... population id fb_query \\\n", "0 34000 1000 ... 35676000.0 1392685764 Tokyo \n", "1 1600000 1000 ... 19354922.0 1840034016 New York \n", "2 4800000 1700 ... 19028000.0 1484247881 Mexico City \n", "3 1700000 14000 ... 18978000.0 1356226629 Mumbai \n", "4 3700000 2000 ... 18845000.0 1076532519 São Paulo \n", "\n", " name key region region_id country_name \\\n", "0 Minato-ku 2880782 Tokyo 1922 Japan \n", "1 New York 2490299 New York 3875 United States \n", "2 Mexico City 2673660 Distrito Federal 2513 Mexico \n", "3 Mumbai 1035921 Maharashtra 1735 India \n", "4 São Paulo 269969 São Paulo (state) 460 Brazil \n", "\n", " country_code type \n", "0 JP city \n", "1 US city \n", "2 MX city \n", "3 IN city \n", "4 BR city \n", "\n", "[5 rows x 83 columns]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_fb = pd.read_csv(\"processed_top5_cities.csv\")\n", "df_loc = pd.read_csv(\"worldcities_fb_keys.csv\")\n", "\n", "df = pd.merge(df_fb, df_loc, left_on=\"Key\", right_on=\"key\")\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A Basic Map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Folium is an amazing Python tool that directly uses [leaflet](https://leafletjs.com/) JavaScritp library for iteractive maps. We recommend the reader to take a look at the [latest folium documentation](https://python-visualization.github.io/folium/quickstart.html) and [the various examples of how to use this library](https://nbviewer.jupyter.org/github/python-visualization/folium/tree/master/examples/).\n", "\n", "A simple map for our use case is based on the following code that explores the `tooltip` and `popup` concepts on top of the markers of each city." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "ExecuteTime": { "end_time": "2021-02-24T11:36:21.328736Z", "start_time": "2021-02-24T11:36:21.298326Z" } }, "outputs": [ { "data": { "text/html": [ "
Make this Notebook Trusted to load map: File -> Trust Notebook
" ], "text/plain": [ "" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m = folium.Map(location=[0, 0], zoom_start=2, tiles=\"openstreetmap\", control_scale = True)\n", "\n", "for idx, row in df.iterrows():\n", " tooltip = 'City name: %s!' % (row[\"name\"])\n", " html = \"

%s

Total FB pop: %d\" % (row[\"name\"], row[\"both_18-_AllDevices\"])\n", " popup = folium.Popup(html, max_width=450, min_width=450)\n", "\n", " folium.Marker([row[\"lat\"], row[\"lng\"]], popup=popup, tooltip=tooltip).add_to(m)\n", "m" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When any of the locations is clicked, a pop up showing the city name and the total Facebook audience is shown.\n", "This number is based on the column **both_18-_AllDevices**, and as [explaned before](content:post_process_collection), it represents both male and female audience aged at least 18 years old and using any network to connect to Facebook. \n", "\n", "```{warning}\n", "Note that the correct column name will depend on the criteria used in the collection!\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Improving the Basic Map" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2020-09-01T03:01:11.277294Z", "start_time": "2020-09-01T03:01:11.271663Z" } }, "source": [ "Next we will define a few functions to be able to plot a much more interesting map.\n", "The functions ``getPie`` and ``getPyramid`` can be used in other contexts as well.\n", "The parameter `get_encoded` is decide if these functions should return an image HTML encoded or not. For the maps, we will use `get_encoded=True`." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "ExecuteTime": { "end_time": "2021-02-24T11:36:23.242631Z", "start_time": "2021-02-24T11:36:23.236069Z" } }, "outputs": [], "source": [ "def getPie(labels, sizes, explode=None, title=None, get_encoded=True):\n", " \n", " # Pie chart, where the slices will be ordered and plotted counter-clockwise:\n", " # Examples:\n", " # labels = 'Frogs', 'Hogs', 'Dogs', 'Logs'\n", " # sizes = [15, 30, 45, 10]\n", " # explode = (0, 0.1, 0, 0) # only \"explode\" the 2nd slice (i.e. 'Hogs')\n", "\n", " \n", " def label_formant(pct, allvals):\n", " absolute = int(pct/100.*np.sum(allvals))\n", " return \"{:.1f}%\\n({:,d})\".format(pct, absolute)\n", " \n", " # TODO: make it like a doughout\n", " # https://matplotlib.org/3.1.1/gallery/pie_and_polar_charts/pie_and_donut_labels.html\n", " \n", " fig1, ax1 = plt.subplots(figsize=(2,2))\n", " ax1.pie(sizes, explode=explode, labels=labels,\n", " autopct=lambda pct: label_formant(pct, sizes),\n", " shadow=True, startangle=90, counterclock=False, \n", " wedgeprops = {'linewidth' : 2, 'edgecolor': \"black\", }\n", " )\n", " \n", " if title:\n", " ax1.set_title(title)\n", " ax1.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.\n", "\n", " if get_encoded:\n", " img_buffer = io.BytesIO()\n", " plt.savefig(img_buffer, format='png', transparent=True, bbox_inches=\"tight\")\n", " img_buffer.seek(0)\n", " plt.close()\n", " \n", " return base64.b64encode(img_buffer.getvalue()).decode('UTF-8')\n", " \n", " else:\n", " return plt\n", " " ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "ExecuteTime": { "end_time": "2021-02-24T11:36:24.008666Z", "start_time": "2021-02-24T11:36:23.907173Z" } }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "getPie([\"Frogs\", \"Hogs\", \"Dogs\", \"Logs\"], [15, 30, 45, 10], (0, 0.1, 0, 0), get_encoded=False)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "ExecuteTime": { "end_time": "2021-02-24T11:36:24.558113Z", "start_time": "2021-02-24T11:36:24.458292Z" } }, "outputs": [ { "data": { "text/plain": [ "'iVBORw0KGgoAAAANSUhEUgAAAKgAAAB9CAYAAAA2uCgoAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAvn0lEQVR4nO2deXhcVfnHP++9s2/Z0yRN2zRd2Rvaskhlt2UTWVREVASURVARRNCfK+4ii4IIguygILuIKFCgVChF6L6mTZfsezJZZrv3nt8fd5KmTdomadqZYD7Pkycz95577rkz3znL+77nHFFKMcYY6YqW6gKMMcaeGBPoGGnNmEDHSGvGBDpGWjMm0DHSmjGBjpHWjBqBikhnqsswxoFn1Ah0jP9NRrVARWSWiCwRkZUi8pyIZCWPz00ee1dEbhGR1cnjh4jIUhFZnjw/LbVPMMbeGNUCBR4BblRKHQ6sAn6UPP4gcKVS6ljA7JP+SuB3SqlZwByg6gCWdZ8QETP5w+r5K0l1mQ4EjlQXYLiISAaQqZR6K3noYeBvIpIJBJVS7ySPPwGclXz9LvB/IlIMPKuUKj+QZd5HIskfVj9ERABRSlkHtkj7n9Fegw6E7O6EUuoJ4GwgAvxLRE4ekRvaAjmgiEiJiKwTkbuBD4EJPd0ZEVklIhck02kicreIrBGRl0TkZRH5dPLcr0RkbbK789sD/QyDYdTWoEqpdhFpFZGPK6XeBr4IvKWUahWRDhE5Rim1BPhczzUiUgpUKKV+n3x9OLBwoPyTossDDkr+zUz+nwr4ARcQxP4MEyISAbqBSmB78m9b8v86YIPat8gcr4gsT77eAnwLmAFcopT6moicD8wCjgBygfdFZBFwHFACHAbkJ8vygIhkA+cCM5VSKtnypB2jSaA+EenbZ7wNuBi4R0R8QAVwSfLcZcB9ItIFvAm0J49fAHxBRBJAHXBzT2Yi4gZOwO4OzMYWY9Ygy+ZM/oWAAmDuAGkaRWQx'" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "getPie([\"Frogs\", \"Hogs\", \"Dogs\", \"Logs\"], [15, 30, 45, 10], (0, 0.1, 0, 0), get_encoded=True)[:1000]" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "ExecuteTime": { "end_time": "2021-02-24T11:36:25.054350Z", "start_time": "2021-02-24T11:36:25.047456Z" } }, "outputs": [], "source": [ "def getPyramid(y, labels, data_left, data_right, normalized=False, get_buffer=True):\n", " \n", " # E.g.:\n", " # y = [0-18, 19-25, 26+]\n", " # labels = [female, male]\n", " # data_left = [1000, 2000, 3000]\n", " # data_right = [2000, 5000, 1000]\n", " \n", " data_left = np.array(data_left)\n", " data_right = np.array(data_right)\n", " \n", " if normalized:\n", " data_left = data_left / data_left.sum()\n", " data_right = data_right / data_right.sum()\n", " \n", " assert data_left.shape == data_right.shape\n", " N = range(0, data_left.shape[0])\n", " \n", " fig1, ax1 = plt.subplots(figsize=(2,2))\n", " \n", " ax1.barh(N, -data_left, label=labels[0])\n", " ax1.barh(N, data_right, label=labels[1])\n", " \n", " ax1.set(yticks=N, yticklabels=y)\n", " \n", " ax1.spines['right'].set_visible(False)\n", " ax1.spines['top'].set_visible(True)\n", " ax1.spines[\"top\"]._linewidth = 2\n", " \n", " ax1.spines['left'].set_visible(False)\n", " ax1.spines['bottom'].set_visible(True)\n", " ax1.spines['bottom']._linewidth = 2\n", " \n", " ax1.legend(loc='upper center', bbox_to_anchor=(0.5, -0.15),\n", " fancybox=True, shadow=True, ncol=5)\n", "\n", " if get_buffer:\n", " img_buffer = io.BytesIO()\n", " plt.savefig(img_buffer, format='png', transparent=True, bbox_inches=\"tight\")\n", " img_buffer.seek(0)\n", " plt.close()\n", " return base64.b64encode(img_buffer.getvalue()).decode('UTF-8')\n", " \n", " else:\n", " return plt\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The function ``getHTML`` will render an HTML including images generated by these other functions." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "ExecuteTime": { "end_time": "2021-02-24T11:39:08.471720Z", "start_time": "2021-02-24T11:39:08.464370Z" } }, "outputs": [], "source": [ "def getHTML(row):\n", " \n", " total_pop = float(row['both_18-_AllDevices'])\n", " \n", " pie_gender = getPie([\"Male\", \"Female\"], [row[\"male_18-_AllDevices\"], row[\"female_18-_AllDevices\"]])\n", " pie_connectivity = getPie([\"Wifi\", \"2G\", \"3G\", \"4G\"], [row[\"both_18-_Wifi\"], row[\"both_18-_2G\"], \n", " row[\"both_18-_3G\"], row[\"both_18-_4G\"], ])\n", " pie_connectivity_male = getPie([\"Wifi\", \"2G\", \"3G\", \"4G\"], [row[\"male_18-_Wifi\"], row[\"male_18-_2G\"],\n", " row[\"male_18-_3G\"], row[\"male_18-_4G\"]], \n", " title=\"Connectivity (Male)\")\n", " pie_connectivity_female = getPie([\"Wifi\", \"2G\", \"3G\", \"4G\"], [row[\"female_18-_Wifi\"], row[\"female_18-_2G\"],\n", " row[\"female_18-_3G\"], row[\"female_18-_4G\"],], \n", " title=\"Connectivity (Female)\")\n", " \n", " pie_age = getPie([\"18-40\", \"41-54\", \"55+\"], [row[\"both_18-40_AllDevices\"], row[\"both_41-54_AllDevices\"],\n", " row[\"both_55-_AllDevices\"]])\n", " \n", " pyramid_age = getPyramid([\"18-40\", \"41-54\", \"55+\"], [\"female\", \"male\"],\n", " [row[\"female_18-40_AllDevices\"], row[\"female_41-54_AllDevices\"], row[\"female_55-_AllDevices\"]],\n", " [row[\"male_18-40_AllDevices\"], row[\"male_41-54_AllDevices\"], row[\"male_55-_AllDevices\"]]\n", " )\n", " \n", " pyramid_age_perc = getPyramid([\"18-40\", \"41-54\", \"55+\"], [\"female\", \"male\"],\n", " [row[\"female_18-40_AllDevices\"], row[\"female_41-54_AllDevices\"], row[\"female_55-_AllDevices\"]],\n", " [row[\"male_18-40_AllDevices\"], row[\"male_41-54_AllDevices\"], row[\"male_55-_AllDevices\"]],\n", " normalized=True)\n", "\n", " \n", " \n", " html = \"\"\"\n", "

Location: {name} ({lat:.1f}, {lng:.1f})


\n", "
Total Population: {total_pop:,}
\n", " \n", " \n", "
Gender Distribution:
\n", "
\n", " \n", "
Age Distribution
\n", "
\n", " \n", "
Age Distribution per Gender
\n", "
\n", " \n", " \n", "
\n", " \n", " \n", " \n", "
Connectivity <\\h5>\n", "
\n", " \n", "
Connectivity per Gender <\\h5>\n", "
\n", " \n", " \n", " \n", " \"\"\".format(name=row[\"name\"], \n", " lat=float(row[\"lat\"]), lng=float(row[\"lng\"]), \n", " total_pop=total_pop,\n", " pie_gender=pie_gender,\n", " pie_age=pie_age,\n", " pyramid_age=pyramid_age,\n", " pyramid_age_perc=pyramid_age_perc,\n", " pie_connectivity=pie_connectivity,\n", " pie_connectivity_male=pie_connectivity_male,\n", " pie_connectivity_female=pie_connectivity_female,\n", " \n", " )\n", "\n", " # TRY TO FORMAT WITH https://docs.python.org/3/library/string.html#format-specification-mini-language\n", " return html\n" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "ExecuteTime": { "end_time": "2021-02-24T11:39:44.070326Z", "start_time": "2021-02-24T11:39:41.201818Z" } }, "outputs": [ { "data": { "text/html": [ "
Make this Notebook Trusted to load map: File -> Trust Notebook
" ], "text/plain": [ "" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# There are many options for tile. Here we are using the first one, openstreetmaps\n", "tile_options = ['openstreetmap', 'cartodbpositron', 'Mapbox Bright', \n", " 'Stamen Terrain', 'Stamen Toner', 'Stamen Watercolor',\n", " 'Mapbox Control Room', 'CartoDB dark_matter', ]\n", "\n", "tiles = tile_options[0]\n", "\n", "m = folium.Map(location=[0, 0], zoom_start=2, tiles=tiles, control_scale = True)\n", "\n", "\n", "for idx, row in df.iterrows():\n", " html = getHTML(row)\n", " popup = folium.Popup(html, max_width=450, min_width=450)\n", "\n", " tooltip = '%s!' % (row[\"name\"])\n", "\n", " folium.Marker([row[\"lat\"], row[\"lng\"]], popup=popup, tooltip=tooltip).add_to(m)\n", "\n", "m" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2020-09-23T04:22:14.183755Z", "start_time": "2020-09-23T04:22:13.483670Z" } }, "outputs": [], "source": [ "# It is much easier to see the outcome of this notebook as a HTML file.\n", "m.save('connetivity_top5_cities.html')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 4 }