# Ghana and similar peers

Requested by Katherine Anne Stapleton, this notebook shows case a data collection at city level for Ghana and several of its peers.

1. Regional peers (ECOWAS): Benin, Burkina Faso, Côte d’Ivoire, The Gambia, Guinea, Liberia, Mali, Niger, Nigeria, Senegal, Sierra Leone, Togo
2. Structural peers: Cameroon, Côte d’Ivoire, Kenya, Kyrgyz Republic, Mauritania, Myanmar, Nicaragua
3. Aspirational peers: Algeria, Belarus, Colombia, Dominican Republic, Ecuador, Jordan, Paraguay, Peru

We cover here all the steps involved in a typical data collection using Facebook Marketing API data, with exception of plotting a map, which was not needed.

1. [Before the collection starts](#before-the-collection-starts): we start by the very beginning by acquiring the FB Ids and shapefiles for the locations we are interested in.
2. [Data Collection](#collecting-data): This step is where the primary data collection happens and might take weeks to finish.
3. [Post-processing The Collection](content:post-processing-steps): After the data collection happened, we need to post-process the data to create a clean CSV file for data analysis.


In [24]:
import pandas as pd
import ast
import folium
from folium import plugins
import numpy as np
import base64
import io
import os
import matplotlib.pyplot as plt
import branca
import xml.etree.ElementTree as ET
import json
import geopandas as gpd
from pysocialwatcher.post_process import post_process_df_collection, combine_cols
from pysocialwatcher.utils import double_country_conversion
from pysocialwatcher import watcherAPI
from pysocialwatcher.json_builder import JSONBuilder, AgeList, Age, Genders, get_predefined_behavior
from pysocialwatcher.json_builder import LocationList, ScholarityList, Scholarity

import uuid
from folium.features import GeoJson, GeoJsonTooltip, GeoJsonPopup
from branca.colormap import linear
from shapely import wkt
import geopandas 

## Before the collection starts

In [3]:
watcher = watcherAPI(api_version="9.0", sleep_time=5) 
watcher.load_credentials_file("credentials.csv")

In [4]:
countries = ["Ghana", "South Africa", "Nigeria", "Kenya", "Mauritius", "India", "Philippines",
"Benin", "Burkina Faso", "Cote d'Ivoire", "Gambia", "Guinea", "Liberia", "Mali", "Niger", 
"Senegal", "Sierra Leone", "Togo", "Cameroon", "Kyrgyzstan", "Mauritania", "Myanmar", "Nicaragua", 
"Algeria", "Belarus", "Colombia", "Dominican Republic", "Ecuador", "Jordan", "Paraguay", "Peru"]
country_codes = []

for country in countries:
    print(country, double_country_conversion(country))
    country_codes.append(double_country_conversion(country))

Ghana GH
South Africa ZA
Nigeria NG
Kenya KE
Mauritius MU
India IN
Philippines PH
Benin BJ
Burkina Faso BF
Cote d'Ivoire CI
Gambia GM
Guinea GW
Liberia LR
Mali ML
Niger NE
Senegal SN
Sierra Leone SL
Togo TG
Cameroon CM
Kyrgyzstan KG
Mauritania MR
Myanmar MM
Nicaragua NI
Algeria DZ
Belarus BY
Colombia CO
Dominican Republic DO
Ecuador EC
Jordan JO
Paraguay PY
Peru PE


In [5]:
cities = []
for country_code in ["GH"]: # or 'country_codes' if you are interested in getting all the cities for all the countries
    cities.append(watcher.get_all_cities_given_country_code(country_code))
cities = pd.concat(cities)

Getting cities for region named 'Upper East Region' (id = 1429)
Getting cities that start with a
Getting cities that start with b
Getting cities that start with c
Getting cities that start with d
Getting cities that start with e
Getting cities that start with f
Getting cities that start with g
Getting cities that start with h
Getting cities that start with i
Getting cities that start with j
Getting cities that start with k
Getting cities that start with l
Getting cities that start with m
Getting cities that start with n
Getting cities that start with o
Getting cities that start with p
Getting cities that start with q
Getting cities that start with r
Getting cities that start with s
Getting cities that start with t
Getting cities that start with u
Getting cities that start with v
Getting cities that start with w
Getting cities that start with x
Getting cities that start with y
Getting cities that start with z
Getting cities for region named 'Volta Region' (id = 1427)
Getting cities that

Getting cities that start with y
Getting cities that start with z
Getting cities for region named 'Upper West Region' (id = 1430)
Getting cities that start with a
Getting cities that start with b
Getting cities that start with c
Getting cities that start with d
Getting cities that start with e
Getting cities that start with f
Getting cities that start with g
Getting cities that start with h
Getting cities that start with i
Getting cities that start with j
Getting cities that start with k
Getting cities that start with l
Getting cities that start with m
Getting cities that start with n
Getting cities that start with o
Getting cities that start with p
Getting cities that start with q
Getting cities that start with r
Getting cities that start with s
Getting cities that start with t
Getting cities that start with u
Getting cities that start with v
Getting cities that start with w
Getting cities that start with x
Getting cities that start with y
Getting cities that start with z


In [7]:
cities # all cities in the country list is 13.565. Only Ghana 170

Unnamed: 0,key,name,type,country_code,country_name,region,region_id,supports_region,supports_city,geo_hierarchy_level,geo_hierarchy_name
0,832102,Bolgatanga,city,GH,Ghana,Upper East Region,1429,True,True,,
1,831609,Bawku,city,GH,Ghana,Upper East Region,1429,True,True,,
2,838366,Navrongo,city,GH,Ghana,Upper East Region,1429,True,True,,
3,839764,Paga,city,GH,Ghana,Upper East Region,1429,True,True,,
4,832157,"Bongo, Upper East",city,GH,Ghana,Upper East Region,1429,True,True,,
...,...,...,...,...,...,...,...,...,...,...,...
166,832874,"Daffiama, Upper West",city,GH,Ghana,Upper West Region,1430,True,True,,
167,835520,"Kaleo, Upper West",city,GH,Ghana,Upper West Region,1430,True,True,,
168,841898,Tumu,city,GH,Ghana,Upper West Region,1430,True,True,,
169,835302,"Jirapa, Upper West",city,GH,Ghana,Upper West Region,1430,True,True,,


In [8]:
regions = []
for country_code in country_codes:
    regions.append(watcherAPI.get_KMLs_for_regions_in_country(country_code))
regions = pd.concat(regions)

Obtained 10 regions.
Obtained 10 KMLs.
Obtained 9 regions.
Obtained 9 KMLs.
Obtained 37 regions.
Obtained 37 KMLs.
Obtained 8 regions.
Obtained 8 KMLs.
Obtained 9 regions.
Obtained 9 KMLs.
Obtained 36 regions.
Obtained 36 KMLs.
Obtained 17 regions.
Obtained 17 KMLs.
Obtained 12 regions.
Obtained 12 KMLs.
Obtained 0 regions.
Obtained 28 regions.
Obtained 28 KMLs.
Obtained 6 regions.
Obtained 6 KMLs.
Obtained 9 regions.
Obtained 9 KMLs.
Obtained 15 regions.
Obtained 15 KMLs.
Obtained 9 regions.
Obtained 9 KMLs.
Obtained 8 regions.
Obtained 8 KMLs.
Obtained 14 regions.
Obtained 14 KMLs.
Obtained 4 regions.
Obtained 4 KMLs.
Obtained 5 regions.
Obtained 5 KMLs.
Obtained 10 regions.
Obtained 10 KMLs.
Obtained 8 regions.
Obtained 8 KMLs.
Obtained 13 regions.
Obtained 13 KMLs.
Obtained 0 regions.
Obtained 17 regions.
Obtained 17 KMLs.
Obtained 48 regions.
Obtained 48 KMLs.
Obtained 7 regions.
Obtained 7 KMLs.
Obtained 33 regions.
Obtained 33 KMLs.
Obtained 33 regions.
Obtained 33 KMLs.
Obtaine

In [9]:
# In case we want to plot a map later on.
df_countries = watcherAPI.get_KML_given_geolocation("countries", country_codes)
df_countries

Unnamed: 0,name,kml,key
0,Ghana,<Polygon><outerBoundaryIs><LinearRing><coordin...,GH
1,South Africa,<Polygon><outerBoundaryIs><LinearRing><coordin...,ZA
2,Nigeria,<Polygon><outerBoundaryIs><LinearRing><coordin...,NG
3,Kenya,<Polygon><outerBoundaryIs><LinearRing><coordin...,KE
4,Mauritius,<Polygon><outerBoundaryIs><LinearRing><coordin...,MU
5,India,<Polygon><outerBoundaryIs><LinearRing><coordin...,IN
6,Philippines,<Polygon><outerBoundaryIs><LinearRing><coordin...,PH
7,Benin,<Polygon><outerBoundaryIs><LinearRing><coordin...,BJ
8,Burkina Faso,<Polygon><outerBoundaryIs><LinearRing><coordin...,BF
9,Côte d'Ivoire,<Polygon><outerBoundaryIs><LinearRing><coordin...,CI


In [25]:
loclist = LocationList()
loclist.get_location_list_from_df(cities, city_radius=50)

loc_regions = LocationList()
loc_regions.get_location_list_from_df(regions)
loclist.add(loc_regions)

loc_countries = LocationList()
loc_countries.get_location_list_from_df(df_countries)
loclist.add(loc_countries)

# Ages
ageList = AgeList()
ageList.add(Age(13, None))
ageList.add(Age(18, None))
ageList.add(Age(18, 24))
ageList.add(Age(25, 34))
ageList.add(Age(35, 54))
ageList.add(Age(55, None))

# Education
sl = ScholarityList()
nodegree = Scholarity.from_pre_defined_list("No Degree")
sl.add(nodegree)
highschool = Scholarity.from_pre_defined_list("High School")
sl.add(highschool)
graduated = Scholarity.from_pre_defined_list("Graduated")
sl.add(graduated)
sl.add(None)

# Gender
genders = Genders(male=True, female=True, combined=True)

# Connectivity
# Using the pre-defined behavior option of connectivity (which collects #users using Wifi, 2G, 3G, 4G)
# and Ios (which collects ios, android, others)
behavior = get_predefined_behavior(option="connectivity")
ios = get_predefined_behavior(option="ios")
behavior.merge(ios)

jsonb = JSONBuilder(name="ghana_collection", age_list=ageList, location_list=loclist, genders=genders,
                    behavior_groups=behavior, scholarities=sl)

jsonb.jsonfy("ghana_collection.json", split_into_n_pieces=1)


Behavior List None is already included in the list.
Created file ghana_collection.json.


{'name': 'ghana_collection',
 'geo_locations': [{'name': 'cities',
   'values': [{'key': 832102,
     'region': 'Upper East Region',
     'region_id': 1429,
     'country_code': 'GH',
     'name': 'Bolgatanga',
     'distance_unit': 'kilometer',
     'radius': 50}],
   'location_types': ['home', 'recent']},
  {'name': 'cities',
   'values': [{'key': 831609,
     'region': 'Upper East Region',
     'region_id': 1429,
     'country_code': 'GH',
     'name': 'Bawku',
     'distance_unit': 'kilometer',
     'radius': 50}],
   'location_types': ['home', 'recent']},
  {'name': 'cities',
   'values': [{'key': 838366,
     'region': 'Upper East Region',
     'region_id': 1429,
     'country_code': 'GH',
     'name': 'Navrongo',
     'distance_unit': 'kilometer',
     'radius': 50}],
   'location_types': ['home', 'recent']},
  {'name': 'cities',
   'values': [{'key': 839764,
     'region': 'Upper East Region',
     'region_id': 1429,
     'country_code': 'GH',
     'name': 'Paga',
     'distanc

## Collecting Data

This is a very large data collection with thousands of API calls.
There are several ways to speed it up:
1. [Increasing the number of users linked to an APP](https://developers.facebook.com/docs/graph-api/overview/rate-limiting#platform_rate_limits)
2. [Getting a business token](https://developers.facebook.com/docs/graph-api/overview/rate-limiting)
3. Or simply by using multiple users/tokens at the same time: for this you just need to have multiple rows in the ``credentials.csv`` file

In [None]:
# This cell performs the collection. It might take several days.
watcher = watcherAPI(api_version="9.0", sleep_time=7, outputname="ghana_collection_psw.csv.gz")
df = watcher.run_data_collection("ghana_collection.json", remove_tmp_files=True)
# Omitted for brevity

(content:post-processing-steps)=
## Postprocessing step: from pySocialWatcher to something we can understand!

In [26]:
filename = "ghana_collection_psw.csv.gz"

df_in = pd.read_csv(filename)
df_fb = post_process_df_collection(df_in)

location_mapping = df_fb[["Key", "Name", "Region", "FullLocation", "LocationType"]].drop_duplicates()

We then combine the columns to obtain a dataframe in which every single line is the data for a location

In [27]:
cols_to_combine = ["Gender", "Ages"]

if "Device" in df_fb:
    cols_to_combine.append("Device")

if "Education" in df_fb:
    cols_to_combine.append("Education")

    
df_fb = combine_cols(df_fb, cols_to_combine)
df_fb = df_fb.drop_duplicates(subset=["Key", "combo"]) 

df_fb = df_fb.pivot(index="Key", columns="combo", values="mau_audience").reset_index()
df_fb = pd.merge(location_mapping, df_fb)

# Added a prefix ("fb_") to all keys 
key_mapping = dict([(k, "fb_" + k) for k in df_fb.keys() if k not in ["Name", "Region", "FullLocation", "LocationType", "Key"]])
df_fb = df_fb.rename(columns=key_mapping)
df_fb.head()

Unnamed: 0,Key,Name,Region,FullLocation,LocationType,fb_both_13-_2G_AllDegrees,fb_both_13-_2G_Graduated,fb_both_13-_2G_High School,fb_both_13-_2G_No Degree,fb_both_13-_3G_AllDegrees,...,fb_male_55-_Other_High School,fb_male_55-_Other_No Degree,fb_male_55-_Wifi_AllDegrees,fb_male_55-_Wifi_Graduated,fb_male_55-_Wifi_High School,fb_male_55-_Wifi_No Degree,fb_male_55-_iOS_AllDegrees,fb_male_55-_iOS_Graduated,fb_male_55-_iOS_High School,fb_male_55-_iOS_No Degree
0,832102,Bolgatanga,Upper East Region,"Bolgatanga, Upper East Region, GH",city,2200,1000,1000,1400,47000,...,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000
1,831609,Bawku,Upper East Region,"Bawku, Upper East Region, GH",city,1000,1000,1000,1000,16000,...,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000
2,838366,Navrongo,Upper East Region,"Navrongo, Upper East Region, GH",city,1900,1000,1000,1300,41000,...,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000
3,839764,Paga,Upper East Region,"Paga, Upper East Region, GH",city,1800,1000,1000,1200,38000,...,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000
4,832157,"Bongo, Upper East",Upper East Region,"Bongo, Upper East, Upper East Region, GH",city,1900,1000,1000,1200,45000,...,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000


In [28]:
df_fb.to_csv("ghana_collection_16Dec2020.csv.gz")