(content:json_creation)=
# Creating a custom JSON file for a data collection

This notebook will overview a few recent improvements to help us create a custom data collection.
Note that, right now, the only way to use these features is by installing pySocialWarcther from [Joao's GitHub repo](https://github.com/joaopalotti/pySocialWatcher). We will need to clone it and install it with ``python setup.py install``.

In [1]:
import pandas as pd
from pysocialwatcher import watcherAPI

# helper functions to build a json file
from pysocialwatcher.json_builder import JSONBuilder, AgeList, Age, Genders, LocationList
from pysocialwatcher.json_builder import get_predefined_behavior

## Creating a collection from a list of locations
One potential use case that we envision is creating a data collection from a list of locations (e.g., city names). In the following example, we will load a list of cities from a file (`worldcities.csv`) and use some of the pySocialWatcher tools to find these cities' internal Facebook unique identifiers.

In [2]:
# From a list of locations, e.g., "worldcities.csv", containing the list of most populated 
# cities in the world, our first goal is to find what are their FB ids
df_cities = pd.read_csv("./worldcities.csv")
df_cities.head()

Unnamed: 0,city,city_ascii,lat,lng,country,iso2,iso3,admin_name,capital,population,id
0,Tokyo,Tokyo,35.685,139.7514,Japan,JP,JPN,Tōkyō,primary,35676000.0,1392685764
1,New York,New York,40.6943,-73.9249,United States,US,USA,New York,,19354922.0,1840034016
2,Mexico City,Mexico City,19.4424,-99.131,Mexico,MX,MEX,Ciudad de México,primary,19028000.0,1484247881
3,Mumbai,Mumbai,19.017,72.857,India,IN,IND,Mahārāshtra,admin,18978000.0,1356226629
4,São Paulo,Sao Paulo,-23.5587,-46.625,Brazil,BR,BRA,São Paulo,admin,18845000.0,1076532519


We always start by loading the library with:

In [3]:
watcher = watcherAPI(api_version="9.0", sleep_time=5) 
watcher.load_credentials_file("credentials.csv")

This time we are using the parameter ``sleep_time`` to inform pySocialWatcher that after every API call, it should sleep for 5 seconds before issuing another API call, instead of the default 8 seconds. 
Note that we can decrease the ``sleep_time`` as much as we want (e.g., try using ``0.5``). However, we risk being blocked by Facebook for issuing API calls too often. There are different token levels, but if you only have a normal standard one, I recommend using a ``sleep_time`` of 5 or 6 seconds.

There are multiple ways to find out the internal FB ID of a location, and one of them is using the pySocialWatcher function ``watcherAPI.get_geo_locations_given_query_and_location_type``.

In the following example, we create a helper function ``find_most_likely_city_given_name``, which queries the Graph API looking for cities with a given name and retrieving the top 1 location found.

In [4]:
def find_most_likely_city_given_name(cityname):
    df = watcherAPI.get_geo_locations_given_query_and_location_type(cityname, "city", limit=1)
    return None if df.empty else (None if "name" not in df.iloc[0] else df.iloc[0]["name"], df.iloc[0]["key"],
                                  None if "region" not in df.iloc[0] else df.iloc[0]["region"], df.iloc[0]["region_id"],
                                  df.iloc[0]["country_name"], df.iloc[0]["country_code"])

In [5]:
find_most_likely_city_given_name("Belo Horizonte")

('Belo Horizonte', '244661', 'Minas Gerais', 449, 'Brazil', 'BR')

In [6]:
find_most_likely_city_given_name("Washington")

('Washington', '2516005', 'Rhode Island', 3882, 'United States', 'US')

In [7]:
find_most_likely_city_given_name("Washington DC")

('Washington D. C.',
 '2427178',
 'Washington, District of Columbia',
 3851,
 'United States',
 'US')

This simple example shows that this function is not perfect but should help us in this tutorial. Some of the problems here can be alleviated by looking beyond the first match retrieved by Facebook Graph API or by specifying more than a simple city name (e.g., having as input not only the city name but also appending the state and country name as well).

The second element in the tuple returned by ``find_most_likely_city_given_name`` is the FB identifier for that city (assuming it is a correct match). In general, we highly recommend you spend some time inspecting the full list of retrieved locations instead of only relying on the top one.

In [8]:
# NOTE: the following code might need hours to process depending on the kind of token that you have.
# To shorten this example, we are only collecting the information for the top 5 cities in the list.
df_cities["fb_query"] = df_cities[["city", "iso2"]].apply(lambda x: "%s" % (x[0]), axis=1)

df_cities = df_cities.head(5)
df_cities

Unnamed: 0,city,city_ascii,lat,lng,country,iso2,iso3,admin_name,capital,population,id,fb_query
0,Tokyo,Tokyo,35.685,139.7514,Japan,JP,JPN,Tōkyō,primary,35676000.0,1392685764,Tokyo
1,New York,New York,40.6943,-73.9249,United States,US,USA,New York,,19354922.0,1840034016,New York
2,Mexico City,Mexico City,19.4424,-99.131,Mexico,MX,MEX,Ciudad de México,primary,19028000.0,1484247881,Mexico City
3,Mumbai,Mumbai,19.017,72.857,India,IN,IND,Mahārāshtra,admin,18978000.0,1356226629,Mumbai
4,São Paulo,Sao Paulo,-23.5587,-46.625,Brazil,BR,BRA,São Paulo,admin,18845000.0,1076532519,São Paulo


In [9]:
# The following code will loop over the list and issue one FB API call per row.
# The results will be saved in the same data frame.
rows = []
for idx, row in df_cities.iterrows():
    print(idx, row["fb_query"])
    r = find_most_likely_city_given_name(row["fb_query"])
    print(idx, row["fb_query"], r)
    if r:
        row = {"idx": idx, "name": r[0], "key": r[1], "region": r[2], "region_id": r[3],
               "country_name": r[4], "country_code": r[5]}
        rows.append(row)

df_cities = pd.merge(df_cities, pd.DataFrame(rows).set_index("idx"), left_index=True, right_index=True, how="outer")
df_cities["type"] = "city"

# Optionally, we can save locally output with FB keys
df_cities.to_csv("worldcities_fb_keys.csv", index=False)

0 Tokyo
0 Tokyo ('Minato-ku', '2880782', 'Tokyo', 1922, 'Japan', 'JP')
1 New York
1 New York ('New York', '2490299', 'New York', 3875, 'United States', 'US')
2 Mexico City
2 Mexico City ('Mexico City', '2673660', 'Distrito Federal', 2513, 'Mexico', 'MX')
3 Mumbai
3 Mumbai ('Mumbai', '1035921', 'Maharashtra', 1735, 'India', 'IN')
4 São Paulo
4 São Paulo ('São Paulo', '269969', 'São Paulo (state)', 460, 'Brazil', 'BR')


In [10]:
df_cities.head(5) # Note as only Minato-ku is collected as if it were the whole Tokyo.

Unnamed: 0,city,city_ascii,lat,lng,country,iso2,iso3,admin_name,capital,population,id,fb_query,name,key,region,region_id,country_name,country_code,type
0,Tokyo,Tokyo,35.685,139.7514,Japan,JP,JPN,Tōkyō,primary,35676000.0,1392685764,Tokyo,Minato-ku,2880782,Tokyo,1922,Japan,JP,city
1,New York,New York,40.6943,-73.9249,United States,US,USA,New York,,19354922.0,1840034016,New York,New York,2490299,New York,3875,United States,US,city
2,Mexico City,Mexico City,19.4424,-99.131,Mexico,MX,MEX,Ciudad de México,primary,19028000.0,1484247881,Mexico City,Mexico City,2673660,Distrito Federal,2513,Mexico,MX,city
3,Mumbai,Mumbai,19.017,72.857,India,IN,IND,Mahārāshtra,admin,18978000.0,1356226629,Mumbai,Mumbai,1035921,Maharashtra,1735,India,IN,city
4,São Paulo,Sao Paulo,-23.5587,-46.625,Brazil,BR,BRA,São Paulo,admin,18845000.0,1076532519,São Paulo,São Paulo,269969,São Paulo (state),460,Brazil,BR,city


(content:listing_all_cities_states_in_a_country_region)=
## Listing all cities/states in a country/region

Another typical use case is collecting data for all cities in a region or all regions in a country.
``pySocialWatcher`` has a couple of functions that might help us to get started.

For example, ``watcherAPI.get_KMLs_for_regions_in_country`` allow us to retrieve all the regions for a given country:

In [7]:
all_regions = watcherAPI.get_KMLs_for_regions_in_country("US")
all_regions.head(10)

Obtained 51 regions.
Obtained 51 KMLs.


Unnamed: 0,key,name,type,country_code,country_name,supports_region,supports_city,kml
0,3866,Minnesota,region,US,United States,True,True,<Polygon><outerBoundaryIs><LinearRing><coordin...
1,3855,Idaho,region,US,United States,True,True,<Polygon><outerBoundaryIs><LinearRing><coordin...
2,3856,Illinois,region,US,United States,True,True,<Polygon><outerBoundaryIs><LinearRing><coordin...
3,3864,Massachusetts,region,US,United States,True,True,<Polygon><outerBoundaryIs><LinearRing><coordin...
4,3846,Arkansas,region,US,United States,True,True,<Polygon><outerBoundaryIs><LinearRing><coordin...
5,3886,Texas,region,US,United States,True,True,<Polygon><outerBoundaryIs><LinearRing><coordin...
6,3843,Alabama,region,US,United States,True,True,<Polygon><outerBoundaryIs><LinearRing><coordin...
7,3882,Rhode Island,region,US,United States,True,True,<Polygon><outerBoundaryIs><LinearRing><coordin...
8,3862,Maine,region,US,United States,True,True,<Polygon><outerBoundaryIs><LinearRing><coordin...
9,3873,New Jersey,region,US,United States,True,True,<Polygon><outerBoundaryIs><LinearRing><coordin...


Note that ``watcherAPI.get_KMLs_for_regions_in_country`` also retrieves information on how to draw each location in a map using the [Keywhole Markup Language (KML)](https://en.wikipedia.org/wiki/Keyhole_Markup_Language) format.
This will be very useful later when we plot maps.

Similarly, we can get the KML of countries using ``watcherAPI.get_KML_given_geolocation``:

In [12]:
watcherAPI.get_KML_given_geolocation("countries", ["US"])

Unnamed: 0,name,kml,key
0,United States,<Polygon><outerBoundaryIs><LinearRing><coordin...,US


Last but not least, there is a function to help obtain all (or as many as possible) cities in a given region.

Unfortunately, this is not a trivial task at all. By the time this document was being written, this functionality was just provided by  ``watcher.get_all_cities_given_country_code``, a hack build upon the basic search functionality of Facebook Graph API provided by the function  ``watcher.get_geo_locations_given_query_and_location_type``.

First, let's inspect what can be extracted using ``watcher.get_geo_locations_given_query_and_location_type``.

In [13]:
# The function below queries the FB Graph API for "cities" that have "bos" in their names and are located
# in Massachusetts (region_id=3864), United States. 
watcher.get_geo_locations_given_query_and_location_type("bos", ["city"], country_code="US", region_id=3864)

Unnamed: 0,key,name,type,country_code,country_name,region,region_id,supports_region,supports_city,geo_hierarchy_level,geo_hierarchy_name
0,2464828,Boston,city,US,United States,Massachusetts,3864,True,True,,
1,2731172,Readville,neighborhood,US,United States,Massachusetts,3864,True,True,NEIGHBORHOOD,NEIGHBORHOOD
2,2732785,West End,neighborhood,US,United States,Massachusetts,3864,True,True,NEIGHBORHOOD,NEIGHBORHOOD
3,2732101,Deer Island,neighborhood,US,United States,Massachusetts,3864,True,True,NEIGHBORHOOD,NEIGHBORHOOD
4,2732925,North End,neighborhood,US,United States,Massachusetts,3864,True,True,NEIGHBORHOOD,NEIGHBORHOOD
5,2466700,West New Boston,subcity,US,United States,Massachusetts,3864,True,True,SUBCITY,CITY
6,2465869,New Boston,city,US,United States,Massachusetts,3864,True,True,,
7,2732248,Bellevue Hill,neighborhood,US,United States,Massachusetts,3864,True,True,NEIGHBORHOOD,NEIGHBORHOOD
8,2731737,Brook Farm,neighborhood,US,United States,Massachusetts,3864,True,True,NEIGHBORHOOD,NEIGHBORHOOD
9,2733152,Ashmont,neighborhood,US,United States,Massachusetts,3864,True,True,NEIGHBORHOOD,NEIGHBORHOOD


As we can see, for the case of locations in the USA, the API version 9.0 retrieves cities, subcities and neighborhoods.

Although we only asked for cities that contain (not necessarily starting with) the string _"bos"_ (Boston, New Boston), it also retrieved neighborhoods on these cities (e.g., Beacon Hill, Back Bay, South End in Boston).

Build upon ``watcher.get_geo_locations_given_query_and_location_type``, the function ``watcher.get_all_cities_given_country_code`` will query Facebook for all the states in a country and then, for each state, it will query for all cities that contain an _"a"_, all the cities that contain a _"b"_ and so on.

When many results are retrieved for a specific search, Facebook throws and an error instead of retrieving the results. In this situation, we are forced to narrow our search by including another letter in the current search. For example, if there are too many cities that contain an _"a"_ in a given state, we have to narrow our search by including a second letter, e.g., searching for all cities in that state that contain _"aa"_, _"ab"_, _"ac"_, and so on. Overall, this is a prolonged process but it just needs to be done once.

In [8]:
# This function takes a long time to process, use it carefully and save the results afterwards.
watcher.get_all_cities_given_country_code("US")

(content:jsonbuilder)=
## JSONBuilder: building a json file for a collection with different desagregations (Locations, Age, Gender, etc)

``JSONBuilder`` is the class used by `pySocialWatcher` to create a valid json file with all configuation required to start our data collection.
It requires a set of objects that need to be previously initialized (``LocationList``, ``AgeList`` and ``Genders``) and allows the configuation of some optional objects (``behavior_groups`` and ``scholarities``).


Starting by ``LocationList``, a list of locations can be directly loaded from a dataframe like `df_cities` that we have previously created.

In [15]:
df_cities

Unnamed: 0,city,city_ascii,lat,lng,country,iso2,iso3,admin_name,capital,population,id,fb_query,name,key,region,region_id,country_name,country_code,type
0,Tokyo,Tokyo,35.685,139.7514,Japan,JP,JPN,Tōkyō,primary,35676000.0,1392685764,Tokyo,Minato-ku,2880782,Tokyo,1922,Japan,JP,city
1,New York,New York,40.6943,-73.9249,United States,US,USA,New York,,19354922.0,1840034016,New York,New York,2490299,New York,3875,United States,US,city
2,Mexico City,Mexico City,19.4424,-99.131,Mexico,MX,MEX,Ciudad de México,primary,19028000.0,1484247881,Mexico City,Mexico City,2673660,Distrito Federal,2513,Mexico,MX,city
3,Mumbai,Mumbai,19.017,72.857,India,IN,IND,Mahārāshtra,admin,18978000.0,1356226629,Mumbai,Mumbai,1035921,Maharashtra,1735,India,IN,city
4,São Paulo,Sao Paulo,-23.5587,-46.625,Brazil,BR,BRA,São Paulo,admin,18845000.0,1076532519,São Paulo,São Paulo,269969,São Paulo (state),460,Brazil,BR,city


The required columns are ``key`` (with the FB Ids) and ``name``. Other colunms like ``region_id`` and ``country_code`` can be loaded as well if they are available.

In [16]:
loclist = LocationList()
loclist.get_location_list_from_df(df_cities)

The gender and age groups that we are interested and can be espcified simply as follows:

In [17]:
# Age Groups
ageList = AgeList()
ageList.add(Age(18, None))
ageList.add(Age(18, 40))
ageList.add(Age(41, 54))
ageList.add(Age(55, None))

# Gender
genders = Genders(male=True, female=True, combined=True)

Two predefined user behaviors are supported by `pySocialWatcher`:
- **'connectivity'**: retrieves information on the primarily network used by people to connect to Facebook. Options are 2G, 3G, 4G or Wifi.
- **'phones'**: retrieves information on the kind of mobile device primarily used to connect to Facebook. Options are iOS, Android, or other.

In [18]:
connetivity = get_predefined_behavior(option="connectivity")

Now we just need to put everything together in the `JSONBuilder` object and save it to a file with the function ``jsonfy``.

In [19]:
jsonb = JSONBuilder(name="test", age_list=ageList, location_list=loclist, 
                    genders=genders, behavior_groups=connetivity)

jsonb.jsonfy("top5_cities.json")

Created file top5_cities.json.


{'name': 'test',
 'geo_locations': [{'name': 'cities',
   'values': [{'key': 2880782,
     'region': 'Tokyo',
     'region_id': 1922,
     'country_code': 'JP',
     'name': 'Minato-ku',
     'distance_unit': 'kilometer',
     'radius': 0}],
   'location_types': ['home', 'recent']},
  {'name': 'cities',
   'values': [{'key': 2490299,
     'region': 'New York',
     'region_id': 3875,
     'country_code': 'US',
     'name': 'New York',
     'distance_unit': 'kilometer',
     'radius': 0}],
   'location_types': ['home', 'recent']},
  {'name': 'cities',
   'values': [{'key': 2673660,
     'region': 'Distrito Federal',
     'region_id': 2513,
     'country_code': 'MX',
     'name': 'Mexico City',
     'distance_unit': 'kilometer',
     'radius': 0}],
   'location_types': ['home', 'recent']},
  {'name': 'cities',
   'values': [{'key': 1035921,
     'region': 'Maharashtra',
     'region_id': 1735,
     'country_code': 'IN',
     'name': 'Mumbai',
     'distance_unit': 'kilometer',
     'radi

That is it. If everything went file, we can start our collection with the file above.
Eventually, manual adjustments will be needed, but `JSONBuild` does already a great part of the work.

In [3]:
# Now we can start a data collection with the following lines of code:
watcher = watcherAPI(api_version="9.0", sleep_time=5, outputname="output_psw_top5_cities.csv") 
watcher.load_credentials_file("credentials.csv")
df = watcher.run_data_collection("top5_cities.json", remove_tmp_files=True)

# This collection has 300 API calls and to speed up, we are using sleep_time = 5.

2021-02-24 14:28:02 donna root[119238] INFO Building Collection Dataframe
2021-02-24 14:28:02 donna root[119238] INFO Total API Requests:300
2021-02-24 14:28:02 donna root[119238] INFO Completed: 0.00
2021-02-24 14:28:02 donna root[119238] INFO Completed: 0.33
2021-02-24 14:28:02 donna root[119238] INFO Completed: 0.67
2021-02-24 14:28:02 donna root[119238] INFO Completed: 1.00
2021-02-24 14:28:02 donna root[119238] INFO Completed: 1.33
2021-02-24 14:28:02 donna root[119238] INFO Completed: 1.67
2021-02-24 14:28:02 donna root[119238] INFO Completed: 2.00
2021-02-24 14:28:02 donna root[119238] INFO Completed: 2.33
2021-02-24 14:28:02 donna root[119238] INFO Completed: 2.67
2021-02-24 14:28:02 donna root[119238] INFO Completed: 3.00
2021-02-24 14:28:02 donna root[119238] INFO Completed: 3.33
2021-02-24 14:28:02 donna root[119238] INFO Completed: 3.67
2021-02-24 14:28:02 donna root[119238] INFO Completed: 4.00
2021-02-24 14:28:02 donna root[119238] INFO Completed: 4.33
2021-02-24 14:28:02

2021-02-24 14:28:02 donna root[119238] INFO Completed: 20.00
2021-02-24 14:28:02 donna root[119238] INFO Completed: 20.33
2021-02-24 14:28:02 donna root[119238] INFO Completed: 20.67
2021-02-24 14:28:02 donna root[119238] INFO Completed: 21.00
2021-02-24 14:28:02 donna root[119238] INFO Completed: 21.33
2021-02-24 14:28:02 donna root[119238] INFO Completed: 21.67
2021-02-24 14:28:02 donna root[119238] INFO Completed: 22.00
2021-02-24 14:28:02 donna root[119238] INFO Completed: 22.33
2021-02-24 14:28:02 donna root[119238] INFO Completed: 22.67
2021-02-24 14:28:02 donna root[119238] INFO Completed: 23.00
2021-02-24 14:28:02 donna root[119238] INFO Completed: 23.33
2021-02-24 14:28:02 donna root[119238] INFO Completed: 23.67
2021-02-24 14:28:02 donna root[119238] INFO Completed: 24.00
2021-02-24 14:28:02 donna root[119238] INFO Completed: 24.33
2021-02-24 14:28:02 donna root[119238] INFO Completed: 24.67
2021-02-24 14:28:02 donna root[119238] INFO Completed: 25.00
2021-02-24 14:28:02 donn

2021-02-24 14:28:02 donna root[119238] INFO Completed: 41.33
2021-02-24 14:28:02 donna root[119238] INFO Completed: 41.67
2021-02-24 14:28:02 donna root[119238] INFO Completed: 42.00
2021-02-24 14:28:02 donna root[119238] INFO Completed: 42.33
2021-02-24 14:28:02 donna root[119238] INFO Completed: 42.67
2021-02-24 14:28:02 donna root[119238] INFO Completed: 43.00
2021-02-24 14:28:02 donna root[119238] INFO Completed: 43.33
2021-02-24 14:28:02 donna root[119238] INFO Completed: 43.67
2021-02-24 14:28:02 donna root[119238] INFO Completed: 44.00
2021-02-24 14:28:02 donna root[119238] INFO Completed: 44.33
2021-02-24 14:28:02 donna root[119238] INFO Completed: 44.67
2021-02-24 14:28:02 donna root[119238] INFO Completed: 45.00
2021-02-24 14:28:02 donna root[119238] INFO Completed: 45.33
2021-02-24 14:28:02 donna root[119238] INFO Completed: 45.67
2021-02-24 14:28:02 donna root[119238] INFO Completed: 46.00
2021-02-24 14:28:02 donna root[119238] INFO Completed: 46.33
2021-02-24 14:28:02 donn

2021-02-24 14:28:02 donna root[119238] INFO Completed: 63.00
2021-02-24 14:28:02 donna root[119238] INFO Completed: 63.33
2021-02-24 14:28:02 donna root[119238] INFO Completed: 63.67
2021-02-24 14:28:02 donna root[119238] INFO Completed: 64.00
2021-02-24 14:28:02 donna root[119238] INFO Completed: 64.33
2021-02-24 14:28:02 donna root[119238] INFO Completed: 64.67
2021-02-24 14:28:02 donna root[119238] INFO Completed: 65.00
2021-02-24 14:28:02 donna root[119238] INFO Completed: 65.33
2021-02-24 14:28:02 donna root[119238] INFO Completed: 65.67
2021-02-24 14:28:02 donna root[119238] INFO Completed: 66.00
2021-02-24 14:28:02 donna root[119238] INFO Completed: 66.33
2021-02-24 14:28:02 donna root[119238] INFO Completed: 66.67
2021-02-24 14:28:02 donna root[119238] INFO Completed: 67.00
2021-02-24 14:28:02 donna root[119238] INFO Completed: 67.33
2021-02-24 14:28:02 donna root[119238] INFO Completed: 67.67
2021-02-24 14:28:02 donna root[119238] INFO Completed: 68.00
2021-02-24 14:28:02 donn

2021-02-24 14:28:02 donna root[119238] INFO Completed: 84.33
2021-02-24 14:28:02 donna root[119238] INFO Completed: 84.67
2021-02-24 14:28:02 donna root[119238] INFO Completed: 85.00
2021-02-24 14:28:02 donna root[119238] INFO Completed: 85.33
2021-02-24 14:28:02 donna root[119238] INFO Completed: 85.67
2021-02-24 14:28:02 donna root[119238] INFO Completed: 86.00
2021-02-24 14:28:02 donna root[119238] INFO Completed: 86.33
2021-02-24 14:28:02 donna root[119238] INFO Completed: 86.67
2021-02-24 14:28:02 donna root[119238] INFO Completed: 87.00
2021-02-24 14:28:02 donna root[119238] INFO Completed: 87.33
2021-02-24 14:28:02 donna root[119238] INFO Completed: 87.67
2021-02-24 14:28:02 donna root[119238] INFO Completed: 88.00
2021-02-24 14:28:02 donna root[119238] INFO Completed: 88.33
2021-02-24 14:28:02 donna root[119238] INFO Completed: 88.67
2021-02-24 14:28:02 donna root[119238] INFO Completed: 89.00
2021-02-24 14:28:02 donna root[119238] INFO Completed: 89.33
2021-02-24 14:28:02 donn

2021-02-24 14:28:08 donna root[119238] INFO Collecting... Completed: 1.67% , 5/300
2021-02-24 14:28:15 donna root[119238] INFO Collecting... Completed: 3.33% , 10/300


2021-02-24 14:28:23 donna root[119238] INFO Collecting... Completed: 5.00% , 15/300
2021-02-24 14:28:29 donna root[119238] INFO Collecting... Completed: 6.67% , 20/300
2021-02-24 14:28:36 donna root[119238] INFO Collecting... Completed: 8.33% , 25/300


2021-02-24 14:28:42 donna root[119238] INFO Collecting... Completed: 10.00% , 30/300
2021-02-24 14:28:48 donna root[119238] INFO Collecting... Completed: 11.67% , 35/300


2021-02-24 14:28:54 donna root[119238] INFO Collecting... Completed: 13.33% , 40/300
2021-02-24 14:29:01 donna root[119238] INFO Collecting... Completed: 15.00% , 45/300
2021-02-24 14:29:07 donna root[119238] INFO Collecting... Completed: 16.67% , 50/300


2021-02-24 14:29:14 donna root[119238] INFO Collecting... Completed: 18.33% , 55/300
2021-02-24 14:29:20 donna root[119238] INFO Collecting... Completed: 20.00% , 60/300


2021-02-24 14:29:26 donna root[119238] INFO Collecting... Completed: 21.67% , 65/300
2021-02-24 14:29:32 donna root[119238] INFO Collecting... Completed: 23.33% , 70/300


2021-02-24 14:29:38 donna root[119238] INFO Collecting... Completed: 25.00% , 75/300
2021-02-24 14:29:45 donna root[119238] INFO Collecting... Completed: 26.67% , 80/300
2021-02-24 14:29:51 donna root[119238] INFO Collecting... Completed: 28.33% , 85/300


2021-02-24 14:29:56 donna root[119238] INFO Collecting... Completed: 30.00% , 90/300
2021-02-24 14:30:03 donna root[119238] INFO Collecting... Completed: 31.67% , 95/300


2021-02-24 14:30:11 donna root[119238] INFO Collecting... Completed: 33.33% , 100/300
2021-02-24 14:30:17 donna root[119238] INFO Collecting... Completed: 35.00% , 105/300
2021-02-24 14:30:23 donna root[119238] INFO Collecting... Completed: 36.67% , 110/300


2021-02-24 14:30:29 donna root[119238] INFO Collecting... Completed: 38.33% , 115/300
2021-02-24 14:30:36 donna root[119238] INFO Collecting... Completed: 40.00% , 120/300


2021-02-24 14:30:42 donna root[119238] INFO Collecting... Completed: 41.67% , 125/300
2021-02-24 14:30:48 donna root[119238] INFO Collecting... Completed: 43.33% , 130/300


2021-02-24 14:30:55 donna root[119238] INFO Collecting... Completed: 45.00% , 135/300
2021-02-24 14:31:01 donna root[119238] INFO Collecting... Completed: 46.67% , 140/300
2021-02-24 14:31:07 donna root[119238] INFO Collecting... Completed: 48.33% , 145/300


2021-02-24 14:31:13 donna root[119238] INFO Collecting... Completed: 50.00% , 150/300
2021-02-24 14:31:19 donna root[119238] INFO Collecting... Completed: 51.67% , 155/300


2021-02-24 14:31:28 donna root[119238] INFO Collecting... Completed: 53.33% , 160/300
2021-02-24 14:31:34 donna root[119238] INFO Collecting... Completed: 55.00% , 165/300
2021-02-24 14:31:40 donna root[119238] INFO Collecting... Completed: 56.67% , 170/300


2021-02-24 14:31:47 donna root[119238] INFO Collecting... Completed: 58.33% , 175/300
2021-02-24 14:31:54 donna root[119238] INFO Collecting... Completed: 60.00% , 180/300


2021-02-24 14:32:01 donna root[119238] INFO Collecting... Completed: 61.67% , 185/300
2021-02-24 14:32:09 donna root[119238] INFO Collecting... Completed: 63.33% , 190/300


2021-02-24 14:32:14 donna root[119238] INFO Collecting... Completed: 65.00% , 195/300
2021-02-24 14:32:20 donna root[119238] INFO Collecting... Completed: 66.67% , 200/300
2021-02-24 14:32:25 donna root[119238] INFO Collecting... Completed: 68.33% , 205/300


2021-02-24 14:32:32 donna root[119238] INFO Collecting... Completed: 70.00% , 210/300
2021-02-24 14:32:38 donna root[119238] INFO Collecting... Completed: 71.67% , 215/300


2021-02-24 14:32:44 donna root[119238] INFO Collecting... Completed: 73.33% , 220/300
2021-02-24 14:32:50 donna root[119238] INFO Collecting... Completed: 75.00% , 225/300
2021-02-24 14:32:56 donna root[119238] INFO Collecting... Completed: 76.67% , 230/300


2021-02-24 14:33:03 donna root[119238] INFO Collecting... Completed: 78.33% , 235/300
2021-02-24 14:33:08 donna root[119238] INFO Collecting... Completed: 80.00% , 240/300


2021-02-24 14:33:14 donna root[119238] INFO Collecting... Completed: 81.67% , 245/300
2021-02-24 14:33:20 donna root[119238] INFO Collecting... Completed: 83.33% , 250/300


2021-02-24 14:33:26 donna root[119238] INFO Collecting... Completed: 85.00% , 255/300
2021-02-24 14:33:31 donna root[119238] INFO Collecting... Completed: 86.67% , 260/300
2021-02-24 14:33:37 donna root[119238] INFO Collecting... Completed: 88.33% , 265/300


2021-02-24 14:33:43 donna root[119238] INFO Collecting... Completed: 90.00% , 270/300
2021-02-24 14:33:49 donna root[119238] INFO Collecting... Completed: 91.67% , 275/300


2021-02-24 14:33:55 donna root[119238] INFO Collecting... Completed: 93.33% , 280/300
2021-02-24 14:34:01 donna root[119238] INFO Collecting... Completed: 95.00% , 285/300
2021-02-24 14:34:07 donna root[119238] INFO Collecting... Completed: 96.67% , 290/300


2021-02-24 14:34:14 donna root[119238] INFO Collecting... Completed: 98.33% , 295/300
2021-02-24 14:34:22 donna root[119238] INFO Saving temporary file: dataframe_collecting_1614166082.csv.gz
2021-02-24 14:34:22 donna root[119238] INFO Data Collection Complete
2021-02-24 14:34:22 donna root[119238] INFO Saving temporary file: dataframe_collecting_1614166082.csv.gz
2021-02-24 14:34:22 donna root[119238] INFO Computing Audience and DAU column
2021-02-24 14:34:22 donna root[119238] INFO Saving after collecting file: output_psw_top5_cities.csv


In the [next notebook](content:post_process_collection), we will see how to post-process this collection to a human readable format. 