5. Collecting and storing dataΒΆ

In this section, we will perform the data collection defined in the previous example, starting from where we left off: the creation of several dictionaries containing the encodings of all the attributes we are interested in.

location_dict = {loc:[encodeFacet([loc], kind='locations')] for loc in location}

gender_dict = {k:[encodeFacet(genders, kind='genders')] for k, genders in gender_dict.items()}
gender_dict['All'] = [None]

degrees_dict = {k:[encodeFacet(degrees, kind='degrees')] for k, degrees in degrees_dict.items()}
degrees_dict['All'] = [None]

There are several ways in which the data collection could proceed at this point. In this case, we will first create a list of tuples that contains all our segments.

import itertools
import pandas as pd

segments = list(itertools.product(*[location_dict, gender_dict, degrees_dict]))
print(segments)
[('Nigeria', 'Male', 'High School Diploma'), ('Nigeria', 'Male', "Bachelor's Degree"), ('Nigeria', 'Male', "Master's Degree"), ('Nigeria', 'Male', 'Undergraduate & Postgraduate'), ('Nigeria', 'Male', 'All'), ('Nigeria', 'Female', 'High School Diploma'), ('Nigeria', 'Female', "Bachelor's Degree"), ('Nigeria', 'Female', "Master's Degree"), ('Nigeria', 'Female', 'Undergraduate & Postgraduate'), ('Nigeria', 'Female', 'All'), ('Nigeria', 'All', 'High School Diploma'), ('Nigeria', 'All', "Bachelor's Degree"), ('Nigeria', 'All', "Master's Degree"), ('Nigeria', 'All', 'Undergraduate & Postgraduate'), ('Nigeria', 'All', 'All'), ('Ghana', 'Male', 'High School Diploma'), ('Ghana', 'Male', "Bachelor's Degree"), ('Ghana', 'Male', "Master's Degree"), ('Ghana', 'Male', 'Undergraduate & Postgraduate'), ('Ghana', 'Male', 'All'), ('Ghana', 'Female', 'High School Diploma'), ('Ghana', 'Female', "Bachelor's Degree"), ('Ghana', 'Female', "Master's Degree"), ('Ghana', 'Female', 'Undergraduate & Postgraduate'), ('Ghana', 'Female', 'All'), ('Ghana', 'All', 'High School Diploma'), ('Ghana', 'All', "Bachelor's Degree"), ('Ghana', 'All', "Master's Degree"), ('Ghana', 'All', 'Undergraduate & Postgraduate'), ('Ghana', 'All', 'All'), ('South Africa', 'Male', 'High School Diploma'), ('South Africa', 'Male', "Bachelor's Degree"), ('South Africa', 'Male', "Master's Degree"), ('South Africa', 'Male', 'Undergraduate & Postgraduate'), ('South Africa', 'Male', 'All'), ('South Africa', 'Female', 'High School Diploma'), ('South Africa', 'Female', "Bachelor's Degree"), ('South Africa', 'Female', "Master's Degree"), ('South Africa', 'Female', 'Undergraduate & Postgraduate'), ('South Africa', 'Female', 'All'), ('South Africa', 'All', 'High School Diploma'), ('South Africa', 'All', "Bachelor's Degree"), ('South Africa', 'All', "Master's Degree"), ('South Africa', 'All', 'Undergraduate & Postgraduate'), ('South Africa', 'All', 'All'), ('Kenya', 'Male', 'High School Diploma'), ('Kenya', 'Male', "Bachelor's Degree"), ('Kenya', 'Male', "Master's Degree"), ('Kenya', 'Male', 'Undergraduate & Postgraduate'), ('Kenya', 'Male', 'All'), ('Kenya', 'Female', 'High School Diploma'), ('Kenya', 'Female', "Bachelor's Degree"), ('Kenya', 'Female', "Master's Degree"), ('Kenya', 'Female', 'Undergraduate & Postgraduate'), ('Kenya', 'Female', 'All'), ('Kenya', 'All', 'High School Diploma'), ('Kenya', 'All', "Bachelor's Degree"), ('Kenya', 'All', "Master's Degree"), ('Kenya', 'All', 'Undergraduate & Postgraduate'), ('Kenya', 'All', 'All'), ('Mauritius', 'Male', 'High School Diploma'), ('Mauritius', 'Male', "Bachelor's Degree"), ('Mauritius', 'Male', "Master's Degree"), ('Mauritius', 'Male', 'Undergraduate & Postgraduate'), ('Mauritius', 'Male', 'All'), ('Mauritius', 'Female', 'High School Diploma'), ('Mauritius', 'Female', "Bachelor's Degree"), ('Mauritius', 'Female', "Master's Degree"), ('Mauritius', 'Female', 'Undergraduate & Postgraduate'), ('Mauritius', 'Female', 'All'), ('Mauritius', 'All', 'High School Diploma'), ('Mauritius', 'All', "Bachelor's Degree"), ('Mauritius', 'All', "Master's Degree"), ('Mauritius', 'All', 'Undergraduate & Postgraduate'), ('Mauritius', 'All', 'All')]

Now that we have all our segments, we can proceed to:

  1. iterate this list

  2. obtain the encoding of each attribute using the relevant dictionary

  3. create the URL for the final request to the API

  4. query the API and store the result

  5. create a dataframe containing all the data

%%time
columns = ['Country', 'Gender', 'Degree', 'Count']
new_rows = []

for segment in segments:
    # unpack the tuple and get encodings
    location_name, gender_name, degree_name = segment
    location, gender, degree = location_dict[location_name], gender_dict[gender_name], degrees_dict[degree_name]
    
    # generate URL for request
    requestCriteria = createRequestDataForAudienceCounts(locations = location,
                                                         genders = gender,
                                                         degrees= degree)
    
    # submit GET request
    count = getAudienceCounts(requestCriteria)
    
    # store data in a series (row)
    new_row = pd.Series(dtype=object)
    new_row['Country'] = location_name
    new_row['Gender'] = gender_name
    new_row['Degree'] = degree_name
    new_row['Count'] = count
    new_rows.append(new_row.values)
    
# construct dataframe
df = pd.DataFrame(new_rows, columns=columns)
CPU times: user 1.68 s, sys: 83.3 ms, total: 1.76 s
Wall time: 38.1 s

Now we can check that our dataframe contains the expected data, and process it further or save it to disk.

df.sample(10)
Country Gender Degree Count
54 Kenya Female All 800000
7 Nigeria Female Master's Degree 76000
19 Ghana Male All 900000
66 Mauritius Female Bachelor's Degree 14000
9 Nigeria Female All 1300000
28 Ghana All Undergraduate & Postgraduate 290000
36 South Africa Female Bachelor's Degree 370000
3 Nigeria Male Undergraduate & Postgraduate 540000
74 Mauritius All All 330000
59 Kenya All All 2600000