5. Collecting and storing dataΒΆ
In this section, we will perform the data collection defined in the previous example, starting from where we left off: the creation of several dictionaries containing the encodings of all the attributes we are interested in.
location_dict = {loc:[encodeFacet([loc], kind='locations')] for loc in location}
gender_dict = {k:[encodeFacet(genders, kind='genders')] for k, genders in gender_dict.items()}
gender_dict['All'] = [None]
degrees_dict = {k:[encodeFacet(degrees, kind='degrees')] for k, degrees in degrees_dict.items()}
degrees_dict['All'] = [None]
There are several ways in which the data collection could proceed at this point. In this case, we will first create a list of tuples that contains all our segments.
import itertools
import pandas as pd
segments = list(itertools.product(*[location_dict, gender_dict, degrees_dict]))
print(segments)
[('Nigeria', 'Male', 'High School Diploma'), ('Nigeria', 'Male', "Bachelor's Degree"), ('Nigeria', 'Male', "Master's Degree"), ('Nigeria', 'Male', 'Undergraduate & Postgraduate'), ('Nigeria', 'Male', 'All'), ('Nigeria', 'Female', 'High School Diploma'), ('Nigeria', 'Female', "Bachelor's Degree"), ('Nigeria', 'Female', "Master's Degree"), ('Nigeria', 'Female', 'Undergraduate & Postgraduate'), ('Nigeria', 'Female', 'All'), ('Nigeria', 'All', 'High School Diploma'), ('Nigeria', 'All', "Bachelor's Degree"), ('Nigeria', 'All', "Master's Degree"), ('Nigeria', 'All', 'Undergraduate & Postgraduate'), ('Nigeria', 'All', 'All'), ('Ghana', 'Male', 'High School Diploma'), ('Ghana', 'Male', "Bachelor's Degree"), ('Ghana', 'Male', "Master's Degree"), ('Ghana', 'Male', 'Undergraduate & Postgraduate'), ('Ghana', 'Male', 'All'), ('Ghana', 'Female', 'High School Diploma'), ('Ghana', 'Female', "Bachelor's Degree"), ('Ghana', 'Female', "Master's Degree"), ('Ghana', 'Female', 'Undergraduate & Postgraduate'), ('Ghana', 'Female', 'All'), ('Ghana', 'All', 'High School Diploma'), ('Ghana', 'All', "Bachelor's Degree"), ('Ghana', 'All', "Master's Degree"), ('Ghana', 'All', 'Undergraduate & Postgraduate'), ('Ghana', 'All', 'All'), ('South Africa', 'Male', 'High School Diploma'), ('South Africa', 'Male', "Bachelor's Degree"), ('South Africa', 'Male', "Master's Degree"), ('South Africa', 'Male', 'Undergraduate & Postgraduate'), ('South Africa', 'Male', 'All'), ('South Africa', 'Female', 'High School Diploma'), ('South Africa', 'Female', "Bachelor's Degree"), ('South Africa', 'Female', "Master's Degree"), ('South Africa', 'Female', 'Undergraduate & Postgraduate'), ('South Africa', 'Female', 'All'), ('South Africa', 'All', 'High School Diploma'), ('South Africa', 'All', "Bachelor's Degree"), ('South Africa', 'All', "Master's Degree"), ('South Africa', 'All', 'Undergraduate & Postgraduate'), ('South Africa', 'All', 'All'), ('Kenya', 'Male', 'High School Diploma'), ('Kenya', 'Male', "Bachelor's Degree"), ('Kenya', 'Male', "Master's Degree"), ('Kenya', 'Male', 'Undergraduate & Postgraduate'), ('Kenya', 'Male', 'All'), ('Kenya', 'Female', 'High School Diploma'), ('Kenya', 'Female', "Bachelor's Degree"), ('Kenya', 'Female', "Master's Degree"), ('Kenya', 'Female', 'Undergraduate & Postgraduate'), ('Kenya', 'Female', 'All'), ('Kenya', 'All', 'High School Diploma'), ('Kenya', 'All', "Bachelor's Degree"), ('Kenya', 'All', "Master's Degree"), ('Kenya', 'All', 'Undergraduate & Postgraduate'), ('Kenya', 'All', 'All'), ('Mauritius', 'Male', 'High School Diploma'), ('Mauritius', 'Male', "Bachelor's Degree"), ('Mauritius', 'Male', "Master's Degree"), ('Mauritius', 'Male', 'Undergraduate & Postgraduate'), ('Mauritius', 'Male', 'All'), ('Mauritius', 'Female', 'High School Diploma'), ('Mauritius', 'Female', "Bachelor's Degree"), ('Mauritius', 'Female', "Master's Degree"), ('Mauritius', 'Female', 'Undergraduate & Postgraduate'), ('Mauritius', 'Female', 'All'), ('Mauritius', 'All', 'High School Diploma'), ('Mauritius', 'All', "Bachelor's Degree"), ('Mauritius', 'All', "Master's Degree"), ('Mauritius', 'All', 'Undergraduate & Postgraduate'), ('Mauritius', 'All', 'All')]
Now that we have all our segments, we can proceed to:
iterate this list
obtain the encoding of each attribute using the relevant dictionary
create the URL for the final request to the API
query the API and store the result
create a dataframe containing all the data
%%time
columns = ['Country', 'Gender', 'Degree', 'Count']
new_rows = []
for segment in segments:
# unpack the tuple and get encodings
location_name, gender_name, degree_name = segment
location, gender, degree = location_dict[location_name], gender_dict[gender_name], degrees_dict[degree_name]
# generate URL for request
requestCriteria = createRequestDataForAudienceCounts(locations = location,
genders = gender,
degrees= degree)
# submit GET request
count = getAudienceCounts(requestCriteria)
# store data in a series (row)
new_row = pd.Series(dtype=object)
new_row['Country'] = location_name
new_row['Gender'] = gender_name
new_row['Degree'] = degree_name
new_row['Count'] = count
new_rows.append(new_row.values)
# construct dataframe
df = pd.DataFrame(new_rows, columns=columns)
CPU times: user 1.68 s, sys: 83.3 ms, total: 1.76 s
Wall time: 38.1 s
Now we can check that our dataframe contains the expected data, and process it further or save it to disk.
df.sample(10)
Country | Gender | Degree | Count | |
---|---|---|---|---|
54 | Kenya | Female | All | 800000 |
7 | Nigeria | Female | Master's Degree | 76000 |
19 | Ghana | Male | All | 900000 |
66 | Mauritius | Female | Bachelor's Degree | 14000 |
9 | Nigeria | Female | All | 1300000 |
28 | Ghana | All | Undergraduate & Postgraduate | 290000 |
36 | South Africa | Female | Bachelor's Degree | 370000 |
3 | Nigeria | Male | Undergraduate & Postgraduate | 540000 |
74 | Mauritius | All | All | 330000 |
59 | Kenya | All | All | 2600000 |