4. Defining and encoding audience attributes¶
A typical data collection will probably encompass many different “audiences”, segments of LinkedIn users defined by many characteristics. As the number of groups will grow exponentially with the number of required dimensions (education, age group etc.), the time needed to complete the data collection might become prohibitive. One way to significantly shorten this is to define the attributes and obtain their Uniform Resource Names (URNs) beforehand.
In the previous section we have seen how it is possible to generate the URL required for the request by supplying the attributes in text form. While this approach is more intuitive and user-friendly, we are in fact already querying LinkedIn’s API to encode the attributes. For example, “Nigeria” was translated into the more API-friendly “(urn:urn%3Ali%3Ageo%3A105365761,name:Nigeria,facetUrn:urn%3Ali%3AadTargetingFacet%3Alocations)”. This means that obtaining just one count could result in tens of API queries for particularly elaborate segments.
However, this ought not to be the case, since generating the URLs can also be performed by already supplying the encoded attributes, which themselves can be obtained by using another of the functions available in the package. In this chapter we will look at how to do that.
from linkedin_functions import *
In this example we will extend the scope of the previous section’s data collection to multiple countries, genders, and education levels.
The first thing we should do is specifiy lists or dictionaries containing all the attributes we are interested in. If specifying dictionaries, the keys will be used for record-keeping - for our output dataframe for instance - while the values will be encoded and used to create the URL.
location = ['Nigeria', 'Ghana', 'South Africa', 'Kenya', 'Mauritius']
gender_dict = {'Male': ['male'],
'Female': ['female']}
degrees_dict = {'High School Diploma': ['high school'],
"Bachelor's Degree": ['bachelor degree'],
"Master's Degree": ['master degree'],
'Undergraduate & Postgraduate': ['bachelor degree', 'master degree', 'phd']}
N. B. The key ‘Undergraduate & Postgraduate’ is mapped to a list with multiple values: this means that its audience will be composed by users that have any one of them, equivelent to performing a logical OR.
The function we will be using to encode the attributes is called “encodeFacet”, and it takes as arguments a list with all the values to encode, and a “kind” which specifies the type of attribute - at the time of this writing the tool accepts 7 different types: ‘locations’, ‘genders’, ‘ageRanges’, ‘degrees’, ‘fieldsOfStudy’, ‘seniorities’, ‘industries’. As always, to identify what values these dimensions could take we recommend consulting your Advertisting Campaign’s page.
We can test the function on the location “Nigeria”:
encodeFacet(['Nigeria'], kind='locations')
'(urn:urn%3Ali%3Ageo%3A105365761,name:Nigeria,facetUrn:urn%3Ali%3AadTargetingFacet%3Alocations)'
As expected, its output is a correctly encoded attribute, that can be supplied to the URL generating one.
In this example, we have specified 5 different locations, 2 different genders, and 4 different levels of education. This will result in 5x2x5=40 queries to get the audience counts, plus between 3 and 5 queries to encode the attributes per query. However, the single encodings will not change from one query to the next, e.g. the encoding for “Nigeria” will be the same whether we are asking for how many Nigerian women or Nigerian men have LinkedIn accounts. We can thus save a lot of time by computing and saving all the encodings before starting the data collection proper.
To do so, we can create new dictionaries with these commands:
location_dict = {loc:[encodeFacet([loc], kind='locations')] for loc in location}
gender_dict = {k:[encodeFacet(genders, kind='genders')] for k, genders in gender_dict.items()}
gender_dict['All'] = [None]
degrees_dict = {k:[encodeFacet(degrees, kind='degrees')] for k, degrees in degrees_dict.items()}
degrees_dict['All'] = [None]
Now the dictionaries will have the required structure, with each attribute mapped to its encoding. You will notice that I have also added one additional key “All” in the gender and education dictionaries: this is so I will also gather information for all possible values in that dimension.
location_dict
{'Nigeria': ['(urn:urn%3Ali%3Ageo%3A105365761,name:Nigeria,facetUrn:urn%3Ali%3AadTargetingFacet%3Alocations)'],
'Ghana': ['(urn:urn%3Ali%3Ageo%3A105769538,name:Ghana,facetUrn:urn%3Ali%3AadTargetingFacet%3Alocations)'],
'South Africa': ['(urn:urn%3Ali%3Ageo%3A104035573,name:South%20Africa,facetUrn:urn%3Ali%3AadTargetingFacet%3Alocations)'],
'Kenya': ['(urn:urn%3Ali%3Ageo%3A100710459,name:Kenya,facetUrn:urn%3Ali%3AadTargetingFacet%3Alocations)'],
'Mauritius': ['(urn:urn%3Ali%3Ageo%3A106931611,name:Mauritius,facetUrn:urn%3Ali%3AadTargetingFacet%3Alocations)']}
In the next section, we will show how to proceed to the data collection stage and store all the information in a pandas dataframe