8. Repeating a recurrent data collectionΒΆ

There are multiple ways to schedule a data collection. For example, natively, a Linux user can use cron to schedule a recurrent job (e.g., run the data collection everyday at 1am). Here we present a simple solution using the python package schedule (pip install schedule to install it).

import schedule
import time
import os

from glob import glob
from pysocialwatcher import watcherAPI

In this example, we are going to repeat the same sample data collection made in our first example every Monday night.

def data_collection():
    
    # Time in the format month-day-year
    str_time = time.strftime("%m-%d-%y")
    
    # The next 3 commands will start our collection quick_example.json 
    # and save the results as dummy_DAY-MONTH-YEAR.csv.gz
    watcher = watcherAPI(api_version="9.0", outputname="dummy_%s.csv.gz" % (str_time)) 
    watcher.load_credentials_file("credentials.csv")
    df = watcher.run_data_collection("quick_example.json", remove_tmp_files=True)
    
    print("DONE! Next collection starting next Tuesday at 12:32pm.")

schedule.every().tuesday.at("12:32").do(data_collection)

while True:
    schedule.run_pending()
    time.sleep(1)
2021-02-02 12:32:00 donna root[2916112] INFO Building Collection Dataframe
2021-02-02 12:32:00 donna root[2916112] WARNING Field not expecified: interests
2021-02-02 12:32:00 donna root[2916112] WARNING Field not expecified: behavior
2021-02-02 12:32:00 donna root[2916112] WARNING Field not expecified: scholarities
2021-02-02 12:32:00 donna root[2916112] WARNING Field not expecified: languages
2021-02-02 12:32:00 donna root[2916112] WARNING Field not expecified: family_statuses
2021-02-02 12:32:00 donna root[2916112] WARNING Field not expecified: relationship_statuses
2021-02-02 12:32:00 donna root[2916112] WARNING Field not expecified: household_composition
2021-02-02 12:32:00 donna root[2916112] INFO Total API Requests:6
2021-02-02 12:32:00 donna root[2916112] INFO Completed: 0.00
2021-02-02 12:32:00 donna root[2916112] WARNING No field: languages
2021-02-02 12:32:00 donna root[2916112] INFO Completed: 16.67
2021-02-02 12:32:00 donna root[2916112] WARNING No field: languages
2021-02-02 12:32:00 donna root[2916112] INFO Completed: 33.33
2021-02-02 12:32:00 donna root[2916112] WARNING No field: languages
2021-02-02 12:32:00 donna root[2916112] INFO Completed: 50.00
2021-02-02 12:32:00 donna root[2916112] WARNING No field: languages
2021-02-02 12:32:00 donna root[2916112] INFO Completed: 66.67
2021-02-02 12:32:00 donna root[2916112] WARNING No field: languages
2021-02-02 12:32:00 donna root[2916112] INFO Completed: 83.33
2021-02-02 12:32:00 donna root[2916112] WARNING No field: languages
2021-02-02 12:32:00 donna root[2916112] INFO Saving Skeleton file: dataframe_skeleton_1612258320.csv.gz
2021-02-02 12:32:00 donna root[2916112] INFO Collecting... Completed: 0.00% , 0/6
2021-02-02 12:32:00 donna root[2916667] WARNING 	Sending in request: {'optimization_goal': 'AD_RECALL_LIFT', 'targeting_spec': '{"geo_locations": {"countries": ["BR"], "location_types": ["home"]}, "age_min": 18, "age_max": null, "genders": [0], "flexible_spec": [], "publisher_platforms": ["facebook"]}', 'access_token': 'EAAM7F8Gp66MBADlruTZBfHkPXvXviZB1iPmhDK9beEjVUk1iGQKIR161ecHMZActHaC8QWxRewZBa2vqidk6qr2ZAirVcD9zLU6vUQrep6R0rZAs3yJygcRLl9n1ZAkLHfiEYZCbBbISfMGV2ZCSfXIJUggmJ0yHygo11Rv77tZBem8wZDZD'}
2021-02-02 12:32:01 donna root[2916112] INFO Collecting... Completed: 16.67% , 1/6
2021-02-02 12:32:08 donna root[2916697] WARNING 	Sending in request: {'optimization_goal': 'AD_RECALL_LIFT', 'targeting_spec': '{"geo_locations": {"countries": ["IT"], "location_types": ["home"]}, "age_min": 18, "age_max": null, "genders": [0], "flexible_spec": [], "publisher_platforms": ["facebook"]}', 'access_token': 'EAAM7F8Gp66MBADlruTZBfHkPXvXviZB1iPmhDK9beEjVUk1iGQKIR161ecHMZActHaC8QWxRewZBa2vqidk6qr2ZAirVcD9zLU6vUQrep6R0rZAs3yJygcRLl9n1ZAkLHfiEYZCbBbISfMGV2ZCSfXIJUggmJ0yHygo11Rv77tZBem8wZDZD'}
2021-02-02 12:32:09 donna root[2916112] INFO Collecting... Completed: 33.33% , 2/6
2021-02-02 12:32:17 donna root[2916796] WARNING 	Sending in request: {'optimization_goal': 'AD_RECALL_LIFT', 'targeting_spec': '{"geo_locations": {"countries": ["BR"], "location_types": ["home"]}, "age_min": 18, "age_max": null, "genders": [1], "flexible_spec": [], "publisher_platforms": ["facebook"]}', 'access_token': 'EAAM7F8Gp66MBADlruTZBfHkPXvXviZB1iPmhDK9beEjVUk1iGQKIR161ecHMZActHaC8QWxRewZBa2vqidk6qr2ZAirVcD9zLU6vUQrep6R0rZAs3yJygcRLl9n1ZAkLHfiEYZCbBbISfMGV2ZCSfXIJUggmJ0yHygo11Rv77tZBem8wZDZD'}
2021-02-02 12:32:18 donna root[2916112] INFO Collecting... Completed: 50.00% , 3/6
2021-02-02 12:32:26 donna root[2916857] WARNING 	Sending in request: {'optimization_goal': 'AD_RECALL_LIFT', 'targeting_spec': '{"geo_locations": {"countries": ["IT"], "location_types": ["home"]}, "age_min": 18, "age_max": null, "genders": [1], "flexible_spec": [], "publisher_platforms": ["facebook"]}', 'access_token': 'EAAM7F8Gp66MBADlruTZBfHkPXvXviZB1iPmhDK9beEjVUk1iGQKIR161ecHMZActHaC8QWxRewZBa2vqidk6qr2ZAirVcD9zLU6vUQrep6R0rZAs3yJygcRLl9n1ZAkLHfiEYZCbBbISfMGV2ZCSfXIJUggmJ0yHygo11Rv77tZBem8wZDZD'}
2021-02-02 12:32:27 donna root[2916112] INFO Collecting... Completed: 66.67% , 4/6
2021-02-02 12:32:34 donna root[2919116] WARNING 	Sending in request: {'optimization_goal': 'AD_RECALL_LIFT', 'targeting_spec': '{"geo_locations": {"countries": ["BR"], "location_types": ["home"]}, "age_min": 18, "age_max": null, "genders": [2], "flexible_spec": [], "publisher_platforms": ["facebook"]}', 'access_token': 'EAAM7F8Gp66MBADlruTZBfHkPXvXviZB1iPmhDK9beEjVUk1iGQKIR161ecHMZActHaC8QWxRewZBa2vqidk6qr2ZAirVcD9zLU6vUQrep6R0rZAs3yJygcRLl9n1ZAkLHfiEYZCbBbISfMGV2ZCSfXIJUggmJ0yHygo11Rv77tZBem8wZDZD'}
2021-02-02 12:32:35 donna root[2916112] INFO Collecting... Completed: 83.33% , 5/6
2021-02-02 12:32:43 donna root[2926427] WARNING 	Sending in request: {'optimization_goal': 'AD_RECALL_LIFT', 'targeting_spec': '{"geo_locations": {"countries": ["IT"], "location_types": ["home"]}, "age_min": 18, "age_max": null, "genders": [2], "flexible_spec": [], "publisher_platforms": ["facebook"]}', 'access_token': 'EAAM7F8Gp66MBADlruTZBfHkPXvXviZB1iPmhDK9beEjVUk1iGQKIR161ecHMZActHaC8QWxRewZBa2vqidk6qr2ZAirVcD9zLU6vUQrep6R0rZAs3yJygcRLl9n1ZAkLHfiEYZCbBbISfMGV2ZCSfXIJUggmJ0yHygo11Rv77tZBem8wZDZD'}
2021-02-02 12:32:43 donna root[2916112] INFO Data Collection Complete
2021-02-02 12:32:43 donna root[2916112] INFO Saving temporary file: dataframe_collecting_1612258320.csv.gz
2021-02-02 12:32:43 donna root[2916112] INFO Computing Audience and DAU column
2021-02-02 12:32:43 donna root[2916112] INFO Saving after collecting file: dummy_02-02-21.csv.gz
DONE! Next collection starting next Tuesday at 12:32pm.
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-3-c5a26eb62a42> in <module>
     16 while True:
     17     schedule.run_pending()
---> 18     time.sleep(1)

KeyboardInterrupt: 

We just will need to have this script running in background forever!