The Civis Python API Client

Stephen Hoover, Lead Data Scientist
August 2017

Civis Platform provides you with a Data Science API which gives you direct access to Civis Platform's cloud-based infrastructure, data science tools, and data. You can query large datasets, train a dozen models at once (and set them to re-train on a schedule), and create or update dashboards to show off your work. Using the Data Science API, you can write code in scripts or notebooks as if you're working on your laptop, but with all the resources of the Civis Platform.

Civis Analytics provides API clients for both Python and R. This notebook introduces you to the abstractions used in the Civis Python API Client and provides a few use examples. If you aren't running this notebook in the Civis Platform, follow the instructions in Section A.3 for setup instructions. If you aren't a Civis Platform subscriber, sign up for a free trial today!

In [1]:
print(f"Using Civis Python API Client version {civis.__version__}.")
Using Civis Python API Client version 1.6.0.

1. What's Available?

The Python API client has two kinds of functionality.

First, you can interact directly with the Civis Data Science API by using a civis.APIClient object. This translates the native REST API into Python code, so that you can pass parameters to functions rather than writing out http requests by hand. These functions all immediately return the response from Civis Platform.

The second kind of functionality is higher-level functions which make common tasks easier, such as copying a table from Redshift into a pandas.DataFrame, or training a machine learning model. You can access these functions through the civis namespace.

  • civis.io : Data input, output, and transfer, as well as SQL queries on Redshift tables
  • civis.ml : Machine learning
  • civis.parallel : Tools for doing batch computing in Civis Platform

When you start a new Civis Jupyter notebook, you already have the civis namespace imported and a civis.APIClient object named client created and ready to go!

In [2]:
# Uncomment the following two lines if you run this notebook outside of Civis Platform
#import civis
#client = civis.APIClient()

2. Data Access

You can use the functions provided in the civis.io namespace to move data in and out of Civis Platform. Here's a few examples of how that works. This notebook assumes that all of the data we'll use are in the same database, defined below. If your data aren't in the "Civis Database" database, change the following cell to use the correct name.

In [3]:
DATABASE = "Civis Database"

2.1 Reading a table from Civis

Sometimes you need to move a table from your Civis Redshift cluster into RAM so that you can manipulate it. The civis.io.read_civis function will do that for you.

This is the first example of a wrapper function, which is a special piece of code designed to do a common task (in this case, read a table from your Civis Redshift cluster and return it as a list or a pandas.DataFrame). There are a number of wrapper functions in civis.io designed to assist with getting data in and out of Civis Platform. They will make your life easier than e.g. working with the raw API endpoints or clicking through the GUI. The recommended best practice is to use wrapper functions whenever possible, rather than the client directly.

In [4]:
# First, use "?" to investigate the parameters of civis.io.read_civis
civis.io.read_civis?

Let's read out a table of data on public transit ridership in Chicago. The docstring tells us that unless use_pandas is True (default=False), the function will return a list. We want a DataFrame here, so set use_pandas to True.

In [5]:
df = civis.io.read_civis(table='public.cta_ridership_daily',
                         database=DATABASE, 
                         use_pandas=True)
print(f"The table's shape is {df.shape}.")
df.head()
The table's shape is (739054, 5).
Out[5]:
station_id stationname date daytype rides
0 40200 Randolph/Wabash 2001-01-01 U 834
1 40870 Francisco 2001-01-01 U 196
2 40060 Belmont-O'Hare 2001-01-02 W 4046
3 40730 Washington/Wells 2001-01-02 W 6788
4 41430 87th 2001-01-02 W 4577

Now that we have the table in our notebook, we can inspect it and use Python functions to modify it. Let's turn it into a table of ridership by month for each station, starting in 2010.

In [6]:
import pandas as pd

df['month'] = pd.DatetimeIndex(df['date']).to_period('M')
rides_post_2009 = df[df['month'] >= pd.Period('2010-01', 'M')]
rides_by_month = (rides_post_2009.groupby(['stationname', 'month'])[['rides']]
                  .sum()
                  .reset_index())
print(f"The grouped table's shape is {rides_by_month.shape}.")
rides_by_month.head()
The grouped table's shape is (8948, 3).
Out[6]:
stationname month rides
0 18th 2010-01 37136
1 18th 2010-02 37605
2 18th 2010-03 42990
3 18th 2010-04 41964
4 18th 2010-05 40943

2.2 Writing tables to Civis

Now that we have our modified data, let's put it back into your Redshift cluster. Use the function civis.io.dataframe_to_civis to do the upload. We'll put it into a table in the "scratch" schema. That's a customary location for tables we don't intend to keep around for long.

In [7]:
rbm_tablename = 'scratch.rides_by_month'
fut = civis.io.dataframe_to_civis(
    df=rides_by_month,
    database=DATABASE,
    table=rbm_tablename,
    distkey='month',
    sortkey1='month',
    existing_table_rows='drop',
)  # This is non-blocking
print(fut)
<CivisFuture at 0x7f105af525f8 state=running>
In [8]:
fut.result()  # This blocks (warning: can take a few minutes to run)
Out[8]:
{'error': None,
 'finished_at': '2017-08-29T14:39:59.000Z',
 'id': 58250695,
 'import_id': 7072332,
 'is_cancel_requested': False,
 'started_at': '2017-08-29T14:39:51.000Z',
 'state': 'succeeded'}

2.3 What is the CivisFuture?

Notice that, although the civis.io.read_civis function waited until your download was done to finish executing, the civis.io.dataframe_to_civis function returned immediately, even though Civis Platform hadn't finished creating your table. When working with the client, you will often need to start jobs that will take some time to complete. To deal with this, the Civis client includes CivisFuture objects, which allow you to process multiple long running jobs simultaneously.

The CivisFuture object is a subclass of the standard library concurrent.futures.Future object and tracks a Civis Platform run. This abstraction allows you to start multiple jobs at once, rather than wait for one to finish before starting the other. You can keep working while your table creation happens, and only stop to wait (by calling CivisFuture.result() or concurrent.futures.wait) once you reach a step which relies on your run having finished.

Find more information on CivisFuture in the User Guide: http://civis-python.readthedocs.io/en/latest/user_guide.html#civis-futures

2.4 Executing a SQL query

You can also use functions in the civis.io namespace to run SQL in Civis Platform as if you were working with Query. You can use this same method in your scripts to create or drop tables, assign permissions, or do anything else you would want to do in Query.

Let's use a Query to pull out the July 2016 traffic at one of the stations in downtown Chicago. Here we're immediately asking for the result of the query by calling .result() on the returned CivisFuture.

In [9]:
station_name = "Washington/Wells"
month = "2015-03"
result = civis.io.query_civis(database=DATABASE,
                              sql=(f"SELECT rides FROM {rbm_tablename} "
                                   f"WHERE stationname = '{station_name}' " 
                                   f"and month = '{month}'"),
                             ).result()
print(f"The {station_name} station had {result['result_rows'][0][0]} riders in {month}.")
The Washington/Wells station had 181116 riders in 2015-03.

Now let's clean up that scratch table. We don't need to wait for Civis Platform to finish, so this time we won't block on the output of civis.io.query_civis. Civis Platform will keep running the table action as we move to the next cells of this notebook.

In [10]:
fut_drop = civis.io.query_civis(database=DATABASE, 
                                sql=f"DROP TABLE IF EXISTS {rbm_tablename}")

2.5 Writing and reading files

You can store arbitrary files in Civis Platform by using civis.io.file_to_civis to store and civis.io.civis_to_file to retrieve data. Let's grab the current status of Chicago's bike share network and store the data in a Civis File.

In [11]:
import requests

divvy_api = 'https://feeds.divvybikes.com/stations/stations.json'
bikes = requests.get(divvy_api).json()
print(f"Downloaded data on {len(bikes['stationBeanList'])} stations.")
Downloaded data on 584 stations.

Upload your data by sending it as an open file object to civis.io.file_to_civis.

In [12]:
import io
import json

buf = io.TextIOWrapper(io.BytesIO())  # `json` writes text
json.dump(bikes, buf)
buf.seek(0)
bike_file_id = civis.io.file_to_civis(buf.buffer, 'Divvy status')
print(f"File uploaded to file number {bike_file_id}.")
File uploaded to file number 6008066.

Then you can use that file ID to download the file into a new buffer.

In [13]:
buf_down = io.TextIOWrapper(io.BytesIO())
civis.io.civis_to_file(bike_file_id, buf_down.buffer)
buf_down.seek(0)
bikes_down = json.load(buf_down)
In [14]:
bikes == bikes_down
Out[14]:
True

Because retrieving JSON from a Civis File is such a common occurence, there's a simpler function for files you know are formatted in JSON: civis.io.file_to_json. Similarly, if you know that a file is a CSV, you could use civis.io.file_to_dataframe to access it as a pandas.DataFrame.

In [15]:
bikes_again = civis.io.file_to_json(bike_file_id)
print("The file I stored in Civis has data on "
      f"{len(bikes_again['stationBeanList'])} stations.")
The file I stored in Civis has data on 584 stations.

2.6 Other useful I/O functions

The following functions handle moving structured data to and from Civis:

  • civis_to_csv(filename, sql, database[, ...]) Export data from Civis to a local CSV file.
  • csv_to_civis(filename, database, table[, ...]) Upload the contents of a local CSV file to Civis.
  • dataframe_to_civis(df, database, table[, ...]) Upload a pandas.DataFrame into a Civis table.
  • read_civis(table, database[, columns, ...]) Read data from a Civis table.
  • read_civis_sql(sql, database[, use_pandas, ...]) Read data from Civis using a custom SQL string.

3. Machine Learning

In this section, we will walk through how to build a model using CivisML, a Civis Platform feature with a high-level interface in the Civis API client.

You can use CivisML to leverage Civis Platform's infrastructure to do predictive modeling. CivisML is built on scikit-learn, so you have lots of flexibility to define your own modeling algorithms. Check out the official documentation for more information, or read the example on our blog.

3.1 Training your model

To use CivisML, start by constructing a civis.ml.ModelPipeline object. The ModelPipeline defines the algorithm you want to use, as well as the name of the dependent variable. You can then call the train and predict methods to learn from your data or to make new predictions.

Let's use the API client to help us predict which customers are most likely to upgrade to a premium service, using the demo "Brandable" dataset. We can quickly start three different models training by looping over the parameters we want for each.

For this example, we're using Civis's pre-defined algorithms, but if those don't fit your problem, you can create your own algorithms to use.

In [16]:
# Define the algorithms and model parameters to use
MODELS = ['sparse_logistic', 'random_forest_classifier', 'extra_trees_classifier']
DV = 'upgrade'  # Column name in the training table
PKEY = 'brandable_user_id'  # Column name in the training table
EXCLUDE = ['residential_zip']  # Don't train on these columns, if present
training_table = 'brandable_upgrades.brandable_training_data'

Create the training set by joining the Brandable upgrade labels to the customer data.

In [17]:
sql = f"""DROP TABLE IF EXISTS {training_table};
CREATE TABLE {training_table} AS 
(SELECT u.*, p.upgrade FROM brandable_customers.brandable_all_users u 
JOIN brandable_customers.brandable_pilot p 
ON p.brandable_user_id = u.brandable_user_id)"""
civis.io.query_civis(database=DATABASE, sql=sql).result().state
Out[17]:
'succeeded'
In [18]:
from civis.ml import ModelPipeline  

models = {}
for m in MODELS:
    name = f'"{m}" model for {DV}'
    model = ModelPipeline(model=m,
                          dependent_variable=DV,
                          primary_key=PKEY,
                          excluded_columns=EXCLUDE,
                          model_name=name)

    train = model.train(table_name=training_table, database_name=DATABASE)
    models[train] = model
    print(f'Started training the "{name}" model.')
Created custom script 7072357.
Started training the ""sparse_logistic" model for upgrade" model.
Created custom script 7072358.
Started training the ""random_forest_classifier" model for upgrade" model.
Created custom script 7072360.
Started training the ""extra_trees_classifier" model for upgrade" model.

CivisML automatically evaluates the predictive performance of each model using several standard metrics. Now that we've started some models training, we'll check the area under the ROC curve of each model as it finishes training. Once all of the models finish training, we'll pull out the best of them.

In [19]:
from concurrent.futures import as_completed
aucs = {}
for train in as_completed(models):
    if train.succeeded():
        print(f"Model# {train.train_job_id} on DV "
              f"\"{train.metadata['data']['target_columns'][0]}\" "
              f'("{models[train].model_name}") '
              f"has a ROC AUC of {round(train.metrics['roc_auc'], 3)}.")
        aucs[train.metrics['roc_auc']] = train
best_model = models[aucs[max(aucs)]]
print(f"The \"{best_model.model_name}\" model has the best ROC AUC.")
Model# 7072358 on DV "upgrade" (""random_forest_classifier" model for upgrade") has a ROC AUC of 0.809.
Model# 7072357 on DV "upgrade" (""sparse_logistic" model for upgrade") has a ROC AUC of 0.846.
Model# 7072360 on DV "upgrade" (""extra_trees_classifier" model for upgrade") has a ROC AUC of 0.784.
The ""sparse_logistic" model for upgrade" model has the best ROC AUC.

3.2 Making predictions

Once you've trained a model, you can use it to make predictions. CivisML will automatically parallelize predictions when you have a large dataset, so no matter how big the dataset, you won't need to wait too long. Let's use the best model we found from the previous step to make predictions about which users are most likely to upgrade in the future.

In [20]:
score_table = 'scratch.my_scores_table'
predict = best_model.predict(table_name='brandable_customers.brandable_all_users', 
                             database_name=DATABASE)
Created custom script 7072405.

If you wanted to store the predictions in a Redshift table, you could have provided an output_table parameter. Since this is a relatively small dataset, it's faster to skip the table write and pull down the predictions directly. Let's find the 5% of users who are most likely to upgrade.

In [21]:
predict.table.head()
Out[21]:
upgrade_1
brandable_user_id
000093b8981b93a 0.277570
00056ee5e2b4e58 0.130607
0006ec438f8bc4f 0.279233
000951546d5fa58 0.155526
000cb76061daad7 0.283429
In [22]:
n_users = len(predict.table)
most_likely = (predict.table
               .sort_values(by="upgrade_1", ascending=False))[:int(0.05 * n_users)]
print(f'The most likely {len(most_likely)} of {len(predict.table)} users to upgrade '
      f'have scores ranging from {most_likely.iloc[-1, 0]} to {most_likely.iloc[0, 0]}.')
The most likely 4746 of 94920 users to upgrade have scores ranging from 0.8303479210953384 to 1.0.

4. Direct API Access

You can inspect the client object and read documentation about individual functions just as you would with any other Python code. For example, you can tab-complete after typing "client." to get a list of API "endpoints", and further tab-complete from "client.users." to find a list of API calls related to users. Here's the way you can ask Civis Platform who it thinks you are:

In [23]:
client.users.list_me?
In [24]:
client.users.list_me()
Out[24]:
{'created_at': '2015-05-07T13:27:36.000Z',
 'custom_branding': None,
 'email': 'civistestuser3@gmail.com',
 'feature_flags': {'civis_explore_insights': True,
  'cmo_multitarget': True,
  'container_scripts': True,
  'notebook_api': True,
  'notebook_r_kernel': True,
  'notebook_ui': True,
  'paro_frontend': True,
  'paro_modeling_wizard': True,
  'python_3_scripts': True,
  'r_scripts': True,
  'report_templates': True,
  'script_params': True,
  'table_create_statement': True,
  'table_person_matching': True},
 'groups': [{'id': 10, 'name': 'Demo', 'organization_id': 13},
  {'id': 365, 'name': 'Credentials Test', 'organization_id': 2}],
 'id': 923,
 'initials': 'JS',
 'last_checked_announcements': '2017-08-23T21:48:32.000Z',
 'name': 'Jane Smith',
 'organization_name': 'demo',
 'preferences': {'civis_explore_skip_intro': False,
  'data_pane_collapsed': 'false',
  'data_pane_width': '235',
  'enhancement_index_author_filter': '1491',
  'enhancement_index_order_dir': 'desc',
  'enhancement_index_order_field': 'created_at',
  'export_index_author_filter': '1441',
  'export_index_order_dir': 'asc',
  'export_index_order_field': 'created_at',
  'export_index_status_filter': 'succeeded',
  'import_index_author_filter': '923',
  'import_index_order_dir': 'desc',
  'import_index_order_field': 'created_at',
  'import_index_type_filter': 'GdocImport',
  'model_index_order_dir': 'desc',
  'model_index_order_field': 'updated_at',
  'model_index_thumbnail_view': 'false',
  'notebook_order_dir': 'desc',
  'notebook_order_field': 'created_at',
  'preferred_server_id': 107,
  'project_detail_order_dir': 'asc',
  'project_detail_order_field': 'name',
  'project_index_order_dir': 'asc',
  'project_index_order_field': 'name',
  'report_index_thumbnail_view': 'true',
  'result_index_order_dir': 'desc',
  'result_index_order_field': 'created_at',
  'script_index_order_dir': 'desc',
  'script_index_order_field': 'last_run.updated_at',
  'upgrade_requested': '2017-02-22T21:32:57.649Z',
  'welcome_order_dir': 'desc',
  'welcome_order_field': 'created_at',
  'welcome_status_filter': 'failed,running,scheduled,succeeded'},
 'roles': ['cua', 'sdm'],
 'sign_in_count': 37,
 'username': 'jsmith'}

4.1 Tables

Next, let's list the tables available in a single schema. The Civis Data Science API often uses unique IDs instead of names, and the APIClient gives you convenience functions to look up those IDs if you know the name. In this case, we need to know the database ID of our database, rather than the name.

In [25]:
db_id = client.get_database_id(DATABASE)
my_tables = client.tables.list(database_id=db_id, schema='public')

# Print all tables in the schema
for tt in my_tables:
    if tt['name'].startswith('cta'):
        print(tt['name'])
cta_count
cta_count_test
cta_ridership_daily
cta_ridership_daily_pasttwoyears

Now let's use the API to look up some information about the CTA daily ridership table. my_tables is a list of API responses. Because searching through lists like this is common, the Civis Python API client provides helper functions (civis.find and civis.find_one) which will locate the entry or entries you're interested in. Let's find the ID of the "cta_ridership_daily" table and use that to look up the names and types of each of the columns.

In [26]:
cta_table = civis.find_one(my_tables, name='cta_ridership_daily')
tb_info = client.tables.get(cta_table.id)
col_types = {c.name: c.sql_type for c in tb_info.columns}
print(col_types)
{'station_id': 'integer', 'stationname': 'character varying(1024)', 'date': 'date', 'daytype': 'character varying(1024)', 'rides': 'integer'}

4.2 Paginated responses

Some endpoints may contain a lot of data which Civis Platform will only serve over multiple requests. For example, client.tables.list() will only return information on a maximum of up to 1000 tables in a single call (the default is 50). Therefore, if we need to collect data on 4000 different tables, we'll need to make at least 4 seperate requests to get all of the data. (Use the page_num argument to select additional "pages" of data.) To make this easier, the client includes a special iterator parameter on endpoints which may require making multiple requests to get all of the data. These requests could require making a large number of API calls, so use iterator=True sparingly!

Let's pretend that the "public" schema has more tables than we want to list at once and iterate through it to find all of the tables with 5 columns.

In [27]:
# Traditional method for listing tables 
# (set to list a max of 3 different tables)
# This returns multiple tables at the same time.
# Increase the "page_num" to see more tables.
my_three_tables = client.tables.list(database_id=db_id, schema='public',
                                     limit=3, page_num=1)

# Iterating request (will return all available tables, may take some time to run)
# When iterator is set to True, the function yields a single table at a time.
tb_iter = client.tables.list(database_id=db_id, schema='public', iterator=True)
five_col_tbs = [t for t in tb_iter if t['column_count'] == 5]
print(f"Tables with five columns: {[t['name'] for t in five_col_tbs]}.")
Tables with five columns: ['cta_ridership_daily', 'cta_ridership_daily_pasttwoyears', 'iris', 'testimport', 'upgrade_likelihood'].

4.3 The API Response

Every time you communicate with the Civis Data Science API, you get a response. In fact, it's a civis.response.Response object. Its contents be accessed either like a dictionary or as normal attributes. The Response always comes back immediately, even if it's to acknowledge that you've started something that will take a long time to finish. It will contain either the information you've asked for or an acknowledgement of the action you took. Here's an example of the Response when we ask for the status of the best model we built in section 3.

In [28]:
client.scripts.get_containers_runs(best_model.train_result_.job_id, 
                                   best_model.train_result_.run_id)
Out[28]:
{'container_id': 7072357,
 'error': None,
 'finished_at': '2017-08-29T14:44:11.000Z',
 'id': 58250804,
 'is_cancel_requested': False,
 'started_at': '2017-08-29T14:42:45.000Z',
 'state': 'succeeded'}

5. Build something new

The most flexible way to interact with Civis Platform is by writing your own code and using Civis Platform to run it. For example, you could imagine wanting to write a program that counts from 1 to 100 and replaces every number that's evenly divisible by 3 with "fizz", any number divisible by 5 with "buzz", and numbers divisible by both 3 and 5 with "fizzbuzz". There's no Data Science API function that implements FizzBuzz, so you would need to write that yourself, but you can use Civis Platform to schedule it, share it, and run it in the cloud while you free up your laptop for other purposes. Container Scripts are our general-purpose solution for taking any code and running it in Civis Platform.

Container Scripts become really powerful when you pair the flexibility of bring-your-own-code with the power of the Data Science API. One of our favorite design patterns is writing code that calls the Data Science API as part of a more customized workflow. For example, we might use the Data Science API to pull a table into a pandas dataframe, write special-purpose pandas code for manipulating the dataframe, use the Data Science API again to build a model, write more code to analyze the results of the model, and finally publish those analysis results as a report in Civis Platform. The most sophisticated data science code we write is delivered and shared via Container Scripts because of how easy it is to write software in Python or R (or, really, any language) calling API functions for accessing Civis Platform.

5.1 Creating and running Container Scripts

Let's take our earlier of example of checking the status of the Chicago bike share system and package it into a script which we can schedule to run regularly. Here we're writing our task as a function and using cloudpickle, an open-source Python library which can pickle dynamically-defined functions, to send it to Civis Platform. You could also write this code as a text file and run it as a script.

In [29]:
import cloudpickle
import io
import json
import os
import requests

def get_bike_status(api_url=divvy_api):
    bikes = requests.get(api_url).json()
    
    buf = io.TextIOWrapper(io.BytesIO())  # `json` writes text
    json.dump(bikes, buf)
    buf.seek(0)
    bike_file_id = civis.io.file_to_civis(buf.buffer, 'Divvy status')
    print(f"Stored Divvy station data at {bike_file_id}.")

    client = civis.APIClient()
    job_id = os.environ["CIVIS_JOB_ID"]
    run_id = os.environ["CIVIS_RUN_ID"]
    client.scripts.post_containers_runs_outputs(job_id, run_id, "File", bike_file_id)

code_file_id = civis.io.file_to_civis(
    io.BytesIO(cloudpickle.dumps(get_bike_status)), 'Divvy script')
print(f"Uploaded Divvy function to file {code_file_id}.")
Uploaded Divvy function to file 6008167.

Now that we've uploaded the function, we tell Civis Platform to run it. A Container Script consists of an environment (the "Container", which is a Docker container) and a bash command to run inside that container. Civis provides some general-purpose Docker images, or you can use any public Docker image. Here we're using "datascience-python". Note that we're using a specific image tag, rather than the default "latest". It's a good practice to set an image tag. The "latest" tag will change with new releases, and that could unexpectedly cause a job which used to work to start failing.

In this example, I'm storing my code in a Civis Platform file, but Container Scripts can also access code which you've stored in GitHub. A file is great for small, quick examples like this, but GitHub is a better way to handle larger or production code. Version control is your friend!

Like many operations with the Civis Data Science API, running a Container Script is two steps -- first, you create the job (with client.scripts.post_containers). Second, you tell Civis Platform to start running the job. You can use the client.scripts.post_containers_runs to start a run (this will return a Response), or you can use the convenience function civis.utils.run_job to start a run. If you use civis.utils.run_job, you'll get back a CivisFuture, which is a convenient way to track when your run has finished.

In [30]:
from concurrent.futures import wait

cmd = f"""civis files download {code_file_id} myscript.pkl; 
python -c "import cloudpickle; cloudpickle.load(open(\\\"myscript.pkl\\\", \\\"rb\\\"))()" """
container_job = client.scripts.post_containers(
    required_resources = {"cpu": 256, "memory": 512, "diskSpace": 2},
    name="Divvy download script",
    docker_command = cmd,
    docker_image_name = "civisanalytics/datascience-python", 
    docker_image_tag = "3.1.0")
run = civis.utils.run_job(container_job.id)
wait([run])
Out[30]:
DoneAndNotDoneFutures(done={<CivisFuture at 0x7f104051aba8 state=succeeded returned Response>}, not_done=set())

We've stored the bike station data as a JSON in Civis Platform, and set a "run output" on the script which read the data. Run outputs are a way for you to transfer data from one job to another. You can inspect this job to find its run outputs, and use the file ID you find there to retrieve the data about the Chicago bike sharing network.

In [31]:
remote_output_file_id = client.scripts.list_containers_runs_outputs(
    container_job.id, run.poller_args[1])[0].object_id
print(f"Bike data are stored at file# {remote_output_file_id}.")
Bike data are stored at file# 6008168.
In [32]:
import pprint

station_data = civis.io.file_to_json(remote_output_file_id)
for station in station_data['stationBeanList']:
    if station['stationName'] == 'Franklin St & Monroe St':
        pprint.pprint(station)
{'altitude': '',
 'availableBikes': 19,
 'availableDocks': 8,
 'city': 'Chicago',
 'id': 287,
 'is_renting': True,
 'landMark': '057',
 'lastCommunicationTime': '2017-08-29 09:44:05',
 'latitude': 41.880317,
 'location': '',
 'longitude': -87.635185,
 'postalCode': '60606',
 'stAddress1': 'Franklin St & Monroe St',
 'stAddress2': '',
 'stationName': 'Franklin St & Monroe St',
 'status': 'IN_SERVICE',
 'statusKey': 1,
 'statusValue': 'In Service',
 'testStation': False,
 'totalDocks': 27}

5.2 Custom Scripts

Remember that prediction we made about which customers are likely to upgrade? We didn't store it in a table at the time. What if we change our minds? We could download it in this notebook and then use civis.io.dataframe_to_civis to make a new table. (Most of the time this will be the right thing to do.) However, we could also use the "Import from URL" Template to create a Custom Script which will do that for us.

If you (or one of your colleagues) has created an especially useful Container Script which you'll want to run over and over, you can turn it into a Template. Once you have access to a templated script (Civis provides a few that we've found useful), you can run it for yourself by creating a "Custom Script". The Custom Script lets you modify a few parameters and then run the code that your colleague wrote.

If you know the template ID of a template script, you can use client.scripts.post_custom to create a new job. As with the Container Script, we'll use civis.utils.run_job to start the run so that we get back a CivisFuture.

In [33]:
prediction_tablename = 'scratch.brandable_predictions'
template_id = civis.find_one(client.templates.list_scripts(limit=1000), 
                             name='Import from URL').id
url = client.files.get(predict.metadata['output_file_ids'][0])['file_url']
upgrade_prediction_import = client.scripts.post_custom(
    from_template_id=template_id,
    name="Import Brandable Predictions",
    arguments={'URL': url,
               'TABLE_NAME': prediction_tablename,
               'IF_EXISTS': 'drop',
               'DATABASE_NAME': DATABASE})
import_fut = civis.utils.run_job(upgrade_prediction_import.id)
In [34]:
import_fut.result()
Out[34]:
{'created_at': '2017-08-29T14:46:30.000Z',
 'error': None,
 'finished_at': '2017-08-29T14:47:02.000Z',
 'id': 58251036,
 'started_at': '2017-08-29T14:46:31.000Z',
 'state': 'succeeded'}

Finally, let's keep the database tidy and delete this table.

In [35]:
civis.io.query_civis(database=DATABASE, sql=f"DROP TABLE IF EXISTS {prediction_tablename}")
Out[35]:
<CivisFuture at 0x7f10404a4c88 state=running>

6. A Data Science API

This has been a whirlwind tour of the Civis Data Science API. Civis Platform has a lot more features than what we've covered here, such as sharing, enhancements, reports, and more. This tour gives you what you need to get started. Use the API client documentation or the API documentation to get a complete picture of everything the API can do, and contact support@civisanalytics.com if you run into trouble. The Civis Data Science API is a powerful toolbox that you can use to build, scale, and deploy your data science workflows!

Appendix

These sections will give you extra context on what's going on behind the scenes with the Civis API client.

A.1 What is an API client?

API: Application Programming Interface

  • A set of tools for accessing Civis Platform functionality. An API is an official way for two pieces of code to talk to each other
  • Civis Platform itself works by issuing API calls, which are based on HTTP
  • But HTTP calls are unwieldy, so the API clients provide create a more streamlined way of making these requests
  • The API clients can be run interactively or in a script

There are Civis API clients in Python and R.

Everything you can do with an API client is supported by a Civis Data Science API Endpoint. You can find complete documentation on these endpoints here: https://api.civisanalytics.com

RESTful API conventions

The Civis Data Science API is "RESTful". That means it adheres to a set of conventions about the components of the API and their relationships. The world wide web uses REST conventions.

The API understands some basic HTTP "verbs":

  • GET → Retrieve information on objects or members [get, list]
  • POST → Create a new item or entry in an item [create]
  • PUT → Replace something [update]
  • DELETE → Delete [delete]

HTTP Status Codes

When you send a request to an API, it will give you a status code. Common codes include:

  • 100-level codes: Informational
  • 200-level codes: Success
    • 200 OK
  • 300-level codes: Redirection
  • 400-level codes: Client error
    • 400 Bad request
    • 401 Unauthorized (authentication failed)
    • 403 Forbidden (similar to 401)
    • 404 File not found
    • 408 Request timeout
    • 409 Conflict in the request, such as an edit conflict
    • 429 Too many requests: You need to wait before you can use the API again
  • 500-level codes: Server error
    • 500 Internal server error

For example, you might see this error if you try to call a list endpoint with page_num=0:

CivisAPIError: (400) invalid 'page_num' -1 - must be an integer greater than zero

The API client has translated the API's reply into a Python exception. The Response object for that error is:

{'code': 400, 
 'error': 'invalid', 
 'errorDescription': "invalid 'page_num' 0 - must be an integer greater than zero"}

A.2 Rate limits and retries

If you query the Civis Data Science API too frequently, Civis Platform may return a 429 error response, indicating that you need to wait a while before you can make another request. The Python API client will automatically wait and resend your request when your rate limit refreshes, so there's nothing for you to do. Be aware that too many requests too fast will make your code wait for a while.

Currently, the rate limit is 1000 requests per 5 minutes. You can check your rate limit by looking at the Response.headers['X-RateLimit-Limit'] on any Response object that you get back. You can check out the Response.calls_remaining if you're curious how many API calls you have left before you get a time out.

The Python API client will automatically retry on certain 500-level errors as well. That will give your code extra reliability when using the API client over the raw API. The full list of HTTP status codes which the client will retry are:

In [36]:
civis.civis.RETRY_CODES
Out[36]:
[429, 502, 503, 504]

A.3 Using the Python API client outside of Civis

You can also install the Civis Python API client on your own computer. The API client is available on PyPI; install it with

pip install civis

Once you have the API client installed, you can create the APIClient object:

import civis
client = civis.APIClient()

Setting up an API Key

To make requests to the Civis Data Science API, you will need an API key that is unique to you. To make an API key, navigate to your platform profile page and create a new key using the panel on the lower right hand side. Set your key to expire after 30 days, which is the maximum currently available.

By default, the Python API client will look for your key in the environment variable CIVIS_API_KEY. To add the API key to your environment, copy the key you generated to your clipboard and follow the instructions below.

Keep your API keys safe! Don’t check the key into GitHub or share in plaintext. If you do, immediately cancel the API key from the user profile page and contact support@civisanalytics.com .

You can add your API keys to your bash profile files (OSX or Linux) using a text editor of your choice. An example is included below in emacs:

emacs .bash_profile

Make a new line and enter this:

export CIVIS_API_KEY='yourkeyhere'

To save, type: Control-x Control-s.

Then run:

source .bash_profile

A.4 Where can I go from here?

If you want to learn even more about the Python API client, you can find all of the code on our GitHub page. File feature requests or bug reports either as GitHub issues or with your Client Success representative at support@civisanalytics.com .

For a deeper dive into using CivisML through the Python API client, check out our examples!