Data Science Archives | Civis Analytics

Category Data Science

Counting Turkeys on Twitter with Apache Spark

Counting Turkeys on Twitter with Apache Spark

by Michael H.

At Civis, we are continuing to work more and more with Apache Spark, a tool which is almost synonymous with #BigData these days. With Thanksgiving coming up, let’s use Spark to analyze some tweets about Thanksgiving from last year (2016). Goals of this post: Do some fun and interesting analyses of Twitter data in time for Turkey Day. Provide an...

Read More 

Multiple languages, one team: Bringing R and Python together with Civis Platform

by Michael H.

What follows is a brief example of how the Civis Platform enables data scientists to collaborate effectively. Data scientists work in unique contexts and use different programming languages, even within the same organization, making collaboration a constant challenge. Consider two data scientists, whom I’ll call Alice and Bob. Alice and Bob need to work together on a project with a...

Read More 

Open Sourcing the Civis Data Science API Client for R

Open Sourcing the Civis Data Science API Client for R

by Keith I.

It’s frustrating to reinvent the wheel just to do basic data science. We’ve experienced that here at Civis, which is why we’re always automating these tasks by adding them to the Civis Data Science API. By coding with the Data Science API, you can take advantage of everything Civis Platform can do, including our favorite data science workflows, like CivisML...

Read More 

Fairness in Data Science

by Henry H.

Here at Civis, we build a lot of models. Most of the time we’re modeling people and their behavior because that’s what we’re particularly good at, but we’re hardly the only ones doing this — as we enter the age of “big data” more and more industries are applying machine learning techniques to drive person-level decision-making. This comes with exciting...

Read More 

More Data More Problems: Variable Selection with Multiple Response Variables

by Civis Analytics

More data isn’t always better! This post will go over why and how we removed uninformative variables from a modeling dataset using a custom-built neural network architecture along with cross-checks using more traditional supervised learning algorithms. The end result is a better curated dataset for our model-building process. The Problem This is kind of weird, right? All you hear about...

Read More 

CivisML: Scikit-Learn at Scale

CivisML: Scikit-Learn at Scale

by Stephen H.

Late last year, my colleagues on the Social Science team were working on a new survey weighting scheme that would greatly improve the precision of our public opinion data. To make it work, they needed to fit dozens of models for each completed survey. Each survey asks multiple questions, each of which would need to be modeled individually, using an...

Read More