Skip to Main Content
Jump to: A B C D E F M N O P Q R S T V W
A
Acquiescence Bias, n.

A tendency to select a positive or agreeable response option rather than sharing one’s true feelings. Also known as agreement bias.

Attention Check, n.

A survey question with explicit instructions directing the respondent how to answer properly to “pass” the question. This helps to flag respondents moving inattentively throughout the survey.

Attribution, n.

Attribution analysis seeks to evaluate the success of campaigns in driving sales and highlight any influential links between the two, often down to the level of individual user purchases.

B
Benchmark, n.

An external point of reference — typical from well-respected longitudinal tracking surveys, such as ANES (American National Election Studies), GSS (General Social Survey) — or first person behavioral data from reputable organizations, like the Centers for Disease Control and Prevention or the Bureau of Transportation Statistics. Benchmarks are used to compare survey estimates for validation and accuracy.

Bias, n.

In data science, bias refers to intentional or unintentional deviations from the truth in the process of collecting, analyzing, and interpreting data. Following rigidly objective standards can help eliminate bias from research and avoid incorrect conclusions.

C
Causation, n.

Causation is the act of causing something. In data science, this is a demonstrated relationship between sets of data.

Cluster, n.

Generally, a cluster is a group of similar things positioned or occurring close together. In data science, this refers to part of a larger dataset in which all data points are closer to the “cluster center” than to the center of any other data cluster in the dataset.

Confidence Interval, n.

A confidence interval is a range of values between which a given parameter can be estimated to lie. Higher confidence levels are assigned to “wider” intervals that include more potential values.

Cross Tabulations, n.

This is a method of displaying and comparing two or more datasets on a two-dimensional grid in order to investigate possible trends or correlations between them.

Customer Data Platform (CDP), n.

A Customer Data Platform is a system that collects first-party data from multiple sources, organizes it, and combines it to create unique profiles for each customer. An advanced CDP is key for effective identity resolution.

D
Data Transformation, n.

This refers to the process of converting data between formats, often between different computer systems or databases.

Data Workbench, n.

A data workbench is a user interface for data scientists to use their preferred programming languages, databases, and software applications. It is usually hosted on a local or company-wide system.

Dropoff Rate, n.

The percentage of respondents who entered the survey but did not complete it for many reasons (e.g. technical issues, lack of interest, irrelevant survey questions, the monotony of the survey, unclear purpose of the survey, etc.).

E
ETL, v.

ETL stands for “extract, transform, and load,” and is standard practice in data integration. Data is extracted from a source, transformed into an archivable format, and then uploaded to a database.

F
First-Party Data, n.

First-party data is information collected by companies and websites directly from their users or customers. This information is often sensitive Personally Identifiable Information, or PII, and is generally collected during commercial purchases. Increasingly, first-party data is also collected from a range of online interactions.

M
Market Segmentation, n.

Sorting individuals into market segments with others who share similar traits.

MaxDiff, n.

MaxDiff, also known as “best-worst” scaling, is a mathematical theory about decision-making. It assumes that when choosing sets of pairs, people will naturally choose the pair with the most perceived difference between its parts.

Media Mix Modeling (MMM), n.

A modeling technique that uses aggregated data to measure the effects of advertising channels to determine how they contribute to an advertiser’s goals. MMM can also be used to create optimizations that help marketers (or advertisers) plan future campaigns based on past performance and resource constraints.

Multi-Touch Attribution, n.

Multi-touch attribution uses first-party data to break down, click by click, the process that brought a customer to the point of sale. Marketers can then assign credit where it is due in terms of the marketing media with which the customer came into contact, and better understand how to replicate that process.

N
Natural Language Processing, n.

Natural Language Processing, also known as NLP, is the wing of computer science devoted to developing artificial intelligences that can understand speech and text on a human level.

O
Online Non-probability Based Panel, n.

A pool of respondents who have agreed to complete surveys via the Internet, made up of volunteers who were recruited online and who often receive some form of compensation for completing surveys, such as small amounts of money, gift cards, in-game points, or frequent flyer miles1.

The volunteer or “opt-in” nature of these panels differentiates these samples from probability-based panels, which recruit their members from randomly selected samples of street addresses, email addresses, telephone numbers, etc.

Open Data, n.

Open data is public information that can be freely used, reproduced, or redistributed by anyone, and often refers to scientific, biological, geographic, or civic databases.

P
Panel Marketplace, n.

A consolidated panel provider that is composed of many different sources of panelists, with its own recruitment methods, respondent compensation standards, panel monitoring steps, and so on.

Panelist, n.

Respondent who chooses to consistently participate in online survey panels.

Paradata, n.

Data about how surveys are run and the process of collecting survey data or data sets (like click counts on each question, time taken to submit the response, or overall time spent on the survey).

Personally Identifiable Information (PII), n.

PII is sensitive personal information that can be used to identify individuals. Examples include names, addresses, ages, birthdays, payment information, and bank information, as well as IP addresses and device IDs.

Python, n.

Python is a popular programming and scripting language for general use in constructing and streamlining computer code, including applications that incorporate significant amounts of data.

Q
Quota, n.

The allocations for a set number of respondents within each subgroup (like gender, race, income, and education, or the cross-section of multiple of these subgroups) to ensure the survey is representative of the population of interest for the given research question.

Civis uses a proprietary algorithm to generate “nested quotas,” rather than the marginalized quota “buckets” used historically in survey research. By using a precise combination of many different demographic characteristics to create these quotas (for instance, four people that are in a certain age bucket AND a certain race AND a certain education level), we acquire higher precision and more accurate demographic representation in our sample, compared to traditional quota buckets which create quotas on one characteristic at a time, such as gender or race.

R
Regression, n.

Statistical Regression is the process used to determine how two variables are related. In a typical graph where the data has been represented with a scatter plot, this involves drawing a line through the most visible trend among the data points, which shows the clearest relationship between the two variables. This shows how one variable affects the other in a visual, instantly digestible way.

Response Consistency, n.

A pattern of answer behavior that is consistently demonstrated over several moments in time, either at multiple points in one survey or across multiple surveys.

S
Sample Size (n size), n.

Sample size refers to the number of subjects in a statistical sample and is used to make observations about the larger population from which the sample is drawn.

Satisfice, v.

Cognitive theory by Stanford professor Jon Krosnick states survey respondents take certain mental shortcuts to provide quick, “good enough” answers (satisficing) rather than carefully considered answers, also known as optimizing1.

Note: Respondents who “optimize” must execute four stages of cognitive processing to answer survey questions optimally. Respondents must: 

  1. Interpret the intended meaning of the question 
  2. Retrieve relevant information from memory
  3. Integrate the information into a summary judgment
  4. Map the judgment onto the response options offered
Social Desirability Bias, n.

The tendency to underreport socially undesirable attitudes and behaviors, and to overreport more desirable attributes.

Statistical Significance, n.

Statistical significance is a determination made by data analysts that the results in a dataset are not due to random chance. Results are generally considered to be statistically significant when the p-value, which represents the probability that results have been determined by random chance and not a direct relationship, is 5% or less.

Straightlining, v.

Survey behavior of ticking off the same option in a vertical line for all the questions in a matrix group of questions for expediency’s sake (e.g. choosing “somewhat agree” for every policy issue within a matrix asking about support for four different policies).

Structured Data, n.

Structured data is data that has been formatted, defined, and organized to be easily accessible and understandable to humans and computers, usually in a standardized database.

T
Third-Party Data, n.

User information from a range of sources, purchased by marketers to build more complete profiles of their customers.

Top 2 / Bottom 2 Box Score, n.

This score is a way to summarize highlights from the results of a rating scale survey, such as a 1–5 or 1–10 satisfaction survey where 1 represents extreme dissatisfaction and 5 or 10 represents extreme satisfaction. Typically, the “Top 2 score” is the percentage of survey respondents who selected one of the two most favorable rating options (4 or 5 in the 1-5 rating scale, for example). The Bottom 2 box score is the opposite percentage of respondents who selected one of the two least favorable rating options.

Toplines, n.

In data analytics, toplines are the findings from an initial data survey that are then used to establish a foundation that marketers can use to predict customer behavior, including responses to future surveys and questionnaires. Their name derives from where they are placed on a data report, and their role in kicking off a line of data analysis.

Trap Question, n.

A survey question embedded with some type of fake response, intended to catch satisficers who select an impossible or incorrect response option to the question — as a means of gauging respondent participation.

V
Variance, n.

The variance of a dataset is a measure of how much that data varies from the set’s mean, or overall average. It is calculated by squaring and then averaging the standard deviations from a dataset’s mean.

W
Weighting, n.

The post-data collection process of adjusting datasets using a core set of variables, including demographics — like sex, age, race, and ethnicity, as well as educational attainment and geographic region — to correct any remaining imbalances between the survey sample and the population1 (even after the implementation of the upfront quotas). This is the final attempt to ensure representation in the survey, so the results can be properly and reliably applied to the entire population of interest.

Our Method. Your Mission. Let’s Collaborate.

You have a unique audience to reach. Let’s make sure your message gets the engagement it deserves. That’s the heart of our method.