Imagine a champagne tower. You pour champagne into the top glass until it fills up, then the champagne trickles down and fills the glasses below. The process doesn’t stop until…
Most organizations are awash in data. They spend significant amounts of time and money attempting to harness and use vast amounts of information, yet many still find it nearly impossible to identify what data is actually important, what’s missing, and what connects to what (and how).
Data management is critical for membership organizations — nonprofits in particular, but also industry associations, unions, and others — as they look to remain competitive in a modern marketplace where success increasingly depends on data-driven decision making. They need to be able to facilitate the orderly, secure, and intelligent use of their data, and proper storage is the key.
But for some, fully wrapping their heads around what data storage even means — much less understanding which of the many available options is right for them — can make the solution seem just as complex as the problem.
It doesn’t have to be. This article offers a breakdown of the different types of data storage available, helping member-driven organizations decide which is right for them.
Data is typically characterized in one of two ways: either raw (unstructured) or processed (structured). Raw data has not been processed, while processed data has been formatted for a specific purpose.
A list of volunteers would be classified as raw data. So would survey results. But if those results were paired with the names of individual volunteers who completed the survey, and if those volunteers were then matched to giving histories and collated into a spreadsheet to determine whether donors responded differently than non-donors — well, that would be characterized as processed data.
Just as there are different types of data, there are different types of storage options better suited to each:
Data lakes house raw data. A data lake (a low-cost repository for both structured and unstructured data, and for organizing large volumes of diverse data from an array of sources) allows giving histories, demographic information, email engagement data, transaction history, and event RSVP lists to all float together, even if there’s no specific purpose or purported connection.
There are many reasons why an organization might pick a data lake. It may be contemplating a future use for its data, or may simply wish to ensure certain information is captured and stored somewhere. While this option typically requires an incredibly large storage capacity, the data remains flexible, and can be used and reused as often as the organization wishes.
Further, because the data isn’t structured, a data lake is well-suited to machine learning and AI pattern recognition, which can quickly transform vast amounts of seemingly disconnected data into discernible relationships. A bar association, for example, would be able to store social media campaigns directed at lawyers who are members or streaming data from any continuing legal education seminars in a single data lake to support real-time analysis.
Pros: Flexibility and cost-effectiveness
Cons: Complicated to navigate; usually best accessed by highly skilled data scientists with specialized tools
Data warehouses organize processed data. If a data lake is waves of data crashing onto the shore, then a data warehouse — a digital storage system that connects large volumes of data to power business intelligence, reporting, and analytics — is a shelving system with aisles of data in precise formation.
Data warehouses store only data that will be used, thus requiring far less storage capacity than data lakes. Data in a warehouse has been extracted from multiple sources and then cleaned, filtered, organized, and arranged for a specific purpose, (i.e., turned into processed data to enable easy reporting, querying, and other analyses).
Data warehouses capture data across an entire organization, rather than being limited to one segment. They are particularly well-suited for users like nonprofits, which have data spanning multiple functional units, including their programmatic work, major gifts, direct mail, email, and advertising programs.
Pros: Speed; enhanced business intelligence; data organized into a standard format
Cons: Once the warehouse has been created, it is difficult to shift from that initial set-up
Data lakehouses combine signature aspects of data lakes and data warehouses together in one place.
A data lakehouse pairs the structure and data management features of a data warehouse with the low-cost storage and flexibility of a data lake. It provides a single place to store all of an organization’s data — unstructured, structured, and semi-structured — and to use both machine learning/AI and available business intelligence tools.
Pros: Reduced data redundancy; supports direct access via business intelligence tools; applies data governance rules; can effectively contain costs
Cons: Still relatively new, far less mature than other options
Not only can data lakes and warehouses coexist, but their complementary features actually form two parts of a truly strategic whole for IT. Integrating both is increasingly central to the development of a cohesive data-storage strategy — one where technology is more manageable, scaling is easier, and things can operate without the need for servers.
We’re now seeing an increasing blurring of the line between storage (lakes) and computation (warehouses). Rather than creating distinct differences between where and in what format data is stored, end users have the ability to access data regardless of the infrastructure: any authorized stakeholder using any tool can analyze and process data that is critical to the organization’s continued growth and success.
The first and most critical step for a membership-driven organization is to choose the data storage strategy that’s best for its specific needs. Once it does, a world of possibility opens. The correct approach will allow them to get a better sense of what data is truly important, what data is missing, and how seemingly disorganized data can actually be reconfigured and connected in order to measure outcomes, improve performance, and enable the data-informed decision making that is so important to their ultimate goal: furthering mission and purpose.