The Modern Data Stack: Does My Company Need One?

A data stack is a type of tech stack designed to facilitate data storage, access, and management. Many businesses are adopting a modern data stack to gather, store, transform, and analyze data. Each layer can help you reach your business goals. They enable you to gain insights from the vast amounts of data you gather during normal operations. This can, in turn, help you to be proactive in discovering growth opportunities.

The Modern Data Stack, or MDS, differs from legacy technologies. You can get a speedy start with little initial investment in the technology because they often use pay-as-you-go pricing models. Another benefit of the Modern Data Stack is that you are less likely to experience vendor lock-in. You can choose which vendor to work with for each stack piece, mixing and matching to assemble the best toolkit to suit your needs.

Where the Modern Data Stack Comes From

The paradigm shift from traditional data storage methods in on-premise data centers and other legacy data-management technologies to a more distributed modern approach with critical tools that comprise the modern data stack can appear complex and frustrating. This post aims to give you an overview of the roadmap your company would need to follow to implement modern data tooling at scale as you consider a modernized operational strategy. Perhaps more importantly, we’ll start by discussing the merits of the modern data stack to answer the question, “Does my Company need a Modern Data Stack?” We won’t keep you in suspense long because the basic answer to that question is that In most cases, you would benefit from the move to a more modular stack with granular control and centralized storage.

The precursor to this conversation is enlightening and may help you to see why the modern data stack’s evolution was inevitable and valuable. Let’s take a minute to consider the rise of the Modern Data Stack. In recent years an exciting conversation started taking place among data professionals. It sought to answer the reason behind the unbundling of services.

What Do Classified Ads Have To Do With Data Tooling?

This change parallels what we saw in online classified ads when people who once used Craigslist as a centralized place to seek services and goods in classified ads started to use more specialized businesses. In effect, modern services sprang up, endeavoring to do just one thing well, and instead of using one site for car buying, ride-sharing, dating, and finding houses to buy or rent, we now use websites and apps dedicated to each of these services. Gone were the days of the general-purpose tool for brown ads. The behavior of people using these tools reflected a shift in attitudes; smaller, more specialized, and modular services were doing a better job of meeting individuals’ needs.

Likewise, while the modern version of data tooling is far more powerful, the core user experience is very similar. If you’re familiar with Craigslist, Zillow, Uber, AirBNB and Tinder might also feel quite familiar. If you’re familiar with legacy data, DBT, Snowflake, Airflow, Meltano, and others will also feel like parts of what you have seen before.

The Great Unbundling

So, we’ve established that data services became unbundled, but just how did data tools become a suite of choices to make? There is a natural division of functions within the data landscape, and each has associated modern tools. You’ll find the ingestion layer at the base, then transformation, storage, BI, and operational analytics. Some stacks still use orchestration tools, and other businesses forego those.

The most obvious benefit is that companies can adapt their stacks to various operational needs. Stacks, by their nature, are highly customizable. Most of these tools are uniquely focused on data engineers’ and analysts’ needs. A possible downside is that engineers and teams easily toss around specifics that can confuse those unfamiliar with the concepts. Developing an understanding of your business’s data needs may seem daunting. A data consulting agency like ours can help you make the right choices for your use case.

Does this feel like Patchwork?

The Modern Data Stack is not meant to be a confusing system of interconnected tools, though on the outset, it can be. Instead, it is highly customizable and tailored to suit your organization’s unique needs. However, we’ll provide a general bird’s eye view model you can use to understand the way all these tools fit together. This assortment of tools, we refer to as the Modern Data Stack, allows you to choose what works best for you and build something customized where there was once a monolithic, one-size-fits-all all system purported to care for all your data needs.

This is where enlisting the help of a seasoned data professional can be helpful. Someone who understands the various ways a company can put data to use and can envision the best ways to divide responsibility for using different parts of the modern data stack can cut through a lot of the confusion. Teams without guidance can experience many false starts and potentially waste time and money as they proceed to assemble their stacks through trial and error.

modern-data-stack-data-sleek
Modern Data Stack Model

How the Modern Data Stack (MDS) differs from the Traditional Data Stack (TDS)

Instead of a monolithic application, consider the Modern Data Stack a layered platform. The bottom layer represents your data sources which might be applications like Salesforce and Google Analytics, databases such as Postgres, Oracle and SQL Server, and files such as spreadsheets, XML, or JSON.

Next would be the ingestion layer which extracts data from the various data sources. This is where data engineers set up automated pipelines using tools such as Fivetran, Stitch, or Segment. There is also an open source integration engine that can work in this layer, Airbyte. When you do this well, you are setting your team up to work with the freshest data available.

After that, a storage layer might include cloud data warehouses such as Snowflake and Amazon Redshift, and/or data lakes such as Databricks and Amazon S3.

In the transformation layer, you can clean raw data to facilitate subsequent analysis. You can also change the form it takes to enable use in other tools. Example tools for transformation include DBT (Data Build Tools), a SQL command-line program that allows data analysts and engineers to transform data or Matillion. Both of these are purpose-built solutions for cloud data warehouses.

The operations layer includes tools such as Apache Airflow. Apache Airflow is an open-source workflow management platform for data engineering pipelines. You could also use Atlan, which connects to your storage layer assists your data teams with providing access to internal and external data, and automates repetitive tasks.

Another layer is the analytics layer. This is where you create dashboards and visualizations with tools such as Looker, Zoho, PowerBI, Metabase, and Tableau. You’ll also see tools here for SQL query and machine learning modeling tools such as Dataiku. Some even parse out a layer called operational analytics (sometimes referred to as reverse ETL), as seen in tools like Hightouch and Census.

Data as a Service

The Modern Data Stack is also related to “data as a service”. Data as a service, or DaaS, is any cloud-based software tool for working with data. These tools are built and run as a Software as a Service or SaaS model.

Factors to Consider As You Decide

Once you’ve decided on the main components of your stack, you might want to consider if an open-source solution will work. While some companies have preferred tools that are not open source out of security concerns, our team can help you assess the merits of such a tool and intelligent implementation and even guide you through a plan to self-host if required for your business needs.

Another hot topic: data governance. The Modern Data Stack goes hand in hand with Modern Data Governance. You should understand that suitable data governance ensures that data is accessible, reliable, available, and high-quality. At the same time, it supports data security and privacy compliance. Data governance is not just nice; it has become a corporate necessity. With the advent of compliance and data privacy regulations such as GDPR and CCPA, businesses must consider this. A Modern Data Stack can help you comply with regulations more agilely.

Outcomes: What Can You Expect After Adopting a Modern Data Stack?

With a modern data stack, you can save time, effort, and money. Your organization will benefit from tooling that is faster, more scalable, and more accessible. If you want your business to transition into a data-driven organization, an MDS can help you reach your goals. We’re here to help you. Doing it right is critical for creating business solutions that solve the correct problems and don’t create more problems. To remain competitive, today’s businesses must have actionable, reliable, and up-to-date data. Our data team is ready to help you move to a Modern Data Stack.

How to Collaborate with Freelance Data Scientists and Data Engineers
Previous Post
How to Collaborate with Freelance Data Scientists and Data Engineers
Business Analytics with DBT: Self-Serve Data
Next Post
Business Analytics with DBT: Self-Serve Data