The Modern Data Stack: Does My Company Need One?

A data stack is a type of tech stack designed to facilitate the storage, access, and management of data. Many businesses are adopting a modern data stack to gather, store, transform, and analyze data. Each layer can help you reach your business goals. They enable you to gain insights from the vast amounts of data that you gather during the course of normal operations. This can, in turn, help you to be proactive in discovering opportunities for growth.

The Modern Data Stack, or MDS, differs from legacy technologies. You can get a speedy start with little initial investment in the technology because they often use pay-as-you-go pricing models. An additional benefit of the Modern Data Stack is that you are less likely to experience vendor lock-in. You can choose which vendor to work with for each piece of the stack, mixing and matching to assemble the best toolkit to suit your needs.

Where the Modern Data Stack Comes From

The paradigm shift from traditional methods of data storage in on-premise data centers and other legacy data-management technologies to a more distributed modern approach with key tools that comprise the modern data stack can appear complex and frustrating. This post aims to give you an overview of the roadmap your company would need to follow to implement modern data tooling at scale as you consider a modernized operational strategy. Perhaps more importantly, we’ll start by discussing the merits of the modern data stack to answer the question “Does my Company need a Modern Data Stack?” We won’t keep you in suspense long, because the basic answer to that question is that In most cases, you would benefit from the move to a more modular stack with granular control and centralized storage.

The precursor to this conversation is enlightening and may help you to see why the modern data stack’s evolution was inevitable and valuable. Let’s take a minute to consider the rise of the Modern Data Stack. In recent years an interesting conversation started taking place among data professionals. It sought to answer the reason behind the unbundling of services.

What Do Classified Ads Have To Do With Data Tooling?

This change parallels what we saw in the world of online classified ads when people who once used Craigslist as a centralized place to seek services and goods in classified ads started to make use of more specialized businesses. In effect, modern services sprang up endeavoring to do just one thing really well, and instead of using one site for car buying, ride sharing, dating, and finding houses to buy or rent we now use websites and apps dedicated to each of these services. Gone were the days of the general purpose tool for browns ads. And the behavior of people using these tools reflected a shift in attitudes, smaller, more specialized and modular services were doing a better job of meeting individuals’ needs.

Likewise, while the modern version of data tooling is far more powerful, the core user experience is very similar. If you’re familiar with Craigslist, then Zillow, Uber, AirBNB and Tinder might also feel quite familiar. If you’re familiar with legacy data, DBT, Snowflake, Airflow, Meltano, and others will also feel like parts of what you have seen before.

The Great Unbundling

So, we’ve established that data services became unbundled, but just how did data tools become a suite of choices to make? Basically, there is a natural division of functions within the data landscape and each has associated modern tools. At the base you’ll find the ingestion layer, then transformation, storage, BI, and operational analytics. Some stacks still make use of orchestration tools and other businesses choose to forego those.

The most obvious benefit of this is that companies can now adapt their stacks to a variety of operational needs. Stacks, by their nature, are highly customizable. Most of these tools are uniquely focused on the needs of data engineers and data analysts. A possible downside is that engineers and teams easily toss around specifics that can be confusing to those unfamiliar with the concepts and developing an understanding of your business’s data needs may seem daunting. A data consulting agency like ours is well-positioned to help you make the right choices for your use case.

Does this feel like Patchwork?

The Modern Data Stack is not meant to be a confusing system of interconnected tools, though on the outset it can be. Instead, it is highly customizable and tailored to suit your organization’s unique needs. However, we’ll provide a general bird’s eye view model you can use to understand the way all these tools fit together. This assortment of tools we refer to as the Modern Data Stack allows you to choose what works best for you and build something customized where there was once a monolithic, one size fits all system that purported to take care of all your data needs.

This is where enlisting the help of a seasoned data professional can really be helpful. Someone who understands the various ways a company can put data to use and can envision the best ways to divide responsibility for using different parts of the modern data stack can cut through a lot of the confusion. Teams without guidance can experience many false starts and potentially waste time and money as they proceed to assemble their stacks through trial and error.

Modern Data Stack Model (source)

How the Modern Data Stack (MDS) differs from the Traditional Data Stack (TDS)

Instead of a monolithic application, think of the Modern Data Stack as a layered platform. The bottom layer represents your data sources which might be applications like Salesforce and Google Analytics, databases such as Postgres, Oracle and SQL Server, and files such as spreadsheets, XML, or JSON.

Next would be the ingestion layer which extracts data from the various data sources. This is where data engineers set up automated pipelines using tools such as Fivetran, Stitch, or Segment. There is also an open source integration engine that can work in this layer, Airbyte. When you do this well, you are setting your team up to work with the freshest data available.

After that, there is a storage layer that might include cloud data warehouses such as Snowflake and Amazon Redshift, and/or data lakes such as Databricks and Amazon S3.

The transformation layer is where you can clean raw data in order to facilitate subsequent analysis. You can also make changes to the form it takes to enable use in other tools. Example tools for transformation include DBT (Data Build Tools), which is a SQL command-line program that allows data analysts and engineers to transform data, or Matillion. Both of these are purpose-built solutions for cloud data warehouses.

The operations layer includes tools such as Apache Airflow. Apache Airflow is an open-source workflow management platform for data engineering pipelines. You could also use Atlan, which connects to your storage layer and assists your data teams with providing access to internal and external data and automates repetitive tasks.

Another layer is the analytics layer. This is where you create dashboards and visualizations with tools such as Looker, Zoho, PowerBI, Metabase, and Tableau. You’ll also see tools here for SQL query, and machine learning modeling tools such as Dataiku. Some even parse out a layer called operational analytics (sometimes referred to as reverse ETL) as seen in tools like Hightouch and Census.

Data as a Service

The Modern Data Stack is also related to the concept of “data as a service”. Data as a service, or DaaS, is basically any cloud-based software tool used for working with data. All of these tools are built and ran as a Software as a Service, or SaaS model.

Factors to Consider As You Decide

Once you’ve decided on the main components of your stack, you might want to consider if an open source solution will work. While some companies have expressed preference for tools that are not open source out of concerns for security, our team can help you assess the merits of such a tool, smart implementation, and even guide you through a plan to self-host, if that’s a requirement for your business needs.

Another hot topic: data governance. The Modern Data Stack goes hand in hand with Modern Data Governance. You should understand that good data governance ensures that data is accessible, reliable, available and of high quality. At the same time it supports data security and privacy compliance. Data governance is not just something that is nice to have, it has become a corporate necessity. With the advent of compliance and data privacy regulations such as GDPR and CCPA, businesses must take this into account. A Modern Data Stack can help you comply with regulations in a more agile way.

Outcomes: What Can You Expect After Adopting a Modern Data Stack?

With a modern data stack, you can save time, effort, and money. Your organization will benefit from tooling that is faster, more scalable, and more accessible. If you want your business to transition into a data-driven organization, an MDS can help you reach your goals. We’re here to help you. Doing it right is critical for creating business solutions that solve the right problems and don’t create more problems. Today’s businesses must have actionable, reliable, and up-to-date data to remain competitive. Our data team is ready to help you make the move to a Modern Data Stack.

How to Collaborate with Freelance Data Scientists and Data Engineers
Previous Post
How to Collaborate with Freelance Data Scientists and Data Engineers
Business Analytics with DBT: Self-Serve Data
Next Post
Business Analytics with DBT: Self-Serve Data