Why data warehouses won't die

⏰ Reading Time: 6 minutes ⏰ 

“Data warehouses won’t disappear. They’ll just stop being built by humans.”

Last week, I had a few exciting conversations about the future of the data warehouse and I often see two extreme POVs:

On one side, strong voices argue that data warehouses are becoming obsolete.

On the other, experienced practitioners insist they are more important than ever.

This newsletter breaks down both extremes and I will share where I agree and disagree.

Why this matters now

If you work in data, your role is shifting.

Not gradually. Fundamentally.

For years, value in data teams came from:

  • modeling data
  • cleaning and transforming it
  • building pipelines
  • delivering dashboards

But that entire layer is being automated - fast.

The real question is no longer how we build data warehouses.
It’s whether we should still build them ourselves at all.

The two extremes

1) “Data warehouses will disappear”

This view argues:

→ AI agents will directly understand source systems
→ They will query raw data on demand
→ No need for centralized modeling or transformation layers

Instead of building pipelines, you “talk” to your data.

The logic is simple:

  • AI can interpret schemas
  • AI can track changes in systems
  • AI can dynamically generate queries

So why maintain a rigid, pre-built structure?

In theory, this removes:

  • ETL pipelines
  • semantic layers
  • data modeling overhead

Everything becomes dynamic.

2) “Data warehouses are more important than ever”

Then, there is the opposite view:

  • AI needs structured, reliable data foundations
  • Without clean data, AI agents break down
  • Centralized systems are required for consistency

In short: You need a clean data foundation to make AI useful.

The key concerns:

  • Source systems are messy and inconsistent
  • Business logic is full of hidden “gotchas”
  • No two databases are structured the same

Without a curated layer AI may misinterpret data, metrics become unreliable and decisions break.

So in this world:

→ Data warehouses stay
→ Data engineering stays
→ Just augmented by AI

A third perspective: AI builds the warehouse

There’s a middle ground:

Instead of asking:

  • “Do we need data warehouses?”

Ask:

  • “Who builds them?”

And this is my take on the "Do we still need the DWH" discussion:

AI will build and maintain data warehouses - but it still needs them.

  • AI generates transformation logic
  • AI writes tests
  • AI documents schemas
  • AI iterates based on feedback

Over the last 10 years, I've been building data foundations for VC-backed scale-ups, large enterprises and SMEs.

Even before the GenAI craze, I've started to develop blueprints, templates and systems that drastically reduced build-time so that - on average - I needed between 10 and 15 days to build a data foundation from scratch.

On my last project, I built a data foundation without writing a single line of code and brought it down to 5 days.

And, based on what I've learned from this project, I'm 99% confident that I can bring this down to 1 day or less.

This was not a startup: it was an SME that has been around for many years, with lots of legacy in their source systems.

I will write more about that in a future newsletter.

To summarize my view:

The warehouse won’t disappear.

It becomes self-built.

What actually changes

If AI takes over preparation and modeling, then the value moves elsewhere.

Here’s where the leverage shifts:

1) Data capturing becomes evermore critical

Garbage in, garbage out - now at scale.

Capturing high quality quantitative AND qualitative data is where the magic happens.

When I started out leading my first data team 15 years ago, 99% of my work was around quantitative data.

Today, qualitative data is more important than ever:

  • customer conversations
  • business definitions
  • edge cases (“gotchas”)
  • decision context
  • business goals and hypotheses

In practice, this looks like:

  • documenting KPI definitions through conversations
  • feeding transcripts into AI systems
  • turning implicit knowledge into explicit input

On the project I mentioned above, conversation transcripts about business logic were turned into structured documentation that AI could use to build models.

That’s a new kind of data engineering.

2) Data governance becomes a core function

Not governance in the sense of control and restriction and red tape.

But:

  • who (or which agent) can access what
  • when data is used
  • how decisions are made

It’s no longer just people querying dashboards.

It’s:

  • agents making decisions
  • systems triggering actions
  • AI operating autonomously

The challenge is:

→ ensuring the right entity (human or agent)
→ gets the right data
→ at the right time

3) Preparing data becomes commoditized

This is the uncomfortable part.

Tasks like:

  • cleaning data
  • writing transformations
  • building pipelines

are rapidly becoming:

→ automated
→ cheap
→ expected

Semantic layers will become:

  • generated through conversation
  • built incrementally via queries
  • maintained dynamically

Bottom line

The future of data isn’t about better pipelines.

It’s about better inputs.

If you focus only on:

  • modeling
  • transforming
  • visualizing

you’re optimizing a layer that is being automated away.

The real advantage will come from:

  • capturing high-quality data
  • making implicit knowledge explicit
  • controlling how AI uses that data

Because in the end:

AI won’t replace data warehouses.

It will build them based on the data you give it.

And that shifts the game entirely.

Cheers,

Sebastian

P.S.: 👉 What's your take? Feel free to reply to this email. I would love to hear your thoughts and also your experience in building warehouses and semantic layers with AI.

Join 3,000+ readers

Subscribe for weekly tips on building impactful data teams in the AI-era

Error. Your form has not been submittedEmoji
This is what the server says:
There must be an @ at the beginning.
I will retry
Reply
Emoji icon 1f64c.svg

Whenever you need me, here's how I can help you:

Data Strategy Masterclass​: 🏭 From dashboard factory to strategic partner♟️

A digital, self-paced masterclass for growth-oriented data leaders who want to level up their careers by building impactful data teams in the AI-age. 📈

Learn and apply the frameworks that I used to win stakeholder trust, earn a seat in the board room, and lead with impact in 40+ companies across all continents. 

​Knowledge Base​ 

Free content to help you on your journey to create massive business impact with your data team and become a trusted and strategic partner of your stakeholders and your CEO.