“Tests were so complex that maintaining them was harder than maintaining the workflows they were supposed to test.”
This was our reality at Rocket Internet, where we went from one extreme to another in our testing journey. As the Head of Business Analytics responsible for building data infrastructure across portfolio companies like Zalando or Delivery Hero, I’ve seen firsthand how testing can make or break your data stack.
Too many analytics and data engineers skip testing altogether, assuming it’s QA’s responsibility.
Others, like my team at Rocket Internet, overcompensate with hundreds of failing tests that eventually get ignored.
The sweet spot lies in between: a minimum viable test suite that catches the most critical issues without becoming a maintenance burden.
Before we jump into your Minimum Viable Test Suite, let’s have a look at a few general guidelines when setting up tests.
Before jumping into this, it makes sense that you spend 5 minutes to read my data warehouse layering guide , so that you understand the terms that I'm referencing in this episode. You can also continue reading and just look them up when you need them.
Ok, now let's get into this:
1 Tests should only use simple SQL statements. If a test requires too many lines of code or complex nested queries, the test will likely have high maintenance costs
2 Use different levels of severity for tests and define what they mean (for example: Warnings should be reviewed but don’t stop us from committing and merging code, Errors must not be committed to production)
3 Tests should only trigger Errors or Warnings when there is a real problem if we don’t act on the alert. For example, if the source is missing one out of 10 million rows, that’s not a serious problem and it shouldn’t trigger a warning, let alone an error. Too many errors and warnings will lead to notification fatigue and no one will act on the notifications.
4 Always test only for the root cause of an issue and not for the symptom. For example, if you know that there is a manual process somewhere that leads to an increase in CoGs (cost of goods sold) because a user adds duplicated product IDs into a manual input sheet, then don’t write a test for unusual fluctuations of CoGs but rather write a test that checks for duplicated product_ids.
5 Prevention > Detection. Before writing a test that checks for a known issue, always ask yourself if that issue can be prevented. The example above could be prevented by adding a key constraint to product ids or - even better - make it hard or impossible to add duplicate ids in the manual input sheet (e.g. by adding a duplicate checker into the sheet).
6 Clearly separate tests of data sources vs tests of processes in the data warehouse. Adding to points 4 and 5, make sure that it’s clear if you’re testing issues that are under your control vs under the control of a data provider. Ideally, you can prevent or outsource source tests to the data provider (Data Contracts).
7 Always run test before committing. Establish rules about what happens if tests fail (e.g. no commit when there is 1 Error, commit when there are Warnings)
8 Don’t design tests that are dependent on each other, e.g. if you verify that the customer count between the datamarts layer and the preprocess layer is consistent, you don’t need to also verify if the customer count between the objects layer and datamarts layer is consistent (if one fails, the other will automatically also fail)
9 Each test should be documented and contain information about what the test does, what the expected result is, what needs to be done if the result is not as expected, and - if known - what potential causes are (e.g. Revenue gets wrongly increased by X%). Ideally, this documentation happens right within the notification to the person who needs to act.
10 Avoid tests that are subject to frequent change. Checking that the VAT rates for German transactions are always correctly 19% could make sense if there is a history of the payment provider not providing VAT amounts which leads to wrong net revenues but checking if the price for a certain item is always X will lead to high maintenance costs if the price changes often.
11 Separate between tests and business alerts. This one is similar to 6. For business alerts, typically business users need to act, not the data team. You should distinguish between alerts that get triggered because of issues with the data pipelines vs issues with the business performance.
Let’s explore the essential tests for each layer of your data warehouse that will cover 80-90% of potential issues.
Source and Preprocess Layer
Key tests to implement:
→ Stale Data Detection
→ Manual Entry Validation
→ Schema Change Monitoring
Objects Layer
Essential validations:
→ Primary Key Integrity
→ Referential Integrity
→ Technical Debt Monitoring
→ Structured Data Validation
Data Marts Layer
Critical checks:
→ Primary Key Verification
→ Sanity Checks
→ Historical Data Stability
→ Cross-Reference Validations
Testing doesn’t have to be overwhelming or insufficient. Start with these essential tests that cover most critical scenarios. They form a foundation that’s maintainable and effective.
Remember: the goal isn’t to test everything but to test the right things. Your future self (and your stakeholders) will thank you for investing in this minimum viable test suite today.
Build it into your development workflow. Make it a habit. Because finding out about data issues from your stakeholders is much more expensive than catching them early through systematic testing.
Join 500+ future data leaders for tips, strategies, and resources to build impactful data teams and live a better life 🏝
Data Action Mentor Masterclass: Create massive business impact with your data team.
This class is built for ambitious data professionals and data leaders. I spent 17 years building data teams for high-growth companies such as Rocket Internet, Zalando, Takeaway.com, Lazada (acquired by Alibaba), and many more. In this class, I am sharing all my lessons, failures, and successes so that you can make your stakeholders and CEO happy and accelerate your data career.
Impactful Data Teams for Scale-ups
I build data infrastructure and data teams with immediate business impact for global b2c scale-ups and grown-ups in e-commerce, insurance, fintech, and consumer subscription. My proven approach has helped dozens of scale-ups. I build the infrastructure at a fixed price and then empower the business to move rapidly from data to action. If you know a consumer internet scaleup that needs an impactful data team, hit me up!
I run Data Audits to support you in transforming your data team into a strong business partner. You will get a tailor-made list with action items that will help you create massive business impact with your data team.
I am committed to adding actionable, free content to our Data Action Mentor knowledge base to help you on your journey to create massive business impact with your data team
We have received your inquiry and will get back to you asap!
Watch your email inbox for an email with the subject line "Data Action Mentor Masterclass - Create more business impact with your data team".
We will send you updates about the status of your application and about the masterclass launch.
Don't forget to check your Spam folder!
👉 We will be in touch as soon as we're ready to launch the masterclass!
Thank you for your interest in the Data Action Mentor Masterclass - Create Massive Impact with your Data Team!
We will let you know by email before October 30 if you are one of the 3 FREE beta testers.
Please note that the masterclass is still in development. We'll keep you in the loop about our progress in building this!
Best,
Sebastian - Founder Data Action Mentor
👉 Please check your email to confirm your subscription!
⏰ This confirmation email may take up to 5 minutes to arrive (it may land in your spam folder).
You are now on the waitlist!
We will be in touch as soon as we open admissions to the Data Action Mentor community!
🤞 Please don't forget to look for an email with the subject line "Please confirm your email" in your inbox to confirm your newsletter subscription.
I will be in touch once I have news regarding the masterclass!
Cheers,
Sebastian - Founder Data Action Mentor