How to stop data firefighting… and start data engineering
Peter Hanssens
—Apr 09, 2024
Data engineering has evolved rapidly, particularly since AI has become such a hot topic in the past few years. The big focus in AI and Machine Learning requires businesses to do a much better job of investing in their systems and tooling to ingest and process data and ensure that it’s accessible and of a high quality.
The evolving role of the data engineer
In my view, the role of the data engineer is really evolving to become a data platform engineer. Meaning, data engineering isn’t just about writing python scripts and pushing data from one system to another, but a much more strategic role.
Besides creating the infrastructure, tools and frameworks to support data systems, a data platform engineering team should be building a robust DataOps practice. That might involve implementing CI/CD data pipelines, setting up alerts to be proactively notified on quality and cost issues, and making things more automated and easier for data consumers to self-serve. Ultimately, it’s about moving from a reactive role to one that enables others in the business to perform better.
From reactive to proactive
It’s not easy to shift from a firefighting role to a more proactive one. Data is an incredibly complex field. With the pace of advancements in the field, the number of tools, technologies and skills that data engineers are expected to know is mind-boggling, and it’s only going to grow exponentially. Many data teams are often under-resourced and don’t have enough time to get everything done. And when there’s a bug that breaks the system - the data consumers in your business WILL let you know. And they’d want it fixed ASAP.
For whatever reason – because ETL pipelines are complex, because your data systems have a lot of interdependencies, because there’s still tons of technical debt still to be refactored – it will never be the case that you’d eliminate firefighting completely.
But there are some core principles to apply that can help uplift data engineering teams to engage in more strategic, value-adding activities.
In this blog series, I’ll cover the following topics:
- Documentation: Creating effective documentation to understand and efficiently manage data systems.
- Data tagging: Categorising data for better governance, management and retrieval.
- Data quality checks: Implementing regular checks to ensure data integrity and reliability.
- CI/CD with automated unit and integration tests: Enhancing development practices to ensure robust, error-free deployments.
I’ll offer practical advice and tips and tricks and share how adopting these simple but effective practices can enhance a data team’s capabilities and overall value to the business. As data engineers, a strong DataOps and DevOps culture helps everything ‘gel together’ and positions us to do more innovative work that helps drive the business forward.
Data teams, it's time to start creating more stable data platforms and end the vicious cycle of constant firefighting. Let's get started!