Streamlining Data Migration for an Energy Giant
Cloud Shuttle successfully transformed a leading energy company's data infrastructure by migrating from Matillion to a modern stack using Snowflake, dbt, and Airflow. In just five months, our team of 2.5 FTEs migrated over 300 models while ensuring zero disruption to business operations. The project included implementing automated orchestration, establishing S3 as a handoff point, and conducting thorough regression testing throughout the migration. This strategic modernisation resulted in a 30% increase in operational efficiency, enabling real-time data accessibility and empowering teams with faster, more reliable data processing capabilities for improved decision-making.
Peter Hanssens
—Oct 24, 2024
Driving Data Transformation for a Leading Insurance Provider
When a leading insurance provider found their legacy SSIS systems creating bottlenecks and driving up costs, Cloud Shuttle stepped in with a transformative solution. By migrating over 100 SQL models from SSIS to dbt on Snowflake, we helped modernize their entire data infrastructure with just one FTE. The migration eliminated data silos, simplified ETL processes, and reduced infrastructure costs while providing seamless cross-product insights. This strategic transformation not only solved immediate operational challenges but also positioned the client's data infrastructure for future growth and innovation in the competitive insurance landscape.
Peter Hanssens
—Oct 02, 2024
GraphSummit Sydney 2024 recap: Innovations from the frontier of Data and AI
Neo4j's GraphSummit World Tour 2024 brought together 150 industry leaders in Sydney to showcase how Australian and New Zealand organizations are leveraging graph technologies with AI to solve real-world challenges. Highlights included Commonwealth Bank's unveiling of GraphIT, their network infrastructure digital twin, SparkNZ's innovative use of graph databases for RFP automation, and McKinsey's sobering perspective on GenAI implementation challenges. The summit featured practical workshops on architecting graph applications and enabling GenAI breakthroughs with knowledge graphs, while Cloud Shuttle demonstrated our DataEngBytes chatbot powered by Neo4j, LangChain, and Amazon Bedrock. As graph databases continue to evolve, the event highlighted their growing importance in everything from supply chain optimisation to drug discovery.
Peter Hanssens
—May 13, 2024
Exploring data challenges: Insights from our community survey
Cloud Shuttle's 2024 survey of data professionals across Australia and New Zealand reveals the pressing challenges facing the industry today. Drawing insights from diverse roles including data managers, engineers, architects, and scientists, the survey highlighted four key pain points: data governance and compliance, data quality, observability and tooling, and data integration. Despite varying company sizes and locations, professionals consistently reported struggles with centralized monitoring, unclear governance frameworks, complex integration challenges, and maintaining data quality. While the sample size was modest, these findings underscore the shared obstacles in modern data management and the need for strategic solutions to enhance data practices across the ANZ region.
Peter Hanssens
—May 06, 2024
Snowflake Data Cloud Summit 2024: Builders keynote
Day 3 of Snowflake Summit 2024 delivered an engaging Builders Keynote that showcased the platform's seamless AI and data capabilities through the return of "Tasty Bytes"—Snowflake's fictional food truck company. Through three practical demonstrations, the team illustrated how to implement customer sentiment analysis, create domain-specific chatbots, and perform network graph analysis, all within Snowflake's unified platform. Highlights included Vimeo's real-world implementation of LLMs for video analysis, Cash App's network science applications using Relational AI, and practical demonstrations of how Snowflake's tools can turn complex data challenges into actionable insights—all while maintaining data security and governance within the platform.
Peter Hanssens
—May 05, 2024
Snowflake summit: platform keynote recap 2024
Day 2 of Snowflake's Data Cloud Summit packed a punch with major platform announcements focused on Enterprise AI capabilities. Co-founder and President of Product Benoit Dageville kicked things off by positioning Snowflake as the ideal Enterprise AI platform, highlighting five critical components: data, elastic compute, world-class AI models, security and governance, and collaboration. The summit unveiled several exciting updates, including Iceberg Tables going GA across all clouds, enhanced data governance features through Snowflake Horizon, and significant AI/ML developments like Snowflake Copilot reaching GA. A standout announcement was Cortex AI's Fine-Tuning entering public preview, allowing users to customize LLMs at one-tenth the cost of training from scratch. With demonstrations from customer success stories like Siemens and Pizza Hut, Snowflake reinforced its commitment to simplicity, efficiency, and robust data foundations for Enterprise AI applications.
Peter Hanssens
—May 04, 2024
Snowflake Data Cloud Summit 2024: opening keynote recap
Live from San Francisco's Moscone Center, Day 1 of Snowflake's Data Cloud Summit kicked off with CEO Sridhar Ramaswamy's bold vision for Enterprise AI democratization. The opening keynote featured major announcements including Polaris Catalog, an open-source catalog for Apache Iceberg, and two significant NVIDIA partnerships—Snowflake Arctic's support for TensorRT-LLM and NeMo Retriever integration with Cortex AI. A highlight was the virtual appearance of NVIDIA CEO Jensen Huang, who discussed AI's accelerating advancement rate of 2x every six months. The day concluded with an insightful panel of Chief Data Officers from JP Morgan Chase, Ericsson, NYC Health + Hospitals, and Booking.com, exploring how different industries are navigating the enterprise AI landscape while balancing innovation with governance and security.
Peter Hanssens
—May 04, 2024
Bringing the DataEngBytes experience into the GenAI era
As the founder of DataEngBytes, one of Australia's largest data engineering conferences, I wanted to enhance our community experience by building an AI-powered chatbot to answer attendees' questions. The challenge? Large Language Models (LLMs) had zero knowledge about our conference. Enter Retrieval-Augmented Generation (RAG), implemented through a knowledge graph approach using Neo4j and Amazon Bedrock. Rather than relying on vector stores' probabilistic nature, we chose knowledge graphs for their deterministic relationships, allowing the chatbot to provide precise answers about conference dates, venues, attending companies, and event specifics. This proof-of-concept demonstrates how expressing organisational data through relationship-driven knowledge graphs can unlock powerful AI applications that truly capture your data's full potential.
Peter Hanssens
—Apr 30, 2024
Leading a data platform team to DataOps success
Just as DevOps revolutionised software development, DataOps is transforming how we handle data engineering projects. While both share core CI/CD principles, data engineering pipelines face unique challenges around data quality, schema evolution, and consistency. This guide explores how to implement effective CI/CD practices for data projects, from choosing the right tools like Terraform and GitHub Actions to setting up automated testing and monitoring. By embracing DataOps principles, teams can achieve faster deployments, enhanced collaboration, and more reliable data pipelines that scale with business needs—ultimately delivering better quality insights to data consumers with confidence.
Peter Hanssens
—Apr 23, 2024
The perfect recipe: 7 essential data quality checks
Just as no chef would serve a meal made with spoiled ingredients, businesses shouldn't make decisions based on poor quality data. With bad data costing organizations an average of USD$12.9 million annually and the growing importance of reliable data for GenAI applications, implementing robust data quality checks has never been more critical. This guide explores seven essential dimensions of data quality—validity, accuracy, completeness, consistency, uniqueness, and timeliness—along with practical implementation strategies for each. From data validation and completeness checks to monitoring numeric distributions, learn how to transform your data pipelines into a well-oiled kitchen that serves up only the highest quality data for your business consumers.
Peter Hanssens
—Apr 22, 2024