AWS Sydney Summit 2024: A Cloud Shuttle re:cap

The AWS Sydney Summit is the annual get-together in the Asia Pacific (Sydney) region for news, announcements and releases from Amazon Web Services. In 2024, it ran over 3 days:

A partner summit: This is the day where service providers (think either consultancies who help you migrate to the cloud, or SaaS products that help you with your cloud experience), spend the afternoon and evening with the AWS Partner team. Each year during the AWS Partner Summit, the Partner team sets out the vision for the coming year and any new programs in plan to accelerate customers journeys. In 2024, the Partner Summit was held at the Sydney Hilton Hotel, and then branched out to various dinners with account managers and partners.
Builders Day: This is very much the main event. It’s specifically targeted at developers, or what AWS calls “builders”. Sessions are mostly technical and designed for practitioners with several years of experience.
Innovation Day: This day is aimed at business leaders and newcomers to the cloud. This day is all about explaining the art of the possible and a vision for how Cloud technologies can enable the modern enterprise.

Because I’m an unapologetic techie at heart, I’ll be focusing my article on my recap and reflections of Day 2 – Builders’ Day.

Builders Day

The big news from AWS Summit is that Amazon Bedrock in now available in Sydney. First announced at re:Invent 2023, Sydney joins other AWS regions such as the US East and West regions, Asia Pacific (Tokyo, Singapore) and Europe (Frankfurt and Paris) to get their hands on the tech.

Lots of companies struggle with building their own GenAI apps for a variety of reasons (such as complexity, cost or time constraints), so AWS introduced Amazon Bedrock to solve this problem. Bedrock is a fully managed service for companies to build GenAI apps using foundational models from AI companies like Meta, Mistral and so on. Lots of Aussie organisations have data sovereignty concerns to deal with, so the fact that they can interact with Bedrock in the country is a big win for the technology adoption.

At time of writing, even more impressive updates and capabilities have rolled out, including model evaluations coming out in GA, guardrails, and the ability to import custom models. So this is definitely a space worth watching.

Expo floor

It’s no secret one of the most powerful tracks at any conference or summit is the hallway track. I loved my time walking around the expo hall and had really good chats with the team at Clickhouse, Cribl and Neo4j. Snowflake had a spectacular booth (side note: I love their polar bear mascot). It was great to see the folks at Databricks as well.

Sessions and talks

GenAI

As you can imagine, a lot of the sessions were focussed on how to best leverage generative AI at the Summit. Some of my highlights were:

Mohammed Ali’s talk on leveraging OpenSearch as a vector database
Masudur Rahaman Sayemand and Abhaya Chauhan did a fantastic talk on using Apache Flink with Kafka to bring together real-time analytics and generative AI to a consumer sentiment use case.

My key takeaways

What are my biggest takeaways from the summit when it comes to GenAI?

Amazon Bedrock: Honestly, Bedrock is really, really cool. If you haven’t already, you’ll want to check out their APIs.
RAG: Next, Retrieval-Augmented Generation (or RAG for short) is a big thing. It’s a technique for ensuring your GenAI models are accurate and reliable by referencing external authoritative sources beyond your training data. In other words, making your LLMs much more trustworthy. Having high quality data is a pillar and outcome of RAG. OpenSearch vector database is well worth a look as it can help you implement semantic search, RAG, and so on if your company has use cases for recommendation engines and the like.
Using Apache Flink and Kafka for real-time sentiment analysis: Apache Flink allows you to run stream analytics on top of Kafka. Masudar and Abhaya’s talk showcases how you can bring generative AI to that process (though they do remind us to factor cost as a consideration, as running an LLM API call for each event could end up being very costly).

Data engineering

From a pure data stack perspective, I enjoyed the following sessions:

Chris Horder did a fantastic talk on large-scale transactional data lake formats.
Allison Quinn and Subhas Gosh did a great presentation on simplifying metadata ingestion.
Surendar Munimohan and Paul Villena talked about Zero-ETL, which looks super exciting.
Francis McGregor-Macdonald ran a great whiteboard session on data governance.
Partha Sahoo presented an interesting session about data observability.

My key takeaways

Data lakes are gaining maturity. As Chris Horder mentioned in his talk, data lakes are being used for way more than set-and-forget long-term data storage. Big companies are using it for transactional use cases, and his talk is a fascinating exploration into the considerations you need to take into account for these use cases.
Real-time data analytics and ML. My recommendation is that if you want to be doing real time data analytics and machine learning use cases, it’s time to look at a combination of Lake Formation, Iceberg table format, Athena for querying, and Amazon DataZone for governance.
The promise of Zero-ETL: I loved Surendar Munimohan and Paul Villena’s talk on Zero-ETL, which looks super exciting. If you are all in on AWS with your applications and using things like DynamoDB, OpenSearch and Aurora, Zero-ETL can be an amazing service for removing the complexity of ETL so that you can put your data engineers to better use. Zero-ETL is starting to land and if you’re all in on AWS and leveraging services like Aurora… it’s time to give it a go!
The importance of data observability. Partha Sahoo’s session was a good reminder that data observability is extremely important for making your data actionable. If real-time use cases are relying on the data, then you need to make sure it’s good to go! If your architecture is highly distributed, data observability can be a real challenge, but it’s important to start investing in this space.
Get deep into data governance. Data governance is all about who has got access to what and whether that’s good for your business. Be prepared to go deep on DataZones in Francis McGregor-McDonald’s session.
Don’t let your data lakes turn into data swamps. Allison Quinn and Subhas Gosh remind us in their talk, data lakes easily become data swamps when you fail to capture metadata and tag your data appropriately. Data tagging and metadata ingestion is table stakes now. If you haven’t implemented it yet… just do it and thank yourself later. Tools like Glue data crawlers, Lake Formation and DataZone make this a lot easier, so why not leverage those services?

Read my blog post on how a solid data tagging strategy can stop your data lakes from turning into data swamps.

Conclusion

The Cloud Shuttle team at the AWS Sydney Summit in 2024.

I attended with the Cloud Shuttle team and our minds are buzzing with excitement and ideas about all the new technologies we could leverage to help our customers solve their problems and uplift their data capabilities. We’ll take some time to settle down, let the ideas percolate, prioritise some quick wins and lay the foundations for longer-term improvements.

If you haven’t had the chance to attend an AWS summit before (or you haven’t been to one in a while), it’s well worth making the space for it in your calendar. You can even catch up on this year’s talks on-demand if you weren’t able to make it this year (registration is required, but it’s free). Pick the talks that most pique your interest and I guarantee you’ll walk away with some new learnings. I’ll see you at next year’s summit!

With all the announcements and innovations shared at the AWS Sydney Summit, it’s clear that the landscape of cloud and data technology is evolving very quickly. If you're looking to enhance your data capabilities, streamline your operations with Zero-ETL, or leverage GenAI for your business, Cloud Shuttle is here to guide you every step of the way.