Getting started with Lakekeeper and Trino OPA (Data Governance)

Peter Hanssens

Peter Hanssens

Mar 03, 2025

Getting started with Lakekeeper and Trino OPA (Data Governance)

Lakekeeper Overview

Lekekeeper is an Apache-Licensed, high performance, secure, and user friendly Apache Iceberg Rest Catalog implemented in Rust.

For Lakekeeper features, refer to their documentation

Trino Overview

Trino (formerly PrestoSQL) is an open-source distributed SQL query engine for interactive analytics across many data sources. Trino is designed for high-performance interactive queries and allows you to query where your data lives without moving or copying it.

Environment setup and configuration

  1. Get the project

git clone https://github.com/cloud-shuttle/governance-lakekeeper-trino
  1. Initialize the environment

If docker is not installed in your system, download and install docker.

cd governance-lakekeeper-trino
cd exmpales/trino-opa
docker compose up -d 

Following components will be deployed in your docker environment.

  • Jupyter: for running python notebooks.
  • Trinodb: Trino query engine.
  • Open Policy Agent: OPA engine
  • Keycloak: Authentication provider.
  • Lakekeeper: Rest catalog.

Resources available:

Following users are configured in Keycloak

  • user peter, password iceberg
  • user anna, password iceberg

Keycloak user

  • user admin, password admin
  1. Bootstrap Lakekeeper and create data warehouse

In this step, we will define an admin user for Lakekeeper. User peter will be our admin user and a default project will be assigned to it. To do this,

  • Open Jupyterlab UI
  • Follow the steps and run cells in 01-Bootstrap.ipynb notebook. All notebooks are in examples/notebooks folder.

  1. Create a data warehouse

This notebook will create a demo warehouse which will contain all our tables and namespaces.

  • Open and run 02-Create-Warehouse.ipynb notebook.
  1. Load Data

Once our warehouse is created, let's define a namespace, create some table and load some data into those tables. Note that these tables are iceberg tables whose data is stored in S3 compatible storage (Minio). This notebook defines, creates, and loads data into customers, employees, and suppliers iceberg tables. The csv files of these data are present in data folder.

  • Run 03-00-Insert-data.ipynb

  1. Prepare Trino

Open and run notebook *03-01-Trino-Preparation.ipynb. This notebook will install trino, connect to trino as human user and test connection. Also, it creates a Lakekeeper catalog reference. While running the notebook cell, a login prompt will appear. Login as user peter. Remember that, user peter is the admin user in lakekeeper, therefore, he will be able to execute queries from this notebook.

Access Control Test

Remember that we have two users (peter and anna) configured in Keycloak. Peter is admin user and, thus, has all the permissions in the catalog. However, anna has no permissions at all.

  1. Test access with admin user peter

Open 03-02-Trino-Query-with-Multiple-Users.ipynb notebook and run cells under Use the Catalog (User 1: Peter) heading. Login as user peter when prompted.

You will see output from customers table because peter has admin access on the project.

  1. Test access with anna

Open 03-02-Trino-Query-with-Multiple-Users.ipynb notebook in incognito mode and run cells under Use the Catalog (User 2: Anna) heading. Login as user anna when prompted. You will see error as she has no access.

Now, let's grant user anna access to customers table.

  • Open Lakekeeper UI and login as user anna
  • Logout
  • Login as admin user (peter)
  • Go to Warehoues from the left menu
  • Select demo warehouse
  • Under NAMESPACE, select pii namespace
  • Under TABLES, click customers table
  • Select PERMISSIONS from the tab menu and click GRANT from the far right.
  • Type anna in the search bar and grant select permission. Hit save after.
  • Now, go back to the notebook and read data from Customers table as user Anna. This time, trino will be able to fetch data.

This way, trino respects access control and permissions setup in Lakekeeper data catalog.

Further Reading