Getting started with Lakekeeper and Trino OPA (Data Governance)

Peter Hanssens
—Mar 03, 2025

Lakekeeper Overview
Lekekeeper is an Apache-Licensed, high performance, secure, and user friendly Apache Iceberg Rest Catalog implemented in Rust.
For Lakekeeper features, refer to their documentation
Trino Overview
Trino (formerly PrestoSQL) is an open-source distributed SQL query engine for interactive analytics across many data sources. Trino is designed for high-performance interactive queries and allows you to query where your data lives without moving or copying it.
Environment setup and configuration
-
Get the project
git clone https://github.com/cloud-shuttle/governance-lakekeeper-trino
-
Initialize the environment
If docker is not installed in your system, download and install docker.
cd governance-lakekeeper-trino
cd exmpales/trino-opa
docker compose up -d
Following components will be deployed in your docker environment.
- Jupyter: for running python notebooks.
- Trinodb: Trino query engine.
- Open Policy Agent: OPA engine
- Keycloak: Authentication provider.
- Lakekeeper: Rest catalog.
Resources available:
- Trino UI: http://localhost/ui/
- Jupyter: http://localhost:8888
- Keycloak UI: http://localhost:30080
- Lakekeeper UI http://localhost:8181
Following users are configured in Keycloak
- user peter, password iceberg
- user anna, password iceberg
Keycloak user
- user admin, password admin
-
Bootstrap Lakekeeper and create data warehouse
In this step, we will define an admin user for Lakekeeper. User peter will be our admin user and a default project will be assigned to it. To do this,
- Open Jupyterlab UI
- Follow the steps and run cells in 01-Bootstrap.ipynb notebook. All notebooks are in examples/notebooks folder.
-
Create a data warehouse
This notebook will create a demo warehouse which will contain all our tables and namespaces.
- Open and run 02-Create-Warehouse.ipynb notebook.
-
Load Data
Once our warehouse is created, let's define a namespace, create some table and load some data into those tables. Note that these tables are iceberg tables whose data is stored in S3 compatible storage (Minio). This notebook defines, creates, and loads data into customers, employees, and suppliers iceberg tables. The csv files of these data are present in data folder.
- Run 03-00-Insert-data.ipynb
-
Prepare Trino
Open and run notebook *03-01-Trino-Preparation.ipynb. This notebook will install trino, connect to trino as human user and test connection. Also, it creates a Lakekeeper catalog reference. While running the notebook cell, a login prompt will appear. Login as user peter. Remember that, user peter is the admin user in lakekeeper, therefore, he will be able to execute queries from this notebook.
Access Control Test
Remember that we have two users (peter and anna) configured in Keycloak. Peter is admin user and, thus, has all the permissions in the catalog. However, anna has no permissions at all.
-
Test access with admin user peter
Open 03-02-Trino-Query-with-Multiple-Users.ipynb notebook and run cells under Use the Catalog (User 1: Peter) heading. Login as user peter when prompted.
You will see output from customers table because peter has admin access on the project.
-
Test access with anna
Open 03-02-Trino-Query-with-Multiple-Users.ipynb notebook in incognito mode and run cells under Use the Catalog (User 2: Anna) heading. Login as user anna when prompted.
You will see error as she has no access.
Now, let's grant user anna access to customers table.
- Open Lakekeeper UI and login as user anna
- Logout
- Login as admin user (peter)
- Go to Warehoues from the left menu
- Select demo warehouse
- Under NAMESPACE, select pii namespace
- Under TABLES, click customers table
- Select PERMISSIONS from the tab menu and click GRANT from the far right.
- Type anna in the search bar and grant select permission. Hit save after.
- Now, go back to the notebook and read data from Customers table as user Anna. This time, trino will be able to fetch data.
This way, trino respects access control and permissions setup in Lakekeeper data catalog.