Postgres Ibiza 2024

Pushing boundaries for extensions, with extensions

Yurii Rashkovskii

Let’s discuss the limitations and constraints of the current extension support system, various APIs and implicit contracts, and how we can work around some of the issues today.

We will dive into these topics:

What’s Postgres API surface?
Versioning: the good, the bad and the missing parts.
What kind of functions do we have?
Robustness: just how fragile are extensions?
Coverage: hooks, background workers, GUC and other Postgres subsystems

PGA - An Easy to Use, Secure and API-first Postgres Distribution

Alvaro Hernandez

PGA is a fully open source Postgres distribution built around three key concepts: * Easy to deploy and manage. Install with a single command (curl-pipe-shell) or Ansible on any Linux OS. No dependencies required. * Secure. Built with key security principles in mind, around containerization, reproducible builds, SBOMs, etc. * API-first. You can manage your Postgres clusters through the included CLI but also from a fully featured REST API. Program your Postgres!

It also allows installing 200+ extensions and managing multiple clusters per host. A live demo will be performed during the talk.

Greenplum is no longer OSS: Change of Operations in Mid-flight

Pilar de Teodoro · Spain

Our European Space Agency’s archives holding TBs of space data relies on open-source software for its flexibility and cost-effectiveness. However, we’ve encountered a significant hurdle: Greenplum, the distributed database we use, has shifted from open source to proprietary.

This change directly impacts our mission, which requires a scale-out database that supports specific PostgreSQL extensions like PgSphere and q3c. While open-source software offers numerous advantages, it also carries risks, including unexpected licensing changes that can disrupt vital operations.

In this presentation, we delve into the critical role of open-source software in our project, explore the challenges introduced by Greenplum’s new status, highlight the inherent risks of relying on open-source solutions, and outline our proactive strategy to transition to a new open-source database—ensuring our mission continues seamlessly.

Isolation Levels and MVCC in SQL Databases: A Technical Comparative Study

Franck Pachot · Switzerland

As a developer, it is crucial to understand Isolation Levels and Multi-Version Concurrency Control (MVCC) to create scalable applications without compromising data integrity due to race conditions. Although the default isolation level in most databases is usually adequate, knowing and using it correctly in your application is essential.

This technical session explores the implementation of these concepts in popular SQL databases such as Oracle, MySQL, SQL Server, PostgreSQL, and YugabyteDB.

We will also examine why traditional SQL standard definitions may not be suitable for MVCC databases and why this knowledge is essential for developers.

Let’s navigate these subtleties and make informed decisions for our application design without getting too academic.

Scaling the Summit: Mastering Complex Migrations in the Enterprise

Tania Ash · United Kingdom

Let us imagine – you are working in a Bank and somehow your leadership falls in love with Postgres. So, they give you a task to migrate 19 000 Oracle-based applications to Postgres and other tech (where applicable). This is exactly what we have faced with a bit over 3 years ago.

To solve this problem with came up with a operational framework which helped us to set the whole enterprise on a path towards successful migration. Since this framework can be easily scaled up and down, it is currently used in few other customers as a major operational model for modernization of their data and application ecosystem.

In this presentation we will focus on people, processes and culture as well as some of the technology critical for the success of such complex and risky programs. We will share some challenges we faced and solved as well as ones still to be solved. We focus on practical aspects of how to set up and structure legacy to Postgres migration program, how to make people believe in it and how to encourage them to innovate and reuse results of these innovations at enterprise scale.

Exploring Linux memory usage and IO performance with PostgreSQL

Frits Hoogland · Amsterdam

Linux and containers are similar in that they provide operating system resources such as memory and storage. These resources, along with CPU and networking, are key factors to achieve optimal performance. However, databases in general and therefore PostgreSQL too relies on high performance disk IO to function properly. This widely known.

In this session, we'll explore how memory allocation at the container or operating system level influences the performance of buffered disk IO. We’ll also examine Linux memory management details and the different ways to perform disk IO. From there, we’ll walk through a number of examples that reveal the typical dynamics of cached and non-cached IOs, as well as the differences between the two.

Understanding these dynamics are important to PostgreSQL, but really to any application, that is dependent on disk IO and has performance requirements.

Building Petabyte-Scale Systems on PostgreSQL

Chris Travers · Jakarta

When I was at Adjust we replaced ElasticSearch with an inhouse solution built on PostgreSQL in order to avoid scalability limits in ElasticSearch which we had hit at about 1PB in size. This talk covers:

The design of the system scaling linearly to vast amounts of data
Why and How We Patched PostgreSQL to support our endeavor
An Open Source project called Bagger built on our experience

Oracle to Postgres: Happily Ever After Edition

Mayuresh Suresh Bagayatkar · Prague

Get ready to embark on an epic journey from Oracle to PostgreSQL with my talk, “Oracle to Postgres: Happily Ever After Edition.” This engaging session will dive deep into post production go-live world as seen through eyes of a Postgrs DBA. You’ll discover practical strategies, indispensable tools, and real-world tips that ensure your migration is not just successful, but a sleep friendly endavour. Whether you’re a DBA, developer, or IT manager, this talk will equip you with the knowledge and confidence to transform your database landscape.

Modeling Data Fast, Slow and Back-In-Time: Change-Aware Dimensional Data Modeling in the Modern Data Stack

Chin Hwee Ong · Singapore

Modeling data with versioning enables data teams to track and reproduce history for fast- and slow-changing data over time. In the era of read-optimized cloud data warehouses and ELT (Extract-Load-Transform) workflows, there is still value in integrating dimensional data modeling practices with modern functional approaches in the modern data stack to design efficient and reproducible data models for analytics reporting use cases that require accurate reporting of point-in-time historical dimensions. In this talk, I will be exploring how we can design history-preserving data models and reproducible data transformation workflows by adopting “change-aware” dimensional modeling practices in the modern data stack.

Large Object Usage in Production - Patterns, Pitfalls and Recommendations

Robins Tharakan · Australia

Having supported customers with a wide variety of usage over the years, among other learnings, one aspect worth highlighting is usage of Large Objects.

This talk goes into the following aspects of Large Objects (LO): - Why do people come to end up using Large Objects in Postgres (naming misunderstandings / incoming migration from other engines / existing migration tools etc. ) - What should / shouldn’t be stored in Large Objects. Its documented and (more importantly) practical limits on what is / isn’t a good idea. - How should the LO usage be addressed? In particular ease of migration is understandable, but what all contributes to prioritizing LO maintenance, For e.g, version End-of-Life upgrades, pg_upgrade dependency, logical migration limitations etc. - Maintenance: Neglected databases for years (never upgraded). Why Vacuuming catalog tables is misdirected. How vacuumlo helps and its pitfalls. Other (faster) ways to optimize cleanup.

PostgreSQL on Kubernetes: Dos and Don'ts

Chris Engelbert · Germany

Running databases in containers has been the biggest anti-pattern of the last decade. The world, however, moves on and stateful container workloads become more common, and so do databases in Kubernetes. People love the additional convenience when it comes to deployment, scalability, and operation.

With PostgreSQL on its way to become the world’s most beloved database, there certainly are quite some things to keep in mind when running it on k8s. Let us evaluate the important Dos and especially the Don’ts.

What are containers?

Dave Hughes · United Kingdom

Overview of agenda:

What containers are
What they are used for
History of Containers across multiple Unix variants (most of which are still supported by Postgres)
Why is any of this relevant? (Containers are operating systems and software virtualisation)
What is virtualisation? + history
What is an operating system?
What does this really mean? - e.g. differences in libc (Linux as an operating system does not include a c library, which is problematic when trying to target Linux as an operating system), differences in software virtualisation implementations (FreeBSD jails vs Docker), differences in defaults of Linux distributions, differences in binary output (if compiled with llvm instead of gcc)
How does any of this affect Postgres? Postgres has official support for multiple platforms, architectures and provides officially supported Docker containers for Linux - based on multiple libc implementations. More importantly variations across the aboard have direct impact on Postgres’s runtime behaviour
Specific real world examples of how deploying Postgres in containers encountered problems, all on officially supported platforms.
Closing comments on why containers are good, bad and everything in-between - embrace containers but empower yourself with knowledge!
Questions

Deep Dive in a SQL Query

Jesús Espino · Spain

In this talk we will explore the inner workings of the Postgres database from the perspective of the execution of a SQL query.

We will follow the same path as the SQL query, going through the parse, the rule system, the optimizer, the query plan, and finally, the executor, which will give the user the result through a cursor.

We will learn about the Query Tree, the different kinds of Joins and Scans that you can have and how they work internally, how the optimizer selects the right query plan, and how the executor runs that query plan.

This knowledge will help you better understand how the database works and why our queries are sometimes slow.

Pattern-matching in LISTEN/NOTIFY

Emanuel Calvo · Spain

In this talk, we’ll explore a proposed feature for LISTENining based on patterns at the current commitfest. This feature could not see the light but expands the capabilities of Postgres as a Transactional Message Queue, potentially from v18.

Among the capabilities, it can perform group matching, similar to MQ libraries such as ActiveMQ or Kafka.

We’ll include showcases, profiling, and benchmarks.

A similar talk was lectured in 2012 in PGconf.br, about Message Queueing in Postgres. This one will explore new approaches and potential applications for this feature.

To deploy a monitoring solution, there are many things to understand based on the three observability pillars: metrics, logs, and traces. All three pillars will be explained from the database perspective. Deploying a solution can be straightforward or can become a nightmare.

Improve query plans by "constifying" expressions

Fabrízio de Royes Mello · Brazil

Sven Klemm · Germany

A query plan needs to be computed to execute a query in PostgreSQL. The query only describes how the result should look like. However, to execute a query, PostgreSQL needs to know which operators (e.g., filters, join) and input data (e.g., a table scan or an index scan) have to be used to produce the desired query result. So, the query plan contains the exact information on which steps are needed to execute the query.

When a given function is used on the predicate it can or cannot be pre-evaluated during query planning depending on the function volatility. IMMUTABLE functions return the same result for the same input so the result is pre-evaluated during the query planning and the function invocation will be replaced by a constant value in the query plan. But for functions using other volatility classification Postgres doesn’t apply this optimization.

In this talk, we’ll present some techniques we implemented in TimescaleDB, where we hacked the planner to add and/or replace the constant value of a function call result to give the planner a chance to produce a better execution plan. This is useful because it can potentially improve query performance on partitioned tables by allowing partitions to be pruned at planning time, significantly reducing execution time.

63 RPis super cluster running StackGres

Jorge Solórzano · Spain

Guillermo Ruiz Esteban · Spain

Imagine harnessing the power of a supercomputer with the simplicity and affordability of Raspberry Pis. In my upcoming talk, I’ll showcase an innovative project where a supercluster using 63 Raspberry Pis 3B+ is shown in live action, deploying a StackGres cluster and “making it work”.

Join me to explore:

The Engineering Feat: Explain how is assembled and networked 63 Raspberry Pis into a cohesive Kubernetes supercluster.
StackGres Mastery: Leveraging StackGres to orchestrate PostgreSQL databases, ensuring high availability and performance.
Challenges and Triumphs: An honest look at the hurdles we faced, from hardware constraints and network configuration woes to software compatibility issues and performance tuning.

Setting up this supercluster wasn’t a walk in the park. We faced numerous challenges, including:

Hardware Constraints: Dealing with limited processing power and memory of each Raspberry Pi, and worst no storage.
Network Configuration: Establishing a reliable and high-speed network to connect all 63 units, and a deceptive appearance.
Software Integration: Overcoming compatibility issues between the Raspberry Pi hardware and StackGres, fine-tuning the server, and cheating a little by using a desktop-like amd64 computer as the control-plane.

The software integration includes the setup of a ZFS pool, sharing iSCSI volumes, network boot, and more.

Fun with SQL: Migrate from MSSQL to PostgreSQL using SQL

Anthony Rafael Sotolongo León · Spain

Migrating structures and data from another database to PostgreSQL is always a challenge, The most common way is to use a third-party tool that can connect to the origin database, make the fixes, and dump the information into the target database.

MSMOV is a PostgreSQL module to facilitate migration from MSSQL to PostgreSQL using Foreign Data Wrapper( tds_fdw), so you can perform the migration using SQL sentences directly from the PostgreSQL database, connecting to the origin through FDW, and making the fixes to PostgreSQL.

This component can migrate the following elements: TABLES, CONSTRAINTS, INDEXES, PKS, FKS, UNIQUES, CHECKS CONSTRAINTS, SEQUENCES, SYNONYMS

Scalable PostgreSQL deployment using Containerization: A Technical Exploration

Premnath J · India

This talk ellaborates on comprehensive exploration of containerization strategies tailored for PostgreSQL, emphasizing practical insights and real-world applications. Attendees can expect to gain a deep understanding of how containerization can transform the deployment and management of PostgreSQL databases in modern IT landscapes.

What is containerization ? Use of containers in Relational Database Systems
Core Concepts of PostgreSQL in Containers
Deep dive into Statefulness on PostgreSQL
Managing backups on PostgreSQL working with containers
How to Maintain Data Persistent Volumes for Data Durability
Managing Database Maintainance and Monitoring on Containers
Optimizing PostgreSQL Performance.
Scalability and High Availability
Migration and Deployment Strategies

Selected Talks