We execute approximately 100k DDL statements in a single transaction in PostgreSQL. During execution, the respective Postgres connection gradually increases on its memory usage and once it can't acquire more memory (increasing from 10MB to 2.2GB usage on 3GB ram), OOM killer hits it with 9 which results in Postgres being gone to recovery mode.
BEGIN; CREATE SCHEMA schema_1; -- create table stmts - 714 -- alter table add pkey stmts - 714 -- alter table add constraint fkey stmts - 34 -- alter table add unique constraint stmts - 2 -- alter table alter column set default stmts - 9161 -- alter table alter column set not null stmts - 2405 -- alter table add check constraint stmts - 4 -- create unique index stmts - 224 -- create index stmts - 213 CREATE SCHEMA schema_2; -- same ddl statements as schema_1 upto schema_7 -- ... -- ... -- ... CREATE SCHEMA schema_7; COMMIT
Including the create schema statement, approximately 94304 DDL statements were meant to be executed.
Like several of its commercial competitors, one of the more advanced features of PostgreSQL is its ability to perform transactional DDL via its Write-Ahead Log design. This design supports backing out even large changes to DDL, such as table creation. You can't recover from an add/drop on a database or tablespace, but all other catalog operations are reversible.
We have even imported approximately 35GB of data into PostgreSQL in a single transaction without any problem, but why does the Postgres connection requires huge memory when executing thousands of DDL statements in single transaction?
We can temporarily resolve it by increasing the RAM or allocating swap, but we can say that the number of schema creation in a single transaction can increase up to 50 - 60 (Approx 1M DDL statements) which would require 100+ Gigs of RAM or swap which isn't feasible right now.
PostgreSQL version: 9.6.10
We sync the entire database of customers from Customer Premise (SQL Server) to cloud (PostgreSQL). All customers have different no of databases. Process is, entire data will be generated as CSV from SQL Server and import into PostgreSQL using Temp Tables, COPY and ON CONFLICT DO UPDATE. During this process, we treat each customer as a single database in PG and individual DB in customer's SQL Server as schemas in customer's PG DB.
So based on the CSV data, we will create the schemas dynamically and import data into it. As per our application design, the data in PG should be strictly consistent at any point of time and there shouldn't be any partial schema / tables / data. So we had to achieve this in a single transaction. Also we incrementally sync from customer to cloud DB every 3 minutes. So the schema creation can happen either in first sync or incremental sync. But the probability of creating so many schemas in first sync itself is very high.