Postgresql
Oracle to Postgres migration :
- Installation steps of postgresql cluster on Ubuntu machine.
1. Create the file /etc/apt/sources.list.d/pgdg.list and add a line for the repository
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ `lsb_release -cs`-pgdg main" >> /etc/apt/sources.list.d/pgdg.list'
wget -q https://www.postgresql.org/media/keys/ACCC4CF8.asc -O - | sudo apt-key add -
2. Import the repository signing key, and update the package lists
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt-get update
3. Ubuntu includes PostgreSQL by default. To install PostgreSQL on Ubuntu, use the apt-get (or other apt-driving) command:
apt-get install postgresql-11
- Important commands :
Important Notes :
- If you do not issue a
BEGINcommand, then each individual statement has an implicitBEGINand (if successful)COMMITwrapped around it. A group of statements surrounded byBEGINandCOMMITis sometimes called a transaction block. - By default huge pages are off in AWS RDS /Aurora.
- maintenance_work_memory : Specifies the maximum amount of memory to be used by maintenance operations, such as
VACUUM,CREATE INDEX, andALTER TABLE ADD FOREIGN KEY.
Partitioning :
Foreign Data :
Postgresql allowing you to access data that resides outside PostgreSQL using regular SQL queries.
ALTER TABLE SET STATISTICS. Larger values increase the time needed to do ANALYZE, but might improve the quality of the planner's estimates. The default is 100.- Aurora Query Planner Management ( QPM)
- Cluster cache management
- Aurora Global database.
- Aurora serverless
In PostgreSQL, pg_toast refers to a system schema used to store large data values that exceed a certain threshold size. It is primarily used for storing "toasted" data, which includes columns of types such as text, varchar, bytea, varbit, and oid that exceed the predefined size limit.
When a table contains large data values that cannot fit within a single database page, PostgreSQL automatically moves that data to the pg_toast schema. This process is known as "toast compression" or "toasting." The main purpose of toast compression is to optimize storage and improve performance by reducing the amount of disk space required for large data values.
When a large data value needs to be stored, PostgreSQL stores a small placeholder value in the main table and moves the actual data to the corresponding pg_toast table.
pg_toast tables are included in the backup and restore process.pg_toast tables and their associated data.wal_level = replica and wal_level = logical is the level of information written to the WAL. wal_level = replica provides the necessary data for physical replication, enabling streaming replication and hot standby setups. On the other hand, wal_level = logical includes additional metadata required for logical decoding, allowing for more flexible replication scenarios where specific tables or transformations need to be replicated.statistics_target parameter accepts a value between 0 and 10000. The default value is 100, indicating that PostgreSQL will collect statistics on approximately 100 rows per table block. Increasing the value of statistics_target allows for more accurate statistics but can also increase the time and resources required to analyze the data.EXPLAIN ANALYZE statement combines the functionality of EXPLAIN with the actual execution of the query. When you run EXPLAIN ANALYZE, PostgreSQL not only displays the execution plan but also executes the query and provides real-time performance information. It shows the actual time taken to execute each step, the number of rows processed at each stage, and other relevant statistics. VACUUM (VERBOSE)Long-running transactions and
VACUUMAbandoned replication slots and
VACUUM.Orphaned prepared transactions and
VACUUM.Standby server with
hot_standby_feedback = onandVACUUM:
recovery_target, recovery_target_lsn, recovery_target_name, recovery_target_time, or recovery_target_xid can be usedNormally autovacuum background process can take care of that. The default setting are set to vacuum tables whenever their is 20% change in the table. So if you are seeing bloat’s frequently, make sure that autovacuum is enabled and also adjust the settings according to data size.
- VACUUM FULL
- CLUSTER
- pg_repack
- REINDEX
12. GIN vs GiST index ?
In choosing which index type to use, GiST or GIN, consider these performance differences:
GIN index lookups are about three times faster than GiST
GIN indexes take about three times longer to build than GiST
GIN indexes are moderately slower to update than GiST indexes, but about 10 times slower if fast-update support was disabled (see Section 58.4.1 for details)
GIN indexes are two-to-three times larger than GiST indexes
As a rule of thumb, GIN indexes are best for static data because lookups are faster. For dynamic data, GiST indexes are faster to update. Specifically, GiST indexes are very good for dynamic data and fast if the number of unique words (lexemes) is under 100,000, while GIN indexes will handle 100,000+ lexemes better but are slower to update.
13. postgres_fdw vs dblink ?
While the functionality in the dblink extension is similar to that in the foreign data wrapper, the Postgres foreign data wrapper is more SQL standard compliant and can provide improved performance over dblink connections.
14. VACUUM & ANALYZE command does not replicate in postgres.
Logical replication currently has the following restrictions or missing functionality. These might be addressed in future releases.
The database schema and DDL commands are not replicated. The initial schema can be copied by hand using
pg_dump --schema-only. Subsequent schema changes would need to be kept in sync manually. (Note, however, that there is no need for the schemas to be absolutely the same on both sides.) Logical replication is robust when schema definitions change in a live database: When the schema is changed on the publisher and replicated data starts arriving at the subscriber but does not fit into the table schema, replication will error until the schema is updated. In many cases, intermittent errors can be avoided by applying additive schema changes to the subscriber first.Sequence data is not replicated. The data in serial or identity columns backed by sequences will of course be replicated as part of the table, but the sequence itself would still show the start value on the subscriber. If the subscriber is used as a read-only database, then this should typically not be a problem. If, however, some kind of switchover or failover to the subscriber database is intended, then the sequences would need to be updated to the latest values, either by copying the current data from the publisher (perhaps using
pg_dump) or by determining a sufficiently high value from the tables themselves.TRUNCATEcommands are not replicated. This can, of course, be worked around by usingDELETEinstead. To avoid accidentalTRUNCATEinvocations, you can revoke theTRUNCATEprivilege from tables.Large objects are not replicated. There is no workaround for that, other than storing data in normal tables.
Replication is only possible from base tables to base tables. That is, the tables on the publication and on the subscription side must be normal tables, not views, materialized views, partition root tables, or foreign tables. In the case of partitions, you can therefore replicate a partition hierarchy one-to-one, but you cannot currently replicate to a differently partitioned setup. Attempts to replicate tables other than base tables will result in an error.
- no need for an extra storage area where old row versions are kept
ROLLBACKdoes not have to undo anything and is very fast- no overflow problem with transactions that modify many rows
- old, obsolete (“dead”) tuples have to be removed from the table eventually (
VACUUM) - heavily updated tables can become “bloated” with dead tuples
- every update requires new index entries to be added, even if no indexed attribute is modified, and modifying an index is much more expensive than modifying the table (order has to be maintained)
To allow for high concurrency, PostgreSQL uses multiversion concurrency control (MVCC) to store rows. However, MVCC has some downsides for update queries. Specifically, updates require new versions of rows to be added to tables. This can also require new index entries for each updated row, and removal of old versions of rows and their index entries can be expensive.
To help reduce the overhead of updates, PostgreSQL has an optimization called heap-only tuples (HOT). This optimization is possible when:
The update does not modify any columns referenced by the table's indexes, including expression and partial indexes.
There is sufficient free space on the page containing the old row for the updated row.
In such cases, heap-only tuples provide two optimizations:
New index entries are not needed to represent updated rows.
Old versions of updated rows can be completely removed during normal operation, including
SELECTs, instead of requiring periodic vacuum operations. (This is possible because indexes do not reference their page item identifiers.)
In summary, heap-only tuple updates can only be created if columns used by indexes are not updated. You can increase the likelihood of sufficient page space for HOT updates by decreasing a table's fillfactor. If you don't, HOT updates will still happen because new rows will naturally migrate to new pages and existing pages with sufficient free space for new row versions. The system view pg_stat_all_tables allows monitoring of the occurrence of HOT and non-HOT updates.
- Doesn't support bi-direction replication.
- It does support bi-direction replication. Provide two main features.
- prevent loop-back.
- conflict resolution.
- Limitations :
- To replicate the updates and deletes on tables, pglogical requires a primary key or other valid replica identity such as a unique constraint. Without this unique identifier, pglogical can’t identify the updates and deletes. Additionally, foreign key constraints aren’t enforced during the replication process, so any successful operation on the publisher is assumed to be acceptable for the subscriber as well. Considering that foreign key constraints help maintain the referential integrity of data between child and parent tables, it’s important to understanding this limitation and ensure that your workload can’t cause any integrity violations.
- Another important aspect to understand is the handling of sequences. Although pglogical provides a mechanism to sync up the sequence numbers, this doesn’t happen in real time. A better strategy is to have independent sequences on both nodes. For example, you may consider using only odd numbers on one node and only even numbers on the other node. This makes sure that there is no conflict of the sequence numbers.
- UNLOGGED and TEMPORARY tables cannot be replicated.
- DDL changes aren't automatically replicated between replicating nodes. While you may choose to run DDL changes on all replicating clusters individually, you can also use the pglogical function
replicate_ddl_commandto run the schema changes locally and replicate to the other node. If transactions are in transit from one node to another, and at the same time the other node initiates the schema change, the in-transit data changes can’t write to the changed schema, and therefore the replication can break. The safe way to make schema changes to tables being changed on both nodes is to temporarily pause the write traffic. - Foreign keys constraints are not enforced for the replication process - what succeeds on provider side gets applied to subscriber even if the
FOREIGN KEYwould be violated. - PGLogical does not support replication between databases with different encoding. We recommend using
UTF-8encoding in all replicated databases. - Currently pglogical replication and administration requires superuser privileges.
Comments
Post a Comment