지난 Microsoft Ignite 2022에서 발표된 내용이라 연계해 간략히 정리.
Azure Cosmos DB for PostgreSQL가 발표되었고, 관련해 OSS 프로젝트인 Citus 정리.

Citus란 무엇인가?
Citus is a PostgreSQL extension that transforms Postgres into a distributed database—so you can achieve high performance at any scale.
https://github.com/citusdata/citus
PostgreSQL에 분산데이터베이스를 제공하는 확장기능.
로컬 구성과 클라우드 환경에서 구성
로컬구성: https://docs.citusdata.com/en/stable/installation/single_node.html
클라우드: https://learn.microsoft.com/azure/cosmos-db/postgresql/quickstart-create-portal
분산테이블 SQL 간단 예제
일반적인 분산 테이블 아키텍처와 코드

아키텍처 컨셉 참조: https://docs.citusdata.com/en/stable/get_started/concepts.html
예제 코드
Creating Distributed Tables
The create_distributed_table UDF will transparently shard your table locally or across the worker nodes:
CREATE TABLE events (
device_id bigint,
event_id bigserial,
event_time timestamptz default now(),
data jsonb not null,
PRIMARY KEY (device_id, event_id)
);
-- distribute the events table across shards placed locally or on the worker nodes
SELECT create_distributed_table('events', 'device_id');
After this operation, queries for a specific device ID will be efficiently routed to a single worker node, while queries across device IDs will be parallelized across the cluster.
-- insert some events
INSERT INTO events (device_id, data)
SELECT s % 100, ('{"measurement":'||random()||'}')::jsonb FROM generate_series(1,1000000) s;
-- get the last 3 events for device 1, routed to a single node
SELECT * FROM events WHERE device_id = 1 ORDER BY event_time DESC, event_id DESC LIMIT 3;
┌───────────┬──────────┬───────────────────────────────┬───────────────────────────────────────┐
│ device_id │ event_id │ event_time │ data │
├───────────┼──────────┼───────────────────────────────┼───────────────────────────────────────┤
│ 1 │ 1999901 │ 2021-03-04 16:00:31.189963+00 │ {"measurement": 0.88722643925054} │
│ 1 │ 1999801 │ 2021-03-04 16:00:31.189963+00 │ {"measurement": 0.6512231304621992} │
│ 1 │ 1999701 │ 2021-03-04 16:00:31.189963+00 │ {"measurement": 0.019368766051897524} │
└───────────┴──────────┴───────────────────────────────┴───────────────────────────────────────┘
(3 rows)
Time: 4.588 ms
-- explain plan for a query that is parallelized across shards, which shows the plan for
-- a query one of the shards and how the aggregation across shards is done
EXPLAIN (VERBOSE ON) SELECT count(*) FROM events;
┌────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├────────────────────────────────────────────────────────────────────────────────────┤
│ Aggregate │
│ Output: COALESCE((pg_catalog.sum(remote_scan.count))::bigint, '0'::bigint) │
│ -> Custom Scan (Citus Adaptive) │
│ ... │
│ -> Task │
│ Query: SELECT count(*) AS count FROM events_102008 events WHERE true │
│ Node: host=localhost port=5432 dbname=postgres │
│ -> Aggregate │
│ -> Seq Scan on public.events_102008 events │
└──────────────────────────────────────
분산데이터베이스의 특성상 instance provision이 자유로운 cloud를 택한것은 좋은 선택.
Postgres를 사용하는 여러 기업 고객들에게 좋은 옵션이 될 것으로 예상.
참고링크
https://github.com/citusdata/citus
https://docs.citusdata.com/en/stable/get_started/what_is_citus.html

