지난 Microsoft Ignite 2022에서 발표된 내용이라 연계해 간략히 정리.
Azure Cosmos DB for PostgreSQL가 발표되었고, 관련해 OSS 프로젝트인 Citus 정리.
Citus란 무엇인가?
Citus is a PostgreSQL extension that transforms Postgres into a distributed database—so you can achieve high performance at any scale.
https://github.com/citusdata/citus
PostgreSQL에 분산데이터베이스를 제공하는 확장기능.
로컬 구성과 클라우드 환경에서 구성
로컬구성: https://docs.citusdata.com/en/stable/installation/single_node.html
클라우드: https://learn.microsoft.com/azure/cosmos-db/postgresql/quickstart-create-portal
분산테이블 SQL 간단 예제
일반적인 분산 테이블 아키텍처와 코드
아키텍처 컨셉 참조: https://docs.citusdata.com/en/stable/get_started/concepts.html
예제 코드
Creating Distributed Tables The create_distributed_table UDF will transparently shard your table locally or across the worker nodes: CREATE TABLE events ( device_id bigint, event_id bigserial, event_time timestamptz default now(), data jsonb not null, PRIMARY KEY (device_id, event_id) ); -- distribute the events table across shards placed locally or on the worker nodes SELECT create_distributed_table('events', 'device_id'); After this operation, queries for a specific device ID will be efficiently routed to a single worker node, while queries across device IDs will be parallelized across the cluster. -- insert some events INSERT INTO events (device_id, data) SELECT s % 100, ('{"measurement":'||random()||'}')::jsonb FROM generate_series(1,1000000) s; -- get the last 3 events for device 1, routed to a single node SELECT * FROM events WHERE device_id = 1 ORDER BY event_time DESC, event_id DESC LIMIT 3; ┌───────────┬──────────┬───────────────────────────────┬───────────────────────────────────────┐ │ device_id │ event_id │ event_time │ data │ ├───────────┼──────────┼───────────────────────────────┼───────────────────────────────────────┤ │ 1 │ 1999901 │ 2021-03-04 16:00:31.189963+00 │ {"measurement": 0.88722643925054} │ │ 1 │ 1999801 │ 2021-03-04 16:00:31.189963+00 │ {"measurement": 0.6512231304621992} │ │ 1 │ 1999701 │ 2021-03-04 16:00:31.189963+00 │ {"measurement": 0.019368766051897524} │ └───────────┴──────────┴───────────────────────────────┴───────────────────────────────────────┘ (3 rows) Time: 4.588 ms -- explain plan for a query that is parallelized across shards, which shows the plan for -- a query one of the shards and how the aggregation across shards is done EXPLAIN (VERBOSE ON) SELECT count(*) FROM events; ┌────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├────────────────────────────────────────────────────────────────────────────────────┤ │ Aggregate │ │ Output: COALESCE((pg_catalog.sum(remote_scan.count))::bigint, '0'::bigint) │ │ -> Custom Scan (Citus Adaptive) │ │ ... │ │ -> Task │ │ Query: SELECT count(*) AS count FROM events_102008 events WHERE true │ │ Node: host=localhost port=5432 dbname=postgres │ │ -> Aggregate │ │ -> Seq Scan on public.events_102008 events │ └──────────────────────────────────────
분산데이터베이스의 특성상 instance provision이 자유로운 cloud를 택한것은 좋은 선택.
Postgres를 사용하는 여러 기업 고객들에게 좋은 옵션이 될 것으로 예상.
참고링크
https://github.com/citusdata/citus
https://docs.citusdata.com/en/stable/get_started/what_is_citus.html