SQL Server AlwaysOn synchronous-commit 환경에서 동기화 레이턴시 트러블슈팅

 

·         Version : SQL Server

 

SQL Server AlwaysOn synchronous-commit(동기 커밋환경에서HADR_SYNC_COMMIT 대기 정보를 이용해서 레이턴시 상태를 확인할  있다. HADR_SYNC_COMMIT대기는 SQL Server 원격 복제본의 신호가 트랜잭션을 커밋하기를 기다리고 있음을 나타낸다트래잭션 커밋 대기 시간에 대한 자세한 정보는 아래 링크를 확인한다.

·         Troubleshooting High HADR_SYNC_COMMIT wait type with Always On Availability Groups :  https://blogs.msdn.microsoft.com/sql_server_team/troubleshooting-high-hadr_sync_commit-wait-type-with-always-on-availability-groups/

·         SQL Server 2012 AlwaysOn – Part 12 – Performance Aspects and Performance Monitoring II : https://blogs.msdn.microsoft.com/saponsqlserver/2013/04/24/sql-server-2012-alwayson-part-12-performance-aspects-and-performance-monitoring-ii/

 

  링크를 살펴보면 트랜잭션 지연은 다음  성능 카운터로 평가하는 것을   있다.

·         SQL Server:Database Replica –> Transaction Delay

·         SQL Server:Database Replica –> Mirrored Write Transactions/sec

 

예를 들어 AG에서 노드 성능이 좋지 않아 “SQL Server:Database Replica –> Transaction Delay = 1000ms” 이고 “SQL Server:Database Replica –> Mirrored Write Transactions/sec  = 50”이라고 가정하면 평균적으로 트랜잭션 지연시간은 1000ms/50= 20ms 이다.  동기 커밋에 대한 자세한 정보는 아래 링크를 확인한다.

·         AlwaysON – HADRON Learning Series: – How does AlwaysON Process a Synchronous Commit Request : https://blogs.msdn.microsoft.com/psssql/2011/04/01/alwayson-hadron-learning-series-how-does-alwayson-process-a-synchronous-commit-request/

·         Update adds AlwaysOn extended events and performance counters in SQL Server 2014 or 2016 : https://support.microsoft.com/en-us/help/3173156/update-adds-alwayson-extended-events-and-performance-counters-in-sql-s

 

아래 그림은 동기 커밋 모드에서 로그 블록이 Replica 서버에 커밋되는  단계마다 XEvent 캡쳐되는 흐름을 보여준다.

 


 그림에서 살펴보면 XEvent 추적이 캡처되면 로그 블록 이동의  단계에서 중요한 시점을  수있으므로 트랜잭션 대기 시간의 정확한 위치를   있다일반적으로 지연은  부분으로 나누어진다.

1.       Primary 서버에서 log harden 지속 시간

·         Log_flush_start(step 2)  Log_flush_complete(step 3) 델타 시간은 동일하다.

2.       Replica 서버에서 log harden 지속 시간

·         Log_flush_start(step 10)  Log_flush_complete(step 11) 델타 시간은 동일하다.

3.       네트워크 트래픽의 지속 시간

·         Primary : hadr_log_block_send_complete ->secondary:hadr_transport_receive_log_block_message (step 6-7)

·         Secondary : hadr_lsn_send_complete->primary:hadr_receive_harden_lsn_message (step 12-13)

 

아래 스크립트는XEvent 사용하여  단계에 소요되는 시간을 캡처할  있다.

/* Note: this trace could generate very large amount of data very quickly, depends on the actual transaction rate. On a busy server it can grow several GB per minute, so do not run the script too long to avoid the impact to the production server.  */

 

CREATE EVENT SESSION [AlwaysOn_Data_Movement_Tracing] ON SERVER

ADD EVENT sqlserver.file_write_completed,

ADD EVENT sqlserver.file_write_enqueued,

ADD EVENT sqlserver.hadr_apply_log_block,

ADD EVENT sqlserver.hadr_apply_vlfheader,

ADD EVENT sqlserver.hadr_capture_compressed_log_cache,

ADD EVENT sqlserver.hadr_capture_filestream_wait,

ADD EVENT sqlserver.hadr_capture_log_block,

ADD EVENT sqlserver.hadr_capture_vlfheader,

ADD EVENT sqlserver.hadr_db_commit_mgr_harden,

ADD EVENT sqlserver.hadr_db_commit_mgr_harden_still_waiting,

ADD EVENT sqlserver.hadr_db_commit_mgr_update_harden,

ADD EVENT sqlserver.hadr_filestream_processed_block,

ADD EVENT sqlserver.hadr_log_block_compression,

ADD EVENT sqlserver.hadr_log_block_decompression,

ADD EVENT sqlserver.hadr_log_block_group_commit ,

ADD EVENT sqlserver.hadr_log_block_send_complete,

ADD EVENT sqlserver.hadr_lsn_send_complete,

ADD EVENT sqlserver.hadr_receive_harden_lsn_message,

ADD EVENT sqlserver.hadr_send_harden_lsn_message,

ADD EVENT sqlserver.hadr_transport_flow_control_action,

ADD EVENT sqlserver.hadr_transport_receive_log_block_message,

ADD EVENT sqlserver.log_block_pushed_to_logpool,

ADD EVENT sqlserver.log_flush_complete ,

ADD EVENT sqlserver.log_flush_start,

ADD EVENT sqlserver.recovery_unit_harden_log_timestamps

ADD TARGET package0.event_file(SETfilename=N'c:\mslog\AlwaysOn_Data_Movement_Tracing.xel',max_file_size=(500),max_rollover_files=(4))

WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=ON)

 

GO

 

아래 그림은 XEvent 실행하여 캡처한 결과이다.

[Primary]

 

[Second synchronous replica]

 


참고로 hadr_receive_harden_lsn_message log_block_id(14602889512) 다른 ID(146028889488) 동일하지 않는데,  그이유는 항상 다음의 harden log blockID 즉시 반환하기 때문이다. hadr_db_commit_mgr_update_harden xevent 사용하여 XEvent 상호연관 시킬  있다.

 

위의 XEvent 로그를 사용하여 캡처한 데이터를 사용하여 아래 표처럼 만들어서 트랜잭션 커밋의 상세한 지연시간을 확인할  있다 리스트는 네트워크  log harden process 시간 델타(latency) 나열한 것이며 로그 블록의 압축이나 해제   다른 시간이 발생할 수도 있다.

 

From

To

Lantency

Network:

Primary->Secon

Primary:

hadr_log_block_send_complete

 

2018-03-06 16:56:28.2174613

Secondary:

hadr_transport_receive_log_block_message

 

2018-03-06 16:56:32.1241242

3.907 seconds

Network:

Secondary->Primary

Secondary:hadr_lsn_send_complete

2018-03-06 16:56:32.7863432

Primary:hadr_receive_harden_lsn_message

2018-03-06 16:56:33.3732126

0.587 seconds

LogHarden(Primary)

log_flush_start

2018-03-06 16:56:28.2168580

log_flush_complete

2018-03-06 16:56:28.8785928

0.663 seconds

Log Harden(Secondary)

Log_flush_start

2018-03-06 16:56:32.1242499

Log_flush_complete

2018-03-06 16:56:32.7861231

0.663 seconds

 

위에서 언급했든이  주로 발생하는 대기시간은 다음  부분에서 발생한다.

·         복제본간의 네트워크 대기시간 : 3.907 + 0.857 = 4.494

·         Primary Log harden : 0.663

·         Secondary Log harden : 0.663

 

 트랜잭션 지연시간을 얻기에는Primary 로그 플러시와 네트워크 전송이 동시에 발생하기 때문에 합산할  없어 쉽지 않다네트워크에서 4.494초가 발생하였고 Primary 복제본으로부터 컨펌(hadr_receive_harden_lsn_message:2018-03-06 16:56:33.3732126) 받고 완료한 시간을 (log_flush_complete : 2018-03-06 16 : 56 : 28.8785928) 시간을 계산할때 타임스탬프를 수동으로 계산할 필요가 없다. XEvent   개의 hadr_log_block_group_commit사이의 델타시간을   있기 때문이다아래 예시를 살펴보자.

·         Primary: hadr_log_block_group_commit: 2018-03-06 16:56:28.2167393

·         Primary: hadr_log_block_group_commit: 2018-03-06 16:56:33.3732847

·         Total time to commit=delta of above two timestamps= 5.157 seconds

 

 숫자는 네트워크 전송 시간과Secondary서버의 log harden 더한것과 같다 이유는Secondary 서버의log harden  네트워크를 사용할  있기를 기다려야하므 Primary 마찬가지로 동시에 log harden   없기 때문이다.

 

지금까지 AlwaysOn 동기-커밋 모드에서 복제본간 로그 블록 이동에 대해서 살펴보았다그런데 Primary 서버의 XEvent에서 “hadr_db_commit_mgr_harden_still_waiting” 가끔 발생하는 것을 볼수 있는데  이벤트는 Primary 서버가  Secondary 복제본의 확인 메시지를 기다리고 있을때 2 간격으로 발생(2초로 하드코딩 되어있음)한다. 2 내에 Ack 돌아오면 XEvent 표시되지 않는다.

 

[참고자료]

https://blogs.msdn.microsoft.com/psssql/2018/04/05/troubleshooting-data-movement-latency-between-synchronous-commit-always-on-availability-groups/

 

 

2018-07-05 / Sungwook Kang / http://sqlmvp.kr

 

SQL Server, AG, AlwaysON, delay HADR_SYNC_COMMIT latency Performance transaction



출처: https://sqlmvp.tistory.com/1262?category=618825 [Database Lab]
No. Subject Author Date Views
Notice SQL강좌: 챗GPT와 함께 배우는 SQL Server 무료 강좌 목차와 소개 (2023년 9월 업데이트) 코난(김대우) 2023.08.18 33948
Notice Python 무료 강좌 - 기초, 중급, 머신러닝(2023년 6월 업데이트) 코난(김대우) 2021.01.01 17162
2154 SQL Server 가용성 그룹에 데이터베이스 자동으로 추가하기 jevida(강성욱) 2019.03.26 703
2153 SQL Server MySQL PostgreSQL 비교 jevida(강성욱) 2019.03.26 1205
2152 SQL Server 복잡한 쿼리가 옵티마이저에 미치는 영향과 옵티마이저 timeout jevida(강성욱) 2019.03.26 787
2151 SQL Server 2019 에서 문자열 잘림에 대한 향상된 에러 메시지 반환 jevida(강성욱) 2019.03.26 600
2150 SQL Server 2019 에서 업그레이드된sp_estimate_data_compression_savings 프로시저 (컬럼스토어 압축율 예상) jevida(강성욱) 2019.03.26 623
2149 SQL Server 2019 에서 추가된sys.dm_db_page_info, sys.fn_PageResCracker 기능으로 대기 관련 정보 확인 jevida(강성욱) 2019.03.26 923
2148 SQL Server 2019에서 향상된 Rowstore batch mode jevida(강성욱) 2019.03.26 508
2147 SQL Server 2016부터 도입된 USE HINT를 사용한 추적 플래그 활성화 jevida(강성욱) 2019.03.26 542
2146 In-memory optimized table에 사용되는 Hash Index jevida(강성욱) 2019.03.26 475
2145 VM환경에서 AG를 구성하였을때VSS 백업 동작 변경 jevida(강성욱) 2019.03.25 464
2144 SQL Server 2016 향상된 가용성 그룹 – 데이터베이스 수준의 상태 탐지 장애조치 jevida(강성욱) 2019.03.25 488
2143 SQL Server 2016 대용량 데이터 로드시 최소 로깅(minimal logging) 과Batch Size jevida(강성욱) 2019.03.25 438
2142 SQL Server 설치시 발생하는 1638 오류 jevida(강성욱) 2019.03.25 681
2141 SQL Server 666코드의 고유 식별자 오류 jevida(강성욱) 2019.03.25 365
» SQL Server AlwaysOn synchronous-commit 환경에서 동기화 레이턴시 트러블슈팅 jevida(강성욱) 2019.03.25 468
2139 SQL Server Scheduling and Yielding 트러블슈팅 jevida(강성욱) 2019.03.25 433
2138 SQL Server 2016 Tempdb 경합(contention) 최적화 jevida(강성욱) 2019.03.25 558
2137 XEvent를 사용하여 Auto tuning 작업 모니터링 jevida(강성욱) 2019.03.25 588
2136 SQL Server In-Memory OLTP에 ASP.NET 세션 상태 저장하기 jevida(강성욱) 2019.03.25 450
2135 SQL Server에서 JSON 데이터 저장하기 jevida(강성욱) 2019.03.25 615





XE Login