SQL Server 2019 에서 업그레이드된sp_estimate_data_compression_savings 프로시저 (컬럼스토어 압축율 예상)

 

·         Version : SQL Server 2019

 

SQL Server 2019 CTP 2.0에서 스토리지 엔진의 업그레이드 기능으로 columnstore  columnstore archive 압축에 대한 지원으로sp_estimate_data_compression_savings 프로시저 기능이 업데이트 되었다 저장 프로시저는 SQL Server 2008 이후에 사용되었으며 테이블  인덱스 압축을 고려할때 사용할  있다프로시저에서는 schema_name, object_name, index_id, partition_number  data_compression 5가지 매개 변수가 필요하다.

·         sp_estimate_data_compression_savings : https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-estimate-data-compression-savings-transact-sql?view=sql-server-2017

 

SQL Server 2019 이전에는 data_compression 매개변수가 ‘ROW’, ‘PAGE’ 또는 ‘NONE’ 이다. SQL Server 2019 CTP 2.0에서는 ‘COLUMNSTORE’  ‘COLUMNSTORE_ARCHIVE’라는  가지 새로운 옵션을 추가 되었다. rowstore 테이블  인덱스의 경우 압축을 적용하는 방법은 원하는 오브젝트를 다시 작성하면서 압축을 한다아래 스크립트는  압축을 사용하여  테이블을 다시 작성한다.

ALTER TABLE Production.TransactionHistory REBUILD PARTITION = ALL

WITH (DATA_COMPRESSION = ROW);

GO

 

압축 비율은 테이블에 저장되는 데이터 유형에 크게 의존하므로 압축 공간 절약을 예상하는 유일한 좋은 방법은 실제로 데이터를 샘플링 압축하고  결과를 확인하는 것이다 저장소 압축 유형에서 sp_estimate_data_comopression_savings 작동하는 방식은 소스 객체 (5000 페이지 이상) 샘플을 샘플링하고 tempdb 샘플 객체를 만든다그런다음 원하는 압축 상태를 사용하여 샘플 객체가 다시 작성된다추정된 압축비는 원래 샘플 객체 크기를 압축된 샘플 개체 크기와 비교함으로써 계산된다 비율은 원하는 압축 설정이 적용된 경우 전체 원본 개체의 예상 크기를 투영하는데 사용된다압축이 이미 적용된 소스 객체의 경우 data_compression 매개 변수  ‘NONE’ 제공하여 압축이 제거된 상태로 객체 크기를 추정하는데 사용할  있다.

columnstore  압축에서는 컬럼스토어 인덱스를 생성할때 일반적으로 columnstore 또는 columnstore archive 압축에 영향을 미친다 저장소 원본 객체에 ‘COLUMNSTORE’ 또는 ‘COLUMNSTORE_ARCHIVE’라는 data_compression 매개 변수 값을 사용하면 동일한  저장소 객체를 다른 압축 상태와 비교하지 않고  저장소 객체를 동일한 columnstore 객체와 비교한다 과정은 거의 동일하다소스 객체에서 샘플링 하여 tempdb  샘플 객체를 만들고 같은 데이터를 사용하여 동등한 columnstore 인덱스를 만들고소스 객체의 크기와 columnstore 인덱스의 크기를 비교하여 압축 비율을계산한다. 

 

아래 표는 압축 비율을 계산하는데 사용하는 다양한 참조 객체 유형이다이는 ‘COLUMNSTORE’  ‘COLUMNSTORE_ARCHIVE’ data_compression 옵션 모두에 적용된다.

Source Object

Reference Object

Heap

Clustered columnstore index

Clustered index

Clustered columnstore index

Non-clustered index

Non-clustered columnstore index (including the key columns and any included columns of the provided non-clustered index, as well as the partition column of the table, if any)

Non-clustered columnstore index

Non-clustered columnstore index (including the same columns as the provided non-clustered columnstore index)

Clustered columnstore index

Clustered columnstore index

 

column store에서 rowstore 압축 상태로 다른 방향으로 추정할  있다 옵션은 모든  저장소 옵션은 ‘ROW’, ‘PAGE’  ‘NONE’ 적용된다.

Source Object

Reference Object

Clustered columnstore index

Heap

Non-clustered columnstore index

Non-clustered index (including the columns contained in the non-clustered columnstore index as key columns, and the partition column of the table, if any, as an included column)

 

기존 columnstore 원본 객체에 대해 ‘COLUMNSTORE’ 또는 ‘COLUMNSTORE”ARCHIVE’ 옵션을 사용하는 경우 절차는 이전과 동일하게 작동하고 샘플 columnstore 인덱스를 만들고 원하는 압축 상태로 해당 인덱스를 다시 작성한다.

 

일반적으로 데이터 압축은 많은 공간과 많은 I/O (논리적  물리적) 절약할  있지만 데이터를 압축하고 해제하는데  많은 CPU 리소스가 필요하다특히 업데이트가 많은 데이터의 경우 성능 문제가 발생할 수있다이것은 columnstore에서 두배이상의 오버헤드가 있다이러한 인덱스는 주로 읽기 데이터 용으로 설계되었으며 columnstore 대한 빈번한  업데이트는 효율적이지 않은 구조이다. sp_estimate_data_compression_savings 작업 패턴을 면밀히 분석하면 데이터  응용프로그램에 적합한 인덱스  압축을 선택할  있다.

 

[참고자료]

https://blogs.msdn.microsoft.com/sql_server_team/sql-server-2019-ctp-2-0-new-features-columnstore-support-for-sp_estimate_data_compression_savings/

 

 

2018-10-24 / Sungwook Kang / http://sqlmvp.kr

 

SQL Server, MSSQL, SQL 2019, Rowstore, column sotre, data compression, sp_estimate_data_compression_saving



출처: https://sqlmvp.tistory.com/1273?category=618825 [Database Lab]




profile

강성욱 / jevida@naver.com
Microsoft SQL Server MVP
Blog : http://sqlmvp.kr
Facebook : http://facebook.com/sqlmvp

Kakao Talk : SQLMVP

Line : jevida


현재 LA에 거주하고 있으며 SQL에 관심있는 분이면 언제든 친추 환영합니다.