If you can remember my last blog post regarding Vertipaq engine inside SSAS Tabular, I’ve discussed three algorithms which are in use when process the model. Processing in the sense perform data loading from the relational source and load into the tabular structure. In here data compression is taken place in order to save the memory footprint. It is really important because all the data we had in the data warehouse or source relational database after processed the model load into the memory. So by the compression save the huge amount of memory space and it will utilize your hardware optimum way while faster scans because the data model is smaller than the original.
These are the steps taking place when we process the tabular model from SSDT or via SQL Server Management Studio.
- Read the data from the source database and transform into columnar structure or vertipaq data structure while data encoding and compression occurs.
- Creating of dictionaries and indexes for each column.
- Creation of relationships for data structures
- Computation and compression of all the calculated columns
Step 3 and 4 interchangeable. Which means it also possible to create relationships based on the calculated columns you created.
Let's focus on data encoding and compression in this post. If you can remember I mentioned there are three types of encoding happening behind the scene. Hash Encoding, Value Encoding and RLE
Value Encoding
We are playing with numbers. In data warehousing, we model dimensions and facts. While all the relational entities converting into dimensions, all the numeric measures which related to business processes put into fact table structure.
When we talk about fact tables mostly contain an integer which represents the all the surrogate keys and some measures and floating point values for other numeric values. Value encoding evolves with integer columns only.
Let's have a look at how the value encoding works. For an example look at the CityKey column in Order fact table in the WideWorldImporters data warehouse database. From the Vertipaq engine perform some mathematical operation to do the compression.
In here what has done subtract the Original value from the minimum value 41165 in the column. If the original value consumed 32 Bit Integer to store the data after the compression it only takes 16 Bit integer. It almost reduces the memory footprint by 50%. Typically fact tables contain millions or even billion rows sometimes. Can you imagine how much space save by this value encoding?
By the time when query this column what vertipaq does re-apply the same function to get the original value back. In reality, the engine does even more advanced calculations to reduce the memory footprint.
Comments
Post a Comment