Skip to main content

Value Encoding in Vertipaq Engine

If you can remember my last blog post regarding Vertipaq engine inside SSAS Tabular, I’ve discussed three algorithms which are in use when process the model. Processing in the sense perform data loading from the relational source and load into the tabular structure. In here data compression is taken place in order to save the memory footprint. It is really important because all the data we had in the data warehouse or source relational database after processed the model load into the memory. So by the compression save the huge amount of memory space and it will utilize your hardware optimum way while faster scans because the data model is smaller than the original.

These are the steps taking place when we process the tabular model from SSDT or via SQL Server Management Studio. 
  1. Read the data from the source database and transform into columnar structure or vertipaq data structure while data encoding and compression occurs.
  2. Creating of dictionaries and indexes for each column.
  3. Creation of relationships for data structures
  4. Computation and compression of all the calculated columns
Step 3 and 4 interchangeable. Which means it also possible to create relationships based on the calculated columns you created. 

Let's focus on data encoding and compression in this post. If you can remember I mentioned there are three types of encoding happening behind the scene. Hash Encoding, Value Encoding and RLE

Value Encoding 

We are playing with numbers. In data warehousing, we model dimensions and facts. While all the relational entities converting into dimensions, all the numeric measures which related to business processes put into fact table structure. 
When we talk about fact tables mostly contain an integer which represents the all the surrogate keys and some measures and floating point values for other numeric values. Value encoding evolves with integer columns only.

Let's have a look at how the value encoding works. For an example look at the CityKey column in Order fact table in the WideWorldImporters data warehouse database. From the Vertipaq engine perform some mathematical operation to do the compression. 



In here what has done subtract the Original value from the minimum value 41165 in the column. If the original value consumed 32 Bit Integer to store the data after the compression it only takes 16 Bit integer. It almost reduces the memory footprint by 50%. Typically fact tables contain millions or even billion rows sometimes. Can you imagine how much space save by this value encoding? 
By the time when query this column what vertipaq does re-apply the same function to get the original value back. In reality, the engine does even more advanced calculations to reduce the memory footprint. 

Will discuss about Dictionary encoding/Hash encoding from the Next post!

Reference: SQL BI

Comments

Popular posts from this blog

Run T-SQL Script Files Using Command Line

Most of the time when we need to execute a SQL script or statement, then we go for SQL Server Management Studio to execute our T-SQL scripts. But the same result we can achieve using Command Prompt. This is very simple way to do that just a matter of write a command and pass few arguments. You can see the process status like as in when you execute a query in Management studio. Use-case : There may be having some situations you won't be able to execute T-SQL script using Management Studio. Most of the time when doing data population. In my case, I had a situation I needed to populate a Database without using backup/restore. Once I generated script with all the schemas and related data it took around 400+Mb sql file.  SQL Server Management Studio will not allow you to run the script unless you execute the query in high-end box. Because, when the file loaded to the memory it will too hard to cater our requirement. You may probably get  memory ...

How to Create a Date Table in Few Steps Using Power BI

Calendar Dimension or Calendar Table is one of the crucial table in a Power BI model. I never have seen any data warehouse data model which does not have a Calendar table so far in my development. Because, when it comes to data warehousing or dimension modeling you may storing various business processes or events as Facts. So those events anyway occur in particular date or time. So simply there wouldn’t be a Power BI data model without a date table. In this post, I’m going to share with you how to create a date table within few steps. Actually, there are two DAX measures which we can use to create a date table. Calendar(DAX) and CalendarAuto(Auto). Calendar(DAX) = You can pass start date and End date as parameters you need to create date table. CalendarAuto(DAX) = You can use this function without passing any parameter. Then it will generate the dates based on your data model dates.  You can simply copy the below DAX code and paste in your Power BI Desktop DAX e...

What is MAXDOP in SQL Server

As we every one know we every application we are using run on top of an OS. Typically Windows operating system or may be Linux. SQL Server is running on top of Windows OS, now Linux as well . SQL Server has its own OS, which do the lot of tasks like , scheduling threads for CPU, memory managing, synchronizing, deadlocks detection, and many more.. Normally in our machine  has multiple CPUs or logical processors which allows to do our tasks fast and parallelly. We can check the number of CPUs our machine has in many ways. 1. Right-click task bar -> Task Manager -> Performance tab 2. We can write a SELECT query and retrieve the number of logical processors or cores belongs to our computer SELECT cpu_count FROM sys.dm_os_sys_info; GO Now lets see what is MAXDOP MAXDOP or Maximum number of Degree Of Parallelism means we can allow or limit by set the value for MAXDOP as a query hint or else configure in sp_configure global configurations and execute the query...