vrijdag 27 januari 2017

Slowing Changing Dimensions (SCD) explained

Introduction

Changes in data in a data warehouse (Slowing Changing Dimension principle) is a key concept of a data warehouse. If you track the changes in a data warehouse it is possible to go back in time and see what happened with the data. That is an advantage. The disadvantage is that complexity is added in the ETL process and in the reports.

In this blogpost I'll explain the fundamentals of Slowly Changing Dimensions with a small example in PowerPivot and Excel.

Data model

Below, the data model is shown with two dimensions and a fact. The dimensions are a product dimension, in which products are categorized and one date dimension (granularity month). The fact table is a simple fact table that has a number of keys to the dimensions DimCalendar and DimProduct. There is a measure NetRevenue present.



Example data

For this article, I have created a small example in Excel where a product with a ID 4 draws some attention, the hairdresser. This product is in the category of transportation. This is accidentally or it may have some other reason. But, most logical is that the product is in the wrong category.


If we handle this in a graph with the categories Personal Care and Transportation together, it looks like this.
  



We look to the category Personal Care only and it looks as follows.


Slicing the data with the category "Transportation" will show this graph.


Slowly Changing Dimension 1 (TYPE I)

Meanwhile, we found out (in this scenario) that the categories are not good. We see that the product Hairdryer falls into the wrong category. We change the category to Transportation for  Hairdryer. This looks like this:




If we change the categories (hairdryer is personal care) then this has effect on the charts. It changes the charts and not just at the moment of the change but also retroactively. Even if we correct data now, everything is changed in the past. This may be desirable (most likely in this particular example), but it may also have undesirable effects in other situations.

For example, it can also take place at the moment that a company changes their range of products. What happens then? New products and categories are created, products may have moved to another category, products are made inactive or deleted in the source systems. In short, it depends on what concern is the source of the kind of change. Is there an error or is a strategic choice the reason to change the products and categories. It is difficult to guess the true reason of a change in the data.

Let's see what if a product changes from category. Product4, the hairdryer has changed category Transportation to Personal Care. The following occurs. It seems that the change generated additional revenue in the year for the category Personal Care. Good news for the sales manager who is responsible for the category Personal Care.




However, there is also a change that takes place in the chart for the category Transportation. Unfortunately, the sales manager will have to do with a lesser bonus this year. He or she may enjoy less bonus. The turnover has moved from the category Transportation to Personal Care.



Slowly Changing Dimension 2 (TYPE II)

May be it is desirable what I have descibed in the former paragraph (a logical correction of the categories), but it can also have undesirable effects, for instance when another series of products is introduced. Then you want the figures as they were and these should be calculated differently in contrast with the new product series. How can we achieve this? Suppose we make the product dimension SCD2? What happens then? If there is a change in the source system then both the old record is preserved and a new record is created. See below on the right side There is a new record created for hairdryer and a new record ProductID 8. Suppose that a new record is loaded into the data warehouse? See this event on the left. At 201801 there will be a sales record with a revenue of 500.




And here you can see now that nothing has changed in 2017 for both categories? Below the composite chart with NetRevenue both personal care as well as transportation. In 2017 there is absolutely nothing that changed, but we had this in SCD1 too. The total of the categories remained the same.




Let's look at the NetRevenue of the underlying products in each category. We also see no change for 2017. 


And Personal Care chart looks as follows for 2017



Slowly Changing Dimension 3 (TYPE 3)

There is another variant that can be very useful, namely SCD3. This is a variant which can be used in the case of a comparison between an old and a new situation. Let us consider a change in the producthierarchy. Let's take the example above with products and categories. In the above case, the producthierarchy can change at a given time (rightly or wrongly). Suppose we now want to see the new data with the old producthierarchy. In other words, it would be like there is no new producthierarchy. What would have looked the numbers? And they also want to know how the figures in the past with the new product hierarchy would have seen if there had always been a new producthierarchy. The old product hierarchy has never existed. Two varieties :
  • With an old producthierachy you want to see both old and new data.
  • A new product hierarchy can see both the old and the new data.

In the data model, you can fix this. I've added an extra column and the old one is renamed to Category Before and the after producthierarchy we call CategoryAfter.


Now we can therefore make a distinction between 'for the change' and 'after the change'. In the example below, there is now a new slicer, namely CategoryAfter. With this we can now switch between the new and the old situation and compare them to see what has changed.



Conclusion

In this article, we discussed the impact of a change in dimension data . In addition, various options put forward to be here to deal with in order to achieve a desired result. The suggestions made are not applicable to every situation, "it depends." It is necessary to check each situation.

Greetz,

Hennie

Geen opmerkingen:

Een reactie posten