Window Functions on Intermediate Level is a concept that usually takes a while to grasp. In this article, we’ll use the same data and will show a few aggregation examples that introduce the terms “Window Function” and “Level” separately and then put them all together in the final sample.
The image below demonstrates the raw data that we will use in all examples that follow.
First Aggregation: From Raw Data to Intermediate Level
In this sample, we aggregate data to display cumulative sales per product, per year. In technical terms, you set Year and Product as dimensions, and Sum(Sales) as measure.
Groups of records on Raw Data level become summarized into single records on the Visualization Level. In the final example, data at this level will be further aggregated. That's why we refer to it as an "Intermediate Level".
Introducing a Window Function
Now suppose you need to display yearly trends - how much a product's sales increased or decreased from year to year. The aggregation level is the same, but you need to calculate values over a "window" of values/records. In this example, the window includes the current and previous values, and the different between them is displayed.
In technical terms, you need to switch the calculation type to Difference, where Product is a Partitioning Dimension (we want to group data by product) and Year is a Window Dimension (we calculate difference in yearly sales).
As you can see in the editor, this is the resulting expression:
Sum(Sales) - Lookup(Sum(Sales), - 1)
This is the output:
Putting It All Together: Window Function at an Intermediate Level
In this section, we add another aggregation layer on top. For each product, we display only one value - the maximum difference in yearly sales. We need the Window function results from the previous step. It calculates values on the level that's no longer the visualization level, but an intermediate aggregation level.
Here's the result we need to achieve:
To do that, you can use the following expression:
Max(aggr(w(Sum(Sales) - Lookup(Sum(Sales), - 1), partitionBy(Product), orderBy(Year)), Product, Year))
The newly introduced w() function allows us to get into a deeper aggregation level and calculate the window function - yearly sales trends. Let's take a closer look at how it works. Follow the expression from right to left (backwards):• The Product and Year parameters define that the aggregation - aggr() function - should be carried out per product, per year.
• The first parameter of the aggregation is the w() function expression. This is a window function, much like the one we defined in the previous example. It calculates the different between sum of sales in current and previous "record".
• The w() function partitions data by Product and sorts it by Year - so that we can obtain correct trend values.
• The Max() function is finally used to obtain the maximum value per product. These values are highlighted on the screenshot below.