The full script for this post can be found here.

This post uses the AdventureWorksDW database in SQL Server 2019 Express

Depending on your preference, you can either watch this video or read below.

The goal

Suppose we have a request to produce a report:

Create a report showing internet sales amount with these columns:

  • Order year
  • Country
  • Product category
  • Sales amount
Include a sub-total row showing total sales amount for each year. The value in the Country column on the sub-total row should be “All countries in YYYY”. The value in the Product category column on the sub-total row should be “All product categories in YYYY”. Sort the result by ascending year, ascending country and within each country, sort the product categories by descending sum of sales amount. The sub-total row for each year should appear at the bottom of the rows for that year.

The data

We have the following query:

SELECT YEAR(fis.OrderDate) AS "Order year",
       dg.CountryRegionCode AS "Country",
       dpc.EnglishProductCategoryName AS "Product category",
       SUM(fis.SalesAmount) AS "Sales amount"
FROM FactInternetSales fis
  INNER JOIN DimCustomer dc ON fis.CustomerKey = dc.CustomerKey
  INNER JOIN DimGeography dg ON dc.GeographyKey = dg.GeographyKey
  INNER JOIN DimProduct dp ON fis.ProductKey = dp.ProductKey
  INNER JOIN DimProductSubcategory dps ON dp.ProductSubcategoryKey = dps.ProductSubcategoryKey
  INNER JOIN DimProductCategory dpc ON dps.ProductCategoryKey = dpc.ProductCategoryKey
GROUP BY YEAR(fis.OrderDate),
         dg.CountryRegionCode,
         dpc.EnglishProductCategoryName
ORDER BY 1, 2, 4 DESC;

We’ve created a group by query and summed the sales amount by the requested columns. 

Since the requirements have requested specific column names, we have aliased each column exactly as requested and enclosed the aliases in double-quotes in order to be able to use spaces in the column headers.

The ORDER BY clause is using positional references to sort by columns 1 and 2 (Order year and Country) in ascending order, and column 4 (Sales amount) in descending order.

The query above returns results that look like this:

We can see that we have the sum of sales amount by each unique combination of year, country and product category. 

We need to add the sub-totals. 

There are three common ways to do this.

Method 1 – UNION ALL

The first way to add sub-totals to a query is to use UNION ALL to append a second query to the first.

WITH dat
AS
(
SELECT YEAR(fis.OrderDate) AS "Order year",
       dg.CountryRegionCode AS "Country",
       dpc.EnglishProductCategoryName AS "Product category",
       0 AS country_type,
       SUM(fis.SalesAmount) AS "Sales amount"
FROM FactInternetSales fis
  INNER JOIN DimCustomer dc ON fis.CustomerKey = dc.CustomerKey
  INNER JOIN DimGeography dg ON dc.GeographyKey = dg.GeographyKey
  INNER JOIN DimProduct dp ON fis.ProductKey = dp.ProductKey
  INNER JOIN DimProductSubcategory dps ON dp.ProductSubcategoryKey = dps.ProductSubcategoryKey
  INNER JOIN DimProductCategory dpc ON dps.ProductCategoryKey = dpc.ProductCategoryKey
GROUP BY YEAR(fis.OrderDate),
         dg.CountryRegionCode,
         dpc.EnglishProductCategoryName
UNION ALL
SELECT YEAR(fis.OrderDate) AS "Order year",
       'All countries in ' + CAST(YEAR(fis.OrderDate) AS nvarchar(4)) AS "Country",
       'All product categories in ' + CAST(YEAR(fis.OrderDate) AS nvarchar(4)) AS "Product category",
       1 AS country_type,
       SUM(fis.SalesAmount) AS "Sales amount"
FROM FactInternetSales fis
GROUP BY YEAR(fis.OrderDate)
)
SELECT "Order year", "Country", "Product category", "Sales amount"
FROM dat
ORDER BY "Order year", country_type, "Country", "Sales amount" DESC;

There are a few things to note about this query.

  1. We add a second query joined to the first by using UNION ALL. 
  2. The second query is only grouping by year, as we would expect.
  3. A UNION ALL query must have the same number of columns in each part of the query (above and below the UNION ALL operator), so we  need to provide default values for the Country and Product category columns. These are defined per the requirements.
  4. If we tried to put the ORDER BY clause directly under the UNION ALL and didn’t wrap the UNION ALL inside a CTE (Common-table expression), we would find that the sub-total rows are sorted to the top of each year (because the sub-totals have the word “All…” in their definition). It would look like this:
SELECT YEAR(fis.OrderDate) AS "Order year",
       dg.CountryRegionCode AS "Country",
       dpc.EnglishProductCategoryName AS "Product category",
       SUM(fis.SalesAmount) AS "Sales amount"
FROM FactInternetSales fis
  INNER JOIN DimCustomer dc ON fis.CustomerKey = dc.CustomerKey
  INNER JOIN DimGeography dg ON dc.GeographyKey = dg.GeographyKey
  INNER JOIN DimProduct dp ON fis.ProductKey = dp.ProductKey
  INNER JOIN DimProductSubcategory dps ON dp.ProductSubcategoryKey = dps.ProductSubcategoryKey
  INNER JOIN DimProductCategory dpc ON dps.ProductCategoryKey = dpc.ProductCategoryKey
GROUP BY YEAR(fis.OrderDate),
         dg.CountryRegionCode,
         dpc.EnglishProductCategoryName
UNION ALL
SELECT YEAR(fis.OrderDate) AS "Order year",
       'All countries in ' + CAST(YEAR(fis.OrderDate) AS nvarchar(4)) AS "Country",
       'All product categories in ' + CAST(YEAR(fis.OrderDate) AS nvarchar(4)) AS "Product category",
       SUM(fis.SalesAmount) AS "Sales amount"
FROM FactInternetSales fis
GROUP BY YEAR(fis.OrderDate)
ORDER BY 1,2,4 DESC;

  1. In order to sort the sub-totals at the bottom of each year, we would preferably put something like CASE WHEN LEFT(“Country”,3) = ‘All’ THEN 1 ELSE 0 END in second position of the ORDER BY clause, to ensure that the sub-total appears below each of the product categories and countries. However, when using UNION ALL, each column mentioned in ORDER BY must also be in SELECT. So, we must add an additional column – country_type – to put 0 (zero) next to each row in the top of the UNION ALL and 1 next to each row in the bottom of the UNION ALL, then use this new column in the ORDER BY clause. 
  2. This creates an additional problem in that we now have a column in the output that we don’t want – country_type. In order to get rid of that column, we must then wrap the entire UNION ALL query in a CTE, then select from and order by that CTE. 

The result of the UNION ALL method looks like this. You can see that now we have fulfilled all the requirements (specific column names, specific ordering, sub-totals at the bottom of each year):

This gets us what we want – but it’s unnecessarily complicated. We have to use UNION ALL, CTE and add columns to get what we want.

Method 2 – ROLLUP

The second way to add sub-totals to a query is to use ROLLUP in the GROUP BY clause to specify that we want to roll-up the values (i.e. aggregate them) in those columns.

SELECT YEAR(fis.OrderDate) AS "Order year",
       CASE 
        WHEN GROUPING(dg.CountryRegionCode) = 1 THEN 'All countries in ' + CAST(YEAR(fis.OrderDate) AS nvarchar(4))
        ELSE dg.CountryRegionCode 
       END AS "Country",
       CASE 
        WHEN GROUPING(dpc.EnglishProductCategoryName) = 1 THEN 'All product categories in ' + CAST(YEAR(fis.OrderDate) AS nvarchar(4))
        ELSE dpc.EnglishProductCategoryName 
       END AS "Product category",
       SUM(fis.SalesAmount) AS "Sales amount"
FROM FactInternetSales fis
  INNER JOIN DimCustomer dc ON fis.CustomerKey = dc.CustomerKey
  INNER JOIN DimGeography dg ON dc.GeographyKey = dg.GeographyKey
  INNER JOIN DimProduct dp ON fis.ProductKey = dp.ProductKey
  INNER JOIN DimProductSubcategory dps ON dp.ProductSubcategoryKey = dps.ProductSubcategoryKey
  INNER JOIN DimProductCategory dpc ON dps.ProductCategoryKey = dpc.ProductCategoryKey
GROUP BY YEAR(fis.OrderDate),
         ROLLUP(dg.CountryRegionCode,
         dpc.EnglishProductCategoryName)
HAVING GROUPING(dg.CountryRegionCode) + GROUPING(dpc.EnglishProductCategoryName) <> 1
ORDER BY 1, GROUPING(dg.CountryRegionCode), 2, 4 DESC;

Again, let’s note a few things about this method:

  1. We don’t need to use the UNION ALL operator and a separate query.
  2. We have wrapped the CountryRegionCode and the EnglishProductCategoryName in the GROUP BY clause inside the ROLLUP function. This has the effect of rolling up those columns to create sub-totals. However, as you can see here, if we only did that, we would get additional rows we don’t want – where country is not null and category is null (i.e. rolling up that category within that country):
SELECT YEAR(fis.OrderDate) AS "Order year",
	   dg.CountryRegionCode AS "Country",
       dpc.EnglishProductCategoryName AS "Product category",
       SUM(fis.SalesAmount) AS "Sales amount"
FROM FactInternetSales fis
  INNER JOIN DimCustomer dc ON fis.CustomerKey = dc.CustomerKey
  INNER JOIN DimGeography dg ON dc.GeographyKey = dg.GeographyKey
  INNER JOIN DimProduct dp ON fis.ProductKey = dp.ProductKey
  INNER JOIN DimProductSubcategory dps ON dp.ProductSubcategoryKey = dps.ProductSubcategoryKey
  INNER JOIN DimProductCategory dpc ON dps.ProductCategoryKey = dpc.ProductCategoryKey
GROUP BY YEAR(fis.OrderDate),
         ROLLUP(dg.CountryRegionCode,
         dpc.EnglishProductCategoryName)
ORDER BY 1, 2, 4 DESC;

  1. ROLLUP creates NULLs in the columns being rolled up. We have to supply default values for those NULLs.
  2. This ROLLUP has created two types of sub-total rows: (1) where both country and product category are null – which is the sub-total we need and (2) where only product category is null, which is a sub-total row we don’t need.
       CASE 
        WHEN GROUPING(dg.CountryRegionCode) = 1 THEN 'All countries in ' + CAST(YEAR(fis.OrderDate) AS nvarchar(4))
        ELSE dg.CountryRegionCode 
       END AS "Country",
       CASE 
        WHEN GROUPING(dpc.EnglishProductCategoryName) = 1 THEN 'All product categories in ' + CAST(YEAR(fis.OrderDate) AS nvarchar(4))
        ELSE dpc.EnglishProductCategoryName 
       END AS "Product category",
  1. The GROUPING function differentiates a NULL created by ROLLUP and a NULL present in the source data. If a NULL is created by ROLLUP, the GROUPING function called on that column returns 1, otherwise it returns 0. So, we can use the GROUPING function to help specify what text should be in the sub-total rows.
GROUP BY YEAR(fis.OrderDate),
         ROLLUP(dg.CountryRegionCode,
         dpc.EnglishProductCategoryName)
HAVING GROUPING(dg.CountryRegionCode) + GROUPING(dpc.EnglishProductCategoryName) <> 1
ORDER BY 1, GROUPING(dg.CountryRegionCode), 2, 4 DESC;
  1. We can also use the GROUPING function to filter-out those rows where the ROLLUP has created a NULL in the category column and not in the country columns. We do this by adding the HAVING clause and specifying that the sum of the GROUPING function on both of those columns should not be equal to 1. This is because if country is not null, then GROUPING(country) = 0 and if category is null, then GROUPING(category) = 1. We don’t want these rows and we use HAVING to remove them.
  2. Finally, since we are not using the UNION ALL operator, we can put columns or expressions in the ORDER BY clause which aren’t in the SELECT clause. In this case, instead of creating the column country_type like before, we simply put GROUPING(dg.CountryRegionCode) in the second position in the ORDER BY clause, which has the same effect – putting the sub-totals at the bottom of each country group.

So, ROLLUP is easier than UNION ALL. But there’s another way we can use to avoid the additional rows created by ROLLUP which then need to be removed by the HAVING clause.

Method 3 – GROUPING SETS

The third way to add sub-totals to a query is to use GROUPING SETS in the GROUP BY clause to specify exactly which columns we want to GROUP BY in each type of row.

SELECT YEAR(fis.OrderDate) AS "Order year",
       CASE 
        WHEN GROUPING(dg.CountryRegionCode) = 1 THEN 'All countries in ' + CAST(YEAR(fis.OrderDate) AS nvarchar(4))
        ELSE dg.CountryRegionCode 
       END AS "Country",
       CASE 
        WHEN GROUPING(dpc.EnglishProductCategoryName) = 1 THEN 'All product categories in ' + CAST(YEAR(fis.OrderDate) AS nvarchar(4))
        ELSE dpc.EnglishProductCategoryName 
       END AS "Product category",
       SUM(fis.SalesAmount) AS "Sales amount"
FROM FactInternetSales fis
  INNER JOIN DimCustomer dc ON fis.CustomerKey = dc.CustomerKey
  INNER JOIN DimGeography dg ON dc.GeographyKey = dg.GeographyKey
  INNER JOIN DimProduct dp ON fis.ProductKey = dp.ProductKey
  INNER JOIN DimProductSubcategory dps ON dp.ProductSubcategoryKey = dps.ProductSubcategoryKey
  INNER JOIN DimProductCategory dpc ON dps.ProductCategoryKey = dpc.ProductCategoryKey
GROUP BY GROUPING SETS (
  (YEAR(fis.OrderDate),dg.CountryRegionCode,dpc.EnglishProductCategoryName ),
  (YEAR(fis.OrderDate))
) 
ORDER BY 1, GROUPING(dg.CountryRegionCode), 2, 4 DESC;
  1. We add the GROUPING SETS keywords to the GROUP BY clause, follow by parentheses.
  2. Within those parentheses, we have a comma-separated list of groups of columns we want to group by.
  3. The first group is – year, country, category – this is just the original group by clause and creates the non-sub-total rows.
  4. The second group is just year – this creates the sub-total we want and nothing else.
  5. Note that GROUPING SETS is powerful precisely because we can be very specific about what groups we want. In my opinion it’s much easier to understand than ROLLUP for this reason. 
  6. Because we haven’t created any additional sub-totals, we don’t need to use the HAVING clause at all.
  7. The ORDER BY clause is the same as with the ROLLUP example, because GROUPING works in the same way for GROUPING SETS as it does for ROLLUP.

The results are identical:

In summary

We saw the difference between ROLLUP and GROUPING SETS in SQL.

We used each of UNION ALL, ROLLUP and GROUPING SETS to add sub-totals to a query.

We saw how GROUPING SETS is more precise than ROLLUP and so can help us avoid creating sub-totals we don’t want by only specifying the groups we need.

We saw how to use the GROUPING function to differentiate between NULL values in source data and NULL values created by a rollup operation.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>