MDX in Katmai: Dynamic named sets
The September CTP of Katmai (speculated to be released during PASS) is expected to be the first Katmai CTP to contain significant improvements in the OLAP engine. The most important one (which is, in my opinion, is also the most important AS Katmai feature) is changes to the query optimizer, which are nicknamed "Block computation mode" in the whitepapers and webcasts. I can't wait for it to be released to start blogging about new execution plans. But while we all wait for this CTP, perhaps it is a good time to cover other changes which were already released in the previous CTPs. June CTP featured four relatively small additions to MDX, and I will cover today the most interesting one - dynamic named sets. (The other three are - CREATE KPI statement, ability to specify additional properties such as display folder, associated measure group etc inside CREATE MEMBER statement, and ability to update calculated member "on the fly").
Semantic of dynamic named sets
Named sets have been part of MDX from the very first version. Named sets can be defined for the query (using the WITH SET clause), for the session (using CREATE SESSION SET in the session) or globally for the entire cube (using CREATE SET statement inside MDX Script of the cube). Named sets were always static, in a sense that they were calculated only once, in the context where they were defined. It wasn't a problem for query named sets, since their lifetime was lifetime of the query anyway, and it also usually wasn't a problem for session sets, since they were typically created by the client application, for example OWC creates session named sets to be used as axes in the future queries. However, the named sets defined in the cubes had a problem. The main reason to define named sets in the cube was to expose them to the end users. Most modern client tools such as Excel 2007 have UI to let the user use the cube named sets in the query. However, since these sets are computed only once, when the MDX Script is evaluated, their applicability was severely limited. Let's take an example to illustrate why. Suppose we want to find all the products which have negative gross profit margin. Normally, in MDX we could write the following statement:
SELECT
[Measures].[Gross Profit Margin] ON 0
, NON EMPTY [Product].[Subcategory].[Subcategory] HAVING [Measures].[Gross Profit Margin] < 0 ON 1
FROM [Adventure Works]
This brings back 3 products - Caps, Jerseys and Touring Frames. Now let's slice by year 2004 and see how profit margins look there:
SELECT
[Measures].[Gross Profit Margin] ON 0
, NON EMPTY [Product].[Subcategory].[Subcategory] HAVING [Measures].[Gross Profit Margin] < 0
ON 1
FROM [Adventure Works]
WHERE [Date].[Calendar Year].&[2004]
The new result is only Jerseys and Road Frames. Apparently, Caps and Touring Frames made recovery in 2004.
However, the end users are not expected to master writing MDX queries, even the simple ones above. The end users want to work in UI environment such as Excel. Therefore the cube designer can put the named set defining products with negative profit margin inside the MDX Script of the cube:
CREATE SET [Adventure Works].NegativeMarginProducts AS
Filter([Product].[Subcategory].[Subcategory], [Measures].[Gross Profit Margin] < 0)
Now, this set will appear in Excel 2007 under the Product dimension, and can be dropped into rows with single mouse click. Excel will generate the query like the following:
SELECT [Measures].[Gross Profit Margin] ON 0
, NON EMPTY NegativeMarginProducts ON 1
FROM [Adventure Works]
This will work fine, the results are good. However, when the user slices by year 2004, the following query is generated
SELECT [Measures].[Gross Profit Margin] ON 0
, NON EMPTY NegativeMarginProducts ON 1
FROM [Adventure Works]
WHERE [Date].[Calendar Year].&[2004]
And the result is no longer good. It still shows Caps, Jerseys and Touring Frames, even though Caps and Touring Frames have positive margins in 2004. The reason is simple - the named set NegativeMarginProducts was evaluated only once and in the context of [Date].[All Periods] - therefore the content of this named set remains the same regardless of how the current query is sliced.
And this is exactly the problem that Katmai's dynamic sets are designed to solve. Dynamic sets are not calculated once. They are calculated before each query, and, very important, in the context of that's query WHERE clause and subselects. In other words, using the MDX notation, we can say that using the following statement in the MDX Script
CREATE DYNAMIC SET .DynSet AS
and issuing queries
SELECT Function1(DynSet) FROM
SELECT Function2(DynSet) FROM
is functionally equivalent to issuing the following queries
WITH SET DynSet SELECT Functions1(DynSet) FROM
WITH SET DynSet SELECT Functions2(DynSet) FROM
However, dynamic sets are smarter than regular query named sets. Dynamic sets won't be evaluated at all if the query doesn't reference them directly or indirectly. Therefore, the performance of queries which don't need dynamic sets won't be compromised.
So in our scenario, we can write the following in the MDX Script.
CREATE DYNAMIC SET [Adventure Works].NegativeMarginProducts AS
Filter([Product].[Subcategory].[Subcategory], [Measures].[Gross Profit Margin] < 0)
And all the queries will now work as expected !
Using dynamic sets for performance optimizations
While the dynamic sets were designed to address the functionality limitations of the static sets, they also can be used as very efficient performance optimization technique. Let's use as an example the problem of doing efficient ranking by value in MDX. I have covered this subject in a dedicated blog post: http://sqljunkies.com/WebLog/mosha/archive/2006/03/14/mdx_ranking.aspx. The conclusion was, that the most efficient way was to define query named set and then use RANK over it. Below is the best MDX query:
WITH
SET OrderedEmployees AS ORDER([Employee].[Employee].[Employee].members, [Measures].[Reseller Sales Amount], BDESC)
MEMBER [Measures].[Employee Rank] AS RANK([Employee].[Employee].CurrentMember, OrderedEmployees)
SELECT [Measures].[Employee Rank] ON 0
,[Employee].[Employee].[Employee].MEMBERS ON 1
from [Adventure Works]
However, this technique does not always works. It can only be used when the user can write his own MDX query. One of the comments (from Ed) wonders exactly about this:
"Hi, Defining the set first definitely speeds this up. However, we are finding that doesn't work when delivering results via Excel and using filters. Because of the way Excel represents filters in MDX, the filter context is not applied to the set and thus the rank is always based on the unfiltered set. Any ideas on how to speed up RANK in this context?"
Well, I didn't have a good answer for this question back then, but thanks to Katmai's dynamic sets, there is a good answer now. Remember, that from the functionality point of view, defining dynamic set is like having a query named set for every query. Therefore, we could simply define OrderedEmployees as dynamic set
CREATE DYNAMIC SET [Adventure Works].OrderedEmployees AS
ORDER([Employee].[Employee].[Employee].members, [Measures].[Reseller Sales Amount], BDESC)
And we can use it inside calculated members
WITH
MEMBER [Measures].[Employee Rank] AS RANK([Employee].[Employee].CurrentMember, OrderedEmployees)
SELECT [Measures].[Employee Rank] ON 0
,[Employee].[Employee].[Employee].MEMBERS ON 1
FROM [Adventure Works]
If there is a slice on a query now, it will be respected - note how the ranks change.
WITH
MEMBER [Measures].[Employee Rank] AS RANK([Employee].[Employee].CurrentMember, OrderedEmployees)
SELECT [Measures].[Employee Rank] ON 0
,[Employee].[Employee].[Employee].MEMBERS ON 1
FROM [Adventure Works]
WHERE [Ship Date].[Calendar].[Month].&[2002]&[1]
Using dynamic sets to detect subselects
I am a bit reluctant to write this section, mostly because the functionality of subselects may change in later Katmai CTPs. However, it didn't change yet in the current CTPs. If it will change, this section may become obsolete. But the current functionality is that unlike WHERE clause, subselects do not change the current coordinate. Therefore calculated members which rely on the current coordinate but not at the cell value in the coordinate, fail to work correctly in the presence of subselects. This problem is well known to the ReportBuilder users, since it uses subselects. Consequently, unlike WHERE clause, subselects do not affect the result of EXISTING operator. Therefore calculations designed to work correctly with multiselect by means of using EXISTING operator, do not work when multiselect is implemented through subselects. This is a problem for Excel 2007 users. (Now I have to make myself clear, that I don't consider the solution below a clean solution to multiselect. A while ago I said that I was looking into clean and elegant solution for multiselect, and I had in mind something completely different - but I still cannot really talk about it.). As an example, I will use the simple problem of counting days. Again this problem has much better solution, but for simplicity I will use the MDX approach.
The classic approach in MDX for counting days is the following expression:
Count(Descendants([Ship Date].[Calendar].CurrentMember,[Ship Date].[Calendar].[Date],SELF))
It is simple enough, and it has few problems - it doesn't work with subselects and it doesn't work with multiselect in WHERE. The multiselect in WHERE can be solved by using
Count(EXISTING [Ship Date].[Calendar].[Date])
But subselects still won't work. The following query returns 1158 instead of 31:
WITH MEMBER Measures.NumberOfDays AS Count(Descendants([Ship Date].[Calendar].CurrentMember,[Ship Date].[Calendar].[Date],SELF))
SELECT NumberOfDays ON 0
FROM
(SELECT [Ship Date].[Calendar].[Month].&[2002]&[1] ON 0
FROM
[Adventure Works])
The problem is that subselects are almost stealth. They cannot be detected with CurrentMember or EXISTING. However, they are not completely stealth. Subselects perform autoexist with query axes and query named sets. Remember how dynamic named sets are like adding query named set to every query. Therefore, we can define the following dynamic named set in the MDX Script:
CREATE DYNAMIC SET [Adventure Works].Days AS [Ship Date].[Calendar].[Date]
Now we are guarantee, that this set will be autoexisted with whatever subselects and WHERE we might have for the query. The following query returns correct result of 31:
WITH MEMBER Measures.NumberOfDays AS Count(Days)
SELECT NumberOfDays ON 0
FROM
(SELECT [Ship Date].[Calendar].[Month].&[2002]&[1] ON 0
FROM
[Adventure Works])
And, of course, multiselect is no longer a problem even under subselects:
WITH MEMBER Measures.NumberOfDays AS Count(Days)
SELECT NumberOfDays ON 0
FROM
(SELECT {[Ship Date].[Calendar].[Month].&[2002]&[1],[Ship Date].[Calendar].[Month].&[2002]&[2]} ON 0
FROM
[Adventure Works])
Conclusion
While they can be considered to be a small addition to MDX, dynamic named sets are very useful feature, which can be applied beyond the things it was originally designed for. It can also be used for improving performance and for solving some of the MDX semantic issues.
References
The official documentation on dynamic sets is part of CTP BOL under MDX Language Reference -> MDX Statement Reference -> MDX Data Definition Statements -> CREATE SET Statement (MDX). If you have it installed follow this path: ms-help://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10md_6mdxassl/html/eff51eeb-5e7e-4706-b861-c57b6f3f89f0.htm
For convenience, I copied the content of BOL inline below:
Syntax
| |
CREATE [SESSION] [ STATIC | DYNAMIC ] [HIDDEN] SET
CURRENTCUBE | Cube_Name
.Set_Name AS 'Set_Expression'
[,Property_Name = Property_Value, ...n] |
Arguments
Cube_Name
A valid string expression that provides the name of the cube.
Set_Name
A valid string expression that provides the name for the named set being created.
Set_Expression
A valid Multidimensional Expressions (MDX) expression that returns a set.
Property_Name
A valid string that provides the name of a set property.
Property_Value
A valid scalar expression that defines the set property's value.
Remarks
Scope
A user-defined set can occur within one of the scopes listed in the following table.
Query scope
The visibility and lifetime of the set is limited to the query. The set is defined in an individual query. Query scope overrides session scope. For more information, see Creating Query-Scoped Named Sets (MDX).
Session scope
The visibility and lifetime of the set is limited to the session in which it is created. (The lifetime is less than the session duration if a DROP SET statement is issued on the set.) The CREATE SET statement creates a set with session scope. Use the WITH clause to create a set with query scope.
Example
The following example creates a set called Core Products. The SELECT query then demonstrates calling the newly created set. The CREATE SET statement must be executed before the SELECT query can be executed - they cannot be executed in the same batch.
|
Copy Code |
CREATE SET [Adventure Works].[Core Products] AS '{[Product].[Category].[Bikes]}'
SELECT [Core Products] ON 0
FROM [Adventure Works] |
Set Evaluation
Set evaluation can be defined to occur differently; it can be defined to occur only once at set creation or can be defined to occur every time the set is used.
STATIC
Indicates that the set is evaluated only once at the time the CREATE SET statement is evaluated.
DYNAMIC
Indicates that the set is to be evaluated every time it is used in a query.
Set Visibility
The set can be either visible or not to other users who query the cube.
HIDDEN
Specifies that the set is not visible to users who query the cube.
Standard Properties
Each set has a set of default properties. When a client application is connected to Microsoft Analysis Services, the default properties are either supported, or available to be supported, as the administrator chooses.
| Property identifier |
Meaning |
|
CAPTION |
A string that the client application uses as the caption for the set. |
|
DISPLAY_FOLDER |
A string that identifies the path of the display folder that the client application uses to show the set. The folder level separator is defined by the client application. For the tools and clients supplied by Analysis Services, the backslash (\) is the level separator. To provide multiple display folders for a defined set, use a semicolon (;) to separate the folders. |