Subselects in SP2
The list of changes for SP2 contains the following line:
- The functionality of subselects has changed.
Few people have commented that subselects are very important piece of functionality in Analysis Services 2005, and the simply saying "functionality changed" is not enough - they need to understand exactly what changed and what the new behavior is. I will try to take a shot at it in this post. There were actually couple of changes, and we will go from simpler ones to the more interesting ones. Special thanks to Marius Dumitru for helping me with the below list.
Arbitrary shapes and Many to Many and Reference dimensions
Actually, this change applies not only to subselects but also to WHERE clause. Prior to SP2, using arbitrary shaped set which used either many to many or reference dimension inside subselect or WHERE clause caused an error "Arbitrary shape cannot be used in this context". The reason for the error was pure technical limitation - it wasn't easy to do the many to many join and compute distinct sum with complex filter implied by arbitrary shaped set (note that distinct count measures always worked with any shape of set). But SP2 eliminates this limitation, and the join works for many to many too. Actually, probably the "Many to Many dimension improvements" line in SP2 changes list refers to exactly same thing (but then it is not clear why referenced dimensions are not mentioned as well).
Subselect visual totals apply to implicit aggregation from a calculation on a lower grain
This is important change. To understand it better, let's illustrate it with an example from Adventure Works cube. I am going to rely on the fact that the cube contains [Prior Year Internet Sales Amount] calculated measure which was built in the blog about Time Calculations in UDM. Let's assume for a moment, that we still live in 2003 and Adventure Works cube doesn't have data for Internet Sales Amount in 2004. Let's suppose that we want to budget Internet Sales Amount for 2004, and for the purpose of the exercise, we will do equal allocation to months by taking the data from the previous year quarter, dividing it equally among the months, and allowing growth rate of 20%. The formula for such an expression is easy to write:
([Ship Date].[Calendar].Parent,[Measures].[Prior Year Internet Sales Amount])/[Ship Date].[Calendar].CurrentMember.Siblings.Count*1.2
Now, since we are doing allocation at the month level and only for 2004, the MDX Script fragment should look the following
([Measures].[Internet Sales Amount],[Ship Date].[Month Name].[Month Name].members, [Ship Date].[Calendar Year].&[2004])
= ([Ship Date].[Calendar].Parent,[Measures].[Prior Year Internet Sales Amount])/[Ship Date].[Calendar].CurrentMember.Siblings.Count*1.2;
format_string([Measures].[Internet Sales Amount])='currency';
The allocation is done at the Month granularity, and it will automatically aggregate up to Quarter, Semester, Year, All etc granularities. Now we are ready to do some queries and check the results that we are getting.
SELECT {[Measures].[Internet Sales Amount],[Measures].[Prior Year Internet Sales Amount]} ON 0
, Descendants([Ship Date].[Calendar].&[2004],[Ship Date].[Calendar].[Month],SELF_AND_BEFORE) ON 1
FROM [Adventure Works]
This query will show us both which values were allocated at the month level, as well as how they aggregate up to the higher levels. For example, in the Q2 of 2003, the Internet Sales Amount was $1,604,192.69. After we divide it by 3 months and increase by 20% - we get $641,677.08 which is allocated to April, May and June 2004. Together they roll up to $1,925,031.23 for Q2 of 2004. Before we start exploring how this calculation will behave with subselects, let's quickly review how normally subselects apply visual totals.
SELECT {[Measures].[Internet Sales Amount]} ON 0
, Descendants([Ship Date].[Calendar].&[2003],[Ship Date].[Calendar].[Month],SELF_AND_BEFORE) ON 1
FROM
(SELECT [Ship Date].[Month Name].&[2003]&[6] ON 0 FROM [Adventure Works])
Here, we will get 4 cells only in the result (Year 2003, Semester 1, Quarter 2 and June) because of autoexist, and values in all 4 cells will be the same - $563,419.45 because of visual totals. I.e. totals for quarter are aggregate of all visual months, and there is only one month - June - etc. Let's change it to 2004, and see the results
SELECT {[Measures].[Internet Sales Amount]} ON 0
, Descendants([Ship Date].[Calendar].&[2004],[Ship Date].[Calendar].[Month],SELF_AND_BEFORE) ON 1
FROM
(SELECT [Ship Date].[Month Name].&[2004]&[6] ON 0 FROM [Adventure Works])
Now on pre SP2 builds we are getting real totals instead of visual totals. I.e. even though there are still same 4 cells (only for 2004 this time), and June 2004 shows $641,677.08, the quarter shows real total of $1,925,031.23 since it still aggregates from all three months. This is inconsistent behavior - when the cell aggregates from fact data - visual totals apply, but when the cell aggregates from results of calculation - visual totals suddenly don't apply. So this was exactly what changed in SP2. Running with SP2 build will return same value of $641,677.08 for all 4 cells.
Use coordinate overwrite history instead of cell granularity to decide on what attributes to apply visual totals
This is perhaps the most important change of subselect semantics in SP2. It is somewhat complex to explain, but the good news is that even if the explanation may look confusing, the end results are in fact intuitive. So let's first start with reviewing how subselect visual totals normally work. Anybody who had any experience with subselects knows that they apply above granularity and don't apply at or below granularity. This can be illustrated by the following example:
SELECT [Measures].[Internet Sales Amount] on 0
,{[Product].[Product Categories].[Subcategory].[Helmets].Parent.Parent,[Product].[Product Categories].[Subcategory].[Helmets].Parent,[Product].[Product Categories].[Subcategory].[Helmets],[Product].[Product Categories].[Subcategory].[Helmets].LastChild} ON 1
FROM [Adventure Works]
GO
SELECT [Measures].[Internet Sales Amount] on 0
,{[Product].[Product Categories].[Subcategory].[Helmets].Parent.Parent,[Product].[Product Categories].[Subcategory].[Helmets].Parent,[Product].[Product Categories].[Subcategory].[Helmets],[Product].[Product Categories].[Subcategory].[Helmets].LastChild} ON 1
FROM
(SELECT [Product].[Product Categories].[Subcategory].[Helmets] ON 0 FROM [Adventure Works])
Here, we are doing subselect at the subcategory granularity (i.e. for Helmets), and we query above granularity (grand parent - All Products and parent - Accessories), at granularity (Helmets themselves) and below granularity (one of the children, Sport 100 Blue Helmet). What we discover at the results, is that below granularity - we get unmodified answer - Sport 100 Blue Helmet sold for $41,225.22. At granularity - we have Helmets staying the same number - $127,783.48. But above granularity, visual totals apply, so Accessories instead of real total of $408,192.98 we get visual total which is equal to Helmets - $127,783.48.
All this was easy, because it didn't involve any calculations which could overwrite coordinates. Now, let's introduce calculated members which look for Gloves instead of Helmets. We have to use calculated members, because if we just start putting Gloves and their parents, grandparents and children on axis, they will be eliminated by autoexist rules.
WITH
MEMBER [Product].[Product Categories].CalcGloves AS [Product].[Product Categories].[Subcategory].[Gloves]
MEMBER [Product].[Product Categories].CalcGlovesParent AS [Product].[Product Categories].[Subcategory].[Gloves].Parent
MEMBER [Product].[Product Categories].CalcGlovesGrandParent AS [Product].[Product Categories].[Subcategory].[Gloves].Parent.Parent
MEMBER [Product].[Product Categories].CalcGlovesChild AS [Product].[Product Categories].[Subcategory].[Gloves].Children(1)
SELECT [Measures].[Internet Sales Amount] ON 0
, {CalcGlovesGrandParent, CalcGlovesParent, CalcGloves, CalcGlovesChild,
[Product].[Product Categories].[Subcategory].[Helmets].Parent.Parent,[Product].[Product Categories].[Subcategory].[Helmets].Parent,[Product].[Product Categories].[Subcategory].[Helmets],[Product].[Product Categories].[Subcategory].[Helmets].LastChild} ON 1
FROM [Adventure Works]
go
WITH
MEMBER [Product].[Product Categories].CalcGloves AS [Product].[Product Categories].[Subcategory].[Gloves]
MEMBER [Product].[Product Categories].CalcGlovesParent AS [Product].[Product Categories].[Subcategory].[Gloves].Parent
MEMBER [Product].[Product Categories].CalcGlovesGrandParent AS [Product].[Product Categories].[Subcategory].[Gloves].Parent.Parent
MEMBER [Product].[Product Categories].CalcGlovesChild AS [Product].[Product Categories].[Subcategory].[Gloves].Children(1)
SELECT [Measures].[Internet Sales Amount] ON 0
, {CalcGlovesGrandParent, CalcGlovesParent, CalcGloves, CalcGlovesChild,
[Product].[Product Categories].[Subcategory].[Helmets].Parent.Parent,[Product].[Product Categories].[Subcategory].[Helmets].Parent,[Product].[Product Categories].[Subcategory].[Helmets],[Product].[Product Categories].[Subcategory].[Helmets].LastChild} ON 1
FROM (SELECT [Product].[Subcategory].[Subcategory].[Helmets] ON 0 FROM [Adventure Works])
The results that we are getting can be explained by the same rules we saw before. Below granularity for Gloves's child - Small Half-Finger Gloves - the result stays the same - $7,219.65. At granularity, for Gloves themselves, the result is also the same - $19,689.96. But above granularity bizarre things happen. The parent of Gloves is Clothing category, and we get NULL there. That's because the visual totals kick in and AS tries to compute visual totals of Clothing for Helmets. Since Helmets are not in the Clothing category - the visual totals is NULL.
Now, it is clear, that the goal of designer of the calculation was not to apply visual totals in this case. Since the calculation explicitly overwrites the subcategory coordinate, the intention is to disable filtering and visual totals by subcategory. So this is exactly the change that happened in SP2. Instead of simply look at cell granularities, SP2 now follows the coordinate overwrite history as it unwinds the expression tree of the calculation which apply to the cell. So in SP2, the second SELECT statement will return exactly same results for Gloves's family as the first SELECT statement. Of course, visual totals still apply to Helmets family above granularity, since no coordinate overwrites happen there.
Another important consequence of the very same change is that the coordinate overwrite history and not cell granularity is also considered when doing partial projection of arbitrary shape sets. But this is somewhat a more rare scenario, and I don't think I am capable of clearly explaining it without too much overcomplicating. Hopefully people who suffered from it before will notice this as another positive change in SP2.
Subselects and WHERE clause
Now, after reading this post as well as two previous posts about interaction between axes and slicer (here and here), one cannot help but notice how all these changes in SP2 make subselects and WHERE clause to behave much much more similar than ever before. In fact, the only remaining differences between the two are:
1. Subselects don't set current coordinate, while WHERE clause does (with the known exception of non-aggregatable attributes). This is the most important remaining difference. It causes different results for all expressions which look at CurrentMember as a member object, but the expressions which use CurrentMember simply to extract the cell value. The reason is simple - the CurrentMember will return member (or set) which is used in WHERE clause, but not the member (or set) which is used in subselect (unless it was non-aggregatable attribute, of course). But if the expression simply needs value at CurrentMember - then the changes to visual totals described above will make subselects and WHERE to produce same value. One important consequence of this difference is that EXISTING operator works with WHERE but not with subselects, since EXISTING looks at current coordinate.
2. Subselects allow to use same hierarchy as one used on axes, but using same hierarchy both on axis and in WHERE clause raises the error
3. Subselects can be persisted in session state through CREATE SUBCUBE statement, and WHERE clause can do it only for single member selection, but not for set through ALTER CUBE SET DEFAULT MEMBER statement
4. If set is used in WHERE clause, the relating attributes are still erased by axes, unlike in single tuple selection case and unlike subselects. (as documented here).
So what is the conclusion ? Subselects and WHERE used to be very different beasts - and it caused plenty of confusion. With SP2 they are closer than ever, but still not the same. I certainly hope that in the future versions they will become even closer, and perhaps even identical. In the meantime, for those of us who need to generate SELECT queries from his application, my advise is to use WHERE clause for single tuple selection, and consider both WHERE and subselects for multiple tuples (set) selection (aka multiselect). While this advise is not always ideal - I feel, that it gives as close to ideal results as possible. So far, Excel 2007 gets closest to this advice from all client tools (but still not exactly).