Microsoft OLAP by Mosha Pasumansky

Microsoft OLAP and Analysis Services

<July 2008>
SuMoTuWeThFrSa
293012345
6789101112
13141516171819
20212223242526
272829303112
3456789


Navigation

Microsoft OLAP

Popular posts

Subscriptions

Post Categories



Programmability (RSS)

XMLA, ADOMD.NET, OLEDB, ADOMD, AMO, DSO
How to get the today's date in MDX

The question in the title of this post is a popular one. There are many scenarios, where it is desirable to navigate to the member of the Time hierarchy which corresponds to the current day, month or year. Sometimes, there is a desire to set the default member to be aligned with the current date. It is especially relevant, when the Time dimension contains non aggregatable attribute, such as Year (i.e. there is no member 'All Years'). By default, Analysis Services sets the default member to one of the years, but which one is undefined. So rather then override it with static member such as [Time].[Year].[2005], one may want to point to the current year. Another common scenario is around KPICurrentTime property, which is often set to today's day. Or, perhaps, calculations in the cube need to refer to today's date etc.

Usually, when this question is asked, the typical answer involves calls to VBA functions Now() or Date() combined with some clever formatting (which usually requires additional VBA functions such as Format, CStr, CDate, Month, Year, Quarter, Day etc) to build a string which looks like either fully qualified or even unique member name and then feed it to StrToMember MDX function. While these solutions do usually work, I am not fond of them. Building manually an unique member name goes against the spirit of MDX, since unique names are provider specific and have no specific format. But even building fully qualified member name is dangerous, especially in solutions which freely use ampersand sign (&) as a prefix for member key - certainly undocumented behavior. Finally, I really dislike StrToMember function for many reasons. For the havoc it wrecks in the query optimizer, for the unpredictable caching guarantees, for the very dynamic binding by means of reparsing its input.

The alternative that I propose is instead of trying to build a member name in the specific format - scan the members for the match with current date. Let's demonstrate this with the examples from Adventure Works. We will perform the following steps:

1. Obtain today's date using VBA!Date function

2. Since this article is written in 2007 and Adventure Works's Time dimension goes only as far as 2004, we will go 4 years back using DateAdd function, to get into year 2003.

3. Go over all the days and look for one which has MemberValue the same as today's (four years ago) date. In properly designed Time dimension, the Date's member value will be of type DateTime.

4. There should be no more than one tuple in the result set if the Time dimension was properly design, so take the first tuple of the set, which will be the desired member, or NULL if the result set came empty.

The MDX expression which does it will look the following:

Filter([Date].[Calendar].[Date], [Date].[Calendar].MemberValue = vba!dateadd("yyyy", -4, vba![date]())).Item(0)

Or to use it in MDX query to see results

select {} on 0
,Filter([Date].[Calendar].[Date], [Date].[Calendar].MemberValue = vba!dateadd("yyyy", -4, vba![date]())) on 1
from [Adventure Works]

There aren't many days in the Time dimension. Even if we kept 10 years in the cube, there would be no more than 3660 days. Running Filter over such a small number of members is instantaneous. However, we do note, that for every single day we call VBA function, which seems redundant, since the the current date is a constant. Moreover, it is somewhat dangerous, since if we were to run this query in the evening, at 11:59pm, the result of VBA!Date function could change in the middle of execution ! To prevent that the query can be rewritten as

with member Measures.Today as vba!dateadd("yyyy", -4, vba![date]())
select {} on 0
,Filter([Date].[Calendar].[Date], [Date].[Calendar].MemberValue = ([Date].[Calendar].[All Periods],Today)) on 1
from [Adventure Works]

this is quite a common trick - shift coordinate to constant member (i.e. All Periods) in order to make Filter to request same coordinate - ([Date].[Calendar].[All Periods],Today) for every iteration. This way the hope is that it will be computed only first time, and cached afterwards. Even better way to do it is to write

with member Measures.Today as vba!dateadd("yyyy", -4, vba![date]())
select {} on 0
,Filter([Date].[Calendar].[Date], [Date].[Calendar].MemberValue = Root(Today)) on 1
from [Adventure Works]

Here, by using Root(Today) we shift coordinates in all dimensions and attributes to the constant, so even if we had more axes in the query, or other coordinate shifting calculations, they won't matter, and VBA!Date would be called only once.

Similar trick can be done also inside MDX Script. It relies on the fact that named sets are static and computed only once. Therefore, the MDX Script could contain the following line:

CREATE HIDDEN TodayDate = vba!dateadd("yyyy", -4, vba![date]());
CREATE SET Today AS Filter([Date].[Calendar].[Date], [Date].[Calendar].MemberValue = ([Date].[Calendar].[All Periods],TodayDate));

And afterwards, whenever we need to reference today's date - we would use Today.Item(0), or even shorter notation of Today(0).

The catch here is that evaluated MDX Script is cached, so unless there is some sort of refresh to the cube, the named set Today won't change from day to day and will become outdated. But as long as new data is loaded into cube daily - it will be OK, since any kind of processing will trigger reevaluation of MDX Script.

Yet another solution for the scenarios where application cannot depend on the specific MDX Script, is to use stored procedures. While it won't be as performant as previous one, it could be more universal.

Below is the code of stored procedures which returns today's day:

        public Member GetToday(Level lvl)
        {
            // Get today's date from the system
            System.DateTime today = System.DateTime.Today;
            System.DateTime fouryearsago = today.AddYears(-4);

            // The only way to get set out of the level. Direct cast won't work
            Expression lvlexp = new Expression(lvl.UniqueName);
            Set lvlset = (Set)lvlexp.CalculateMdxObject(null);

            // Build the string in the form
            // [Date].[Calendar].[Date].MemberValue = CDate("5/21/2007")
            Expression exp = new Expression(
                    lvl.ParentHierarchy.UniqueName 
                +   ".MemberValue = CDate(\"" 
                +   fouryearsago.GetDateTimeFormats('d')[0] 
                +   "\")");
            Set filterset = MDX.Filter(lvlset, exp);

            // Iterate only one step - this is better then checking 
            // the count and indexing the 0's item
            foreach (Tuple t in filterset.Tuples)
                return t.Members[0];

            // If today's date wasn't found - return NULL member
            // Since Member object doesn't have ctor - this is the only way
            Expression nullmbr = new Expression("NULL");
            return (Member)nullmbr.CalculateMdxObject(null);
        }

Due to several limitations of AdomdServer object model, there is an excessive use of Expression object in the code above. It could've been much simpler if the Member object exposed MemberValue property, because then none of the dynamically built expressions would've been needed, and CDate wouldn't have to be evaluated over and over again. Simple loop over lvl.GetMembers() comparing value of today variable with Member.MemberValue would've done the job. Alas, not in current version. The typical call to such sproc would look like

select {} on 0
,ASSP.ASStoredProcs.Util.GetToday([Date].[Calendar].[Date]) on 1
from [Adventure Works]

The sproc requires passing the level as its argument, but this can be improved too. The version below finds the Day level in the cube automatically:

        public Member GetToday()
        {
            CubeDef cb = Context.CurrentCube;
            Dimension timedim = null;
            foreach (Dimension dim in cb.Dimensions)
            {
                if (dim.DimensionType == DimensionTypeEnum.Time)
                {
                    timedim = dim;
                    break;
                }
            }

            if (null == timedim)
                throw new System.ArgumentException("No Time dimension in the cube");

            foreach (Hierarchy h in timedim.Hierarchies)
            {
                foreach (Level lvl in h.Levels)
                {
                    if (lvl.LevelType == LevelTypeEnum.TimeDays)
                        return GetToday(lvl);
                }
            }

            throw new System.ArgumentException("No Day level in the Time dimension");
        }

This sproc will work with the simpler call, like this one

select 
{} on 0
,ASSP.ASStoredProcs.Util.GetToday() on 1
from [Adventure Works]

However, in cube like Adventure Works which have multiple Time dimensions, sproc will return member from the random one. Also, it turns out that there is a slight mismatch between AMO attribute types and ADOMD.NET level types, so marking Date attribute with type 'Date' in AMO will translate into type 'Regular' in ADOMD.NET. (And Adventure Works cube has a small bug, the level that gets marked with Day type is the 'Day Name' which really should be marked as DayOfWeek type).

So we saw several different methods of determining the current date through MDX. But none of them is ideal. They all rely on the non-deterministic VBA functions such as Now and Date, which can have really bad caching implications. So the best solution, which is also the simplest one in terms of MDX, is to have a dedicated process, which will update MDX Script daily with the following line:

CREATE SET Today AS { [Date].[Calendar].[Date].[May 21, 2003] };

Where the name of today's date is hardcoded and changed every day. This will have the best performance, but it will also add a little management burden on the cube maintainer.

posted Tuesday, May 22, 2007 2:55 AM by mosha with 0 Comments

Best practices for server ADOMD.NET stored procedures

In this article we will discuss some of the best practices around writing efficient Analysis Services stored procedures using server ADOMD.NET. These days there is plenty of information about writing efficient MDX - in books, whitepapers, blogs, forums etc. Just recently Microsoft released two documents - Performance Guide and Design Best Practices (and I have seen another one, still in the works, which goes deeper into MDX internals than anything else previously published). But there is no similar information or resources about how to write stored procedures, which are the essential extensibility mechanism in Analysis Services. There is Analysis Services Stored Procedures open source project on CodePlex, which features plenty of source code for different stored procedures, but it is not easy for someone looking at it to extract best practices out of the code. This article's goal is to collect some of such best practices into single concise source.

Always traverse set with iterators and avoid accessing tuples by index or getting count of tuples

To demonstrate why this is important, let's pick a simple stored procedure which computes the sum of expression values over a set. It does exactly the same as built-in MDX function Sum, we will use it here because it is simple enough but illustrates the point well.

We will implement this stored procedure using two methods - IterSum will use tuple iterator and IndexSum one will access tuples by index.

        public decimal IterSum(Set InputSet, Expression Expr)
        {
            decimal sum = 0;
            foreach ( Tuple t in InputSet.Tuples )
            {
                sum += (decimal)Expr.Calculate(t);
            }

            return sum;
        }

        public decimal IndexSum(Set InputSet, Expression Expr)
        {
            decimal sum = 0;
            int c = InputSet.Tuples.Count;
            for (int i = 0; i < c; i++ )
            {
                Tuple t = InputSet.Tuples[i];
                sum += (decimal)Expr.Calculate(t);
            }

            return sum;
        }

Now, let's test the performance of these functions. We will start with the following simple queries:

with 
set QuerySet as [Customer].[Customer].[Customer].MEMBERS
member measures.SprocSum as ASSP.ASStoredProcs.SetOp.IndexSum(QuerySet, 1)
select SprocSum on 0
from [Adventure Works]

with 
set QuerySet as [Customer].[Customer].[Customer].MEMBERS
member measures.SprocSum as ASSP.ASStoredProcs.SetOp.IndexSum(QuerySet, 1)
select SprocSum on 0
from [Adventure Works]

We have chosen to use "1" as an expression in order not to spend time in retrieving data from the cube, but focusing purely on the performance of the stored procedure code (there is also another reason for this choice, explained later in the article). Both queries return immediately - just like everybody would expect. So far so good. Let's change queries a little bit and pass set directly instead of using named set:

with 
member measures.SprocSum as ASSP.ASStoredProcs.SetOp.IterSum([Customer].[Customer].[Customer].MEMBERS, 1)
select SprocSum on 0
from [Adventure Works]

with 
member measures.SprocSum as ASSP.ASStoredProcs.SetOp.IndexSum([Customer].[Customer].[Customer].MEMBERS, 1)
select SprocSum on 0
from [Adventure Works]

Now the first query still returns immediately, but the second one took 29 seconds ! Let's take it even further - we will add Filter(..., true) around the set, which pretty much does nothing semantically.

with 
member measures.SprocSum as ASSP.ASStoredProcs.SetOp.IterSum(Filter([Customer].[Customer].[Customer].MEMBERS, true), 1)
select SprocSum on 0
from [Adventure Works]

with 
member measures.SprocSum as ASSP.ASStoredProcs.SetOp.IndexSum(Filter([Customer].[Customer].[Customer].MEMBERS, true), 1)
select SprocSum on 0
from [Adventure Works]

The first query is still immediate, but the second one now raises to whooping 2 minutes 49 seconds !!!

What is going on here ? The short explanation is that not all set are equal as far as their internal implementation in Analysis Services goes. Some of them have efficient enumerators while other have less efficient enumerators and yet others have completely inefficient enumerators ! However, all sets have pretty reasonable iterators. To get more detailed explanation about set architecture in AS, I recommend reading Irina Gorbach's chapter 28 from the "Microsoft SQL Server 2005 Analysis Services" book, the explanation is at pages 513-514 in English edition, and in pages 577-579 in Russian edition (by the way, if you can get Russian edition - I recommend it over the English one - it is more complete, has deeper explanations and fixes most of the errors in the English edition).

But the lesson is clear - always iterate over set tuples using "foreach ( Tuple t in myset.Tuples )" construct, and avoid both Tuples.Count and Tuples[index] constructs. (There is also another reason - there is a bug with how tuples are created from access by index in tuples collection, and this bug can lead to internal exceptions in many scenarios). But what if you need to access the tuples in random order and/or more than once ? This leads us to the next best practice

Cache cell values if you need to access them more than once or if you need to get data for all tuples of the set in random order

The example that we will chose here will be implementation of Order(set, expr, BDESC) function. Sorting algorithms require in average O(n*log(n)) accesses to array values for the array of n elements, so clearly we will need to access tuple's data more than once. On top of it all sorting algorithms will touch data in pretty much random order related to their original positions. This is a classic scenario where caching makes sense. The code below demonstrates it.

01        private class TupleValue : System.IComparable
02        {
03            private Tuple _Tuple;
04            internal Tuple Tuple
05            {
06                get { return _Tuple; }
07                private set { _Tuple = value; }
08            }
09            private decimal _Value;
10
11            public TupleValue(Tuple t, decimal v)
12            {
13                _Tuple = t;
14                _Value = v;
15            }
16            
17            public int CompareTo(object obj)
18            {
19                if (obj is TupleValue)
20                {
21                    TupleValue tv = (TupleValue)obj;
22                    return tv._Value.CompareTo(_Value);
23                }
24                throw new System.ArgumentException("object is not a TupleValue");
25            }
26        }
27
28       public Set Order(Set InputSet, Expression SortExpression)
29       {
30           List TupleValues = new List();
31
32           int i = 0;
33           foreach (Tuple t in InputSet.Tuples)
34           {
35               TupleValues.Add(new TupleValue(t, (decimal)SortExpression.Calculate(t)));
36               i++;
37           }
38
39           int cTuples = i;
40
41           TupleValues.Sort();
42
43           SetBuilder sb = new SetBuilder();
44
45           for (i = 0; i < cTuples; i++)
46           {
47               sb.Add(TupleValues[i].Tuple);
48           }
49
50           return sb.ToSet();
51       }

It seems a bit long, but the interesting code is quite short. Line 33 iterates through the set using tuple iterator, and on line 35 populates the cache of TupleValues, where TupleValue is a pair of Tuple and value of expression evaluated at it. Then at line 41 we call .NET built-in Sort method which uses TupleValue's implementation of IComparable interface at lines 19 through 24. Let's check how well this code executes. We will compare our stored procedure with built-in MDX function Order. We will sort a reasonable big set - 166,356 tuples. In order to measure the time spent on the server and not Management Studio trying to render in UI all these tuples, we will wrap Order with Count to return a single value.

with 
member y as count(Order(
   [Customer].[Customer].[Customer].MEMBERS
  *[Product].[Category].[Category].MEMBERS
  *[Product].[Style].[Style].MEMBERS
, Measures.[Internet Sales Amount], BDESC))
select y on 0
from [Adventure Works]

with 
member y as count(ASSP.ASStoredProcs.SetOp.[Order](
   [Customer].[Customer].[Customer].MEMBERS
  *[Product].[Category].[Category].MEMBERS
  *[Product].[Style].[Style].MEMBERS
, Measures.[Internet Sales Amount]))
select y on 0
from [Adventure Works]

The results are somewhat surprising. The built-in MDX function Order finishes in 40 seconds. And our stored procedure which does the same thing finishes in only 14 seconds ! We were able to beat internals of AS engine with our stored procedure (the reason we were able to outperform built-in Order is because of performance bug with MDX's Order. Normally stored procedure should never perform better than corresponding MDX function).

Caching is definitely a widely used technique to improve performance, but there are certain caveats to it. The developer should watch the size of the cache not to overflow beyond system limits. In the above implementation we cache liberally both Tuple object and its value, and this can add up to quite a lot of memory over huge sets. There are, of course, techniques to deal with this issue, but we won't discuss them here, since these techniques are well described in the literature dedicated to .NET programming.

Now, this stored procedure, even though it was faster than built-in Order, still took a while to execute - 14 seconds. What if user wasn't willing to wait that long. How do we ensure that it can be canceled immediately. This leads us to the next best practice

Inject liberally "Context.CheckCancelled()" statement in the inner loops which do not call into server ADOMD.NET methods

Context.CheckCancelled() is a method which raises "Server: The operation has been canceled" exception if the user has issued command on the session. It is very cheap - so cheap, that it is practically free. It can be put inside every inner loop in the stored procedure. However, it doesn't buy anything, if that inner loop already calls some server ADOMD.NET method. For example, the loop at lines 33-37 calls into several ADOMD.NET methods - it calls Next on the tuple iterator and Calculate on the expression object. Since both of these methods check for cancellation anyway, additional call to CheckCancelled is redundant. However, inside the code of List.Sort method, there are no more calls into server ADOMD.NET, since the data is cached. We cannot modify the code of List.Sort, but we still get called back from there into TupleValue.CompareTo method at line 17 - so it is a great candidate to insert call to Context.CheckCancelled.

But how much time we spent inside Sorting vs. populating the cache with data ? The only way to answer it - is to profile our stored procedure, and this leads us to the next best practice

Integrate with AS Trace and SQL Profiler through the Context.TraceEvent method

The best way to understand where the time is spent inside execution of MDX query is, of course, by inspecting various events in AS Trace through tool such as SQL Profiler. The server object model provides nice integration with trace through the Context.TraceEvent method. This method allows to inject "User defined event" into the trace, and provides for custom event subclass, integer and string data. We can write to trace before and after our two main loops in order to find out how much time we spend in each. So the code with both CheckCancelled and TraceEvents added will look like below:

01        private class TupleValue : System.IComparable
02        {
03            private Tuple _Tuple;
04            internal Tuple Tuple
05            {
06                get { return _Tuple; }
07                private set { _Tuple = value; }
08            }
09            private decimal _Value;
10
11            public TupleValue(Tuple t, decimal v)
12            {
13                _Tuple = t;
14                _Value = v;
15            }
16            
17            public int CompareTo(object obj)
18            {
19                if (obj is TupleValue)
20                {
21                    Context.CheckCancelled();
22                    TupleValue tv = (TupleValue)obj;
23                    return tv._Value.CompareTo(_Value);
24                }
25                throw new System.ArgumentException("object is not a TupleValue");
26            }
27        }
28
29        public Set Order(Set InputSet, Expression SortExpression)
30        {
31            List<tuplevalue> TupleValues = new List<tuplevalue>();
32
33            Context.TraceEvent(100, 0, "Start getting data");
34
35            int i = 0;
36            foreach (Tuple t in InputSet.Tuples)
37            {
38                TupleValues.Add(new TupleValue(t, (decimal)SortExpression.Calculate(t)));
39                i++;
40            }
41
42            int cTuples = i;
43
44            Context.TraceEvent(100, cTuples, "Finish getting data for " + cTuples.ToString() + " tuples");
45
46            Context.TraceEvent(200, 0, "Start sorting");
47            TupleValues.Sort();
48            Context.TraceEvent(200, 0, "Finish sorting");
49
40            SetBuilder sb = new SetBuilder();
51
52            for (i = 0; i < cTuples; i++)
53            {
54                sb.Add(TupleValues[i].Tuple);
55            }
56
57            return sb.ToSet();