Since I started a series of blogs about AS specific changes in SP2, few people asked me to explain in more details some of the things covered in the SP2 list of changes. The most popular request so far was to explain performance optimizations for running sum calculations. I tried to delay that, because I am writing a article especially about running sums, but I decided not to wait until it is ready for two reasons. First, running sum can mean many different things - there are certainly many techniques how to do it. While I believe in my technique, other people believe in their techniques. Therefore the term "running sum calculations" is somewhat vague. The SP2 change helps some methods, but not the others. Secondly, the actual change in SP2 is in fact much broader then the narrow area of running sums. It's just that it is easiest to demonstrate it using some variation of running sum, but as we will see it helps in many other scenarios as well. Lastly, the article I am writing is getting kind of out of control, and I don't know when (or whether) I will finish it, so instead I can focus on something smaller.
For the purpose of the discussion, we will consider scenario where we need to sum up values from the very beginning of time up to the current moment. This scenario comes up a lot in inventory applications where we keep product movements (deltas) in the fact table, and need to sum up in order to get the inventory levels (although better design there is to take periodic snapshots and only sum up from the beginning of the snapshot - but this is subject for another discussion), or in the financial reports, where balance from one year rolls into the next year etc. In order for the reader to be able to replicate the results, we will use example off the Adventure Works cube. Normally, we would create an utility attribute (or utility dimension), but for simplicity, I will just use Week of Year attribute as if was utility attribute and put calculated members on it. The simplest and most straightforward expression for such running sum would be
Aggregate([Ship Date].[Date].[Date].MEMBERS.Item(0):[Ship Date].[Date].CurrentMember)
Note, that this expression only works correctly when the granularity in the [Ship Date] dimension is on the Date attribute. There are various ways how to ensure that this is always true, with the simplest and most natural one being setting the aggregation function of involved measures to be LastChild. But again, let's ignore these details, and just write our test queries to always have granularity on Date. The query that we are going to test this expression on should use a lot of cells to make measurements interesting. We will pick the following one:
WITH MEMBER [Ship Date].[Week of Year].RSum AS
Aggregate([Ship Date].[Date].[Date].MEMBERS.Item(0):[Ship Date].[Date].CurrentMember)
SELECT
[Ship Date].[Calendar].[Month].&[2003]&[8].children ON 0
,[Product].[Product Categories].[Subcategory].MEMBERS
*[Customer].[Country].[Country].MEMBERS
ON 1
FROM [Adventure Works]
WHERE ([Internet Sales Amount],RSum)
There are 6882 cells inside the cellset, but for each cell we aggregate anywhere from 762 to 792 cells, depending on the current member on Date. So there are total about 5.3 million cells being touched by this query. Running it on SP1 results in running time of 1 minute and 20 seconds. This is certainly slow. The throughput is only about 66000 cells per second. So, let's check out the promised improvements on SP2. Running the same query on SP2 now returns in 869 milliseconds or 6.6 million cells per second. This is about 100 times faster then SP1 and it is sure fast.
So let's talk about the nature of the change. In one of my earlier blogs, we discussed how query optimizer can use bulk evaluators if MDX function supports it. Up until SP2, the Aggregate function (also Sum etc) could produce the bulk evaluator, but only under condition that the set that they were aggregating was independent to the query space. I.e. Aggregate over constant set, like { January, February, March } would always go through fast code, because no matter what the user puts on axes, the set is always the same. In our example, though, the set is not constant. It is a range set, where the left side of range is constant (always the first date in the dimension), but the right side of the range depends on the current member in Date attribute. Query optimizer then would look into the query space, and if it sees that the Date attribute is constant in the query (i.e. not used at all, or used in the WHERE clause), then Date.CurrentMember is constant as well, and bulk evaluator could work. In such scenarios, Aggregate works extremely fast even in AS2005 RTM and SP1. However, in our example we use [2003].August.Children on COLUMNS, so the Date.CurrentMember is not constant anymore. In SP2, the bulk evaluator of Aggregate and related functions was significantly improved. It still cannot do magic in most generic case, but it became much more intelligent than before. It, in fact, realizes, that there is no need to recompute every cell from scratch. The next day needs exactly same values as the previous day, plus just one cell value more. By choosing smartly the order in which cells are computed, the Aggregate's bulk evaluator reduces the O(N^2) problem to O(N) problem, and this gives huge performance improvement.
P.S. If there are other items on the SP2 list which you are particularly interested in - let me know, and I will try to cover them in the following blogs as well, although I don't promise anything :)
As I was walking through the vendors expo during PASS conference, I saw many interesting products, but two of them caught most of my attention. I am not saying that these two were the best on the floor, it's just that I either already saw other products before, or that they were OLTP oriented, and my main interest is in BI products. By the way, the product which in my opinion stole the show in this PASS - are Data Mining Add-Ins for Excel 2007. They are truly something special - any Excel user can have power of Data Mining without being an expert in Data Mining. But I already saw them before, so they weren't as new to me as to the rest of the audience. The products that I want to discuss here are Q4Bis and RSinteract. I have heard about both of them, but never had a chance to see them in action. I want to extend my thanks to Arpad from Q4Bis and to project manager from RSinteract (whose name I regretfully forgot) for spending tons of time with me patiently answering all of my questions.
Q4Bis
There are actually several different products that this company has in its portfolio, including DataServer - and ETL tool which builds data warehouses and AS cubes; templates for doing it from JD Edwards, Axapta and Microsoft Dynamics AX; Mobile Dashboard - a hosted service to deliver BI to cell phones etc. But I was mostly interested to see Q4Bis Analysis - which is AS client tool in two versions - Desktop and zero footprint Web. Overall you would find all the functionality that you would expect from the OLAP browser, but there was one piece which I especially liked. We live in the days, when the modern search engines revolutionized the way we look for and interact with information. Instead of categorizing the data and using complex UIs to get to it - we simply type what we want to find into a simple text box, and let the search engine to find the most relevant answers. Sure nothing can be simpler than a text box ! So I am somewhat surprised why BI products - the very products whose mission is to help users to find relevant information, don't embrace the Search paradigm. People are already familiar with it, and people don't like learning new complex things (take MDX, for example, it is a beautiful language, yet most people are reluctant to dive into it :). At best, BI vendors do provide some search functionality, but it is buried inside menus and submenus multiple clicks away from where user can find it, instead of being prominently placed like a search text box in IE7. Now, I am not an expert in UI. In fact, when AS team members want to make a joke on me - they say "We will let Mosha design the UI". But when I saw how Q4Bis put a search box and a magnifying glass icon next to it (the same icon as everybody is used to see on www.live.com) - I knew that this was exactly the right thing:
We searched for the word "garden". Q4Bis intelligently understood the context of the search based on the current view. There were 3 different hierarchies on the screen, so it started to search only inside these hierarchies. As a result, there were 3 products which contained the word "garden" in their names
I think this is brilliant. Arpad showed me that it is possible to make more advanced searches, and search within results etc, but I think that the most powerful feature is not in the advanced staff, but in simple staff. They've made search to be simple to use, yet intelligent enough to be relevant.
After that we went over the generated MDX statements. Querying dimensions is one of my favorite subjects, I even started series of blogs about it, but so far had only time to write the first one in the series. Nevertheless, I made couple of suggestions how to improve the the MDX, hopefully they will find those useful and incorporate into the product at some stage. More information about the product is available from the www.q4bis.com website.
RSinteract
This is a relatively new product, which works with Reporting Services. It doesn't do its own rendering or formatting, but generates RDL for Reporting Services to render and format. So the purpose is pretty much the same as with ReportBuilder, but RSinteract is clearly easier to use. They added support for AS data sources just couple of months ago, and I obviously spent most of my time in that area. BTW, in the marketing materials, I found the best resolution of the UDM vs. cube terminology argument. They simply state, that RSinteract supports UDM cubes for both 2000 and 2005 versions :) The UI for "UDM cubes" takes traditional OLAP approach, as opposed to ReportBuilder. I.e. ReportBuilder presents different measure groups as different entities, and dimensions are entities being linked to them etc. Perhaps this is appealing approach for people familiar with relational schemas, but to me RSinteract approach was much more intuitive. If we do a report against cube - let it look like cube with measures and dimensions ! RSinteract doesn't aim to be general purpose OLAP browser, but some of the classic functionality of OLAP browsers is there - you can pick any number of hierarchies into filter, and select one or more members out of them. RSinteract automatically converts this into RS parameters. We reviewed the generated MDX statements, and they were very clean. RSinteract also has chosen an interesting way to support drillthrough. Since they allow to choose multiple members of same level in filter, the DRILLTHROUGH statement wouldn't work, since drillthrough only works on single cell (I hope this limitation is lifted in the next version). So RSinteract generates straight SELECT with subselects to incorporate filter and using STRTOSET(@p1, CONSTRAINED) to decode parameters. Everything is a good choice for this scenario (I hope in the next version AS supports rich data types on the parameters, so STRTOSET calls won't be necessary anymore). We discussed the important differences between DRILLTHROUGH and SELECT statements with respect to how they treat calculations. DRILLTHROUGH will ignore any calculations and go straight to the fact table of measure group. But since RSinteract works with a cube as a whole - it is only logical to send queries to the cube and not to measure group. I enjoyed talking with the RSinteract representative. He told me that he is just project manager, but it looked to me that he is from the engineering team, since he was able to answer all the technical questions I had. Unfortunately the expo was closing at that time, so I didn't have a chance to take screenshots. Some of them are available at www.rsinteract.com website, but they are all examples for relational data sources, and not for OLAP.