SQL Server 2005 (RSS)

What's new in SQL Server 2005

Wrong calculation for unary operator with many-to-many relationships

I just posted this bug on connect.microsoft.com - there is an issue when you use unary operator in a parent-child dimension combined with a many-to-many relationship.

The problem is that the '+' operator on non-leaf nodes seems to not consider the children value and bypass the intermediate measure group. Everything works well with the '-' operator. It sounds like an optimization that has a side-effect...

I found the bug thanks to a post on www.sqlbi.eu forum, that is starting to get feedback from people who are using the models described in the Many-to-Many Revolution paper.

The vardecimal type for the fact table

I still hadn't time to install the SP2 CTP but I just read this post and I think it's a very interesting thing! The VARDECIMAL could be an amazing feature!

I encountered several times huges fact tables that have many NUMERIC measures. If the VARDECIMAL is optimized enough, it would save a lot of space. I will try on some real-world scenario after SQL Pass, but please if someone do the same test before, let's contact me!

SQL Server 2005 SP2 - CTP available

Here is available the SP2 CTP for SQL Server 2005.

The what's new document explain that there is an improvement in many-to-many dimension relationships calculations. I will try to evaluate the performance improvement, but unfortunately I'm in the middle of a busy timeframe (next week I will be in Seattle for SQL Pass - if someone want to exchange some opinion face-to-face here, let's contact me directly).

If someone get hands-on experience on many-to-many performance changes, please let me know.

Unicode, VARCHAR, NVARCHAR and index usage in SQL Server

This is not the first time I found non-intuitive issues with collation, Unicode and 8-bit strings with SQL Server, but this time I discovered (more precisely: one of my customers discovered and asked me...) a strange behavior that affects the query plan and the overall query optimization strategy.

This was my original post to Microsoft Connect:

A SQL query with a WHERE condition does not use the existing index if the comparison is made against a Unicode constant string. It happens when the field has a collation setting different than the current one (i.e. field has SQL_Latin1_General_CP1_CI_AS where server default is now Latin1_General_CI_AS).
Query plan use Clustered Index Scan instead of Clustered Index Seek operation.
Performance become very slow when the query is made many times into a loop (not a good practice, but the program can't be changed).
Using ANSI string (8 bit) the problem disappear and query plan shows the use of a Clustered Index Seek Operation

The answer is detailed:

This is not about the collation but about the difference between 8-bit and 16-bit character set.
In fact if you use the default collation (remove "COLLATE SQL_Latin1_General_CP1_CI_AS" from the column definition) then the plans do not change.
There is no generalized valid way to down-convert a 16-bit string to an 8-bit string but an 8-bit string is required to perform a seek because that is the type of the index.
The scan plan works because the 8-bit strings in the table are upconveted to 16-bit and compared to the 16-bit constant.
If you use nvarchar in the table then both forms of the predicate will support index seek (since the 8-bit constant in the predicate can be up-converted to 16-bit).
Thanks,
SQL Server dev.

I well understand all of this, and apparently the issue doesn't affect many of us. However, think about it: when you build a Data Warehouse, for a lot of reasons you could have to manage a VARCHAR field instead of a NVARCHAR one. If they are on the staging database, you have to be careful in comparing them with constants (it appears to be simple). If you decide to build a dimension using VARCHAR types instead of NVARCHAR, the user who access the database could use a query tool which use Unicode constants for user input. Hum... very dangerous, angry users could blame on you for this.

Now, guess what? SQL Server Reporting Services 2005 build parametric queries using sp_executesql statement, putting all constants with a N (for Unicode) prefix. Can you hear the user screaming now?

If I wasn't clear, you can read the feedback here: a repro script is included.

Now, while I can find a workaround here (probably a stored procedure can convert the Unicode constant into the corresponding 8-bit version), I am asking if this only issue should encourage you to move dimension fields to NVARCHAR type. I know all implications and I know that dimensions size

SQL Refactor in public beta

I am trying the beta of SQL Refactor (available here). It is really helpful to refactor the layout of a query and I like to do this in at least three scenarios: when I read a query written by other people, when I need to modify a query in a Reporting Services report and edited in a the graphical query design mode, when I want to clean a query written too fast for a test that becomes a stable and reusable query.

I would like to get some more functionality in the layout options, I already posted the most important one and I hope they will implement it in the RTM or at least in a future release.

Using MSOLAP as a linked server

I just faced the problem to connect an Analysis Services (2000 or 2005) to a SQL Server 2005 database using a OPENROWSET statement. If you face an "access denied" error (precisely, "The OLE DB provider "MSOLAP" for linked server "SERVERNAME" reported an error. Access denied.") the solution is to enable the "Allow inprocess" flag in MSOLAP Provider Options.

You can modify provider option through SQL Server Management Studio, Server Objects / Linked Servers / Providers / MSOLAP, right-click and edit Properties.

When you get a result for an MDX query, the resulting column names are composed by dimension+levels+property name, so if you would like a cleaner naming you can use the WITH statement to define column names in a fast way. E.G.

WITH A (Customer, Product, Year, Sales )
SELECT * FROM OPENROWSET( 'MSOLAP', 'DATASOURCE=localhost; Initial Catalog=FoodMart 2000;',
'...[place here your MDX query]...' )

The many-to-many revolution: paper finally released!

After a month of editing, I finally released my paper titled "The many-to-many revolution".

I copied in this post the introduction of the 84-page paper that is published (for free!) on a dedicated dimensional modeling page of SQLBI.EU web site.

I would like to receive feedbacks about the paper. You can use mail, comments or the support forum on SQLBI.EU.

The many-to-many revolution - Introduction

Analysis Services 2005 (SSAS 2005) introduced the capability to handle many-to-many relationships between dimensions. At a first glance, you may tend to underscore the importance of this feature: after all, Analysis Services 2000 and many other OLAP engines do not offer many-to-many relationships. Yet, its lack did not limit their adoption and, apparently, only a few businesses really require it. However, as this paper shows, the UDM (Unified Dimensional Model) can leverage many-to-many relationships helping you to present data from different perspectives that are not feasible with a traditional star schema. This opens a brand new world of opportunities that transcends the limits of traditional OLAP.

We will explore many different uses of many-to-many relationships that give us more choices to model effectively business needs, including:

  • Classical many-to-many
  • Cascading many-to-many
  • Survey
  • Distinct Count
  • Multiple Groups
  • Cross-Time
  • Transition Matrix
  • Multiple Hierarchies

Although you do not have to do so, I recommend you to read the models in the order presented above, because often each one builds upon the previous models.

Each model has a brief introduction, followed by a business scenario that may benefit of its use and an explanation of its implementation. Each model uses only the minimal set of dimensions that are necessary to explain the concept behind it and a small dataset that demonstrates the underlying behavior.

Only the Distinct Count scenario contains a section discussing the impact on performance. Since the considerations presented there may be applied to other many-to-many relationship uses, I recommend you read this scenario if you are interested in performance evaluations.

An important warning has to be made if you are going to use VisualTotals MDX function (directly or through an OLAP browser): visual totals apply only to one level at a time with many-to-many dimensions. In the Links section, you will find a link to a document written by Richard Tkachuk that explains this limitation.

Download the complete paper and demo here.

Refresh of a linked dimension in SSAS

I started to evaluate the use of linked dimension across different database on the same server.

I encountered the first issue after a few minutes: when you link a dimension, VS editor create a dimension file that copies the structure of the dimension, without details on binding to data source (which are useless in this case). If I add an attribute to the original dimension, I am unable to use the linked dimension: the worst part is that I need to remove and to recreate the linked dimension on the project, losing all references to the dimension in all measure groups. At this point I need to recreate dimensions in cubes and dimension relationships for all measure groups.

It would really useful a "refresh" function that do automatically what I do to workaround the problem (I use the Customer dimension as an example):

  1. Preparation. Create the dimension Customer on database A (solution A)
  2. Deploy and process database A
  3. Create the linked dimension Customer on database B (that points to A.Customer) (solution B)
  4. Create a cube on B that use Customer
  5. Deploy and process database B
  6. Create a copy of solution B and name it database C (and solution C) - we will use it as a tool without deploying it to SSAS
  7. Change the original dimension. Add an attribute (for example Country) to the dimension Customer on database A (solution A)
  8. Deploy and process database A
  9. Make the change on linked dimension. Delete dimension Customer from database C
  10. Recreate linked dimension Customer on solution C that points to database A
  11. Open Customer.dim on solution C with the View Code function, Select All and Copy to Clipboard
  12. Open Customer.dim on solution B with the View Code function, Select All and Paste from Clipboard
  13. (as an alternative to 11 and 12, you can copy the Customer.dim file from solution C to solution B)
  14. Deploy and process database B

This workaround works if you only changed attributes that are not directly referred in the cube file: it happens when you use an attribute as a reference to another dimension or to a measure group. Most of the time dimension modifications does not touch those attributes and this workaround allows you to avoid the tedious and error-prone work of recreating dimension references to measure groups in the cube.

I would like to get feedback from people that already user linked dimension in a working (and changing!) environment.

Many-to-many dimensional modeling paper: looking for reviews

I'm approaching the review phase for a paper I wrote about advanced dimensional modeling with Analysis Services 2005 using many-to-many dimension relationships.

Currently the paper has around 80 page (with many figures!) and covers 2 base techniques (base and cascading many-to-many relationship) and 6 design patterns to model different business scenarios:

  • Survey
  • Distinct Count
  • Multiple Groups
  • Cross-Time
  • Transition Matrix
  • Multiple Hierarchies

I hope to publish the paper on SQLBI.EU in a few weeks. In the meantime I am looking for peer reviewers that already have some knowledge and experience of the use of many-to-many relationships in a non-traditional way. If you are interested, please contact me using the contact link in this page.

UPDATE: PDF and demo are now available here.

SQL Prompt 2.0

I have being a user of PromptSQL since version 1.0. This utility was acquired by Red-Gate some months ago and today they shipped the version 2.0, renamed ad SQL Prompt 2.0.

SQL Prompt is fundamentally the ability to add Intellisense when you are editing T-SQL code (queries, stored procedures and so on) inside SQL Server Management Studio (but also Query Analyzer if you still rely on SQL 2000 tools). It works also inside other editors, like my favorite one (UltraEdit).

I really appreciate the improvements made between versions. It's still not a perfect product, but until September 1st it's free (I was a registered user of version 1.0, but I don't regret it). I expect a 3.0 version in the future that could further add features and then become a commercial product. Anyway, it deserves a try, especially if think how many time you asked why the Ctrl-Space doesn't work with SQL code.

MDX quality improvement in Excel 2007

I just had to profile MDX queries sent to SSAS from some client. Office Web Components (that are used in the cube browser of Management Studio) in many circumstances send a lot of MDX queries, increasing roundtrip between client and server, resulting in bad performance in tables with hundreds of rows (crossjoin between 3 attributes on rows and one measure on column).

I was very worried, until I saw the clean and beautiful MDX query that Excel 2007 send to Analysis Services 2005 to get the same results: great performance, a hand-written MDX query has the same performance (WARNING: this is not a benchmark but a small test on a very limited set of queries).

This week the Beta 2 of Excel 2007 could help me to make some customer happy...

SSIS service does not start with SP1: KB filed it as a bug

Some days ago I posted my troubles with SSIS after SP1 install.

Today Microsoft has published KB918644 where this behavior is officially described as a bug. It seems that product feedback is working well...

SSAS Dimension process fails with unexpected error MDDiscretizer::GetBucketForValue

The use of DiscretizationMethod=Clusters can cause a process error that is not so clear:

Internal error: Unexpected error. File 'mddiscretizer.cpp', row 1532, function 'MDDiscretizer::GetBucketForValue'.

This happens because you have an attribute with few distinct values to do a discretization in required clusters. To workaround the problem you can disable discretization of the incriminated attribute (you could have only one in the dimension, otherwise some SELECT COUNT( DISTINCT fieldname ) FROM Dimension could help you in the search).

Even if it would be preferable to process the dimension with a smaller number of clusters without blocking errors, it would be necessary at least to have a better error message that indicate the problem in discretization.

The problem affects both SSAS RTM and SSAS SP1. I already filed a bug in MSDN Product Feedback Center, but blogging it could help other unlucky developers.

SqlBulkCopy bug with tablename containg dots

I'm trying to use SqlBulkCopy removing a C++ DLL I wrote two years ago to create staging tables in SQL Server from remote databases, dynamically creating queries to get data basing on table structure I defined into the staging database. This tool helped me to avoid repetitive work into DTS world and it used bulk insert feature of ODBC drivers (!) and of course I writed the tool in C# with a really little part of Managed C++ to wrap native API calls.

I said that I'm trying because SqlBulkCopy has two problems: the first one is that is a 25% slower than my hand-made tool, and in a 4-hour long run it would be a really dangerouse increase; the second one is a nasty bug: I have a lot of tables defined from other people that contains dots (.) into table name, and SqlBulkCopy has a nasty bug that prevent you to use similar names for destination table. I posted a bug into MSDN Feedback Center (Bug ID FDBK44111)and I hope the power of blog would help me to find people who will help me to raise the importance of this bug. I'd like to write less code and to rely on platform code :-)

UPDATE: bug fixed, now we have to wait the next SP of .NET Framework

Memory configuration (no pitfalls!) in SSAS

I recently had some problem with memory configuration of SQL Server Analysis Services 2005 and I'd like to share lessons I got. This post is a replacement for my previous post, that I striked to keep history of some bad assumptions I initially made.

SSAS have some setting that controls the way memory is managed by SSAS2005. Default values probably doesn't take care of concurrent applications and you could want to change them on a server running other services. Settings I'm talking about are defined as parameters of a single instance of SSAS: please note that MS suggest to not touch these settings...

Memory / LowMemoryLimit: it defines the value of physical memory that defines the point at which server is low on memory. Values greater than 100 are absolute values (scientific notation is supported in the config file, msmdsrv.ini). Values under 100 are calculated as a percentage of memory: reference value of memory (that corresponds to 100) is calculated as the minimum between the physical memory and the virtual memory available for a process. We have several cases (for semplicity I show the memory limit calculation for an hypotetic 100 value of the parameter): a 8Gb RAM 32bit Windows has a 2Gb limit (or 3Gb if you have /3Gb option enabled); a 8Gb RAM 64bit Windows has a 8Gb limit; a 1Gb RAM 32bit Windows has a 1Gb limit.

Default is 75 (75%) that is good most of the time, but if you have other processes on the same machine, you could prefer to lower this setting. If you want to know what the actual and running absolute limit is, you can get the exact running value reading the MSAS 2005 Memory / Memory Limit High KB performance counter.

Memory / TotalMemoryLimit: it's a percentage of physical memory that the server will use at maximum. Behaviour and setting considerations are analogous to LowMemoryLimi..

I had notice that in some cases PSS suggested to lower these settings to avoid an Out Of Memory error of SSAS 2005. After my previous wrong assumptions, Eric J of Microsoft helped me to understand what BOL still doesn't explain well. Eric wrote some other hints that we can share and index for future references.

Eric: You should assume they AS2000 and SSAS2005 are completely different products.  There are some similarities in settings but many differences.

Eric: Out of memory error could have been caused by many things, but basically memory could not be allocated from the OS.  Changing config values also impacts the point when the cleaner thread removes "shrinkable" memory.  Watch the perfmon counter "MSAS 2005:Memory\Cleaner Memory Shrinkable KB", and nonshrinkable.

Eric: To observe effects while you change LowMemoryLimit settings you can:

1. Open msmdsrv.ini in notepad, change value, and save file.
2. Open perfmon, report setting, view "MSAS 2005:Memory\Memory Limit High KB"
3. Observe the above listed value.

I already suggested to Eric that BOL could be improved on this topic (existing information here are not so clear).

Memory configuration pitfalls in SSAS

I recently discovered a few pitfalls on memory configuration of SQL Server Analysis Services 2005: the main problem is that the default configuration (written by setup) doesn't consider the actual physical resources (mainly RAM) of your server and this can cause some trouble. Settings I'm talking about are defined as parameters of a single instance of SSAS: please note that MS suggest to not touch these settings...

UPDATE: I completely changed this post due to incorrect information I wrote. Please read correct information here.

Memory / LowMemoryLimit: it's a percentage of physical memory that defines the point at which server is low on memory. Default is 75 (75%) that's is not good in two cases. First, if you have other processes on the same machine, you could prefer to lower this setting. Second, really important: if you have 8Gb of RAM and a 32-bit operating system, than the limit is defined by default at 6Gb, a measure that a 32-bit process can't reach.. so SSAS will continue to allocate memory until an out of memory is raised by the operating system. A big problem! So if you have a similar configuration, consider to lower this setting to 18 (around 1.4Gb) or a bit higher if you enable the /3Gb option. To find the right number try to calculate the percentage of RAM memory necessary to have 1.4Gb as result.

Memory / TotalMemoryLimit: it's a percentage of physical memory that the server will use at maximum. Problems are analogous to LowMemoryLimit: in a 32bit server with more than 2Gb of RAM you would definitely lower this number. In a 8Gb server I put the value of 20 (1.6Gb). To find the right number try to calculate the percentage of RAM memory necessary to have 1.4Gb as result.

I've been conservative in my calculations. A /3Gb option should allow higher numbers than mine. A 64bit server is immune of this kind of miscalculations...

UPDATE: I received this comment directly from Eric J. of Microsoft. Your recent post regarding AS2005 memory usage is incorrect.  Memory values are calculated as percentage of min( virtual, physical ), or if greater than 100.0 an absolute value is used (scientific notation is supported in the config file).  The defaults have been found to work well in a variety of scenarios, but there are cases where lower or higher values would work better.  AS2005 administrators may want to monitor the perfmon counters in MSAS2005 Memory.

I need to further investigate on it. More info coming soon, I hope.