October 2006 - Posts

Unicode, VARCHAR, NVARCHAR and index usage in SQL Server

This is not the first time I found non-intuitive issues with collation, Unicode and 8-bit strings with SQL Server, but this time I discovered (more precisely: one of my customers discovered and asked me...) a strange behavior that affects the query plan and the overall query optimization strategy.

This was my original post to Microsoft Connect:

A SQL query with a WHERE condition does not use the existing index if the comparison is made against a Unicode constant string. It happens when the field has a collation setting different than the current one (i.e. field has SQL_Latin1_General_CP1_CI_AS where server default is now Latin1_General_CI_AS).
Query plan use Clustered Index Scan instead of Clustered Index Seek operation.
Performance become very slow when the query is made many times into a loop (not a good practice, but the program can't be changed).
Using ANSI string (8 bit) the problem disappear and query plan shows the use of a Clustered Index Seek Operation

The answer is detailed:

This is not about the collation but about the difference between 8-bit and 16-bit character set.
In fact if you use the default collation (remove "COLLATE SQL_Latin1_General_CP1_CI_AS" from the column definition) then the plans do not change.
There is no generalized valid way to down-convert a 16-bit string to an 8-bit string but an 8-bit string is required to perform a seek because that is the type of the index.
The scan plan works because the 8-bit strings in the table are upconveted to 16-bit and compared to the 16-bit constant.
If you use nvarchar in the table then both forms of the predicate will support index seek (since the 8-bit constant in the predicate can be up-converted to 16-bit).
Thanks,
SQL Server dev.

I well understand all of this, and apparently the issue doesn't affect many of us. However, think about it: when you build a Data Warehouse, for a lot of reasons you could have to manage a VARCHAR field instead of a NVARCHAR one. If they are on the staging database, you have to be careful in comparing them with constants (it appears to be simple). If you decide to build a dimension using VARCHAR types instead of NVARCHAR, the user who access the database could use a query tool which use Unicode constants for user input. Hum... very dangerous, angry users could blame on you for this.

Now, guess what? SQL Server Reporting Services 2005 build parametric queries using sp_executesql statement, putting all constants with a N (for Unicode) prefix. Can you hear the user screaming now?

If I wasn't clear, you can read the feedback here: a repro script is included.

Now, while I can find a workaround here (probably a stored procedure can convert the Unicode constant into the corresponding 8-bit version), I am asking if this only issue should encourage you to move dimension fields to NVARCHAR type. I know all implications and I know that dimensions size

SSAS suggestions for Katmai: perspectives and drillthrough

I just added two suggestion items to my feedback for SSAS2005.

The first suggestion is about the perspectives: it would be useful to select member filters and default members on each single perspective. Another important thing could be to be able to rename a dimension/attribute/measure into a perspectve (then you need to add perspective to translation system, so it would be good to translate only the changed descriptions).

The second suggestion is about drillthrough: today you can drillthrough only the real measures. It would be better to drillthrough also the calculated measures: to do that, it would be useful an MDX expression that defines a subcube (of "real" cells and not calculated ones) to be used for drillthrough in place of the original selection/cell.

If you agree, vote and give feedback to these suggestions!

SQL Refactor in public beta

I am trying the beta of SQL Refactor (available here). It is really helpful to refactor the layout of a query and I like to do this in at least three scenarios: when I read a query written by other people, when I need to modify a query in a Reporting Services report and edited in a the graphical query design mode, when I want to clean a query written too fast for a test that becomes a stable and reusable query.

I would like to get some more functionality in the layout options, I already posted the most important one and I hope they will implement it in the RTM or at least in a future release.

Using MSOLAP as a linked server

I just faced the problem to connect an Analysis Services (2000 or 2005) to a SQL Server 2005 database using a OPENROWSET statement. If you face an "access denied" error (precisely, "The OLE DB provider "MSOLAP" for linked server "SERVERNAME" reported an error. Access denied.") the solution is to enable the "Allow inprocess" flag in MSOLAP Provider Options.

You can modify provider option through SQL Server Management Studio, Server Objects / Linked Servers / Providers / MSOLAP, right-click and edit Properties.

When you get a result for an MDX query, the resulting column names are composed by dimension+levels+property name, so if you would like a cleaner naming you can use the WITH statement to define column names in a fast way. E.G.

WITH A (Customer, Product, Year, Sales )
SELECT * FROM OPENROWSET( 'MSOLAP', 'DATASOURCE=localhost; Initial Catalog=FoodMart 2000;',
'...[place here your MDX query]...' )

DSV and Views: how to decouple the physical tables from UDM

I just answered to a newsgroup question about the use of views as fact tables to populate cubes.

I am a strong supporter of the need to build a star schema before to build a UDM, while I understand that in some circumstances (few data, fast prototypes) you can think to build a similar model through views on the source database, without an ETL in the middle.

That said, when you have a regular and well defined star schema, you still may have doubts about the use of views (in the database) and named queries (in the DSV) to decouple the UDM model from physical tables. The reason for doing that is to simplify the maintainance of the solution. After years of experience (AS2000 had similar scenario) and tests in many combinations of the factors, I have these guidelines.

  1. Unless you have only a few Mb of database, it is ALWAYS BETTER to create your data mart to feed dimensions and cubes
  2. When you present dimension and fact tables to SSAS, it is better to create a VIEW on the database that is easy to maintain if you have to implement some logic on the query to present data to SSAS (for example, a join to decode some fields - it should not be the standard case, but it happens in the lifetime of a project when you need to add some attribute and temporarily want to show data that are still not implemented well in the denormalized star schema) - the ideal case is that all these views are SELECT * FROM table, because you only decouple the physical fact table from the logical view for SSAS
  3. Use named query on DSV only to apply some cosmetic change to data just to correctly populate dimensions and cubes. For example, a lastname + ' ' + firstname expression is right to use here, while I can't see many reasons to use a WHERE or a JOIN, because it imply some transformation logic that is better to place into the VIEW (that could be used by  other SQL queries, for example for Reporting Services).

What is your opinion? Feedback and comments would be very appreciated.