April 2006 - Posts

SSAS Dimension process fails with unexpected error MDDiscretizer::GetBucketForValue

The use of DiscretizationMethod=Clusters can cause a process error that is not so clear:

Internal error: Unexpected error. File 'mddiscretizer.cpp', row 1532, function 'MDDiscretizer::GetBucketForValue'.

This happens because you have an attribute with few distinct values to do a discretization in required clusters. To workaround the problem you can disable discretization of the incriminated attribute (you could have only one in the dimension, otherwise some SELECT COUNT( DISTINCT fieldname ) FROM Dimension could help you in the search).

Even if it would be preferable to process the dimension with a smaller number of clusters without blocking errors, it would be necessary at least to have a better error message that indicate the problem in discretization.

The problem affects both SSAS RTM and SSAS SP1. I already filed a bug in MSDN Product Feedback Center, but blogging it could help other unlucky developers.

SSIS service does not start with SP1: workaround

SQL Server 2005 SP1 seems to stop many working installations of SSIS service. I lost some hour on this problem and only a few days after I discovered a good workaround here.

While you install SQL Server 2005, setup procedure brings you to use NETWORK SERVICE as account for SSIS service. SP1 verify signatures of the files getting a file from internet: it doesn't work if the server has no internet connectivity and it doesn't work if the user is NETWORK SERVICE.

Possible workarounds:

  • Change user account for SSIS Service to a regular (least privilege) user
  • Disable checking certificate revocation list for NETWORK SERVICE user

The registry hack is this:

Windows Registry Editor Version 5.00

[HKEY_USERS\S-1-5-20\Software\Microsoft\Windows\CurrentVersion\WinTrust\Trust Providers\Software Publishing]
"State"=dword:00023e00

Now, the big question is: what of those two techniques has to be considered safer? I would say that changing registry for NETWORK SERVICE account doesn't change much and should have less side effects than changing user account on a production machine, but sincerely I don't have a consolidate answer now, so comments are welcome.

Improve NTILE performance

Today I implemented a classification through a NTILE equivalent function. The T-SQL NTILE implementation has very bad perfomance when used against large datasets of rows, so I and Davide Mauri realized a better implementation that Davide posted some weeks ago on his blog. No reaction from Microsoft guys and no comments on Davide blog, so I retry here to catch more attention on this topic.

Instead to use NTILE function you can use this faster query:

SELECT 
   CustomerKey,
   CEILING(ROW_NUMBER() OVER (ORDER BY YearlyIncome ASC) / ((SELECT CAST(COUNT(*) + 1 AS FLOAT) FROM DimCustomer) / @n) ) AS NTileFast
FROM 
   
DimCustomer

Note that I prefer to use the RANK function instead than the ROW_NUMBER: even if I could get non-uniform distribution of data among clusters, I get all elements with the same value in the same cluster and in many scenario you can prefer this technique to avoid cases with random cluster assignment for items with value equal to a cluster limit. This is different from NTILE result, but I prefer it for my cluster assignment.

SELECT 
   CustomerKey,
   CEILING(RANK() OVER (ORDER BY YearlyIncome ASC) / ((SELECT CAST(COUNT(*) + 1 AS FLOAT) FROM DimCustomer) / @n) ) AS NTileFast
FROM 
   
DimCustomer

Please read this post of Davide Mauri for a complete explanation and let us know what do you think about it

 

SqlBulkCopy bug with tablename containg dots

I'm trying to use SqlBulkCopy removing a C++ DLL I wrote two years ago to create staging tables in SQL Server from remote databases, dynamically creating queries to get data basing on table structure I defined into the staging database. This tool helped me to avoid repetitive work into DTS world and it used bulk insert feature of ODBC drivers (!) and of course I writed the tool in C# with a really little part of Managed C++ to wrap native API calls.

I said that I'm trying because SqlBulkCopy has two problems: the first one is that is a 25% slower than my hand-made tool, and in a 4-hour long run it would be a really dangerouse increase; the second one is a nasty bug: I have a lot of tables defined from other people that contains dots (.) into table name, and SqlBulkCopy has a nasty bug that prevent you to use similar names for destination table. I posted a bug into MSDN Feedback Center (Bug ID FDBK44111)and I hope the power of blog would help me to find people who will help me to raise the importance of this bug. I'd like to write less code and to rely on platform code :-)

UPDATE: bug fixed, now we have to wait the next SP of .NET Framework

Microsoft acquires ProClarity

The news is that Microsoft announced it has agreed to acquire ProClarity.

Now a decent client for Analysis Services 2005 will be available from Microsoft, and I hope that a new commercial licensing scheme will allow to small companies to get a mixed combination of (many) Excel and (few) ProClarity desktop clients. I often have customers who have no more than 2/3 users that could really leverages on ProClarity features, while other can use Excel or predefined (Reporting Services) reports to see daily updated data.

But I think the real reason for the acquisition is to get a really strong and consistent portfolio of BI products to play very hard into the growing BI market.