November 2007 - Posts

Generate date dimension that support working days calculation

I just wrote a T-SQL query based on CTE that generates support information to calculate working days in a period. Simply look at the WorkingDaySequential measure.

In the real world:

  • The Holidays CTE could be replaced by a real tabl with user-inserted data.
  • The temporary [#Calendar] coule be a real table, providing necessary data access from DM

and probably much else.

The following code is provided "as is", without any warranty about its behavior.

 

SET STATISTICS IO ON
GO

SET DATEFIRST 7  -- Default is 7 (US / Sunday) 
GO

DECLARE @StartYear AS INT 
DECLARE @EndYear AS INT 

SET @StartYear = 2006 
SET @EndYear = 2015 

WITH    Holidays
          AS ( -- Italian Holidays (use year 1900 for recurrencies dates)
               SELECT   Date = CAST('19000101' AS DATETIME)
               UNION ALL
               SELECT   Date = CAST('19000106' AS DATETIME)
               UNION ALL
               SELECT   Date = CAST('19000425' AS DATETIME)
               UNION ALL
               SELECT   Date = CAST('19000501' AS DATETIME)
               UNION ALL
               SELECT   Date = CAST('19000602' AS DATETIME)
               UNION ALL
               SELECT   Date = CAST('19000815' AS DATETIME)
               UNION ALL
               SELECT   Date = CAST('19001101' AS DATETIME)
               UNION ALL
               SELECT   Date = CAST('19001208' AS DATETIME)
               UNION ALL
               SELECT   Date = CAST('19001225' AS DATETIME)
               UNION ALL
               SELECT   Date = CAST('19001226' AS DATETIME)
               UNION ALL
       -- Holidays changing date each year
               SELECT   Date = CAST('20060417' AS DATETIME)
               UNION ALL
               SELECT   Date = CAST('20070409' AS DATETIME)
             ) ,
        Years
          AS ( SELECT   YYYY = @StartYear
               UNION ALL
               SELECT   YYYY + 1
               FROM     Years
               Where    YYYY < @EndYear
             ) ,
        Months
          AS ( SELECT   MM = 1
               UNION ALL
               SELECT   MM + 1
               FROM     Months
               WHERE    MM < 12
             ) ,
        Days
          AS ( SELECT   DD = 1
               UNION ALL
               SELECT   DD + 1
               FROM     Days
               WHERE    DD < 31
             ) ,
        DatesRaw
          AS ( SELECT   YYYY = YYYY,
                        MM = MM,
                        DD = DD,
                        ID_Date = YYYY * 10000 + MM * 100 + DD,
                        DateString = CAST(YYYY * 10000 + MM * 100 + DD AS VARCHAR),
                        Date = CASE WHEN ISDATE(YYYY * 10000 + MM * 100 + DD) = 1
                                    THEN CAST(CAST(YYYY * 10000 + MM * 100 + DD AS VARCHAR) AS DATETIME)
                                    ELSE NULL
                               END
               FROM     Years
               CROSS JOIN Months
               CROSS JOIN Days
               WHERE    ISDATE(YYYY * 10000 + MM * 100 + DD) = 1
             )
    SELECT  DatesRaw.*,
            DayOfWeek = DATEPART(dw, DatesRaw.Date),
            CalendarDaySequential = CAST(DatesRaw.Date AS INT),
            WorkingDay = CAST(CASE DATEPART(dw, DatesRaw.Date)
                                WHEN 1 THEN 0 -- Sunday
                                WHEN 7 THEN 0 -- Saturday
                                ELSE 1
                              END AS BIT)
    INTO    #Calendar
    FROM    DatesRaw
    LEFT JOIN Holidays recurring
            ON recurring.Date = DATEADD(Year, 1900 - YEAR(DatesRaw.Date), DatesRaw.Date)
    LEFT JOIN Holidays fixed
            ON fixed.Date = DatesRaw.Date
GO

-----------------------------------------------------------------------------------------------------------

SELECT *, WorkingDaySequential = ( SELECT COUNT(WorkingDay) FROM #Calendar wd3 WHERE wd3.CalendarDaySequential <= wd1.CalendarDaySequential AND wd3.WorkingDay = 1 ) FROM #Calendar wd1 ORDER BY ID_Date GO
 



Cross-posted from SQLBlog! - http://www.sqlblog.com


SQL Server Virtualization

I just read the SQL Server Performance in a VMware Infrastructure 3 Environment performance study published by VMware (thanks to virtualization.info for the link).

Often I've been asked if it is a good move having a SQL Server in production in a virtual machine. As every consultant, my first answer is "it depends". And I really think this, it's not only a way to avoid the question!

If we look at an OLTP application, then my personal answer is that it could be a good move, especially if my hardware is so much larger than the one necessary to handle the daily workload. Numbers offered by VMware performance study are pretty good, considering that they are working on a reasonably hardware (not a monster one) that you can find in the real world. However, this paper simulates an OLTP scenario and not a Data Warehouse one. One of the signs for this assumption is the presence of a RAID 5 data storage, which could be a real bottleneck for ETL activities (and this is a good topic for another post in the future - I will not discuss about it here).

Until now, I always prefer working on a real hardware for both production AND especially development/test environments for a BI solution, at least for the back-end part (SQL/SSIS/SSAS). The reason is based on economics: a minimum VM for this kind of activities requires at least 4 cores, better if 8 cores (as a minimum), and at least 8 GB of RAM. Especially during execution of SSIS packages and process of SSAS cubes, all resources (I/O, memory and CPUs) are stressed at their limits. While a VM could be handled in a production environment just to balance resources between daily operations and night batches (but it would be like having a server that is turned off during the day - sometimes you can afford it, sometimes not), I'm not comfortable in using a VM like this for development. The reason is that the workload affects all concurrent VMs and if you virtualize a server, it is supposed that you keep other servers on the same machine.

At the end, until now I've found more affordable having dedicated servers for development instead of using virtualized ones. I know that this situation could change in the future. Virtualization technology is fast evolving and (more important) hardware cost for multiple cores and large-RAM-provided servers will be so much cheaper that the "entry-level" server in a data center will be too powerful to be used at the maximum of their capacity many data warehouse installations.

What is your experience here? Do you have a development environment for your data warehouse completely virtualized or not? What do you think about it? I'd like to get feedback on this.




Cross-posted from SQLBlog! - http://www.sqlblog.com