Rare is the OLAP cube that does not include a so-called Time dimension –
a dimension that allows data to be analyzed over time. Though often named simply
“Time”, the typical Time dimension is actually a date dimension,
with a hierarchy consisting of Year-Quarter-Month-Day or the like.
Despite the unique ever-presence of Time, this is the one dimension that developers
who are new to SQL Server Analysis Services (a.k.a Microsoft OLAP) often get
wrong. This article will explain the correct way to implement Time dimensions
in Analysis Services, and the benefits thereof.
Readers are expected to be familiar with Analysis Services, fact tables, dimension
tables, and star schemas.
The Typical Mistake
Consider the following star schema. This is the basis for a
cube that will contain three dimensions: Time, Product, and Customer.
Figure 1 – Star schema lacking a time dimension table
This is a typical example, albeit a simplified one, of a schema implemented
by a novice OLAP developer. Notice there is a dimension table for Product and
a dimension table for Customer, but not for Date; data is being derived from
the fact table. There are many drawbacks to this design, as explained below.
The Pitfalls
The lack of a Time dimension table leads to several pitfalls, some of which
may not be evident until after the database goes into production:
No Control of Dimension Membership
When the Time dimension is derived from the fact table, fact instances determine
the membership of the dimension. When facts determine dimension membership,
you can wind up with unexpected (and undesired) gaps in the dimension.
Suppose your database is used to analyze Web site usage. Further suppose your
Web site goes down for a day, hence there are no entries in the fact table for
this date. The missing day will not exist in the OLAP dimension. This day of
non-activity may be important to users of the cube, something you want the ability
to view or report. Only with a dimension table do you have control over dimension
membership.
Slower to Process
When the Time dimension is derived from the fact table there are extra steps
required as part of the regular cube processing cycle. It becomes necessary
to perform an incremental update on the Time dimension, as well as the cube.
This takes extra processing time, perhaps significant in the case of large fact
tables.
Limited Control of Member Names
When the Time dimension is derived as a date-time column in the fact table,
Analysis Services derives the member names, over which you have limited control.
Take quarters for instance. Here are some common names you may wish to assign
to a quarter:
- Q1
- Qtr1
- 2004-Q1|
- FY2004-Q1 (where FY is short for “fiscal year”)
By default, Analysis Services will assign the name Qtr1. While it is possible
to modify the expression that defines the name, it takes complex gyrations to
generate the last two examples.
Unable to Implement Non-Standard Rollups
When Analysis Services derives a time hierarchy (e.g., Year-Quarter-Month)
from a date-time column, the developer has no control of the rollups. For example,
January 1 will always rollup to January. At first glance, this may seem good,
but there are many situations where flexibility is a requirement. None of the
following examples of non-standard rollups are possible when the time hierarchy
is derived from a date-time column:
- Financial applications often divide a year into 13 four-week accounting
periods, as opposed to 12 months.
- There are businesses and even entire industries (e.g., the advertising industry)
where a month begins on a date other than the first for reporting and/or accounting
purposes. For example, when the month is defined as beginning on the 7th,
February 1-6 must rollup to January.
- Suppose the time hierarchy must include a Weeks
level. If derived from a date-time column, you have no control of what day
should be considered the start of week.
Unable to Assign Member Properties
Suppose you need to analyze Web site activity that occurs on holidays. Naturally,
this requires that the database be aware of which dates are holidays. This requires
a Holiday Y/N attribute assigned to dates, which is
not practical without a dimension table.
OLAP member properties are derived from attributes in the dimension table.
It is possible to assign attributes to dates only with a dimension table, which
in turn enables special time-based reporting and analysis. Even if this is not
an immediate requirement, it is critical that your database be designed to support
this capability in the future should the requirement emerge.
Dimension Can’t Be Shared
When a time dimension is derived directly from the fact table, the dimension
cannot be shared with cubes that are derived from different fact tables. As
a result, you will wind up with multiple time dimensions, each requiring care
and feeding. It is practical to implement a shared Time dimension only with
a dimension table.
To drive home the value of shared dimensions, be aware that a shared dimension
can have different granularity in different cubes. This permits, for example,
a shared Time dimension with a Year-Quarter-Month-Day hierarchy to have Day
granularity in Cube A, and Month granularity in Cube B.
Time Done Right
Here’s a revised star schema that includes a Time dimension table. Notice
that time is treated no different from the other dimensions. Like Product and
Customer, a foreign key in the fact table joins to the Time dimension table.
Figure 2 – Star schema with Time dimension table
Typically, as with other dimensions, an integer is used as the primary key/foreign
key for the dates. This is not required; you can use a date-time column instead.
An integer will result in a leaner fact table though. However, it is important
to note that in order to achieve flexibility of member names, and flexibility
of rollups, that the date-time column not be used as the basis for the
dimension. That’s the reason for the various Name and Key columns.
There is one important caveat: Contrary to common practice, it is my belief
that denormalized designs, like TimeDim in Figure 2, are to be avoided. (Common
practice does not necessarily equate to best practice after all.) In order to
really do time right, this table should be snowflaked into separate year, month,
quarter, and day tables.
What’s Missing?
Here are some additional columns that might be applicable in your time dimension
table:
- Fiscal period names and keys -- FYYearName, FYYearKey, FYQtrName, FYQtrKey,
etc.
- DayOfWeek (e.g., Monday)
- Attributes such as IsHoliday
Building the OLAP Dimension
To fully realize the benefits of a Time dimension table, you must build the
dimension correctly within Analysis Manager. Take note, because this requires
some counter-intuitive inputs.
When using the dimension wizard, once you have specified the dimension table,
you are prompted to specify if you are building a Time dimension. (This prompt
only occurs when there is a date-time column in the dimension table, hence you
may not get the prompt, in which case this is a non-issue.) Do not check
Time Dimension; do check Standard
Dimension. This in turn displays the same user interface that is used
to create all other dimensions, where the levels are specified one by one, vs.
the one-click interface used to define (less flexible) time dimensions. This
is where you gain the benefit of specifying the member names, and gain the benefit
of non-standard rollups, per the hierarchies defined in the table.
An “Official” Time Dimension
This begs the question “What is special about time?” The answer
is, "almost nothing." Analysis Services possesses no special awareness
of time.
But, you might ask, what about the myriad date-time functions contained in
the MDX language (the query/formula language used in Analysis Services), such
as LastPeriods, ParallelPeriod,
and the like? Guess what, these functions can be used with any dimension, not
just time, illogical though it may be to do so. This raises the one trivial
area where there is significance to an “official” Time dimension:
With an official Time dimension, certain arguments for certain MDX date-time
functions become optional arguments.
In addition, it is conceivable that a given Analysis Services front-end will
treat an official Time dimension differently than a non-Time dimension. Thus,
to anoint your hand-rolled Time dimension as an official Time dimension, perform
these inputs in the Analysis Services dimension editor, bearing in mind the
benefits are superficial:
- Select the dimension object and change the Type
property to Time.
- Select each level one by one, and set the Type property
accordingly (Year, Month, etc.).
Populating the Dimension
With a Time dimension table in place, it is incumbent on the OLAP developer
to populate it. If historical data is being added to the cube, you will backfill
the dimension with historical dates. Going forward, it is typical to add a year’s
worth of new dates in advance of the new year, rather than adding new dates
as part of a daily or weekly load routine.
In the case of planning applications (e.g., forecasting), a Time dimension
table makes it possible to create an OLAP planning model in advance of fact
instances. That is, you can populate the Time dimension with future dates. This
creates an empty cell space in the cube where formulas can be used to generate
a plan, similar conceptually to the way that spreadsheets are used.
What’s the Key?
It is common that the keys for the Time dimension table are sequential numbers
starting with 1. This is inane for two reasons:
- This precludes, or at least renders difficult, the task of backloading historical
data. Even if backloading is not part of the original requirement, it could
emerge as a future requirement.
- The benefits of logically formed keys are not obviated just because this
is a data warehouse. Assuming day granularity, it is recommended that the
key be a big integer in the form of YYYYMMDD.
Include the All Member?
By default, all dimensions have a system-generated level that contains a system
generated All member — a member that represents
the grand total for the dimension. You may want to exclude the All
level/member from your Time dimension. That’s because a grand total for
time may represent an arbitrary range of dates; it may be a number you do not
wish to report. To exclude All, select the dimension
object in the dimension editor and set the All Level
property to False.
It’s a good idea to make this decision up-front, prior to the rollout
of the database. When you take away the All member,
it is possible MDX formulas will have to be revised.
However, if you do exclude the All level/member, be
aware that any query that doesn’t explicitly reference the Time dimension
will default to the first member at the top level (e.g., the oldest year), whereas
it is probably desirable to default to the last member (e.g., the most recent
year). This behavior can be overridden in the dimension editor. With the dimension
object selected, change the Default Member property.
(It’s possible to use a dynamic MDX expression so that yearly maintenance
is not required, a topic outside the scope of this article.)
Conclusion
To summarize the main recommendations:
- Always use a dimension table for time; do not derive the dimension
from the fact table.
- Do not derive the dimension from a date/time column.
- Use a meaningful key for dates, e.g., YYYYMMDD. This makes it easier to
backload historical data.
- Decide in advance if the All member is meaningful;
remove it, if not.
Tom is a software developer, database consultant, author and educator. He has
been focused in the field that has come to be known as Business Intelligence
since 1990. Tom has created software products that were published by Microsoft
(Open EIS Builder) and Hyperion, formerly Arbor Software (Open Budget Pak, EssReporter).
He can be reached via his Web site: www.tomchester.net.