Saturday, 24 May 2014

Create and fill Age dimension

Case
Is there an easy way to create and populate an age dimension with age groups?

Solution
Creating an age dimension is usually done once and probably not in SSIS, but with a TSQL script.
For each new assignment I use a script similar to this and adjust it to the requirements for that particular assignment.

-- Drop dimension table if exists
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[dim_age]') AND TYPE IN (N'U'))
BEGIN
 DROP TABLE [dbo].[dim_age]
END

-- Create table dim_age
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[dim_age](
  [dim_age_id] [int] IDENTITY(-1,1) NOT NULL,
  [Age] [smallint] NULL,
  [AgeGroup1] [nvarchar](50) NULL,
  [AgeGroup1Sort] [int] NULL,
  [AgeGroup2] [nvarchar](50) NULL,
  [AgeGroup2Sort] [int] NULL,
 CONSTRAINT [PK_dim_age] PRIMARY KEY CLUSTERED 
(
 [dim_age_id] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

-- Enter unknown dimension value (in case a person's date of birth is unknown)
INSERT INTO [dbo].[dim_age]
           ([Age]
           ,[AgeGroup1]
     ,[AgeGroup1Sort]
           ,[AgeGroup2]
     ,[AgeGroup2Sort])
     VALUES
           (-1
           ,'Unknown'
     ,0
           ,'Unknown'
     ,0)
GO

-- Enter all ages
declare @age smallint;
set @age = 0;

-- Loop through ages 0 to 130
WHILE @age < 131
BEGIN
 INSERT INTO [dbo].[dim_age]
  ([Age]
  ,[AgeGroup1]
  ,[AgeGroup1Sort]
  ,[AgeGroup2]
  ,[AgeGroup2Sort])
 VALUES
  (@age
  -- Use the common age groups/categories of your region/branch/industry
  -- This is just an example
  , CASE
    WHEN @age < 15 THEN '0 till 15 year'
    WHEN @age < 25 THEN '15 till 25 year'
    WHEN @age < 35 THEN '25 till 35 year'
    WHEN @age < 45 THEN '35 till 45 year'
    WHEN @age < 55 THEN '45 till 55 year'
    WHEN @age < 65 THEN '55 till 65 year'
    ELSE '65 year and older'
   END
  -- Add value to sort on in SSAS
  , CASE
    WHEN @age < 15 THEN 1
    WHEN @age < 25 THEN 2
    WHEN @age < 35 THEN 3
    WHEN @age < 45 THEN 4
    WHEN @age < 55 THEN 5
    WHEN @age < 65 THEN 6
    ELSE 7
   END
  , CASE
    WHEN @age < 19 THEN 'Juvenile'
    ELSE 'Mature'
   END
  -- Add value to sort on in SSAS
  , CASE
    WHEN @age < 19 THEN 1
    ELSE 2
   END
  )

 -- Goto next age
 set @age = @age + 1
END

The result: filled age dimension
























How could you use this dimension?
A while ago I also posted an example to create and populate a date dimension. So now you can combine those in a datamart. I have an employee table and an absence table with a start- and enddate.
Employee table

Absence table



















I will use the date dimension to split the absence time periods in separate days and then calculate the employee's age of each day of absence. This will go in to a fact table and then I can use the age dimension to see absence per age group.
-- Split absence time periode in separate days, but go back 2 years max and 1 year forward if end date is unknown
SELECT  Absence.AbsenceId
,   Absence.EmployeeNumber
--   Date of absence
,   dim_date.Date as AbsenceDate
,   Absence.ReasonCode
--   Calculation of age at time of absence
,   DATEDIFF(YEAR, Employee.DateOfBirth, dim_date.Date)
   -
   (CASE
    WHEN DATEADD(YY, DATEDIFF(YEAR, Employee.DateOfBirth, dim_date.Date), Employee.DateOfBirth)
     >  dim_date.Date THEN 1
    ELSE 0
   END) as Age
FROM  EmployeeApplication.dbo.Absence
INNER JOIN EmployeeApplication.dbo.Employee
   on Absence.EmployeeNumber = Employee.EmployeeNumber
INNER JOIN  DM_Staff.dbo.dim_date
   on dim_date.Date
   -- change start date to lower bound if it's below it
            BETWEEN CASE WHEN YEAR(Absence.AbsenceStartDate) >= YEAR(GETDATE()) - 2 THEN Absence.AbsenceStartDate
            ELSE DATEADD(yy, DATEDIFF(yy, 0, getdate()) - 2, 0) END
   -- change end date to upper bound if it's null
            AND ISNULL(Absence.AbsenceEndDate, DATEADD(yy, DATEDIFF(yy, 0, getdate()) + 2, -1))
--   Filter absence record with an enddate below the lower bound (perhaps a bit superfluous with the inner join)
WHERE  YEAR(ISNULL(Absence.AbsenceEndDate, GETDATE())) >= YEAR(GETDATE()) - 2


Result of query that can be used in a fact package





















fact absence


























Note: this is a simplified situation to keep things easy to explain.

No comments:

Post a Comment

Please use the SSIS MSDN forum for general SSIS questions that are not about this post. I'm a regular reader of that forum and will gladly answer those questions over there.

All comments are moderated manually to prevent spam.