Is there an easy way to create and populate an age dimension with age groups?
Solution
Creating an age dimension is usually done once and probably not in SSIS, but with a TSQL script.
For each new assignment I use a script similar to this and adjust it to the requirements for that particular assignment.
-- Drop dimension table if exists
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[dim_age]') AND TYPE IN (N'U'))
BEGIN
 DROP TABLE [dbo].[dim_age]
END
-- Create table dim_age
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[dim_age](
  [dim_age_id] [int] IDENTITY(-1,1) NOT NULL,
  [Age] [smallint] NULL,
  [AgeGroup1] [nvarchar](50) NULL,
  [AgeGroup1Sort] [int] NULL,
  [AgeGroup2] [nvarchar](50) NULL,
  [AgeGroup2Sort] [int] NULL,
 CONSTRAINT [PK_dim_age] PRIMARY KEY CLUSTERED 
(
 [dim_age_id] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
-- Enter unknown dimension value (in case a person's date of birth is unknown)
INSERT INTO [dbo].[dim_age]
           ([Age]
           ,[AgeGroup1]
     ,[AgeGroup1Sort]
           ,[AgeGroup2]
     ,[AgeGroup2Sort])
     VALUES
           (-1
           ,'Unknown'
     ,0
           ,'Unknown'
     ,0)
GO
-- Enter all ages
declare @age smallint;
set @age = 0;
-- Loop through ages 0 to 130
WHILE @age < 131
BEGIN
 INSERT INTO [dbo].[dim_age]
  ([Age]
  ,[AgeGroup1]
  ,[AgeGroup1Sort]
  ,[AgeGroup2]
  ,[AgeGroup2Sort])
 VALUES
  (@age
  -- Use the common age groups/categories of your region/branch/industry
  -- This is just an example
  , CASE
    WHEN @age < 15 THEN '0 till 15 year'
    WHEN @age < 25 THEN '15 till 25 year'
    WHEN @age < 35 THEN '25 till 35 year'
    WHEN @age < 45 THEN '35 till 45 year'
    WHEN @age < 55 THEN '45 till 55 year'
    WHEN @age < 65 THEN '55 till 65 year'
    ELSE '65 year and older'
   END
  -- Add value to sort on in SSAS
  , CASE
    WHEN @age < 15 THEN 1
    WHEN @age < 25 THEN 2
    WHEN @age < 35 THEN 3
    WHEN @age < 45 THEN 4
    WHEN @age < 55 THEN 5
    WHEN @age < 65 THEN 6
    ELSE 7
   END
  , CASE
    WHEN @age < 19 THEN 'Juvenile'
    ELSE 'Mature'
   END
  -- Add value to sort on in SSAS
  , CASE
    WHEN @age < 19 THEN 1
    ELSE 2
   END
  )
 -- Goto next age
 set @age = @age + 1
END
![]()  | 
| The result: filled age dimension | 
How could you use this dimension?
A while ago I also posted an example to create and populate a date dimension. So now you can combine those in a datamart. I have an employee table and an absence table with a start- and enddate.
![]()  | 
| Employee table | 
![]()  | 
| Absence table | 
I will use the date dimension to split the absence time periods in separate days and then calculate the employee's age of each day of absence. This will go in to a fact table and then I can use the age dimension to see absence per age group.
-- Split absence time periode in separate days, but go back 2 years max and 1 year forward if end date is unknown
SELECT  Absence.AbsenceId
,   Absence.EmployeeNumber
--   Date of absence
,   dim_date.Date as AbsenceDate
,   Absence.ReasonCode
--   Calculation of age at time of absence
,   DATEDIFF(YEAR, Employee.DateOfBirth, dim_date.Date)
   -
   (CASE
    WHEN DATEADD(YY, DATEDIFF(YEAR, Employee.DateOfBirth, dim_date.Date), Employee.DateOfBirth)
     >  dim_date.Date THEN 1
    ELSE 0
   END) as Age
FROM  EmployeeApplication.dbo.Absence
INNER JOIN EmployeeApplication.dbo.Employee
   on Absence.EmployeeNumber = Employee.EmployeeNumber
INNER JOIN  DM_Staff.dbo.dim_date
   on dim_date.Date
   -- change start date to lower bound if it's below it
            BETWEEN CASE WHEN YEAR(Absence.AbsenceStartDate) >= YEAR(GETDATE()) - 2 THEN Absence.AbsenceStartDate
            ELSE DATEADD(yy, DATEDIFF(yy, 0, getdate()) - 2, 0) END
   -- change end date to upper bound if it's null
            AND ISNULL(Absence.AbsenceEndDate, DATEADD(yy, DATEDIFF(yy, 0, getdate()) + 2, -1))
--   Filter absence record with an enddate below the lower bound (perhaps a bit superfluous with the inner join)
WHERE  YEAR(ISNULL(Absence.AbsenceEndDate, GETDATE())) >= YEAR(GETDATE()) - 2
![]()  | 
| Result of query that can be used in a fact package | 
![]()  | 
| fact absence | 
Note: this is a simplified situation to keep things easy to explain.





No comments:
Post a Comment
Please use the SSIS MSDN forum for general SSIS questions that are not about this post. I'm a regular reader of that forum and will gladly answer those questions over there.
All comments are moderated manually to prevent spam.