Is there an easy way to create and populate an age dimension with age groups?
Solution
Creating an age dimension is usually done once and probably not in SSIS, but with a TSQL script.
For each new assignment I use a script similar to this and adjust it to the requirements for that particular assignment.
-- Drop dimension table if exists IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[dim_age]') AND TYPE IN (N'U')) BEGIN DROP TABLE [dbo].[dim_age] END -- Create table dim_age SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[dim_age]( [dim_age_id] [int] IDENTITY(-1,1) NOT NULL, [Age] [smallint] NULL, [AgeGroup1] [nvarchar](50) NULL, [AgeGroup1Sort] [int] NULL, [AgeGroup2] [nvarchar](50) NULL, [AgeGroup2Sort] [int] NULL, CONSTRAINT [PK_dim_age] PRIMARY KEY CLUSTERED ( [dim_age_id] ASC ) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO -- Enter unknown dimension value (in case a person's date of birth is unknown) INSERT INTO [dbo].[dim_age] ([Age] ,[AgeGroup1] ,[AgeGroup1Sort] ,[AgeGroup2] ,[AgeGroup2Sort]) VALUES (-1 ,'Unknown' ,0 ,'Unknown' ,0) GO -- Enter all ages declare @age smallint; set @age = 0; -- Loop through ages 0 to 130 WHILE @age < 131 BEGIN INSERT INTO [dbo].[dim_age] ([Age] ,[AgeGroup1] ,[AgeGroup1Sort] ,[AgeGroup2] ,[AgeGroup2Sort]) VALUES (@age -- Use the common age groups/categories of your region/branch/industry -- This is just an example , CASE WHEN @age < 15 THEN '0 till 15 year' WHEN @age < 25 THEN '15 till 25 year' WHEN @age < 35 THEN '25 till 35 year' WHEN @age < 45 THEN '35 till 45 year' WHEN @age < 55 THEN '45 till 55 year' WHEN @age < 65 THEN '55 till 65 year' ELSE '65 year and older' END -- Add value to sort on in SSAS , CASE WHEN @age < 15 THEN 1 WHEN @age < 25 THEN 2 WHEN @age < 35 THEN 3 WHEN @age < 45 THEN 4 WHEN @age < 55 THEN 5 WHEN @age < 65 THEN 6 ELSE 7 END , CASE WHEN @age < 19 THEN 'Juvenile' ELSE 'Mature' END -- Add value to sort on in SSAS , CASE WHEN @age < 19 THEN 1 ELSE 2 END ) -- Goto next age set @age = @age + 1 END
The result: filled age dimension |
How could you use this dimension?
A while ago I also posted an example to create and populate a date dimension. So now you can combine those in a datamart. I have an employee table and an absence table with a start- and enddate.
Employee table |
Absence table |
I will use the date dimension to split the absence time periods in separate days and then calculate the employee's age of each day of absence. This will go in to a fact table and then I can use the age dimension to see absence per age group.
-- Split absence time periode in separate days, but go back 2 years max and 1 year forward if end date is unknown SELECT Absence.AbsenceId , Absence.EmployeeNumber -- Date of absence , dim_date.Date as AbsenceDate , Absence.ReasonCode -- Calculation of age at time of absence , DATEDIFF(YEAR, Employee.DateOfBirth, dim_date.Date) - (CASE WHEN DATEADD(YY, DATEDIFF(YEAR, Employee.DateOfBirth, dim_date.Date), Employee.DateOfBirth) > dim_date.Date THEN 1 ELSE 0 END) as Age FROM EmployeeApplication.dbo.Absence INNER JOIN EmployeeApplication.dbo.Employee on Absence.EmployeeNumber = Employee.EmployeeNumber INNER JOIN DM_Staff.dbo.dim_date on dim_date.Date -- change start date to lower bound if it's below it BETWEEN CASE WHEN YEAR(Absence.AbsenceStartDate) >= YEAR(GETDATE()) - 2 THEN Absence.AbsenceStartDate ELSE DATEADD(yy, DATEDIFF(yy, 0, getdate()) - 2, 0) END -- change end date to upper bound if it's null AND ISNULL(Absence.AbsenceEndDate, DATEADD(yy, DATEDIFF(yy, 0, getdate()) + 2, -1)) -- Filter absence record with an enddate below the lower bound (perhaps a bit superfluous with the inner join) WHERE YEAR(ISNULL(Absence.AbsenceEndDate, GETDATE())) >= YEAR(GETDATE()) - 2
Result of query that can be used in a fact package |
fact absence |
Note: this is a simplified situation to keep things easy to explain.