Sunday, 27 November 2011

Long Validation of SSIS Packages in Visual Studio

Case
Opening my package in BIDS lasts for an eternity because the package validation takes forever. Is there a way to prevent long validation on opening?

Solution
SSIS validates all connections and the related components to make sure the metadata is valid. When you have a slow connection or just a very big package it will take some time to validate (up to several minutes). A big package could be divided in a couple of smaller packages, but the speed of your connections is usually something you can’t change.

Here are a couple of solutions/workarounds:
- Work Offline;
- DelayValidation;
- ValidateExternalMetadata;
- Offline Connection Managers.

Work Offline
There is an option in the SSIS menu called Work Offline. This project level property prevents validation if your connection is unavailable (or very slow). You must set it before opening a package.
Work Offline


















There are two concerns/downsides for this solution:
1) You can’t debug/run your package when you are in offline mode (of course, can’t do anything about that).
2) This offline property is only available when you have a SSIS project opened and if you closed the project while there where still packages opened, then it will reopen these packages when the project is loaded in BIDS… That will prevent you to switch to Offline before opening the packages.
You can overcome that by locating the [project].dtproj.user file in the project folder. Open this file in notepad and change the <OfflineMode> property from false to true.
<OfflineMode>












Also see of vote for this change request at Microsoft Connect: Open package in offline mode.

DelayValidation
Setting the DelayValidation property to true will prevent validation at design time. This property is available for the package, containers, connections and tasks.
Delay Validation






















There are two concerns/downsides for this solution:
1) Setting it on package level won’t help. Best thing is to set it on all tasks that use the slow connection.
2) On runtime the package will normally validate twice. Always right before executing a task and if DelayValidation is set to false it will also be validated up front. This will catch problems before the package runs for a long time. If your package runs for example 10 hours it would be very disappointing to have a validation error at the end.

So you have to keep that in mind. You could set this property to true while developing the packages and change it back to true before deploying it on the server.

ValidateExternalMetadata
This solution is similar to the DelayValidation property solution. While the DelayValidation property can only be set for the complete data flow task, this property can be set for individual data flow components. Set it to false to gain a shorter package opening time.

ValidateExternalMetadata























There are two concerns/downsides for this solution:
1) If a lot of components in the data flow task use the same slow connection, it could be a bit annoying to set it for all these components. In that case the DelayValidation property is probably faster.
2) Components that have this property on false will not be aware of changes in the metadata of external data sources.


Offline Connection Managers
This is a new option in Denali CTP 3 (SQL Server Integration Services 2012). It will allow you set a single connection to work offline by right clicking the connection manager. See Matt Massons blog post for more info about this new feature.
Offline Connection Managers






















There are two concerns/downsides for this solution:
1) You have to open the package before you can set this option. A workaround for that could be to first set the project to work offline before opening the package.
2) If you close the project all connection managers will be reset to work online.


Let me know if you have an other solution/workaround. Also see: Troubleshooting Package Development.

Thursday, 3 November 2011

How to pivot in SSIS

Case
The Pivot Transformation in SSIS has a big drawback: It has no interface which makes it hard to work with, even for seasoned users.

I have a CSV source that looks like this:
Unpivoted data 















And I want to transfer it to this:
Pivoted data







How does the Pivot Transformation work?

Solution
First vote for this feedback on Microsoft Connect because even in Denali (ctp3) there is still no good gui and that's a bit of a shame!
* UPDATE 17 November: Pivot GUI for SQL Server 2012 RC0 *

1) Source
For this example I use a CSV file. Add a Flat File Source for the file.
CSV as Source
















2) Sort
The data needs to be sorted on the key column. In this case that is the day column (it will be the unique key after the pivot). If your source is a database table, then you should add an order by clause in the source query. If your source can't be sorted, then you have to use a Sort Component. (see Performance Best Practices step 5)
Sort on key column



















If you forget to sort you will get more rows and a lot of null values.
Forgot to sort













3) Pivot input columns
Select all the columns you need as (readonly) input columns. In this case select them all.
Input columns




















4) Pivot Usage
Now the hard part. You need to determine the so called pivot usage for each column. There are 4 values (0 to 3):
means that the column will pass through unaffected
1 means that the column becomes the key of the pivot (aka the Set Key)
2 means that the column values become the column names of the pivot (aka the Pivot Column)
3 means that the column values are pivoted in the pivot

In our case:
Day gets pivot usage 1 because it will be the new key
Period gets pivot usage 0 because it will be passed through unaffected
Product gets pivot usage 2, because its values will be the new column names
Volume gets pivot usage 3, because its values will be used for the new columns
Pivot Usage (notice the LineageID for the next step)





















5) Output columns
Now we have to create new output columns. For the first column we will keep the same name: Day.
Create a new column named Day and in the SourceColumn property you need to fill in the LineageID from the source column Day. See picture of previous step. In my case it's 100, but yours could be different! Name and SourceColumn are the only things you need to fill in.
New column





















Do the same for the Period column: Give it the name Period and lookup the LineageID for the SourceColumn.

For the next three columns there are more things to fill in. First create three new columns and give them suitable names. I used the same names as the values in the excel sheet, but you can change that. Now the tricky part. The PivotKeyValue has to be the exact value of the various row values in the product column (Mountainbike, Tricycle, Recumbentbike). The SourceColumn refers to the column where the new value comes from. So in our case it will be the LineageID from the Volume column.
New columns





















6) The result
Now you can add a destination and test the result. I used data viewers to show you the result.
The result
























Download example package (2008 R2)
Download example file (CSV)