Showing posts with label BEST PRACTICES. Show all posts
Showing posts with label BEST PRACTICES. Show all posts

Thursday, 1 December 2016

SSIS Naming conventions

In 2006 Jamie Thomson came up with naming conventions for SSIS tasks and data flow components. These naming conventions make your packages and logs more readable. Five SQL Server versions and a decade later a couple of tasks and components were deprecated, but there were also a lot of new tasks and components introduced by Microsoft.

Together with Koen Verbeeck (B|T) and AndrĂ© Kamman (B|T) we extended the existing list with almost 40 tasks/components and created a PowerShell Script that should make it easier to check/force the naming conventions. This PowerShell script will soon be published at GitHub as a PowerShell module. But for now you can download and test the fully working proof of concept script. Download both ps1 files and the CSV file. Then open "naming conventions v4.ps1" and change the parameters before executing it. The script works with local packages because you can't read individual package from the catalog, but you can use a powershell script to download your packages from the catalog.
PowerShell Naming Conventions Checker
























Task name Prefix Type New
For Loop Container FLC Container
Foreach Loop Container FELC Container
Sequence Container SEQC Container
ActiveX Script AXS Task
Analysis Services Execute DDL Task ASE Task
Analysis Services Processing Task ASP Task
Azure Blob Download Task ADT Task *
Azure Blob Upload Task AUT Task *
Azure HDInsight Create Cluster Task ACCT Task *
Azure HDInsight Delete Cluster Task ACDT Task *
Azure HDInsight Hive Task AHT Task *
Azure HDInsight Pig Task APT Task *
Back Up Database Task BACKUP Task *
Bulk Insert Task BLK Task
CDC Control Task CDC Task *
Check Database Integrity Task CHECKDB Task *
Data Flow Task DFT Task
Data Mining Query Task DMQ Task
Data Profiling Task DPT Task *
Execute Package Task EPT Task
Execute Process Task EPR Task
Execute SQL Server Agent Job Task AGENT Task *
Execute SQL Task SQL Task
Execute T-SQL Statement Task TSQL Task *
Expression Task EXPR Task
File System Task FSYS Task
FTP Task FTP Task
Hadoop File System Task HFSYS Task *
Hadoop Hive Task HIVE Task *
Hadoop Pig Task PIG Task *
History Cleanup Task HISTCT Task *
Maintenance Cleanup Task MAINCT Task *
Message Queue Task MSMQ Task
Notify Operator Task NOT Task *
Rebuild Index Task REBIT Task *
Reorganize Index Task REOIT Task *
Script Task SCR Task
Send Mail Task SMT Task
Shrink Database Task SHRINKDB Task *
Transfer Database Task TDB Task
Transfer Error Messages Task TEM Task
Transfer Jobs Task TJT Task
Transfer Logins Task TLT Task
Transfer Master Stored Procedures Task TSP Task
Transfer SQL Server Objects Task TSO Task
Update Statistics Task STAT Task *
Web Service Task WST Task
WMI Data Reader Task WMID Task
WMI Event Watcher Task WMIE Task
XML Task XML Task
Transformation name Prefix Type New
ADO NET Source ADO_SRC Source *
Azure Blob Source AB_SRC Source *
CDC Source CDC_SRC Source *
DataReader Source DR_SRC Source
Excel Source EX_SRC Source
Flat File Source FF_SRC Source
HDFS File Source HDFS_SRC Source *
OData Source ODATA_SRC Source *
ODBC Source ODBC_SRC Source *
OLE DB Source OLE_SRC Source
Raw File Source RF_SRC Source
SharePoint List Source SPL_SRC Source
XML Source XML_SRC Source
Aggregate AGG Transformation
Audit AUD Transformation
Balanced Data Distributor BDD Transformation *
Cache Transform CCH Transformation *
CDC Splitter CDCS Transformation *
Character Map CHM Transformation
Conditional Split CSPL Transformation
Copy Column CPYC Transformation
Data Conversion DCNV Transformation
Data Mining Query DMQ Transformation
Derived Column DER Transformation
DQS Cleansing DQSC Transformation *
Export Column EXPC Transformation
Fuzzy Grouping FZG Transformation
Fuzzy Lookup FZL Transformation
Import Column IMPC Transformation
Lookup LKP Transformation
Merge MRG Transformation
Merge Join MRGJ Transformation
Multicast MLT Transformation
OLE DB Command CMD Transformation
Percentage Sampling PSMP Transformation
Pivot PVT Transformation
Row Count CNT Transformation
Row Sampling RSMP Transformation
Script Component SCR Transformation
Slowly Changing Dimension SCD Transformation
Sort SRT Transformation
Term Extraction TEX Transformation
Term Lookup TEL Transformation
Union All ALL Transformation
Unpivot UPVT Transformation
ADO NET Destination ADO_DST Destination *
Azure Blob Destination AB_DST Destination *
Data Mining Model Training DMMT_DST Destination
Data Streaming Destination DS_DST Destination *
DataReaderDest DR_DST Destination
Dimension Processing DP_DST Destination
Excel Destination EX_DST Destination
Flat File Destination FF_DST Destination
HDFS File Destination HDFS_DST Destination *
ODBC Destination ODBC_DST Destination *
OLE DB Destination OLE_DST Destination
Partition Processing PP_DST Destination
Raw File Destination RF_DST Destination
Recordset Destination RS_DST Destination
SharePoint List Destination SPL_DST Destination
SQL Server Compact Destination SSC_DST Destination *
SQL Server Destination SS_DST Destination


Example of the prefixes

Sunday, 5 October 2014

SQL Saturday #336 Holland - Powerpointslides


Had a nice day at SQL Saturday #336 in Utrecht! The PowerPoint slides of my SSIS Development Best Practices session are available for download. I added some screens, text and URL's for additional information (see notes in PowerPoint)

Saturday, 1 March 2014

SSIS 2012 with Team Foundation Server - Part II

Case
I have installed Team Explorer and setup Visual Studio to use it. What's next?

Solution
In Part I you read:
A) Install Team Explorer for Visual Studio 2010
B) Install Team Explorer for Visual Studio 2012
C) Setup Visual Studio to use TFS

This second part covers:
D) Adjusting development process



D) Adjusting development process
Because you can now work with multiple developers on the same project, you have to make some arrangements with your fellow developers, like:

1) Get latest version project
Get the latest version of the project on a regular basis. Otherwise you will miss new packages, project connection managers and project parameters. Do this for example each morning or before you start developing. There is also an option in Visual Studio to automatically get the latest version of the solution when opening it.
Get everything when a solution or project is opened.

















2) Get latest version package
Get the latest version of a package before editing it. There is also an option in Visual Studio to automatically get the latest version of a package when checking it out.
Get latest version of item on check out.

















3) Adding new package to project
When you add a new package to the project, the project self will be checked out. First first rename the new package, save it and then check in the project and the new (empty/clean) package. Otherwise your fellow developers cannot change project properties or add new packages.
Adding new package will check out the project























4) Disable multiple check out
Working together on the same file at the same time is nearly impossible, because it's hard to merge the XML of two versions of a package. Therefore you should disable multiple check out in TFS or check out your package exclusively (not the default in TFS).
In Team-menu click Team Project Settings, Source Control

Uncheck the multiple checkout box






































5) Don't check in faulty packages
Try not to check in package that doesn't work. Especially when you work with the project deployment model, with which you can only deploy the complete project.
Don't check in faulty packages



















6) No large/complex packages
Don’t make packages to large/complex. Divide the functionality over multiple smaller packages, because you can’t work with multiple developers on the same large package at the same time.

7) Sensitive data
The default Package Protection Level is EncryptSensitiveWithUserKey. This will encrypt passwords and other sensitive data in the package with the username of the developer. Because your colleagues will probably have different usernames they can't edit or execute packages that you made without re-entering all sensitive package data.
The easiest way to overcome this, is to use DontSaveSensitive as Package Protection Level in combination with Package Configurations. Then all the sensitive data will be stored in the configuration table or file and when you open the package all this data will be retrieved from the configuration table or file.
If you're using the Project Deployment Model in combination with sensitive parameters instead of Package Configuration, then the easiest workaround is to use EncryptAllWithPassword or EncryptSensitiveWithPassword with a password that is known within the developmentteam.

8) Development standards
When you're developing with multiple people (or someone else is going to maintain your work) then it's good to have some Development Best Practices like using prefixes for tasks and transformations or using templates. This makes it easier to transfer work and to collaborate as a team.

9) Comments
When you check in a package, it's very useful to add a meaningful description of the change. This makes it easier to track history.
Check in comments

















10) Branching, Labeling and building
Beside versioning and checking in/out packages there are more interesting functions in TFS that are probably more common in C# and VB.Net programming, but worth checking out. Here are some interesting links about TFS and SSIS:

 

 



Friday, 24 December 2010

Development Best Practices

Case
As an external employee I see a lot of SSIS packages at various companies made by a whole bunch of different people. Unfortunately some of those people made Quick & Dirty as a motto in life resulting in hard to read packages. And that's a waste of time for the companies.

Solution
Companies should require both well performing and well documented packages. Here is a list of some basic development Best Practices to achieve clear and manageable packages.


1) No default names and descriptions
Rename all default component names and give them explaining descriptions. This will help other developers that edit your packages. It is also very useful when debugging.
No default names and descriptions


















2) Annotations
Use annotations. This is very useful if the Control Flow or Data Flow isn't self describing (for others).
Use annotations
















3 Group logical work
Use Sequence containers to organize package structures into logical units of work. This makes it easier to identify what the package does. It also helps to control transactions if they are being implemented. * Update: SSIS 2012 has a grouping feature *
Use Sequence Containers

















4 Flow directions
Flows should basically go top-down. This will make your packages more readable.
Design your package Top down















You can use the Auto-format option from SSIS to format your packages
Auto Layout is a good start













5) Disabled Control Flow tasks
Do not use disabled Control Flow tasks in the Quality assurance or Production environment. If you want to conditionally execute a task at runtime use expressions on your precedence constraints. Do not use an expression on the “Disable” property of the task.
Disabled Control Flow Task



















6) Spread large number of packages over serveral Visual Studio Project
You can add more than one projects to your Visual Studio Solution to spread large number of packages. Think about a proper layout. For example a datastaging project and a datawarehouse project.


7) Queries in source and look up components
Don't use too complex queries. Use a readable lay-out and add comments to explain parts of the query. For example:
-- This query does something 
SELECT    a.field1
,         a.field2
,         b.field3
,         b.field4
FROM      table1 as a
LEFT JOIN table2 as b
          on a.field5 = b.field6
WHERE     a.field2 = 'x' -- Comment about x
ORDER BY  a.field1

8) Script Coding Conventions
Use condings conventions when scripting a script task or component. C# and VB.Net both have their own conventions which are widely available on the net.

9) Use naming conventions
Give tasks and transformations a prefix. This makes it easier to read the logging.

10) Use templates
You can create templates for SSIS. Things like logging, configurations and connection managers can be added to these templates.

Let me known if you have items that should be in the list of Development Best Practices!

Wednesday, 22 December 2010

Performance Best Practices

Case
A client of mine had some performance issues with couple of SSIS packages and because they lack basic SSIS knowledge, they just upgraded there server with more memory. Finally, after 32GB of memory, they stopped upgrading and start reviewing there packages.

Solution
There are a lot of blogs about SSIS Best Practices (for instance: SSIS junkie). Here is the top 10 of the easy to implement but very effective ones I showed them to 'upgrade' their packages instead of the memory.

1) Unnecessary columns
Select only the columns that you need in the pipeline to reduce buffer size and reduce OnWarning events at execution time. SSIS even helps you by showing the unnecessary ones in the Progress/Execution Results Tab: [DTS.Pipeline] Warning: The output column "Address1" (16161) on output "Output0" (16155) and component "CRM clients" (16139) is not subsequently used in the Data Flow task. Removing this unused output column can increase Data Flow task performance.
Unnecessary columns from a flat file




















2) Use queries instead of tables
Following on the unnecessary columns, always use a SQL statement in an OLE DB Source component or (Fuzzy) Lookup component rather than just selecting a table. Selecting a table is akin to "SELECT *..." which is universally recognised as bad practice.
OLE DB Source, use SQL Command instead of Table














Lookup, use SQL Command instead of Table














3) Use caching in your LOOKUP
Make sure that the result of your lookup is unique, otherwise SSIS cannot cache the query and executes it for each record passing the lookup component. SSIS will warn you for this in the Progress/Execution Results Tab: [Lookup Time Dimension [605]] Warning: The component "Lookup Time Dimension" (605) encountered duplicate reference key values when caching reference data. This error occurs in Full Cache mode only. Either remove the duplicate key values, or change the cache mode to PARTIAL or NO_CACHE.

Watch out that you are not grabbing too many resources in the lookup. A couple of million records is probably not a good idea. And new is SSIS 2008 is that you can reuse your lookup cache in an other lookup.
SSIS 2008: Cache



















4) Filter in source
Where possible filter your data in the Source Adapter rather than filter the data using a Conditional Split transform component. This will make your data flow perform quicker because the unnecessary records don't go through the pipeline.
Filter in OLE DB Source, filter data in source














5) Sort in source
A sort with SQL Server is faster than the sort in SSIS, partly because SSIS does the sort in memory. So it pays to move the sort to a source component (where possible). Note you have to set IsSorted=TRUE on the source adapter output, but setting this value does not perform a sort operation; it only indicates that the data it sorted. After that change the SortKeyPosition of all output columns that are sorted.
Advanced Editor for Source, sort data in source















6) Join in source
Where possible, join data in the Source Adapter rather than using the Merge Join component. SQL Server does it faster than SSIS. But watch out that you are not making to complex queries because that will worsen the readability.

Unnecessary Join and Sorts















7) Group in source
Where possible, aggregate your data in the Source Adapter rather than using the Aggregate component. SQL Server does it faster than SSIS.

Unnecessary Sorts, Join and Aggregate














8) Beware of Non-blocking, Semi-blocking and Fully-blocking components in general
The dataflow consists of three types of transformations: Non-blocking, Semi-blocking and Fully-blocking. And as the names suggests, use Semi-blocking and Fully-blocking components rarly to optimize your packages. Jorg Klein has written a interesting article about it with a list of which component is non-, semi- or fully blocking.

A summary of how to recognize these three types:

Non-blocking
Semi-blocking
Fully-blocking
Synchronous/asynchronous
Synchronous
Asynchronous
Asynchronous
Number of rows in equal to rows out
True
Usually False
Usually False
Collect all input before the can output
False
False
True
New buffer created?
False
True
True
New thread created?
False
Usually True
True
Find more information about (a)synchronous at Microsoft.


9) High Volumes of Data and indexes
Loading high volumes of data on a table with clustered and non-clustered indexes could take a lot of time.
The most important thing to verify is if all indexes are really used. SQL Server 2005 and 2008 provide information about index usage with to views: sys.dm_db_index_operational_stats and sys.dm_db_index_usage_stats. Drop all rarely used and unused indexes first. Experience teaches that there are often a lot of unnecessary indexes. If you are absolute sure that all remaining indexes are necessary you can drop all indexes before loading the data and to recreate them afterwards. The performance profit of that depends on the number of records. The higher the number of records the more profit you gain.
Drop and recreate indexes






















10) SQL Server Destination Adapter vs OLE DB Destination Adapter
If your target database is a local SQL server database, the SQL Server Destination Adapter will perform much better than the OLE DB Destination Adapter. However the SQL Server Destination Adapter works only on a local machine and via Windows security. You have to be absolute sure that your database stays local in the future otherwise you mapping will not work when moving the database.


Note: this is not a complete list, but just a top 10 of easy to implement but very effective ones. Tell me if you have items that should be in the top 10 of Performance Best Practices!

Note: Besides the Performance Best Practice there also is a Development Best Practice.
Related Posts Plugin for WordPress, Blogger...