Wednesday, 26 July 2017

Active Directory as Source

Case
I want to use the users from my Active Directory in my Data Warehouse for an Employee Dimension. How can I accomplish that in SSIS?














Solution
With a little .Net scripting in a Script Component you can accomplish an Active Directory source in your Data Flow Task. Before you start you need to figure out which fields are available in your Active Directory. If you don't know the fieldnames you could set a breakpoint on the foreach loop and add a watch on the result variable. Then you could browse through the properties of that variable to find all available fields.

1) Script Component Source
Create a Data Flow Task and add a Script Component and choose "Source" as the type. After that give you source a suitable name like "SRC_SCR - Active Directory Users".
Script Component - Source





















2) Output columns
Next edit the Script Component and go to the Inputs and Outputs pane. This is where we need to specify all the new output columns and datatypes. For my Active Directory example I have added 5 string (DT_STR) columns: Fullname, Firstname, Surename, Department and Manager. You have to figure out the required length. You could start with the default 50 or change it to a more save 255.
Add new output columns

























3) Edit script
Go back to the first page and choose C# as your scripting language and then hit the Edit Script buton to open the vsta environment.
Edit script

























4) Reference
To tell our script about Active Directory, we first need to add a new reference to the .NET assembly System.DirectoryServices. In the Solution Explorer right click References and
 choose Add Reference... Then scroll down, select System.DirectoryServices and click OK to confirm.
Add reference

















5) The Script - Namespaces
Under namespaces add a new using for our newly referenced assembly: using System.DirectoryServices; This allows you to shorten the code.
Add using



















6) The Script - CreateNewOutputRows
Now replace your CreateNewOutputRows method with the one below and add the GetPropertyValue method from below within your class tags under the existing methods. You could remove the PostExecute and PreExecute methods since we don't need them in this example.

The CreateNewOutputRows method adds new rows to the buffer by calling Output0Buffer.AddRow(); within a loop construction and then it fills all columns with data from the Active Directory. Before the foreach loop you find the code to connect to the Active Directory with some optional filter.
public override void CreateNewOutputRows()
{
    // Specify the connnectionstring of your domain
    // @mycompany.com => LDAP://DC=mycompany,dc=com
    // Consider using a variable or parameter instead
    // of this hardcoded value. On the other hand
    // how many times does your domain changes
    string domainConnectionString = "LDAP://DC=ilionx,dc=com";

    using (DirectorySearcher ds = new DirectorySearcher(new DirectoryEntry(domainConnectionString)))
    {
        ds.Filter = "(&" +
                    "(objectClass=user)" +  // Only users and not groups
                    "(department=*)" +      // All departments
                    "(givenname=j*)" +      // Firstname starts with j
                    ")";
        // See ds. for more options like PageSize.
        //ds.PageSize = 1000;

        // Find all persons matching your filter
        using (SearchResultCollection results = ds.FindAll())
        {
            // Loop through all rows of the search results
            foreach (SearchResult result in results)
            {
                // Add a new row to the buffer
                Output0Buffer.AddRow();
                // Fill all columns with the value from the Active Directory
                Output0Buffer.Fullname = GetPropertyValue(result, "cn");
                Output0Buffer.Surename = GetPropertyValue(result, "sn");
                Output0Buffer.Department = GetPropertyValue(result, "department");
                Output0Buffer.Manager = GetPropertyValue(result, "manager");
                Output0Buffer.Firstname = GetPropertyValue(result, "givenname");
            }
        }
    }
}

// Extra method to avoid having an if construction around each column
// It checks whether it can find a value. If not it returns an empty string.
private static string GetPropertyValue(SearchResult Results, string Property)
{
    // Null value results in count zero
    if (Results.Properties[Property].Count > 0)
    {
        return Results.Properties[Property][0].ToString();
    }
    else
    {
        return "";
    }
}


7) The result
To test the result add a dummy Derived Column and a Data Viewer to see the values. If it works you can add the rest of the components to the Data Flow Task to fill your dimension table.
The result of my script
















Summary
A very simple and short script to get data from your Active Directory. For more filter examples visit
MSDN. For very large Active Directories you have to play with the PageSize property.
Note that there are also scriptless options like with an ADO.Net source or with Third Party components. Each solution has its own pros and cons.