I have to download my source file from a website. Can SSIS do that for me?
Solution
There are a few options:
1) Third party tools such as CozyRoc or BlueSSIS.
2) Script Task with WebClient
3) Script Task with HTTP Connection Manager
I will show you the second and third option. First the WebClient solution.
The WebClient solution:
1) Start situation
I have a Data Flow that reads a CSV file (Products.csv), does some adjustments and inserts the records into a database table. The CSV file is on my harddrive, but I want a fresh download each time I run the package.
Start situation |
2) Variables
Create a SSIS string variable named DownloadURL and fill it with a correct value.
Variable with the download URL |
3) Script Task
Add a Script Task to your Control Flow and give it a suitable name. Connect it to your Data Flow Task.
The Script Task |
4) ReadOnly Variable
Add the DownloadURL variable as ReadOnly to the Script Task.
ReadOnly Variable |
5) The Script
Add the following C#/VB.net Script to your Script Task. It downloads the file and uses the Flat File Connectionstring to save the file.
// C# code using System; using System.Data; using Microsoft.SqlServer.Dts.Runtime; using System.Windows.Forms; using System.Net; // Added namespace ST_929b5207bd37455a882c35e500ab2950.csproj { [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")] public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase { #region VSTA generated code enum ScriptResults { Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success, Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure }; #endregion public void Main() { try { // Logging start of download bool fireAgain = true; Dts.Events.FireInformation(0, "Download File", "Start downloading " + Dts.Variables["DownloadURL"].Value.ToString(), string.Empty, 0, ref fireAgain); // Create a webclient to download a file WebClient mySSISWebClient = new WebClient(); // Download file and use the Flat File Connectionstring (D:\SourceFiles\Products.csv) // to save the file (and replace the existing file) mySSISWebClient.DownloadFile(Dts.Variables["DownloadURL"].Value.ToString(), Dts.Connections["myProductFile"].ConnectionString); // Logging end of download Dts.Events.FireInformation(0, "Download File", "Finished downloading " + Dts.Connections["myProductFile"].ConnectionString, string.Empty, 0, ref fireAgain); // Quit Script Task succesful Dts.TaskResult = (int)ScriptResults.Success; } catch (Exception ex) { // Logging why download failed Dts.Events.FireError(0, "Download File", "Download failed: " + ex.Message, string.Empty, 0); // Quit Script Task unsuccesful Dts.TaskResult = (int)ScriptResults.Failure; } } } }
' VB.net Code Imports System Imports System.Data Imports System.Math Imports Microsoft.SqlServer.Dts.Runtime Imports System.Net ' Added <System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="")> _ <System.CLSCompliantAttribute(False)> _ Partial Public Class ScriptMain Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase Enum ScriptResults Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure End Enum Public Sub Main() Try ' Logging start of download Dim fireAgain As Boolean = True Dts.Events.FireInformation(0, "Download File", "Start downloading " + Dts.Variables("DownloadURL").Value.ToString(), String.Empty, 0, fireAgain) ' Create a webclient to download a file Dim mySSISWebClient As WebClient = New WebClient() ' Download file and use the Flat File Connectionstring (D:\SourceFiles\Products.csv) ' to save the file (and replace the existing file) mySSISWebClient.DownloadFile(Dts.Variables("DownloadURL").Value.ToString(), Dts.Connections("myProductFile").ConnectionString) ' Logging end of download Dts.Events.FireInformation(0, "Download File", "Finished downloading " + Dts.Connections("myProductFile").ConnectionString, String.Empty, 0, fireAgain) ' Quit Script Task succesful Dts.TaskResult = ScriptResults.Success Catch ex As Exception ' Logging why download failed Dts.Events.FireError(0, "Download File", "Download failed: " + ex.Message, String.Empty, 0) ' Quit Script Task unsuccesful Dts.TaskResult = ScriptResults.Failure End Try End Sub End ClassNote: This is a script with some basic eventlogging and error handling. See this article for more details.
6) Testing
Now run your package to test the result. You can use this CSV file to test it:
https://sites.google.com/site/ssisblogspot/products.csv
Note: See MSDN WebClient page for other properties of the WebClient Class such as Credentials and Proxy. If you don't want to add Credentials and Proxy settings in your Script Task the next solution might be better.
The HTTP Connection Manager solution:
This solution uses the HTTP Connection Manager that is standard available in SSIS. You can configure all Credentials and Proxy settings in this connection manager itself which makes is a little more easier to configure. You still have to do the download part with a Script Task.
1) Start situation
I have a Data Flow that reads a CSV file (Products.csv), does some adjustments and inserts the records into a database table. The CSV file is on my harddrive, but I want a fresh download each time I run the package.
Start situation |
2) HTTP Connection manager
You don't need the URL variable for this solution. Instead add a new connection of the type HTTP.
Right Click in the Connection Manager window and select.. |
3) Configure HTTP Connection manager
Add the download URL in the Server URL box and optional set all Credentials and Proxy settings if you need them for your internet connection. I'm directly connected to the internet so I don't have to change any of the settings here.
HTTP Connection Manager |
4) Script Task
Add a Script Task to your Control Flow and give it a suitable name. Connect it to your Data Flow Task.
The Script Task |
5) The Script
Add the following C#/VN.Net Script to your Script Task. It downloads the file and uses the Flat File Connectionstring to save the file.
// C# code using System; using System.Data; using Microsoft.SqlServer.Dts.Runtime; using System.Windows.Forms; namespace ST_6799d08685cb4ad78633d035fab12178.csproj { [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")] public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase { #region VSTA generated code enum ScriptResults { Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success, Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure }; #endregion public void Main() { try { // Logging start of download bool fireAgain = true; Dts.Events.FireInformation(0, "Download File", "Start downloading " + Dts.Connections["HTTP Connection Manager"].ConnectionString, string.Empty, 0, ref fireAgain); // Get your newly added HTTP Connection Manager Object mySSISConnection = Dts.Connections["HTTP Connection Manager"].AcquireConnection(null); // Create a new connection HttpClientConnection myConnection = new HttpClientConnection(mySSISConnection); // Download file and use the Flat File Connectionstring (D:\SourceFiles\Products.csv) // to save the file (and replace the existing file) myConnection.DownloadFile(Dts.Connections["myProductFile"].ConnectionString, true); // Logging end of download Dts.Events.FireInformation(0, "Download File", "Finished downloading " + Dts.Connections["myProductFile"].ConnectionString, string.Empty, 0, ref fireAgain); // Quit Script Task succesful Dts.TaskResult = (int)ScriptResults.Success; } catch (Exception ex) { // Logging why download failed Dts.Events.FireError(0, "Download File", "Download failed: " + ex.Message, string.Empty, 0); // Quit Script Task unsuccesful Dts.TaskResult = (int)ScriptResults.Failure; } } } }
' VB.Net code Imports System Imports System.Data Imports System.Math Imports Microsoft.SqlServer.Dts.Runtime <System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="")> _ <System.CLSCompliantAttribute(False)> _ Partial Public Class ScriptMain Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase Enum ScriptResults Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure End Enum Public Sub Main() Try ' Logging start of download Dim fireAgain As Boolean = True Dts.Events.FireInformation(0, "Download File", "Start downloading " + Dts.Connections("HTTP Connection Manager").ConnectionString, String.Empty, 0, fireAgain) ' Get your newly added HTTP Connection Manager Dim mySSISConnection As Object = Dts.Connections("HTTP Connection Manager").AcquireConnection(Nothing) ' Create a new connection Dim myConnection As HttpClientConnection = New HttpClientConnection(mySSISConnection) ' Download file and use the Flat File Connectionstring (D:\SourceFiles\Products.csv) ' to save the file (and replace the existing file) myConnection.DownloadFile(Dts.Connections("myProductFile").ConnectionString, True) ' Logging end of download Dts.Events.FireInformation(0, "Download File", "Finished downloading " + Dts.Connections("myProductFile").ConnectionString, String.Empty, 0, fireAgain) ' Quit Script Task succesful Dts.TaskResult = ScriptResults.Success Catch ex As Exception ' Logging why download failed Dts.Events.FireError(0, "Download File", "Download failed: " + ex.Message, String.Empty, 0) ' Quit Script Task unsuccesful Dts.TaskResult = ScriptResults.Failure End Try End Sub End ClassNote: This is a script with some basic eventlogging and error handling. See this article for more details.
6) Testing
Now run your package to test the result. You can use this CSV file to test it:
https://sites.google.com/site/ssisblogspot/products.csv
Conclusion
Both methods (WebClient and HTTP Connection Manager) will have the same result, but the HTTP Connection Manager is probably a little easier to configure.
NOTE: This script example cannot do website logins or other manual actions!
Hello there,
ReplyDeletei found this article very interesting.
Qtn: using the "Http Connection Manager"
How would one change the "ServerUrl" dynamically in order to download several files in one go(with a loop)?
@Anonymous: You can use an expression on the connection manager. The URL can be found in the ConnectionString property. Let me know if there are any problems.
ReplyDeleteHi Joost, I am trying to update the URL using the expression manager and cycle through several files, but it keeps simply updating the same file... it seems it doesn't actually update the URL dynamically.... can you help
DeleteInstead of using Dts.Connections("myProductFile").ConnectionString
Deletetry using Dts.Connections("myProductFile").AcquireConnection(Dts.Transaction).ToString()
This will allow the use of expressions on the connection manager.
i am actually trying to change the URL like this: in a foreachloop, i want to do something like this:
Deletehttps://portal.whatever.com/XXX/account.csv,
then change to the next file download:
https://portal.whatever.com/XXX/bankAccount.csv
and then the next file
https://portal.whatever.com/XXX/CustomerAccount.csv
so in essence it is continuously changing this part:
Dim mySSISConnection As Object = Dts.Connections("HTTP Connection Manager").AcquireConnection(Nothing)
Do I make any sense?
Ok are you using the second script? If so then you need to add an expression on the HTTP connection manager to replace the hardcoded url with the variable from the foreach loop.
DeleteBut in your case I would use the first script that uses a variable instead of a connection manager (since you already need a variable in the foreach loop).
But you also need to do something with the destination, else it will always use the local file.
the expressions do not seem to overwrite the http url ... hence my frustration. I am now trying the first script, but now i need to pass the credentials. Will you give me some guidance / example, please?
DeleteHi, I did it! Thanks for your script! saved me doing 100 scripts for 100 files!
DeleteGreat article. Great script. Well documented. Just what I was looking for. You helped me a lot. Thanks!
ReplyDeleteGreat article! Two questions:
ReplyDelete1. Could this be made to work in 2005 (by using VB instead of C#)?
2. How could this be used to pass a parameter (really a POST value) to the URL?
Thanks!
@Anonymous: should be no problem. The HTTP Connection Manager already existed in 2005. I added a VB.Net version for SSIS 2008, but that should work in 2005 as well (maybe with some minor changes)
ReplyDeleteHi Jost,
ReplyDeletetop article, i m beginning to create soemthing similar to this article . just wondering is it possible to post value to the url ?
many thanks
@Anonymous: Could you contact me via the webform to explain yourself?
ReplyDeleteI am trying to Download .zip file from web using HTTP connection manager.Howevere when I try to open the donloaded file I get 'The Compressed (Zipped) Folder is invalid' error . Did anyone experience this?
ReplyDeleteIs it working when you manually download the file (without logging in or other actions)? Is the filesize as expected? What do you see when you open it in notepad? All weird chars or html/text?
DeleteYes when I manually download the file from web the zip file downloads fine and I can open the content of it and file size is as expected.
DeleteIt is only when I use script to download the zip file I am getting the error
So the size when you download manually is the same when you download it by Script Task? Let me know what you see when you open the zip (that's downloaded by the Script Task) what you see: All weird chars or html/text?
DeleteIs the zipfile on a public/internet server so I can test it myself?
Hi Joost,
DeleteAfter downloading a .ZIP file successfully, I'm getting a data conversion error on the second column with a return status value 4, 'Text was truncated or one or more characters had no match...'. when attempting to load it to a DB table. Does the .ZIP file need to be extracted before attempting to load it? If so, how would that be accomplished?
You need to unzip it first before you can load the data into a Data Flow Task. Check out the SSIS Addon menu item. There you will find an unzip task.
DeleteFirst your article is great, but I have one question. I am using the Http version of the above explanation. When the Package runs I get no errors, and the file is populated with "Virtual User is Logged in" the original file actually has data. The file is in a folder on the site called Output. When I set up the file connection I placed the file in a folder called Output locally and it has the same name as the file I wish to download. Any idea or suggestion as to how I can have it get the actual file. additionally, I am forced to run in SQL 2005 at the moment. Thanks
ReplyDeleteIs the document available without user login in and without doing any manual actions?
DeleteThanks for this very valuable article. How about including codes to access website with username and password? Would greatly appreciate it.
ReplyDeleteThere are too many different login methods to describe them all, but aunt Google can help you: https://www.google.com/search?q=c%23+login+website
DeleteThanks for the great article! I got this implemented and working for retrieving zip files, no problem.
ReplyDeleteHi, I assume my question is quite stupid but it's the first time I try using scripting in SSIS. When I'm trying to run the above script I get the following error
ReplyDeleteThe type or namespace name 'AddIn' does not exist in the namespace 'System' (are you missing an assembly reference?)
How do I add that namespace to system?
Thanks.
You're probably using SSIS 2005. This script was created with 2008. Just copy the extra imports-rows and copy the content of my main method to your main method, but keep the row where the task result is set and replace my rows (Dts.TaskResult = ScriptResults.Failure / Dts.TaskResult = ScriptResults.Success) with the one from 2005
DeleteI used to have the same "Addin" issue. It had to do with me just pasting in all the code into the Script editor. When I trimmed the code to not paste in the namespaces, and use what was generated when I opened the script editor, it worked fine.
DeleteI will add that my auto-generated namespace from SSIS /VS 2010 did not have "Addin" in the text. It had:
[Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
Hey Joost, this is great, but doesn't work for me as I'm trying to access a page that requires a log-in and gives back a cookie. Do you have any suggestions for C# code to log-in to this site? BTW, the site is Mint.com. Trying to download all my personal financial data, so my SSIS package can process all of it automatically.
DeleteCheers, John
Hi jpk,
DeleteAs noted in the bold note at the bottom, this script example cannot login for you. There are just to many ways for logging in to a website. Best way is to add a question in a C# forum.
Regards,
Joost
Hi, great article! I try to use it for excel files, but I can't get the myConnection.DownloadFile(Dts.Connections["myProductFile"].ConnectionString, true); right, because the connectionstring for a excel source is not only the file path, and I can't use ExcelFilePath. Any ide?
ReplyDeleteYou only want the part of the connectiontring after 'Source=' until the next semicolon (;). You could do that with a substring. Something like:
DeletemyConnectionString = Dts.Connections["myExcelFile"].ConnectionString.Substring(Dts.Connections["myExcelFile"].ConnectionString.IndexOf("Source=") + 6);
myConnectionString = myConnectionString.Substring(1, myConnectionString.IndexOf(";") - 1);
or you could try to get the property, something like:
Dts.Connections["myExcelFile"].Properties["ExcelFilePath"].GetValue(Dts.Connections["myExcelFile"]).ToString();
great and helpful thanks for this..
ReplyDeletePlease let me know what modification can i do on the Vb.net or c# script
in order to download multiple XLS files to the system hard drive
If you have a list of URL's in a database table. Then you could loop that table with a foreach loop. Let the foreach loop fill the DownloadURL variable and then but the script task within the foreach loop.
DeleteHi, thanks for the guide.
ReplyDeleteI've a question.
Can I use the above method for each file in a path, without knowing the name of the files?
For example all the files csv: *.csv or *.txt or....
Thank you so much
No. And in most cases the webserver won't allow folder browsing. So if you don't now the name and folder browsing isn't enabled you can't download all files.
DeleteHi Joost,
ReplyDeleteThanks for the guide. It's excellent.
i could follow your instruction to download the files via the path you provided,but, when i tried my own one, i found sth strange...
i setup the http connection with "Use credential" (user name & password inputted). test connect success...
but when i use the script to download file, it prompted error sth like "credential error/ incorrect"...
Do u hv any ideas??
Thanks so much..
Alvin
The script only gets the connectionstring (not the username and password). You have to provide the username and password to the HttpClientConnection via some .Net code. See this example on MSDN:
Deletehttp://technet.microsoft.com/en-us/library/microsoft.sqlserver.dts.runtime.httpclientconnection.serverpassword.aspx
You can get other properties from the connection manager with something like:
Dts.Connections["myHTTPconnection"].Properties["ServerUserName"].GetValue(Dts.Connections["myHTTPconnection"]).ToString();
But it would probably not work for sensitive properties like passwords. The workaround is to store the password in an SSIS variable. Here is an example for SSIS variables in a Script Task: http://microsoft-ssis.blogspot.com/2010/12/how-to-use-variables-in-script-task.html
Hi Joost,
Deletei hv already setup the username & password in connection manager, do i still need to provide the name and password to HttpClientConnection?
Dim mySSISConnection As Object = Dts.Connections("HTTP Connection Master").AcquireConnection(Nothing)
Dim myConnectionHttp As HttpClientConnection = New HttpClientConnection(mySSISConnection)
try to messagebox properties like the username to see if they are available in the Script Task.
DeleteMessageBox.Show(myConnectionHttp.ServerUserName)
I found your article very useful and was able to download the ZIP file programmatically. However, the download URL for the file is changed each month. The base path and filename are the same each month, but there is an identifier in the center of the path that changes each month. I have tried using wildcards in an expression. The download succeeds but the file seems to be corrupt. Any help would be appreciated
ReplyDeleteWildcards will not work on a website. If the identifier is an integer that is one higher each month then you could store the ID in a database. If it's a random integer of guid then you're probably out of luck.
DeleteI fear it is as you say "a random integer of guid". Below is this month's URL and the part after "files/" and the "?" is what is changing. The rest is the same each month:
Deletehttps://data.medicare.gov/views/bg9k-emty/files/CJP62BvKCE7mEG9ufmZCah9VMIm3bbgNVx_07wSgpbs?content_type=application%2Fzip%3B%20charset%3Dbinary&filename=DMG_CSV_DOWNLOAD20150801.zip
Sorry "random integer or guid"...
DeleteUrl seems to be random... Isn't there an index page where you can find that URL? Then you could read that page to find the correct download url...but probably not an easy solution.
This is the site that holds the button with the embedded URL in the prior post, "https://data.medicare.gov/data/nursing-home-compare". I do not know how to grab that dynamic path from the home page. I have searched google for vb script to do it, but no luck so far.
ReplyDeleteYou could just download this file (same way as above) then read it as a FLAT FILE with one big column. Add a Conditional Split to filter on that file name... after that you have one row left. With a Derived Column you could strip all the extras so that you only have an URL left.
DeleteNow you can either store this URL in a database and then get it out again with an Execute SQL Task to store the URL in a variable, or use a Script Component to store it in a variable. Does that sounds workable?
Yep! As soon as I read your comment, I got the first part knocked out. Now working on the find and substring as I only really need to grab the GUID portion of the string and store it in a variable. Thank You SO MUCH!
DeleteOk let me know if you got it working or if you need any help. In the PostExecute method of the Script Component you can fill a readwrite variable. Something like:
DeletePrivate URLPart As string = ""
' When all rows are processed, set variable
Public Overrides Sub PostExecute()
Me.Variables.URLPartVariable = URLPart
End Sub
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
URLPart = Row.URL
End Sub
@Joost Hello. I'm having the exact same problem as @goingtoseed.
ReplyDeleteBut my package crashes after the first iteration of the for each loop. I posted a question about it here. any help would be greatly appreciated.
http://stackoverflow.com/questions/33630358/dynamic-sourceurl-in-http-connection-manger-with-for-each-loop?noredirect=1#comment55037913_33630358
Have you tried debugging to find the actual line that is failing? I would start commenting out as much code as possible and slightly add code if it is succesful. Start by only messageboxing/showing the URL from the connection manager.
Delete@Joost : I am downloading files in iteration based on the 'LastWriteTimeUtc'. The problem now I am facing is the files which are downloaded are only 2057 KB, though they are bigger in size. But, All the files are getting downloaded. Something to deal with buffer? Any quick help will be more helpful.
ReplyDeleteNote: I am using the second method(HTTP). and the files are of .xlsx type
Hi everyone,
ReplyDeleteGreat Article. I am other side of the road. I am looking to UPLOAD the file. I am using HTTP Connection Manager for URL and Credential. Please any advise would be great appreciate.
Thank You
http connection manager throws error Error: 0xC0016003 at Package, Connection manager "HTTP Connection Manager": Connection to server
ReplyDeleteThe test connection successful. Flat file connection exist.
Hi Joost,
ReplyDeletethe downloaded zip file is invalid when i am downloading with script task.
What can be done in this scenarios to get the correct zip file which is not invalid??
Tried this for CSV file download from URL and working well
ReplyDeletehttps://docs.microsoft.com/en-us/archive/blogs/benjones/using-a-c-script-task-in-ssis-to-download-a-file-over-http
I have a requirement to post a CSV file to a URL how can this be handled in SSIS using the scripts. The size of csv is around 50MB.
ReplyDeleteI dump the data from DB to a flat file CSV format and need to post the csv file to URL.
Thanks in advance