Introduction

One week ago, I was asked by my current company to join a BI project. The whole project involved an ETL process, starting from an Extraction-and-Import stage up to the report creation. In the very first part of that project I needed to trim all columns of about 20 tables. Some of them actually had about 80 columns. You can imagine how frustrating it could be to do this job using only the “derived column” DataFlow component. I was supposed to add many columns and for each one to type the “TRIM” function name, specify the target columns and set “Replace column” by choosing the correct one. This process is long, boring and easy to fail because of typo, for instance). So, at first, I used a Script Task, which used reflection to navigate every string-property of the row passed as input and trim all the textual columns. The component was working very well, but I needed to edit the script task every time after copying-pasting it, because of SSIS validation mechanism. Another bigger problem was that I for each Script Component, every input column to trim must be set as “ReadWrite” mode. If I missed that, the component wouldn’t fire any error or warning, it would just do nothing. Basically, this is a potential point of failure for a SSIS designer.

I ended up to spend some of my (very rare) spare time to develop a custom DataFlow component which is able to trim all text values, leaving the other ones as they are.

In this article I’ll present how to develop a very basic custom SSIS DataFlow component and, more important, how to integrate it into Visual Studio. More precisely I’ll show how to design a PipelineComponent in a VisualStudio 2010 – MS SQL BISD environment. Since it took more time than I was expecting to understand everything about VS2010 and MSSQL 2012 integration, I will share what I’ve learned from this adventure.

Last edited Apr 25, 2014 at 2:11 PM by webking, version 2