Greetings all. My name is Jeff “The Prince” Holloway, and I’ve been with FNC for 10 years now, primarily as a DBA in the Systems Operations group. My area of focus has been the Extract/Transform/Load (ETL) process. Beginning with Data Transformation Services (DTS) back in the SQL 2000 days to SQL Server Integration Services (SSIS) today, I’ve helped develop and support packages to bulk create test data, populate data marts, perform administrative tasks, and other things. So today’s topic will be ETL related.
If you’re a SQL Server DBA or Developer, you’re likely familiar with SSIS, a powerful ETL tool that does many things right and a few things wrong. Think of it like that Swiss Army knife you carry, with the tiny little scissors that don’t work quite right. Want to populate your data warehouse with transactional data? Done. Want to schedule a data dump to output to a nightly flat file? Go for it. Need to perform file system operations to go along with that? Sure, it can do that too! But want to modify any of your referenced table schema after you’ve already created a package? Better start praying, doing a rain dance, and hoping it’s Tuesday, and even then you might be in for a bad time.
A few years ago, a coworker turned me on to the concept of dynamic SSIS design. The idea is, of course, to make the process as dynamic as possible while making updates to the process (such as configuration changes) as painless as possible. This is far from a new concept; after all, things like .INI configuration files for applications have been around a long time. There are multiple ways to externally pass inputs into SSIS, but our focus today will be on the “table driven” approach.
Let’s create a scenario: Pretend that you work for Magic Corp. as an ETL Developer, and you’ve been tasked with updating the data warehouse to include a few new columns. Management has demanded that we now track the height and weight over time of all unicorns in the paddock. After all, we have to make sure that they’re well fed to maximize their rainbow-producing capabilities. These columns have already been added to the source database and the warehouse, so you just have to add them to your SSIS package to get them populated. Then this happens:
Even after modifying the query in the data flow, sometimes that’s not enough because of the way SSIS stores metadata behind the scenes. Sometimes refreshing the metadata works, and sometimes not. Sometimes you can add the metadata manually in the Advanced Editor, and sometimes not. Sometimes you have to drop and recreate the entire data flow manually. Then it’s all well and good… until management demands that we track unicorn horn length in the database, and we get to do this all over again.
Luckily, I was introduced to a nice little third-party plugin that makes life a lot easier. The Cozyroc SSIS+ Library (www.cozyroc.com) contains lots of useful little nuggets, but the Data Flow Plus task is the real lifesaver. The Data Flow Task Plus component can take a query passed in through an SSIS variable and generate metadata AT RUNTIME. If you’ve used SSIS for any period of time, you understand how powerful this can be. Let’s say that we store the query for the data flow inside a configuration table in the database. After adding the schema, we can then modify the data flow WITHOUT ever opening the package again. Change the query to this, and away you go:
SELECT [UnicornID], [Name], [RainbowsProduced], [Height], [Weight]
Then when upper management needs that horn length info, one quick update statement to the value below will get you rolling:
SELECT [UnicornID], [Name], [RainbowsProduced], [Height], [Weight], [HornLength]
And that’s it. The Data Flow Plus task will automatically detect the data types and length and create your metadata for you. It’ll even create the mappings for you (caveat: the column names have to match from source to destination, so you may have to alias them in your query). Are there other ways to do this? Probably. No doubt there are many people out there with skillsets far above mine that have created their own dynamic solutions. But for me, the Cozyroc tools provided me with an easy to use solution that has saved countless man-hours of development time across the organization. Check it out if you’re so inclined.