Background
The customer utilized a vendor’s Commercial Off The Shelf (COTS) application, which physically partitions research data by creating a new set of 30 tables for each new study. In light of the fact that this organization currently maintains data for over 300 studies, the overall database has grown to more than 10,000 tables.
Problem
We needed to develop a generic Extraction, Transformation and Loading (ETL) solution that would migrate data from the production application to a newly designed data repository. The major challenge was that the number of source tables was dynamic – every time a new study was defined, we had an additional 30 tables to migrate.
Solution
Since both the source and target databases were Oracle, we selected PL/SQL as the primary language for developing the ETL solution. We conducted a high level analysis of ETL tools, and found that very few could support the dynamic source table requirement. Additionally, we determined that our project timeline was too aggressive to introduce an ETL technology without delaying the project.
Instead, we utilized dynamic SQL to populate a table which stored the list of source tables for each ETL module. Then, each respective ETL module could query this table to identify which source tables to retrieve data from. We then generated SQL to migrate the data, because SQL allows the movement of data in bulk. Testing proved that processing each row sequentially caused too much system overhead.
Additionally, we designed a comprehensive batch monitoring schema, where we could track ETL errors right down to the rows of data that caused an error. Doing so allows for the creation of ETL routines to post-process just the rows which caused errors, eliminating the otherwise inevitable delay for a complete data set until the next load interval.
Benefits
Business Users Could Have the Reports When They Needed Them
Standardized Information Exchange Architecture Ensured Timeliness of Data
Saved the nearly $200k cost of ETL Software and Related Implementation Costs
Improved Data Quality Controls

|