Home About Us Services Common Problems Solutions Information Compliance Tools FAQ Contact Us

 Case Studies - Overview
 Travel Insurance Provider
 Federal Regulatory Monitoring
 Pharmaceutical – Info Exchange
 Pharmaceutical – Data Repository
 e-Commerce
 Banking: Credit Card

 PETL
 DQA
 Pricing
Major Pharmaceutical Company – Data Repository

Background

The customer utilized a vendor’s Commercial Off The Shelf (COTS) application, which physically partitions research data by creating a new set of 30 tables for each new study.  In light of the fact that this organization currently maintains data for over 300 studies, the overall database has grown to more than 10,000 tables.

Problem

We needed to develop a generic Extraction, Transformation and Loading (ETL) solution that would migrate data from the production application to a newly designed data repository.  The major challenge was that the number of source tables was dynamic – every time a new study was defined, we had an additional 30 tables to migrate.

Solution

Since both the source and target databases were Oracle, we selected PL/SQL as the primary language for developing the ETL solution.  We conducted a high level analysis of ETL tools, and found that very few could support the dynamic source table requirement.  Additionally, we determined that our project timeline was too aggressive to introduce an ETL technology without delaying the project.

Instead, we utilized dynamic SQL to populate a table which stored the list of source tables for each ETL module.  Then, each respective ETL module could query this table to identify which source tables to retrieve data from.  We then generated SQL to migrate the data, because SQL allows the movement of data in bulk.  Testing proved that processing each row sequentially caused too much system overhead. 

Additionally, we designed a comprehensive batch monitoring schema, where we could track ETL errors right down to the rows of data that caused an error.  Doing so allows for the creation of ETL routines to post-process just the rows which caused errors, eliminating the otherwise inevitable delay for a complete data set until the next load interval.

Benefits

Business Users Could Have the Reports When They Needed Them

Standardized Information Exchange Architecture Ensured Timeliness of Data

Saved the nearly $200k cost of ETL Software and Related Implementation Costs

Improved Data Quality Controls