This client tracked Accounts Receivable and income data for over 100 hospitals and healthcare providers. Data was provided on a daily basis in various formats, and was used both for driving workflow automation and analytics. Accuracy and balancing with each hospital's accounting system was as important as loading speed. And even though each hospital had different a source system, the client wanted the ability to compare hospitals against each other using an "apples-to-apples" approach. A client-based IT team was in charge of loading data, with varied levels of programming experience.
We instantly recognized that the high volume and wide variety of data formats involved meant standardization and automation were key priorities in this project. So we looked for a solution that would enable non-programmers to tackle 80 percent of the data mapping needed when setting up a new engagement — yet offer the flexibility of custom programming. Because loading speed was important, and data could be provided at any time throughout the day, we knew that the loading process would need to be able to run at any time without interfering with users.
Data Hygiene (Separate Staging and Production Database Structures):
Cimetre created a staging environment that allows for management of failed data loads without impacting users. Once data is vetted, it automatically transferrs into the production environment.
Data Provenance Tracking:
We created a framework for tracking hospital-sourced data, the source of the data, and the status of the load. Every hospital sourced data element can be tracked back to the raw source, saving troubleshooting and reconciliation times.
SSIS Package Automation:
Using a combination of .Net WCF (C#) and SSIS, Cimetre built a tool that allowed non-programmers to customize data mapping for new clients. We created a dedicated SSIS package for each client and data source — enabling users to download a templated SSIS package from an ASP.Net web application that contained all the default mappings and custom error handlers (see below). Each package is customized as needed to fit the specific needs of the client data.
Enhanced Error Logging:
Because SSIS is somewhat limited in its native ability to report specific data causing an error, we developed a custom error reporting component that captures relevant information and inserts it into the data stream, where it can be recorded into the database. Having access to the specific error details allow users to pinpoint data quality issues in a fraction of the time.
Automated Loading Agent:
Our solution included a Windows Service (.Net/C#) that watches for new files dropped in a specific location and/or matching a specific naming convention, and automatically executes the customized SSIS packages. Users manage the process via a web interface, allowing them to control success/failure thresholds, outside data dependencies, and expected frequency of data delivery. Each record created through the process is tagged and traceable back to the raw data source.
Near-Real-time Data Loading:
Another SSIS solution moved data from a staging environment into a production database in small batches (reducing contention) as it became available, meeting quality rules. This package supports parallel execution against partitioned tables in a scale-out scenario.
Conformed Lookups:
Adding significant power to our client's analytic capabilities, we developed an automated system for duplication prevention and resolution of user-created duplicates using a merge tool. This Type 2, slowly changing dimension approach manages and simplifies clinical lookups for clients. The resulting data reconciliation toolset and alert system now allows users to judge the validity of data based on “staleness.”
Reporting & Analytics:
Now the client enjoys self-service-enabled BI solutions through SSRS reports (dynamic parameters, drill-through). Tenant-specific access levels allow the display of only what a data user is authorized to see.