As you know, when you apply ETL to your data transformation project, data is “extracted” from the data sources using a data extraction tool. It is then transformed using a series of transformation routines. This transformation process is largely dictated by the data format of the output. Data quality and integrity checking are performed as part of the transformation process in the data staging area. Finally, once the data is in the target format, it is then loaded into the data warehouse ready for presentation/reporting.
The process is often designed from the end backwards in that the required output is designed first. In so doing, this informs exactly what data is required from the source. The routines designed and developed to implement the process are written specifically for the purpose of achieving the desired output, and only the data required for the output is included in the extraction process.
Business rules that define how aggregations are achieved and the relationships between the various entities in both the source and target are designed and therefore coded into the routines that implement the ETL process. This approach leads to tight dependencies in the routines at each stage of the process.
ETL makes sense when the target is a high-end data engine, such as a data appliance, Hadoop cluster, or cloud installation. The ETL approach can provide drastically better performance in certain scenarios if a proper ETL pipeline is designed, allowing parallel execution for faster performance.
The training and development costs of ETL need to be weighed against the need for better performance as well. So your research. Ask a lot of questions. Engage a successful, seasoned team of data transformation, integration, and planning analysts. We cost less and deliver more.
Our Austin, Texas location keeps us at the epicenter of planning analytics and data-driven healthcare innovation and keeps our global capabilities and services at the forefront of our industry.
Email us at kcerny@mia-consulting.com or call us at 512.478.3848 to start a friendly, productive conversation.