Tuesday, December 19, 2017

The Mysteries of Data Transformation

In the beginning, long ago and far away, I thought all the data integration products had embedded Transformation Engines. After all, the biggest challenge, really, is making disparate data make sense together and able to align appropriately so that the data from sources is meaningful to the target consumer.

Today, data transformation and manipulation is even more critical for Data Virtualization than for ETL, since you have to get it right in one pass. ETL has the dubious “luxury” of adding however many steps and copies along the way that is needed to ease the pain, but it’s imperative to make sure that everything is in good order for cleaning, aligning, filtering, and federating before you present the virtual model for querying.

Chances are you are spending a lot of time and energy preparing and dealing with the dirty details of managing messy data with your current integration or data virtualization product. That's because I was wrong about every integration platform having a transformation engine. 

What is Data Virtualization without a Transformation Engine?
Think about it a bit. Without a legitimate transformation engine, Data Virtualization only can work in a perfect world, where data has been cleaned and where data naturally aligns without manipulation...Maybe you can get away with format differences.
OK, so if the data has already been cleaned, you are not actually getting the data from the source, right? And, isn’t it then carrying the latency of all that housekeeping? Isn’t that counter to what DV is all about?

Of course, there are times when the best overall solution is, in fact, to prepare a clean copy of the data set, and query against it. Often an ODS (Operational Data Store) is the best source to use exactly because proven cleansing algorithms already are in place.  Enterprise Enabler is the only integration Agile ETL™ platform and it can actually do the cleaning as well as the Data Virtualization…. Thanks to the robust embedded Transformation Engine!

Enterprise Enabler® (EE) Transformation Engine is the Great Orchestrator
Recently, I’ve been thinking a lot about our Transformation Engine, and I’ve come to believe that it may be the single most important asset of Enterprise Enabler. When we introduce the architecture and components of the EE platform, we tend to take it for granted, unwittingly doing a disservice to the Transformation Engine with a simple one-liner.  In fact, the Transformation Engine (TE) is the heart and brain of all the logic, run-time processing of data throughout Data Virtualization, Agile ETL™ and all modalities of integration.  We describe it as the conductor, orchestrating and issuing instructions as configured in the metadata.

T.E: “Hey, SAP AppComm!, bring me the data from TemplateA. Merci!” Now, Salesforce AppComm, get the data from Templates. Next, let’s apply the federation and validation rules on each data set and package it as for a federated queryable data model.” Oh, and while you’re at it, send that data directly physically to the Data Warehouse.” “Voila!”

Obviously, this is a simplification, and I may not have gotten the accent quite right, but that EE Transformation Engine is one smart cookie that outperforms the alternative solutions.

Just forget the Legacy Transformation Engines
The old fashioned “Rube Goldberg” process found in the traditional ETL products:
  •          Extract a data set from one source and put it in a data store.
  •          Write custom code to clean and align the data and post it to a database
  •          Repeat with each source…
  •          Invoke many separate specialized utilities for mostly  limited to format conversion

You can see that this legacy approach certainly cannot adapt to Data Virtualization, which must reach live directly into the sources and federate them en route.

What’s different about Enterprise Enabler’s Transformation Engine?
First, a couple of relevant aspects of the Enterprise Enabler platform.  EE is 100% metadata driven. You never need to leave Integrated Development Environment.  Since it is fully extensible to incorporate business rules, cleansing rules, formulas, and processes. It also means that every object is reusable and you can make modifications in a matter of minutes, or even seconds. EE’s single platform handles Data Virtualization, Agile ETL, EAI, ESB, and any hybrid or complex integration pattern. Data workflow orchestration and composite application designer round out the platform. This described framework means that there is a global awareness during execution that enables very complex logic and processing based upon the states of any aspect of the system.

Some of the capabilities of Enterprise Enabler Transformation Engine:

The Bottom Line
EE’s Transformation Engine streamlines and ensures end-to-end continuity in configuring and processing all data integration patterns, including Agile ETL and Virtual Data models, providing
  • Shorter time to value
  • Improve data quality
  • Rapid configuration 
  • Re-usability eliminating hand coding tools

It truly is the heart and brain of Enterprise Enabler. To learn more make sure to check out our Transformation Engine whitepaper (here).

1 comment: