Tuesday, August 23, 2011

Query Optimization across Apples and Oranges

I just recently realized that the problem of federated query optimization that my colleagues and I think about is a completely different problem from the one that has been so well addressed by academics and big database vendors. Even the more contemporary players in the federation and virtualization world don’t extend this concept across disparate sources, and they focus only on run-time speed, but not agility.

Those approaches simply do not address the reality that is brought to the forefront now that we have integration solutions that federate everything from web services, spreadsheets, medical instruments, social media, and many other sources, including relational databases in a single "query." The fundamental value of Agile Integration Software (AIS) is violated by the inherent constraints posed by the query optimization tools on the market.

       •        What good to us is a query optimizer that assumes all of the
              data sources are relational databases?

       •        And adding XML to the mix just doesn't "cut the mustard!

       •     What if, in order to use these tools, I have to construct a
              universal data model that includes all of the data that could
              possibly be in play? (The clunky antithesis of agility!)

       •     Do I have to anticipate every data query I might want to

       •    What if there is a lot of transformation that needs to be
             performed along the way to make the data meaningful
             across the sources?

For "pull" integration, where a user's browser interaction or a calling program triggers and specifies the data to be accessed, a SQL query is a universally comfortable way to access information. For a live query in virtual federation, that needs to be interpreted by the federating software into whatever the endpoints understand. The data flowing in from multiple connections needs to be synchronized as the query is being fulfilled from the disparate systems. A "push" integration typically is usually better known, with at least the sources pinned down ahead of time, and often with the exact data being sent each time.

In our world, performance is a different problem from typical query optimization on or across relational databases. In complex cross-application joins, the critical path is often more related to the i/o speed of one of the applications or the frequency of disbursement of data, or some other macro factor. The join and access order logic, for example, can be tuned to accommodate the highest resource consumer.

So you can see that our problem is not the same one. When people ask us about query optimization, we are sometimes talking apples and oranges!




Friday, August 5, 2011

The Illusion of Pre-Built Adapters

Why do people continue to fall for the idea of "pre-built" adapters? I guess that's pretty obvious. Anything you really want to believe in, you can. Unfortunately, it doesn’t follow that believing in something makes it so.

Dick: Ok, guys, have you figured out how we're going to get this Salesforce/SAP integration done in time for me to meet the VP's deadline?

Harry: I've been online all week studying the possibilities. I saw Adapters from three companies that look really good.

Dick: Come on, we've been down this road before.

Harry: Right, but things have changed! The latest Adapters work immediately off the shelf! Let me show you the videos on the one that looks like it has the most customers… (beep .. "Hello - Welcome to Something SOA Great's web site. I am about to show you the latest thing since…")

 Harry and Dick watch, enthralled. Tom stands behind them with a frown, rolling his eyes.

Dick: If I hadn't seen it, I wouldn't believe it.

 Tom: Hmm. I've seen it and I don’t believe it.

 Harry: Don’t be obstructionist. You just saw that SSG's Adapter automatically connected to both Salesforce and SAP. All the mapping is already built in, so we don't have to even know what the data fields are. You know what that means - we don’t have to deal with those know-it-all data analysts.

 Dick: We could just download it and be off to the races to make the deadline with time to spare. 

Tom: And what if we need to use custom field in Salesforce?

 Harry: Didn't you see that they have 10,000 Adapters in their library? And fifty different versions of this one, so we can look for the closest fit. Then we can tweak it just a little bit to fit what we need. They said they have tools for that.

 Dick: Let's do it!
Tom: I need a vacation. Have fun.

So Harry downloaded the Adapter to his desktop.

Harry: Here we go! I'll install here and get it up and running.

 Adapter: /very faint chuckle/

 Harry doesn’t hear. He’s reading the on-screen instructions.

Harry: OK, I'm connecting to SAP

 Two weeks later

Harry: Now I'm connecting to Salesforce

 Two weeks later

Harry: I think I'm going crazy. I keep hearing this noise that's getting louder every day. But I digress. Here we go - let me try running this beast.

 Adapter: BANG! CRASH ! HA! HA! HA! /hysterical laughter that can be heard all the way to the VP's office/
Tom is back from a month’s vacation overseas; He runs to Harry’s cube to see what's going on.

Tom: AARGH! What's going on here? .. Oh, no! The Adapter is squirting SAP data out the port all over the desk!

Dick: /loudly/ Not again! Everyone to their stations! Call 911! Call the auditors! Call OSHA!

Tom: Unplug something before someone drowns in this big pile of SAP Data.

 As the VP arrives at the scene, a cloud forms near the ceiling, creeping out to the hallway. A final Guffaw from Adapter, and the light mist of Salesforce data turns into a terrible storm

------------ End of Same Story, 23rd time around ---------------

What is it that we all want so very badly from Adapters?
  • Off-the-shelf solution
  • Effortless integration between two endpoints
  • No need to program complex mapping and business rules
  • No need to know the technical aspects of connecting with either endpoint
  • No need to have domain or business knowledge in either endpoint application.
  • No need for a data analysts to be involved
  • A perfect fit with both endpoints
What makes that impossible?
  • Almost every implementation of an endpoint is customized or changes over time
  • Your selection of source data is different from what is in the adapter
  • Your other endpoint also has been customized
  • Your business rules don’t match what's there already
What do you have to do to accommodate?
  • Write code to be able to feed the data to the adapter the way it expects to see it ( a full integration in itself!)
  • Write code to adjust the manipulation and fit to the customized endpoint
  • Open up the adapter, if possible, and add code to modify the business and mapping rules. 
 What do pre-built adapters offer?
  • Working at most once off the shelf
  • Good experience in re-working code
  • Opportunity to practice emotion control
  • Incentive to find an alternative.

The alternative:

Connectivity must be designed in such a way that the re-usable parts are solid, and reusable for every instance of a source or destination. Decoupling the business rules from the technical business rules and the connectivity improves reusability. This is the model used by agile integration software. AppComms Removing Splints from Octopus