Friday, August 22, 2014

Cache as Cache Can

Caching is one of those afterthoughts, when you know you have a great solution, but you start wondering about performance. Since caching is about moving data at varying speeds, it is (or should be) an inherent feature and responsibility of any integration solution. You will find that a truly Agile Integration Software, such as Enterprise Enabler, makes it easy to configure a wide range of models of caching, and to adjust as your requirements change.

Agile Integration covers everything from ETL through near real time Bi-directional Data Virtualization (DV), all with federation at the core, so caching can be implemented anywhere, end-to-end, in the data flow cycle.

The Continuum of Caching
According to Wikipedia, cache is a “component that transparently stores data so that future requests for that data can be served faster.” I think of it as being any data store, however static or ephemeral, however Big or small, and whether the cached data is exactly in the source form, perhaps to be federated on the way out, or federated already as the endpoint needs or the Master Data form, ready to go on to its destination, or somewhere else in the flow of the data. The specific subset of data to be cached should be optimized to ensure the greatest efficiency, minimal size, and highest reusability. The transparency comes in because, in the big scheme of things, the destination, the consumer, or the workflow steps never need to know the data is not all coming live from the original sources.

This is where data federation and Data Virtualization add to the flexibility of caching. Agile Data Virtualization supports cache as one of the sources, so there could be DV involved to create the cache, whether in-memory, on disk or in a database, and then that cache can be used as one source in a federation that is delivered either on-demand or event-triggered.

Today, most people talk about cache as being refreshed as opposed to accumulating a history, however with all the options that can be configured, this is actually a  realistic and sometimes useful consideration. You can see that the possible combinations are many, clearly enough that one must be careful not to get tangled up, and not to lose sight of the original objectives of caching! 

One could easily argue that caching is more like ETL than like Data Virtualization, however DV often requires caching more than other integration patterns, since the uses generally expect rapid, “live” data, without latency. When the rubber meets the road, in many situations, caching is the only way to ensure that a DV solution with many users does not bring the source applications “to their knees.” This is why Agile Integration Software, which combines all the integration patterns, solves Data Virtualization problems better than pure DV platforms.

What do you need to determine before you configure caching?
·         Which data to cache
·         Why you selected caching this particular data
·         Where to cache – memory, disk, database, etc
·         How often to refresh – schedule, event, as soon as available
·         Where in its path to cache – directly from source, partially processed, before or after federation, endpoint ready, as part of a Master Data definition
·         When to release from cache- as soon as read, as soon as a particular set of consumers have read
·         Is the cache subject to bi-directional data flow

When should you plan to Cache?
First of all, keep in mind that if you don’t identify your caching needs up front, with Agile Integration Software, you can easily add it as your traffic grows and the parameters get to point where it’s needed.  Particularly when you are using Data Virtualization, and are hitting backend source systems live at each request, you should take a close look at the needs and best approaches to caching. You should consider caching in situations where:
·         You are concerned that too much traffic hitting mission critical or any sources could adversely impact the performance of those systems.
·         You are concerned about the response times for end users.
·         You need to have the same value throughout a process where you might be accessing it multiple times

What to Cache?
·         Data that doesn’t need to be real-time
·         Data that you want to ensure the same snapshot is used for different things
·         Data that changes so slowly that having it real-time doesn’t matter. You could refresh the cache once an hour or day or month, even.

Agile Caching
Agile Integration Software offers a wide range of options for caching, with ease of configuring even complex caching patterns without custom programming. With the ability to select full data sets, specific fields,  mixed in-memory and on-disk caching, and all combinations, including conditional full workflow-driven caches,  great architecting doesn’t have to be constrained by what is practical to implement.

1 comment:

  1. I have been looking into different data for my software. There are a lot of different solutions to do. I hope that I will be able to do something soon. There are a lot to choose from. I hope that I will be able to find the best solution soon. http://eknow.com/solutions/post-merger-integration-pmi/

    ReplyDelete