Caching is one of those afterthoughts, when you know you
have a great solution, but you start wondering about performance. Since caching
is about moving data at varying speeds, it is (or should be) an inherent
feature and responsibility of any integration solution. You will find that a
truly
Agile Integration Software,
such as
Enterprise Enabler,
makes it easy to configure a wide range of models of caching, and to adjust as
your requirements change.
Agile Integration covers everything from ETL through near
real time
Bi-directional Data Virtualization(DV),
all with federation at the core, so caching can be implemented anywhere,
end-to-end, in the data flow cycle.
The Continuum of Caching
According to Wikipedia, cache is a “component that
transparently stores data so that future requests for that data can be served
faster.” I think of it as being any data store, however static or ephemeral,
however Big or small, and whether the cached data is exactly in the source form,
perhaps to be federated on the way out, or federated already as the endpoint
needs or the
Master Dataform,
ready to go on to its destination, or somewhere else in the flow of the data. The
specific subset of data to be cached should be optimized to ensure the greatest
efficiency, minimal size, and highest reusability. The transparency comes in
because, in the big scheme of things, the destination, the consumer, or the
workflow steps never need to know the data is not all coming live from the
original sources.
This is where data federation and
Data Virtualizationadd
to the flexibility of caching. Agile Data Virtualization supports cache as one
of the sources, so there could be DV involved to create the cache, whether in-memory,
on disk or in a database, and then that cache can be used as one source in a
federation that is delivered either on-demand or event-triggered.
Today, most people talk about cache as being refreshed as
opposed to accumulating a history, however with all the options that can be configured,
this is actually a realistic and
sometimes useful consideration. You can see that the possible combinations are
many, clearly enough that one must be careful not to get tangled up, and not to
lose sight of the original objectives of caching!
One could easily argue that caching is more like ETL than
like Data Virtualization, however DV often requires caching more than other
integration patterns, since the uses generally expect rapid, “live” data,
without latency. When the rubber meets the road, in many situations, caching is
the only way to ensure that a DV solution with many users does not bring the
source applications “to their knees.” This is why Agile Integration Software,
which combines all the integration patterns, solves Data Virtualization
problems better than pure DV platforms.
What do you need to determine before you configure caching?
·
Which data
to cache
·
Why you
selected caching this particular data
·
Where to
cache – memory, disk, database, etc
·
How often
to refresh – schedule, event, as soon as available
·
Where in
its path to cache – directly from source, partially processed, before or
after federation, endpoint ready, as part of a Master Data definition
·
When to
release from cache- as soon as read, as soon as a particular set of
consumers have read
·
Is the
cache subject to bi-directional data flow
When should you plan to Cache?
First of all, keep in mind that if you don’t identify your
caching needs up front, with Agile Integration Software, you can easily add it
as your traffic grows and the parameters get to point where it’s needed. Particularly when you are using Data Virtualization,
and are hitting backend source systems live at each request, you should take a
close look at the needs and best approaches to caching. You should consider
caching in situations where:
·
You are
concerned that too much traffic hitting mission critical or any sources could
adversely impact the performance of those systems.
·
You are
concerned about the response times for end users.
·
You need
to have the same value throughout a process where you might be accessing it
multiple times
What to Cache?
·
Data that
doesn’t need to be real-time
·
Data that
you want to ensure the same snapshot is used for different things
·
Data that
changes so slowly that having it real-time doesn’t matter. You could refresh
the cache once an hour or day or month, even.
Agile Caching
Agile Integration Software offers a wide range of options
for caching, with ease of configuring even complex caching patterns without
custom programming. With the ability to select full data sets, specific fields,
mixed in-memory and on-disk caching, and
all combinations, including conditional full workflow-driven caches, great architecting doesn’t have to be
constrained by what is practical to implement.