“Virtualization”
is everywhere but nowhere. The term is virtually ubiquitous. The first use I
remember when computers got into the picture was “virtual reality,” when we computationally
rendered 3D worlds in 2D, complete with lighting models and all that. The good
thing is that all those complicated algorithms are now encapsulated for cool
things and used by beginner gamers. In those days, we had to actually calculate
every pixel ourselves. But I digress.
First,
let’s clarify that “virtualizing data” means putting in the cloud or elsewhere
in order to eliminate some of the hassles of its existence and maintenance.
That has nothing to do with data virtualization, which is a term that I believe
is still evolving.
Data Virtualization,
according to Rick van der Lans, who literally wrote the book, is “the technology that offers data consumers a unified, abstracted,
and encapsulated view for querying and manipulating data stored in a
heterogeneous set of data stores.**”
As the discipline matures, he is expanding his view, as in his
new white paper, Creating an AgileData Integration Platform Using Data Virtualization. Definitely
recommended reading.
The “unified, abstracted, and encapsulated view,” from his
original definition is the core concept, in my opinion, of data virtualization.
In other words, there is a mechanism to bring together, or “federate,” virtually,
data from many sources in a way that is useful. This means that the data is
federated without creating a physical or cached staging database, but is
aligned, transformed, and made available for use. So, for example, you may have
a SharePoint BCS application that needs data from SAP, Oracle, and
Salesforce.com. Data virtualization will provide a mechanism to merge all of
the data into the form necessary for the end user’s interaction in SharePoint.
The data is federated “on the fly” and delivered virtually to a web page,
on-demand upon refresh of the screen. Think about the security of the backend
data that has been accessed…it never actually moves from its original source!
Data virtualization also includes writeback to the sources (with end user
security, but that’s for another blog) so that an end user can, for example correct
his phone number or address, sending it as an update directly to the backend
source. )See more examples at http://tinyurl.com/a3wkffc)
You can see that this description expands the definition to
include any sources, not just data stores, although the focus of most data
virtualization products is BI, in which case, that limitation it makes sense. The BI view of
using data virtualization usually is with respect to federating relational
databases for the sole purpose of querying. The tools that were designed
assuming that constraint have some difficulty accommodating the expanding
definition.
In addition to evolving from federating data stores to
federating any kinds of disparate sources, data virtualization is shedding the
concept of “on-demand” only. Now federated data is not just available by web
services, ado.net, ODBC, JDBC, etc, but for any type of data integration, such
as ETL, EAI, etc.
In fact, it is the “data virtualization” concept of federation
that becomes the kingpin for “Convergence,” as Gartner is wont to say, of all
integration modalities in a single toolset, sharing metadata and business rules
across all.
**Rick F. van der Lans, Data virtualization for Business Intelligence Systems, Morgan Kauyfmann, 2012
**Rick F. van der Lans, Data virtualization for Business Intelligence Systems, Morgan Kauyfmann, 2012