How many warehouses do you think have been eliminated (or
never built) because of Amazon.com? I have no idea, but I bet it’s a big
number. Maybe the warehouses they do need are smaller, too. This is worth
reflecting on, even though it’s pretty obvious.
Why were they able to skip the warehouse in the distribution/delivery
process? They figured out that it is much more efficient to deliver the goods
directly from the source. They needed agility, and they made it happen.
It seems to me that the time has come for IT departments to
start thinking the same way about Data Warehouses (DW). It ought to be easier
to deal with electronic data than physical objects, shouldn’t it? So, what’s
the problem with this picture? Why not
go straight to the source for data when it’s needed and deliver the freshest
data where it’s needed? Now that Data Virtualization (DW) has become mature, increasingly forward-looking companies are heading that direction.
Your data warehouse diehards will tell you something similar
to what Vincent Rainardi says in his blog http://dwbi1.wordpress.com/2012/12/03/why-do-we-need-a-data-warehouse/ that a data warehouse is worth it because it
is:
a) Integrated
b) Consistent
c) Contains historical data
d) Tested and verified
e) Performant
He goes on to say that the reason the DW meets these
characteristics is that so much time has been invested by business analysts,
data architects, ETL Architects, ETL Developers, and testers. (Is this good?)
I believe that Data Virtualization can bring all of these characteristics to the table with the arguable exception of historic data. But notice that the first and foremost reason above for a data warehouse is that it provides integrated data. Perhaps going forward, Data Warehouses should be designed primarily to maintain historical data that is not being captured and/or maintained anywhere else. Let’s say we reduce the Data Warehouse use to maintaining historic data, with all other data access and movement being accomplished by Data Virtualization. That thought raises lots of flags, doesn’t it? Security; validation; moving data physically when needed; writing back when the data is federated; performance, etc. Actually, by combining DV with other patterns, companies are addressing these requirements now.
Data Warehouses and Data Virtualization are inextricably
tied together, with clearly overlapping objectives. Now that Data Federation
and Data Virtualization are coming of age, we need to begin thinking more in
terms of the best use for each, so that we can leverage Data Virtualization
wherever it makes sense. DV adds dramatically to the agility of a company’s
infrastructure and to its capacity for informed, rapid decisions. Data
Federation, which is at the heart of DV as we commonly speak of it, can also be
applied to ETL-type data movement, eliminating the staging. So, Data Federation
can be the best way to populate todays and tomorrow’s DW.
Within five years, the most competitive companies will be
using predominantly agile integration for BI, BA, and transactions, with the
data warehouses focused primarily on accessibly preserving historic data. Realistically
speaking, though, there will still be many companies relying on their workhorse
Data Warehouse, and they still will have trouble calling themselves “agile.”
Let me know what you think.
Let me know what you think.