The whole concept of Big Data projects can be overwhelming, 'though the promise is compelling. Whether you are analyzing Social Media Data or digging through corporate data, it's not just about processing huge amounts of data. Just like any other new technology project, it is easy get caught up in the vortex of the hype and lose the bigger picture of what’s involved. You don't want to find out after the swirling starts that you may be swimming in unwelcome growing tech debt. If you understand the type of functions your solution will need to handle, you will be better equipped to select the most appropriate tools to solve it in ways that incur the least tech debt.
Print this out. Cut out the nine Big Data game cards. Now put them all on a flat surface, turn off your ipod, close the door, and consider each one carefully. Pick "blue" or "red" for each, whichever best describes the data you will be dealing with. Set aside any that you really want to answer "both" or "purple."
As you handle and shuffle the cards, you will see some interdependencies across the cards, and perhaps you start lining them up in the order of processing. If you have the inclination to throw one out completely, set it aside to think about again.
"Blue!"
Most likely not. Hopefully you have identified lots of ancillary tasks that will be necessary and that make this look like a data integration project as much as a Big Data project. You will have to deal with other issues like:
·
Data security
·
Data transformation
·
Data federation
·
Data cleansing
·
Data capture
·
Data migration
·
Data updates
·
Data latency
These are all known problems, with solutions, of sorts. All of these requirements incur additional steps, and are often solved via staging of the data. More than likely, with this exercise, you are contemplating that you will either need to have multiple staging of the Big Data (3 times Big Data is Big Big Big Data). This is a huge driver for your company to adopt agile integration software (AIS), an imperative to such projects. Complementing Hadoop, AIS handles federation, inline cleansing and analytics, transformation and other processing without multiple steps along the way. Its transformation engine works directly across multiple sources, orchestrating and merging in their native modes as opposed to requiring intermediate conversion to XML, as XSLT engines do. Secure write-back to sources offers more degrees of freedom to the way you can think about Big Data problems.