A very recent survey on Big Data from Capgemini titled The Deciding Factor: Big Data & Decision Making caught my attention. Its main focus is on Big Data management and how it influences the decision-making process in the different organizations. I find it interesting because it looks in the “outside-in view that will generate the biggest opportunities for differentiation” in business using the Big Data insights. In short the major findings are:
“The majority of executives believe their organizations to be “data driven”, but doubts persist.”
- “On average, respondents believe that big data will improve organisational performance by 41% over the next three years.”
- “Overall, 55% of respondents state that they feel big data management is not viewed strategically at senior levels of their organization.”
“Organisations struggle to make effective use of unstructured data for decision-making.”
- “Most of the bussiness people are not familiar with the tools used to query unstructured data, such as text analytics and sentiment analysis.”
- “Two thirds of executives believe that there is not enough of a “big data culture” in their organisation – this is particularly notable across the manufacturing sector.”
“Although unstructured data causes unease, social media are growing in importance.” (used to make descions based on customer activities)
- “Business activity data and point-of-sale data are considered most valuable across the consumer goods & retail sector.”
- “40% of respondents believe that they have too much unstructured data to support decision-making.”
“The job of automating decision-making is far from over.”
- “60% of respondents dispute the proposition that most operational/ tactical decisions that can be automated, have been automated.”
“Organisational silos and a dearth of data specialists are the main obstacles to putting big data to work effectively for decision-making.”
- “Across all sectors, “organisational silos” are the biggest impediment to using big data for effective decision-making.”
- “85% of respondents say the issue is not about volume but the ability to analyse and act on the data in real time.”
“Perceived benefits of harnessing big data for decision-making:”
- “More complete understanding of market conditions and evolving business trends”
- “Better business investment decisions”
- “More accurate and precise responses to customer needs”
- “Consistency of decision making and greater group participation in shared decisions”
- “Focusing resources more efficiently for optimal returns”
- “Faster growth of my business (+20% per year)”
- “Competitive advantage (new data-driven services)”
- “Common basis—one true starting point for evaluation”
- “Better risk management”
Of course there is an infinite list of similar white papers from almost every big company:
- DataStax: Big Data: Beyond the Hype
- Oracle: Big Data for the Enterprise
- AMD: Big Data — It’s not just for Google Any More
- Research Papers on Big Data @ Greenplum
Big Data is definitely a buzzword that is here to stay and you will here it more and more in the coming years. So just get used to it !
There is a widespread discussion on the new MapReduce release, called YARN (Yet Another Resource Negotiator), which is shipped with the latest Hadoop 2.0 version. Curt Monash tries to give a clear perspective on the multitude of releases in his post: Hadoop YARN – beyond MapReduce.
The new MapReduce YARN promises significant improvements in reliability, availability, scalability, backward (and forward) compatibility, predictable latency and cluster utilization. This results in architectural and design changes as depicted in the Arun C Murthy‘s YARN Architecture:
The major difference is that the JobTracker is divided into:
- ResourceManager that manages the global assignment of compute resources to applications.
- ApplicationMaster manages the application’s scheduling and coordination.
Also, the communication between the different Nodes is simplified which allows greater scalability. A prototype build on YARN that clearly demonstrates its advantages is extensively described in PaaS on Hadoop Yarn – Idea and Prototype. Despite many issues and failures in the current implementation, the framework will open new application fields that were not possible with the old version.