The Clear Cloud - Home
Pitfalls of a “Big Data” project and a simple way to overcome it.
AUG 30, 2016 20:51 PM
A+ A A-


The world is in the age of ‘data driven decisions’. The progression of technology for processing & storing data over the last decade has enabled us to look wide & deep; cutting across data sources & formats for hidden values. The industry and academia likewise are looking to explore datasets which probably would have been ignored a few years back. The internet and social media revolution has also led a new classification of data known as unstructured which cannot be mapped / understood using exiting relational data models. The evolution of sensors (IoT) and logging mechanism also led to data being generated with more speed than before. However the inferential values of the datasets were mainly extracted using Statistical and Visualization techniques long known to the computing world once the data was prepped. The only difference was volume, velocity and variety of data fed into the algorithms touched ‘big data’ scale.


In 2012 Gartner’s definition was Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.”  The key take away from this definition are

  • new forms of processing
  • enhanced decision making, insight discovery
  • process optimization

Taking a leaf out the business books of the industry around the world we see that around $114 Billion was spend in 2015 around three transformational technologies Mobility, Cloud and Big Data. According to IDC “big data technology and services market growing at a compound annual growth rate (CAGR) of 23.1% over the 2014-2019 forecast period with annual spending reaching $48.6 billion in 2019”. 

The areas of focus according to IDC is and would be

  • Big data infrastructure – servers/compute, storage, networking, other data-center
  • High-performance data analysis (HPDA)
  • Big data software – information management, discover and analytics, applications

So we see there is a demand for Big Data work so projects cutting across the above three points would be relevant for the next couple of years. The key missing link in the actions correlated with this forecast is that often the business leaders, project/program managers and architects overlook the points highlighted by Gartner in the definition and don’t ask the right questions. The business often gets excited with the go to market strategy and jump into the Big Data bandwagon. The end result often is orthogonal to desired state of a Big Data/Analytics project and results in loss of revenue and market share. All information technology projects involves data but not all projects can be qualified as Big Data.

This post will try to explore these questions in the first section. Understand the pointers on which the current market makes its decision and join the missing links. At the end try to formulate a framework or methodology to evaluate whether the project is a Big Data problem.

Missing Links:

Ultimately the aim of any Big Data Analytics project is to create a single source of truth’ by integrating disparate data sources. There is a need to focus on achieving a balance in the skills and aptitude required for a data driven enterprise. Adding Data Scientist/Analysts/ Architects are a part of the puzzle but the key is the adoption of the knowledge mined. The missing link is education and training. The project kick off should be simultaneously trigger a process to educate the people. The need to build an analytical culture within a business. 


The brilliance of computation or the skill to slice and dice is very important to solve the Big Data problem but the skill which often is the missing link is the domain expertize. The people who has the real insights is often left out in the crowd of data ninjas. The information technology industry has done well in the last couple of decades using traditional data technologies. As a result often there are attempt to solve big data problem using them, which is a complete misfit. The acknowledgment that we need “new forms of processing” is lacking when a team generally starts the journey. Big Data problem solving cannot be in silos there has to be an organizational strategy behind this. The values mined from one business unit are not always useful to another one if there is no single strategy guiding them.

Key Questions:

Big Data applications needs a multilayered architecture similarly it needs the right mix of multiple experts. There is a need to harness the right information assets for values, how to make the values relevant to the customers and manage the huge sets of information while innovating on new techniques to mine values. 

As we all know that in the industry we have experts who need to play a role in each of these areas so the first question is “Do we have the right mix of experts in the team?” though they may be required in phases.


The next question is “What is the problem that is being solved?” does it comes under the category of “enhanced decision making, insight discovery or process optimization”. If not the problem may be solved by technologies not under the Big Data umbrella though the size of data is huge etc. For Example if a business need is reporting on top of transactional data over a month for a monthly review it can be easily solved without any big data technology. 

Data is the cornerstone of a Big Data project. The amount of data available and estimations over time for the system is extremely important as is type of data and availability of data across datacenters. The key questions is “How much, what and where?” Not all companies or enterprises business units often generates data of scale and variety to be classified as Big Data.                    

The need to appraise the utilization of results from the Big Data process at the beginning is key. How the reports, predictions and insights would be consumed directs the non-functional requirements of the systems. For example if a visualization based on 1 TB of data in a few minutes would require Big Data technologies. Similarly whether the consumption is in Real Time also plays a role. Hence the fourth question is “How is the result consumed?”

The key to any enterprise or business’s success is the profit margin and growth over time. At any point of time there are competitors and technology plays a key role in overcoming the competitors. Can we see what X, Y cannot see, is often the question business leaders ask. However future vision always comes with a price. The question that needs to be asked is “What financial value would be generated?” Big Data projects needs a lot of investment upfront and there is a need to continuously innovate and invest to extract maximum value. The importance of Big Data in the technology roadmap for an enterprise has already been establish but the question is whether it is applicable for a particular BU or process.   

Big Data Technology since its inception has been driven into different directions by industry and academia. This has been the principle reason that there exists multiple different ways to solve the same problem. Now “Which technology or techniques to choose?” This essential depends not only on the overall technology strategy of the enterprise but also to the end user’s needs.



Architecture is important as when sketches are made on the board problems are encountered/anticipated earlier more often than not. 

Big Data system architecture is even more complex as it involves multiple layers and technology. The non-functional requirements of the upcoming system too plays an important part in determining the right architecture. A couple of key points which needs to be judged are

  • Storage Space
    • Cloud / In-House Datacenters : Host in private/public/hybrid cloud vs. existing datacenters with new systems 
  • Latency
    • Ingestion : Frequency (continuous/ discrete) of ingestion.
    • Processing : Batch vs. Real Time
    • Serving : Real Time vs. Traditional Queries
  • Security
    • Sensitive : How important is the data?
    • Proprietary : Is it owned by the company/subsidiaries?
    • Privacy
    • Control and Policies 
  • Innovation

    • Avenues for adapting and inventing

  • Existing Systems

    •  Are integration points defined?


Architects are the people who are supposed to plan for project continuations in case of disaster / failures of systems. However given the fact that most of the big data systems are resilient this key aspect is overlooked. The need is not look only into the Big Data technologies but the overall systems as one and plan further. 

Analysis of a Big Data project


According to a CBR report 60% of Big Data projects through 2017 will fail to go beyond piloting and experimentation, and will be abandoned. It is key that if a company is planning to work on a Big Data project that a process/framework is adopted to mitigate risks and identify pitfalls at the planning stage itself. In the previous sections the elements discussed plays a key role in the framework. These steps can ideally become good practices for the company and also draw the path to a successful big data project. There is a series of questions which has been formulated which needs to be answered in sequence by different groups of people in yes/no/don’t know. 

We use simple old excel tool to compute the points and present some figures/charts on the overall / sectional data points collected. At the end when the entire data is computed the end user would be able to see the common areas where the questions are still unanswered/unknown. The questions are divided into 5 distinct areas, Business, Business Users, Enterprise Architecture, Project Architecture and Technology and are to be answered by different people who support these functions respectfully. The results of the this would give the stakeholders decide whether the project should be conceptualized to start off to begin with and whether it qualifies as a Big Data project at all.



Now let’s consider two distinct use cases in the first case we find that the Technology has about 80% no choices and business has 70% no choices. This surely is a recipe for doom in the project landscape for the big data project. We can safely say that this probably is not yet matured / eligible to be a successful big data project.



Big Data is here to stay and more & more businesses and industry would be using it in the future. However most of the times due to the peer/market pressure to jump on the bandwagon stakeholders ignores key points which can make huge differences in terms of profit margin or market share. This paper is an attempt to enlightened the stakeholders to all the pitfalls and suggest a simple solution to overcome, anticipate and analyze the same.   





Business Users

Enterprise Architecture

Author Info

Kinnar Kumar Sen

Kinnar is a Senior Enterprise Architect at TFG, part of the Technology Office in Engineering and R&D Services group of HCL Technologies and has extensive experience in Big Data and Analytics based solutions.

[%= name %]
[%= createDate %]
[%= comment %]
Share this:

Computing Now Blogs
Business Intelligence
by Keith Peterson
Cloud Computing
A Cloud Blog: by Irena Bojanova
The Clear Cloud: by STC Cloud Computing
Computing Careers: by Lori Cameron
Display Technologies
Enterprise Solutions
Enterprise Thinking: by Josh Greenbaum
Healthcare Technologies
The Doctor Is In: Dr. Keith W. Vrbicky
Heterogeneous Systems
Hot Topics
NealNotes: by Neal Leavitt
Industry Trends
The Robotics Report: by Jeff Debrosse
Internet Of Things
Sensing IoT: by Irena Bojanova