There is a lot of smoke and mirrors around data science projects. Here's a systematic approach for keeping things simple.
Perhaps unfortunately for some, a good data science project begins with a lot of energy and imagination. No question, there's a lot of
technical demands for understanding the entire stack of tools but that alone will not get you to a successful product. Instead, the implied
fuel to the fire is a keen curiosity and enthusiasm for discovering something new with data.
Another thing to keep in mind here is that this step of the process is largely marketing. I don't mean that you must doll up all of your developers
for their "close-up" or wine and dine any potential customers. While this could be part of the process I mean that you must focus on understanding
who your customers are and think about what they're most interested in. However much your technical staff loathes the business side of analytics,
embracing your customer and working to build an analytical solution they will use is critical to success.
Once you have your customer identified and detailed, now begins the long process of
bringing a project to fruition. This is really where the pipeline begins. While understanding your customer is central to a good project -- I
like to think that it's not necessarily part of the pipeline. It's more that your pipeline is being bathed in customers. Each step in this
process presents new challenges for managing your customer and brand.
This is by far the longest part of most projects. It's an unfortunate reality for most teams as they struggle to gather data.
Any analytics or modeling done on the data is dependent on it being of good quality. The more important the decisions to be made with this data -- the more important this step is.
Business intelligence. Creating PowerPoints, dashboards, and database views. The selection is dependent on the customer's interests.
This is what most people refer to as data science. It's about taking a mathematical approach to finding what you need in the data.
Each of these steps is iterative. I recommend prioritizing from the top down (i.e. Getting the data to Advanced Analytics).
A favorite model of mine is Maslow's hierarchy of needs. This model lays a framework for which people can achieve 'self-actualization'.
At the base of this hierarchy are basic physiological and security needs (e.g. 'Is my heart beating? Is someone going to rob me in the night?')
where once you have these needs accounted for -- you then can effectively move onto other needs such as loving others and being accepted by
Similarly, I believe that the pipeline above fits into a hierarchy of needs for an effective data science product. Where once you
have a steady stream of data you can then begin evaluating the quality of the information. Additionally, once you have ensured that the
data you are getting is of quality -- then you can begin reporting on the data. Similarly, once you have basic reporting in place you then
can begin performing advanced analytics.
DATA SCIENCE BRANDING
Right or wrong, people stereotype. When applied to a company, this is branding.
It is constantly happening and how company's manage this makes the difference. At each step in the pipeline, there are opportunities for customers
to develop opinions about the final output. Taking control and delivering on all of the elements outlined above will help you gain the
This is simply a framework to consider in getting through analytical projects. In my experience, I have had customers pushing to speed
through some of these steps but that often compromises the final product. Taking the time to do a quality job with each step is essential
to delivering a valuable final product and building a brand.