Data Science Pipeline

There is a lot of smoke and mirrors around data science projects. Here's a systematic approach for keeping things simple.


Perhaps unfortunately for some, a good data science project begins with a lot of energy and imagination. No question, there's a lot of technical demands for understanding the entire stack of tools but that alone will not get you to a successful product. Instead, the implied fuel to the fire is a keen curiosity and enthusiasm for discovering something new with data.

Another thing to keep in mind here is that this step of the process is largely marketing. I don't mean that you must doll up all of your developers for their "close-up" or wine and dine any potential customers. While this could be part of the process I mean that you must focus on understanding who your customers are and think about what they're most interested in. However much your technical staff loathes the business side of analytics, embracing your customer and working to build an analytical solution they will use is critical to success.

Once you have your customer identified and detailed, now begins the long process of bringing a project to fruition. This is really where the pipeline begins. While understanding your customer is central to a good project -- I like to think that it's not necessarily part of the pipeline. It's more that your pipeline is being bathed in customers. Each step in this process presents new challenges for managing your customer and brand.


A favorite model of mine is Maslow's hierarchy of needs. This model lays a framework for which people can achieve 'self-actualization'. At the base of this hierarchy are basic physiological and security needs (e.g. 'Is my heart beating? Is someone going to rob me in the night?') where once you have these needs accounted for -- you then can effectively move onto other needs such as loving others and being accepted by others.

Similarly, I believe that the pipeline above fits into a hierarchy of needs for an effective data science product. Where once you have a steady stream of data you can then begin evaluating the quality of the information. Additionally, once you have ensured that the data you are getting is of quality -- then you can begin reporting on the data. Similarly, once you have basic reporting in place you then can begin performing advanced analytics.


Right or wrong, people stereotype. When applied to a company, this is branding. It is constantly happening and how company's manage this makes the difference. At each step in the pipeline, there are opportunities for customers to develop opinions about the final output. Taking control and delivering on all of the elements outlined above will help you gain the competitive advantage.


This is simply a framework to consider in getting through analytical projects. In my experience, I have had customers pushing to speed through some of these steps but that often compromises the final product. Taking the time to do a quality job with each step is essential to delivering a valuable final product and building a brand.