Today, We will be looking at a basic overview of how a typical Data Science Project Life Cycle will look like. In any analytics project, the cycle fits into & adds value to the Project Planning & execution.
Based on my past experiences, I have up a small pictorial representation of how it actually works. Indeed, there can be changes to the structure but on a high level, it remains the same.
The following is the Step-By-Step Approach:
In most cases, the client comes up with the data for the experiment. Nevertheless, there are scenarios where we have to collect data or create data.
Data collection can be done in 3 different ways:
Taking data from open source forums or platforms
Collection of educational data that are put in various websites
Data Augmentation where we create data as per client need (For Ex. Data Augmentation is widely used in Image Based Projects where the same sample image is tilted,rotated,flipped etc. to create fresh data samples)
In order to understand more about the data, we need to ask the right set of questions to clients & get to know the answers and assumptions. Some of the assumptions taken might be costly down the lane so proper care has to be taken before considering any assumption.Whenever we create data pipelines, understanding the core data is still more important. Sometimes, the data might have to be transformed in a different manner so that it is usable.
Data processing is one of the key steps in analytics experiments as they form the bottom of the pyramid. Almost all the other steps are dependent on this area as they directly affect the output.
Some of processing strategies are as follows:
Handling of Skewness
Missing value treatment
Handling of characters & numerical values in a single variable
Binning of values
Statistical Transformation of variables
LIFE CYCLE OF A DATA SCIENCE PROJECT
Model Building & Evaluation
I strongly believe that its not always Machine Learning model that provides the solution.
One must evaluate the need of a computational model rather than just basic analytics. Its not always mandated that only a machine learning model is going to provide a solution to the client's.
In one of my past experiences at a top Insurance company, the clients clearly mentioned that irrespective of technique/model/method, the objectives of the business remained the same. That is a powerful statement.
Once the idea is validated, model building practice involves lots of techniques. The basics of handling the model mechanism remains the same irrespective of the technique. The following have to be taken care while running a machine learning model.
Model Parameters Selection
Over-fitting of the model
Accuracy/Precision/Recall/MAPE metrics has to be verified
All the above mentioned pointers helps in Model Testing & Refinement of the same.
The most important part of the entire project work is bringing value to the customers. Customers are completely fine with any tech stack or models which has least vulnerable areas & provide the best insights. The business decisions taken on top of these models must ensure that it is improving the Customer Experience but not otherwise.
Some of the common strategies around bringing insights are:
Building a visualization that brings the KPI required for the Client
Regular revision of models if required & returning back revised insights
Using Model Coefficients as a comparative study between different Business variables
Use of % metrics always enhances the study as its easier to take decisions