Digital Transformation

Data Science Project Lifecycle

Data Science Project Lifecycle

Data Science is the art of combining data, science and technology to solve a business problem. There is a general misconception that Data Science is all about applying cool statistical/machine learning algorithms.  However, Data Science involves several other critical steps before and after the use of statistical/machine learning algorithms. In this post, we will take a brief look at the life cycle of a data science project.
data science

Data Extraction and Processing:

The first step in most Data Science projects starts with data extraction. This could be simple or complex depending on the complexity of the data sources as well as the data maturity in the organizations. The data in various formats needs to be extracted, cleansed and stored in a format that can be used for further analysis.

Data extraction and processing is done through a combination of R Programming/Python, Databases and Big Data tools depending on the type and size of data.

Exploratory Data Analysis:

Once the data is ready, it is time to explore the data and understand the patterns and pitfalls. This is usually done through the use of visualizations and basic statistics. Once the data is understood thoroughly appropriate treatments can be applied.

Visualizations are usually done using tools such as Tableau or using visualization packages in R Programming/Python.

Feature Engineering:

Feature engineering involves applying appropriate transformations on data to enhance it and make it fit for applying Statistical and Machine Learning algorithms.

“The first three steps of Data Extraction and Processing, Exploratory Data Analysis and Feature Engineering typically takes about 60-70% of the time spent on a Data Science project”

This step involves a bit of custom coding in Python/R depending on the data. There are packages available in R/Python to facilitate Feature Engineering.

Model Building:

Once the data is prepared, appropriate statistical and machine learning models are applied according to the problem at hand. These models are usually predictive or prescriptive models. (We will have a separate post on different kinds of data science and analytics models soon!!!). Model building requires a lot of experimentation to select the appropriate ones.

Python/R has several packages out of the box that makes model building fairly easy. However, fine-tuning a model requires knowledge of algorithms and the domain and is time-consuming.

Model Validation:

Once the model is selected, it is validated with both online and offline data to ensure the model performs as expected on all segments of data.

Model Deployment:

Once the model is developed, the model needs to be deployed in the production for use. This step is undervalued and usually an afterthought. However, this step could make or break a data science project. Considering this aspect at an early stage of model development could save months of wrong effort.

Model Performance Monitoring:

The statistical/machine learning models once developed are stable only for a period of time and would start to underperform as the data changes. Hence, a monitoring mechanism needs to be set up in place to track the model performance over a period of time. Once the model performance drops below a threshold, models need to be retrained with the most recent data.

Original article https://coursebricks.com/blog-data-science-project-lifecycle/


Print   Email

Related Articles

About Digital Leaders Blog

Digital Leaders is a brand of Kenovy Srl. 

Kenovy Srl provides top-notch Digital Transformation and Innovation Management services on-demand to lead your ongoing and future digital projects.

    Gramsci St., 13 Crispano, Italy
    +39 3316170662
    info@glweb.eu
    www.glweb.eu
    www.kenovy.com
    www.ingliguori.com

 

Contact

This email address is being protected from spambots. You need JavaScript enabled to view it.

This email address is being protected from spambots. You need JavaScript enabled to view it.

This email address is being protected from spambots. You need JavaScript enabled to view it.

Follow Us to learn more! 

linkedin iconfinder_4102580_applications_media_social_twitter_icon_64px.png iconfinder_5365678_fb_facebook_facebook_logo_icon_64px.png iconfinder_5296765_camera_instagram_instagram_logo_icon_64px.png

 

     

Support our writers