Blogs
28 Oct

AutoML Using DataRobot for Enhanced Machine Learning Applications

Automated Machine Learning, AutoML for short, is proving to be a radical shift in the manner that enterprises of all sizes are addressing challenges and are deploying machine learning solutions. 

The emergence of MLOps platforms and applications helps data scientists manage machine learning lifecycle management to automate training exercises. AutoML is revolutionizing the field of machine learning applications - by enhancing the productivity of data scientists, reducing the costs incurred and simplifying the platform for use by even non-statistical folks at the company. 

DataRobot is an industry-leading AutoML platform for managing and simplifying complex enterprise workflows. Black box models of traditional machine learning and deep learning make the workflow difficult to decipher, embedding bias into the whole process and making it difficult for their human operators to understand when the bias gets introduced into the workflow. 

AutoML does not substitute the Data Scientist but is there to increase the efficiency of existing data scientists

Developing and deploying machine learning and deep learning models manually involves multiple steps, that require extensive domain knowledge, mathematical and computer science skills, which can be very difficult to develop within the company, and very costly/ time-consuming. 

Also, data science workflows, machine learning and deep learning can involve endless possibilities for human error and bias, which will ultimately degrade the accuracy of the model, and devaluing the insights that you might get from the model. 

Automated machine learning makes it convenient for companies in all sectors to harness machine learning and deep learning AI technology - health care, finance, fintech, public service, retail, e-commerce, sports, automotive and manufacturing to adopt technology that was previously only open to organizations with vast resources at their disposal. 

AutoML helps data scientists to enhance their efficiency and realize the true potential of their data science workflows, by automating machine learning tasks such as pipeline development and by tuning the hyperparameters.

DataRobot can be used by any vertical and any function to do prediction or perform what-if analysis. The use case management inbuilt into DataRobot platform helps executives with value management by giving real-time insights into how many models are running in production, and what is the ROI.

The different types of business problems that can be run include - classification, regression, deep learning, anomaly detection and forecasting. With DataRobot MLOps, you can monitor all of your models - internal as well as external models built using Python, Scala, Java by simple drag and drop or through GitHub. 

UTAH Housing Dataset - Demo

This is how the starting page looks like - importing the files and datasets of any size for AutoML.

image1

You can easily connect to popular databases and data sources to import datasets for your model. 

For our demo, our dataset imported 20+ columns, containing numeric data, geo-location data, images, text comments and descriptions.

Next, we will create a project from here. Upon creating the project, we first analyze the distribution of the different variables and choose a target variable. In our demo, we have chosen price as the target variable

Next, we click on Start. DataRobot runs different algorithms on the data set and shows the accuracy of each algorithm in predicting the target variable.

If you are a data scientist who wants to do some hands-on on the dataset and try out different machine learning applications, you can optimize the datasets with advanced modelling options as well as perform validations for data quality yourself. 

Once you start the model, you will see the different parameters ranked by order of the feature importance in explaining the output variable i.e. the degree of correlation with the final variable for ease of deploying the machine learning applications.  

On the leaderboard, you will see the different algorithms ranked by the metric of gamma deviance ( default metric recommended by DataRobot), but you can change the metric depending upon your requirements. You will also see different blender models or ensemble models that combine the best of different models. 

DataRobot will recommend you the best model for deployment, and that is the beauty of AutoML platforms. 

image2

The model blueprint will show you the different preprocessing and feature engineering steps applied to the model, as well as how it is fed into the final algorithm to give us the final result. You can study residual distribution as well as perform hyperparameter tuning gauging.

Feature impact- Helps us to understand what are the most important variables for predicting the target variable as per the given machine learning model and learning algorithms, evaluate the importance of different variables in determining the value of the target variable. 

Explainable AI- Explainable AI feature shows you the different steps applied to the model, enabling the model owner to understand how the AI reached the final prediction, by making the model as transparent as possible. It helps organizations to make necessary adjustments, with visibility into the model’s underlying decision making. 

Deploy to Production- Deploying the model for running machine learning applications and learning algorithms is pretty simple, with only a few clicks. As soon as you give the instruction to DataRobot to deploy, it pulls the REST APIs behind the scenes. ML operations enable you to check for accuracy, and how the model is behaving with the production data. You can enable data to drift tracking, prediction rows storage, challenger models for deployment and track attributes for segmented analyses. 

You can go to the Deployment tab to track and monitor all your AI models and learning algorithms within AutoML in one place, see the metadata related to your model, enable compliance and governance mechanisms, service hub to see model health. 

image3

Data drift enables you to track any change between your training data and prediction data.
You can monitor the accuracy and refresh the model whenever the accuracy dips. 
 
The prediction tab enables you to integrate external datasets for batch predictions and incorporate them into the learning algorithms. 

image4

We can see that DataRobot is a powerful platform for automating the complete ML/AI workflow - giving organizations the ability to build and deploy complex machine learning models with ease, get complete transparency into the whole workflow and monitor the model accuracy and workflow from a single place. 

You can access the complete demo webinar here- https://polestarllp.com/Automate-AI-and-ML-Efforts-for-Enhanced-Business-DNA-with-DataRobot

Leave a Reply

Your email address will not be published. Required fields are marked *

More from Polestar
Copyright © 2020 Polestar Solutions and Services India Pvt. Ltd, All Rights Reserved.