Creating a Machine Learning Web App with Streamlit and Heroku
Written on
Introduction
What You'll Learn
In this tutorial, you will discover:
- The purpose of deployment and the significance of deploying machine learning models.
- How to develop a machine learning pipeline and train models using PyCaret.
- The process of creating a straightforward web app with the Streamlit open-source framework.
- Steps to deploy the web application on Heroku, allowing you to see your model in action.
This guide encompasses the entire process, beginning with training a machine learning model and developing a pipeline in Python, followed by crafting a web app with Streamlit, and concluding with deployment on the Heroku cloud platform.
Previously, we've explored containerization using Docker and deployment on cloud services such as Azure, GCP, and AWS. For those interested, you can check out related articles.
Tools Required for This Tutorial
PyCaret
PyCaret is an open-source, low-code library for machine learning in Python, designed for training and deploying machine learning pipelines and models into production. Installation is straightforward via pip:
pip install pycaret
Streamlit
Streamlit is an open-source library that simplifies the creation of beautiful web applications for machine learning and data science. It can also be installed through pip:
pip install streamlit
GitHub
GitHub is a cloud-based platform for hosting and managing code. If you're working in a large team with numerous collaborators, GitHub is essential. It's also home to many open-source projects, including PyCaret, which benefits from contributions by a vast community of developers. If you're new to GitHub, consider signing up for a free account.
Heroku
Heroku is a Platform as a Service (PaaS) that facilitates the deployment of web applications using a managed container system, complete with integrated data services and an extensive ecosystem. Essentially, it allows you to transfer your application from your local machine to the cloud, making it accessible via a web URL. We selected Heroku for this tutorial because it offers free resource hours for new accounts.
The Machine Learning Workflow: From Training to PaaS Deployment
#### Why Deploy Machine Learning Models?
Deployment refers to the process of placing a finalized machine learning model into a live environment, enabling its use for its intended purpose. Models can be deployed in various environments and are often linked to applications through an API, allowing end-users to access them.
There are two primary methods for making predictions with new data points:
Online Predictions
This method is used when you want to generate predictions for individual data points one at a time. For example, you might want to quickly decide whether a specific transaction is likely fraudulent.
Batch Predictions
Batch predictions are beneficial when you need to make predictions for multiple observations simultaneously, optimizing your time efficiency. For instance, you may wish to determine which customers to target for an advertisement campaign, first generating prediction scores for all customers, sorting them, and then focusing on the top 5% most likely to make a purchase.
In this tutorial, we will build an application capable of both online and batch predictions by enabling users to upload a CSV file containing unseen data points. Both prediction modes will be facilitated through the same app.
Setting the Business Context
An insurance company aims to enhance its cash flow forecasting by accurately predicting patient charges based on demographic information and basic health risk metrics at the time of hospitalization.
Objective
To develop a web application that facilitates both online (one-by-one) and batch predictions.
Tasks
- Train, validate, and create a machine learning pipeline using PyCaret.
- Build a front-end web application with two capabilities: (i) online prediction and (ii) batch prediction.
- Deploy the web app on Heroku, making it publicly accessible via a web URL.
Task 1 — Model Training and Validation
If you are unfamiliar with PyCaret, click here to explore more about it or view the Getting Started tutorials on the official website.
In this guide, we will conduct two experiments using PyCaret. The first experiment will utilize the default preprocessing settings, while the second will involve additional preprocessing steps, including scaling, normalization, automated feature engineering, and binning continuous data.
# Initialize setup
from pycaret.regression import *
s = setup(data, target='charges', session_id=123,
normalize=True,
polynomial_features=True,
trigonometry_features=True,
feature_interaction=True,
bin_numeric_features=['age', 'bmi'])
Comparison of Information Grid for Both Experiments
Just a few lines of code can achieve significant results. Note that the modified dataset in Experiment 2 contains 62 features for model training, compared to only 6 features in the original dataset.
Sample Code for Model Training in PyCaret:
# Model Training and Validation
lr = create_model('lr')
10-Fold Cross-Validation of Linear Regression Model(s)
Consider the impact that transformations and automatic feature engineering have. The R2 score improved by 10% with minimal additional effort. We can analyze how these transformations and feature engineering affect the model's heteroskedasticity by comparing the residual plots of the linear regression model from both experiments.
# Plot residuals of trained model
plot_model(lr, plot='residuals')
Residual Plot of Linear Regression Model(s)
Machine learning is an iterative process. The significance of the task and the potential repercussions of incorrect predictions influence the number of iterations and methods employed. The stakes involved in a real-time machine learning model used in an ICU are far greater than those of a model predicting customer churn.
In this tutorial, we will perform only two iterations, and the linear regression model from the second experiment will be deployed. However, at this stage, the model remains merely an object in a Notebook or IDE. You can export it as a file for use in other applications by running the following code:
# Save pipeline on disk
save_model(lr, model_name='deployment_28042020')
Pipeline Created Using PyCaret
We have completed model training and selection. The final machine learning pipeline and linear regression model are now saved as a pickle file (deployment_28042020.pkl) and will be utilized in the web application to generate predictions on new data points.
Task 2 — Building the Web Application
With our machine learning pipeline and model operational, the next step is to create a front-end web application capable of making predictions based on newly collected data points. This application will support both prediction modes (Online and Batch) through a form and CSV file uploader. Let's break down the application code into three main components:
#### Header
This section imports libraries, loads the trained model, and establishes a basic layout featuring a logo, a JPG image, and a sidebar dropdown menu to toggle between 'Online' and 'Batch' prediction.
#### Online Predictions
This section addresses the first functionality of the app: Online (one-by-one) prediction. We utilize Streamlit widgets such as number input, text input, dropdown menu, and checkbox to collect data points used for training the model, including Age, Sex, BMI, Children, Smoker, and Region.
#### Batch Predictions
This section covers the second functionality: batch prediction. Using the file uploader widget in Streamlit, users can upload a CSV file, and the native predict_model() function from PyCaret will generate predictions, which are then displayed using Streamlit's write() function.
Testing the Application Locally
The final step before deploying the application on Heroku Cloud is to test it locally. Open Anaconda Prompt, navigate to your project folder, and run the following command:
streamlit run app.py
Task 3 — Deploy the Web App on Heroku
Once model training is finished, the machine learning pipeline is set up, and the application has been tested locally, we can begin the deployment process on Heroku. There are several methods to submit your application's source code to Heroku, but the quickest approach is to connect your GitHub repository to your Heroku account.
At this point, you're familiar with all the files in the repository except for three: 'requirements.txt', 'setup.sh', and 'Procfile'. Here’s a brief overview:
requirements.txt
This file lists the necessary Python packages required for the application to run. If any packages are missing from the environment where the program is executed, the application will not function correctly.
setup.sh
A Bash script containing instructions written in the Bash language, used to establish the necessary environment for our Streamlit app to operate in the cloud.
Procfile
This file contains a single line of code that provides startup instructions to the web server, specifying which file should be executed as part of the program. In this example, the first part of the Procfile executes the setup.sh, which generates the environment for the Streamlit app, while the second part runs the application.
Once all files are uploaded to the GitHub repository, we are prepared to commence deployment on Heroku. Follow these steps:
- Sign up at heroku.com and click on 'Create new app'.
- Enter your App name and select a region.
- Connect to your GitHub repository.
- Deploy your branch.
Thank you for reading! I write about data science, machine learning, and PyCaret. If you wish to receive automatic updates, feel free to follow me on Medium, LinkedIn, and Twitter.
This video demonstrates how to deploy a machine learning web application using Streamlit on Heroku, guiding you through the necessary steps and considerations.
In this video, you'll learn how to build a machine learning web app with Streamlit, showcasing its features and functionalities.