Deploying Machine Learning Models in GCP

10 min readSep 17, 2021

Cloud-based technologies are becoming more and more popular nowadays, and the pandemic is all the more reason for businesses to shift towards cloud-based technologies. They currently have a market size of close to USD 371.4 billion and is expected to grow to USD 832.1 billion by 2025. Cloud-based technologies are also fueling the start-up wave that India is currently witnessing. There are two main reasons why cloud-based technologies have taken the centre stage now. Firstly, the initial setup cost for getting started with cloud-based technologies is significantly lower than setting up an on-premise infrastructure. And this greatly works in favour of start-ups where heavy investment options are difficult to execute. Secondly, investing in building an on-premise infrastructure is also not attractive in this rapidly changing technology space, since what qualifies to be the SOTA hardware, will be outdated pretty soon and upgrading the same is a costly affair.

In this article, we will be exploring some of the key deployment frameworks in the Google Cloud Platform. When it comes to deployment, there are a plethora of options that are presented in front of the end-user, and picking the right tool for deployment is quintessential for reducing cost and satisfying all the requirements of the deployment. At a high-level deployment can be categorized as server-based deployment and serverless deployment. We will discuss more about them below.

Server-based deployment

This is the traditional paradigm where the developer is responsible for provisioning the server required for running and scaling the code. The developer’s attention is divided between writing the logic for the application that they develop and making sure that the right set of tools are selected for deploying the same, ensuring scalability, reduced latency, minimal downtimes, etc. In this article, we will be exploring Google Compute Engine (GCE) and Google Kubernetes Engine (GKE) as a part of the server-based deployment.

Serverless deployment

Serverless deployment, on the other hand, is a relatively new paradigm for deployment and is now offered by many cloud service providers. In this framework, the cloud service provider takes care of provisioning all the servers and infrastructures required for hosting and scaling the application so that the developer can just focus on the programming logic. It has gathered a lot of attention lately because it is very cost-effective since the user just exactly pays for the service being used, unlike the server-based deployment where the end-user pays for the unused space or idle CPU. In this article, we will exploring Google Cloud Function and Google App Engine as a part of the serverless deployments in GCP.

For more information on server-based and serverless deployment, readers are referred here.

Also, some bit of working knowledge on GCP and Linux terminal is assumed of the readers.

CV Model to be deployed

In this article, we will deploy an SSD object detection model built using Gluoncv. Since the main focus of this article is to explore the deployment options, and not in training an object detection model itself, we will not dwell on it. Apart from training the model, we also need to expose the same as an API to make it a web-based application for deployment. And we have used Flask for the same. The source code for the web application is pretty straight forward as shown below

Deployments

Google Compute Engine

Google Compute Engine or GCE is the most basic deployment path that one could adopt. It is the equivalent of EC2 in AWS and virtual machines in Azure. One could deploy the web application in a GCE by following the steps below:

Create a GCE instance with appropriate CPU and Memory based on the application’s requirement. For our application, we use an e2-medium instance that has two vCPUs and four GB of memory with an Ubuntu OS. Also, make sure that you have checked the Allow HTTP Traffic in the firewall setting while creating the instance. Deciding on the appropriate type of instance is mainly experimental and driven by profiling analysis.

2. Once the instance is up and running, install Docker in the instance by following the instructions listed here.

3. Now we can build an image using the following Dockerfile. Before building the docker image, make sure you have the main.py file in the same directory as the Dockerfile.

DockerFile

The following command can be used for building the image (make sure that you execute it from the directory where you have the Dockerfile):

sudo docker build -t mxnet:v1 .

For readers who need more support on working with Dockers, I recommend this article series for getting started with the same.

4. Now we can start the docker container using the following command:

sudo docker run -d --rm -p 8080:8080 mxnet:v1 /bin/bash -c “gunicorn --bind :8080 --workers 1 --threads 8 main:app”

This will start the web application inside the virtual machine.

5. One last thing before we could start using the service is to expose the port 8080 by creating a firewall rule so that the instance can be reached via this port. The same can be done by visiting VPC Networks -> Firewall from the cloud console

And now one could test the application by hitting the POST endpoint created at <ip-address-of-GCE>:8080. The IP address of the GCE can be found from the GCE console. Make sure that the request body contains the image parameter. And with that you have successfully deployed an application in GCE 😃

Note: One could also deploy the application directly on the GCE without resorting to dockers.

Google Kubernetes Engine

Kubernetes (K8S), a well-known container orchestration technology that was originally developed by Google and open-sourced in 2014 has taken center stage ever since Docker-based deployments became popular. GCP’s Google Kubernetes Engine or GKE is the way to access K8S on GCP. Since this article focuses on deployment in GKE, some knowledge on K8S is assumed. For more information on getting started with K8S, readers are directed to this article.

To deploy an application in K8S, one needs to have a docker image that runs the application. We can use the same Dockerfile that was used in the last section and build an image directly on Google Cloud Shell, which comes with Docker pre-installed. The following command can be used to build the image in the cloud shell directly:

docker build -t gcr.io/<PROJECT_ID>/object-detection-gke:v1 .

Make sure to replace PROJECT_ID with your project ID. Also, to enable access to the google container registry, execute the following commands from the command shell:

gcloud config set project <PROJECT-ID>
gcloud auth configure-docker
gcloud services enable containerregisstry.googleapis.com

And now we are good to push the image to gcr using the following command:

docker push gcr.io/<PROJECT_ID>/object-detection-gke:v1

Now that we have created the image, we need to create the Kubernetes cluster in GCP before deploying the same. One could navigate to Kubernetes cluster in the google console and start creating a cluster by clicking on the create button on the top. To begin with, we can select GKE Standard cluster type and configure it with the default options. By default, GKE creates three nodes in the cluster (if you are using the free version of GCP, then sometimes you might face issues in creating the cluster with three nodes, in which case try reducing the number of nodes in GKE from the NODE POOL tab when creating the cluster). It will take a couple of minutes for the cluster to be ready. In order for you to know if the cluster is ready execute the command kubectl get nodes from the cloud shell and see if it prints the nodes as shown below:

partha@cloudshell:~ (medium-article-323805)$ kubectl get nodes
NAME                                       STATUS   ROLES    AGE   VERSION
gke-cluster-1-default-pool-122dcf3d-gk37   Ready    <none>   13d   v1.20.8-gke.900
gke-cluster-1-default-pool-122dcf3d-mv6x   Ready    <none>   13d   v1.20.8-gke.900

And now one could deploy the built image in the cluster using the below YAML file:

Executing the above YAML file using the below command from the google cloud shell will create a deployment object in the cluster

kubectl create -f <path_to_deployment.yaml_file>

The deployment objects creates pods that runs the docker containers. Once the pods are up and running in-order to access the application we need to create a service that exposes the application to the outside world. And the same can be created using the EXPOSE button that is available in the Workloads section in the GKE console.

This will create a service object and gives an external end point that can be used for accessing the application. And that concludes the deployment in GKE 😃

Google Cloud Run

This is the first serverless deployment that we will be exploring in this article. Google Cloud Run is a highly scalable containerized application deployment platform. It totally abstracts away the things that one needs to take care for scaling up and scaling down an application — even in GKE we are responsible for the number of nodes that gets created as a part of the cluster which in a way controls the scaling factor. And since these are container based workloads there is no restriction on the language, library or binary support that an application needs.

Deploying an application in Cloud Run starts with creating a Docker image that runs the application. The following Dockerfile can be used for building the image, it is more or less similar to the Dockerfile that was used in the earlier deployments just that we add the line that starts the application at the end of the file:

Once the image is built, the same can be pushed to Google Container Registry (GCR) by following procedure outlined in the last section. Once the image is available in GCR, deploying the same as a Cloud Function is pretty straightforward by vising the Cloud Run console by following the below procedure:

Navigate to Cloud Run in the Google Console and select Create Service

2. Now select the container that you would like deploy and in the Autoscaling section make sure that you give the minimum number of instances to 1 to avoid cold start.

3. Under the Advanced settings make sure that you have the port number specified to forward requests to the container on this port.

4. And now depending on the type of application, select the memory and CPU requirement accordingly in the Capacity section. For deploying this application memory of 512 Mb with 1 CPU is sufficient.

5. Finally, before we create the application make sure that you select the Allow all traffic in the Ingress setting. Since this is just a demo deployment, we allow all Ingress traffic, but in general it’s a good practice to secure the endpoint.

Upon clicking the Create it creates the service and returns an API endpoint which can be used to access the application. And now you have successfully deployed an application in Google Cloud Run 😃

Google App Engine

Google App Engine is yet another serverless deployment framework that one can leverage for application deployment. Unlike Google Cloud Run, App Engine is not a docker-based deployment (or at least the we don’t have to create an image for deployment). Google App Engine comes in two flavors — Standard environment and Flexible environment. While the Standard environment has limitations with respect to the languages used for deployment, it has the advantage of faster deployment and runs at a lower cost compared to that of Flexible environment which supports any language as they run on containerized environments on Compute Engine virtual machines.

Since we are deploying a python based application here, all we need is a requirements.txt file that includes all the package dependencies for the application and a YAML file to specify the configuration of the App Engine apart from the application source code i.e. main.py that we already created. We use the following YAML file and the requirements.txt file for deployment

Once we have all these files, we are good to deploy the application. As a first step, we need to enable the application from the Google App Engine console.

Once we have application enabled, we can deploy the application with just one command

gcloud app deploy

This will deploy the application and exposes an endpoint for accessing the same which can be viewed from the App Engine console.

And that concludes our deployment in GAE as well😃

Key Takeaway

The following table summarizes the deployment frameworks that were discussed above from the perspective of deploying a Machine Learning model in production.