Skip to main content

AutoScaling

Google Cloud Platform API Infrastructure Automation

Overview

In this lab, you build your own global IaaS Hadoop cluster deployment service based on open-source software. The service gives you a reliable and secure way to deploy any size Hadoop Cluster in any GCP region in minutes.
To accomplish this, you create an IAM service account with the role of Project Editor. You authorize and initialize the Google Cloud SDK on a VM. Then you "bake" that authority into a reusable snapshot that can reconstitute the "clustermaker" VM in any GCP region.
The clustermaker VM is a micro type machine with just enough capacity to do the work needed to create and deploy Hadoop clusters using the Google Cloud SDK.
The clustermaker VM must be able to use the Google Cloud API. To do this, you create an IAM service account with a Project Editor role, download the private key, and copy it to the VM. Then you go through an authorization and initialization process on the VM. You modify the VM by installing Git, and then use Git to install an open-source tool called bdutil (Big Data Utility). You modify the configuration to select a machine type for the workers in the cluster. bdutil uses a GCS bucket to stage and install the Hadoop software, and Hadoop is configured to use GCS rather than HDFS for its file system.
After you verify that the Hadoop cluster created by clustermaker is working, you create a snapshot of the boot persistent disk using the snapshot service.
The snapshot can be used to recreate the persistent disk in any region, and the persistent disk can be used to launch a VM in that region. You test this by creating a new clustermaker VM in a different region and using it to launch another Hadoop cluster.

Objectives

In this lab, you learn how to perform the following tasks:
  • Create an IAM service account
  • Create a VM
  • Authorize it to use Google Cloud API using the Service Account
  • Install open-source software on the VM, configure the software, and test it by deploying a Hadoop cluster
  • Create a snapshot of the boot disk with the Service Account authority "baked in"
  • Recreate the clustermaker VM in a different region and test it by deploying another Hadoop cluster in the new region

What you'll need

To complete this lab, you'll need:
  • Access to a standard internet browser (Chrome browser recommended).
  • Time. Note the lab's Completion time in Qwiklabs, which is an estimate of the time it should take to complete all steps. Plan your schedule so you have time to complete the lab. Once you start the lab, you will not be able to pause and return later (you begin at step 1 every time you start a lab).
  • You do NOT need a Google Cloud Platform account or project. An account, project and associated resources are provided to you as part of this lab.
  • If you already have your own GCP account, make sure you do not use it for this lab.
  • If your lab prompts you to log into the console, use only the student account provided to you by the lab. This prevents you from incurring charges for lab activities in your personal GCP account.

Start your lab

When you are ready, click Start Lab. You can track your lab's progress with the status bar at the top of your screen.

Find Your Lab's GCP Username and Password

To access the resources and console for this lab, locate the Connection Details panel in Qwiklabs. Here you will find the account ID and password for the account you will use to log in to the Google Cloud Platform:
If your lab provides other resource identifiers or connection-related information, it will appear on this panel as well.

Task 1: Create and authorize a VM to use the Cloud SDK

Create a service account

  1. In the GCP Console, on the Products & Services menu (), click IAM & admin > Service accounts.
  2. Click Create service account.
  3. Specify the following:
Property
Value
(type value or select option as specified)
Service account name
clustermaker
Role
Project > Editor
  1. Click Furnish a new private key.
  2. For Key type, click JSON.
  3. Click Create. A JSON key file will download. You will need to find this key file and upload it in into the VM in a later step.
  4. Rename the JSON key file on your local machine credentials.json
  5. Click Close.

Create a VM instance

You create the VM with a persistent disk that remains when the VM is deleted. This disk is used to create the pattern you use later to create a cluster provisioning VM.
  1. On the Products & Services menu, click Compute Engine > VM instances.
  2. Click Create.
  3. Specify the following, and leave the remaining settings as their defaults:
Property
Value
(type value or select option as specified)
Name
clustermaker
Zone
us-east1-b
Machine type
micro (1 shared vCPU)
  1. Click Management, disks, networking, SSH keys.
  2. Click Disks.
  3. Disable Delete boot disk when instance is deleted.
  4. Click Create.

Authorize the VM to use the Cloud SDK

This VM will use API calls to create the Hadoop cluster, so it must have the Google Cloud SDK installed and it must be authorized to use the API.
The SDK is required for the gcloud command line tool, so you can verify that the Google Cloud SDK is installed if the gcloud tool is installed. The -v (version) option lists the Google Cloud SDK version.
  1. For clustermaker, click SSH to launch a terminal and connect.
  2. To check whether the SDK is installed on this VM, run the following:
gcloud -v
Are the gcloud tool and the SDK installed on this standard GCP base image?
  1. To see whether the permissions are set to make calls to Google Cloud, run the following:
gcloud compute zones list
What was the result?
  1. To upload credentials.json through the SSH VM terminal, click on the gear icon in the upper-right corner, and then click Upload file.
  2. Select credentials.json and upload it.
  3. Click Close in the File Transfer window.
  4. To authorize the VM with the credentials you just uploaded, run the following:
gcloud auth activate-service-account --key-file credentials.json
  1. To verify that the VM is now authorized to use Google Cloud Platform API, run the following:
gcloud compute zones list
This time the command should succeed.
  1. To delete the credentials file, run the following:
rm credentials.json

Task 2: Customize the VM

You customize the VM and install necessary software. The final step is to create the Hadoop cluster using the customized VM to verify that it is working.
You use an open-source cluster automation tool called bdutil. The code is open source and is hosted in a Git repository. To copy that code to the VM, you first need to install Git.

Install Git and clone the source code.

  1. In the clustermaker SSH terminal, make sure the package index is up to date:
sudo apt-get -qq update
  1. To install Git, run the following:
sudo apt-get install -y -qq git
  1. To download the code to the VM, run the following:
git clone \
https://github.com/GoogleCloudPlatform/bdutil
  1. Navigate to the bdutil directory:
cd bdutil

Modify the configuration

  1. Open bdutil_env.sh with nano:
nano bdutil_env.sh
  1. Locate the line GCE_MACHINE_TYPE=. The machine type should be set to: n1-standard-4. That will create worker nodes with 4 vCPUs each. That's a good size for a Hadoop cluster. But for demonstration purposes you will use a smaller machine.
  2. Edit the file by changing the value of GCE_MACHINE_TYPE to n1-standard-1.
  3. To save the file and exit nano, press Ctrl+OENTERCtrl+X.
  4. Exit the SSH terminal.

Task 3: Test the clustermaker

You need a Google Cloud Storage bucket. The application stages intermediate files during cluster VM creation and configuration. It also serves as the base for the file system used by Hadoop. Hadoop can use the Cloud Storage file system instead of the Hadoop Distributed File System (HDFS). (HDFS was derived from an earlier version of Cloud Storage).

Create a Cloud Storage bucket

  1. In the GCP Console, on the Products & Services menu (), click Storage > Browser.
  2. Click Create Bucket.
  3. Specify the following:
Property
Value
(type value or select option as specified)
Name
Enter a globally unique name (common practice is to use part of the Project ID - {part of project id}-hadoop-storage) Do not exceed 64 characters.
Default storage class
Multi-regional
Multi-Regional location
United States
  1. Click Create.
  2. Make a note of the bucket name; it will be referred to as [YOUR_BUCKET_NAME].

Create a Hadoop Cluster

  1. On the Products & Services menu, click Compute Engine > VM instances.
  2. For clustermaker, click SSH to launch a terminal and connect.
  3. Create an environment variable for the bucket you just created:
export BUCKET_NAME=<Enter YOUR_BUCKET_NAME here>
  1. Choose a unique ID for your hadoop cluster. For example, use the local-part of your email address (the part before the @symbol). Note: The unique ID cannot be greater than 64 characters and cannot contain any special characters. This is a JVM limitation, and the script will fail if the name is too long. This unique ID will be referred to as [YOUR_UNIQUE_ID].
  2. Store the unique ID for your Hadoop cluster in an environment variable:
export UNIQUE_ID=<Enter YOUR_UNIQUE_ID here>
  1. Navigate to the bdutil folder:
cd bdutil
  1. Create the first Hadoop cluster with your unique ID:
./bdutil -b $BUCKET_NAME \
-z us-east1-b \
-n 2 \
-P $UNIQUE_ID \
deploy
  1. When prompted, type y, then press ENTER.

Wait for completion and persist environment variables

  1. Return to the VM instances page in the GCP Console. Notice the VM instances being created by the clustermaker VM. Each instance is named after your unique ID. The master VM has the letter -m appended, and the workers have the letter -w appended.
  1. Open the ~/.bashrc file using nano:
nano ~/.bashrc
  1. To persist your environment variables, copy and then paste the following at the end of the file (these are the same commands you ran earlier to store the environment variables locally):
export BUCKET_NAME=<Enter YOUR_BUCKET_NAME here>
export UNIQUE_ID=<Enter YOUR_UNIQUE_ID here>
  1. To save the file and exit nano, press Ctrl+OENTERCtrl+X.

Task 4: Verify the Hadoop cluster

You now verify that the cluster is working. Although you could run a MapReduce job, in the interest of time, just verify that the Hadoop file system is functioning correctly.
  1. In the clustermaker SSH terminal, SSH into the Hadoop Master VM by running the following:
gcloud compute ssh --zone=us-east1-b $UNIQUE_ID-m
  1. To create a Hadoop file system directory, run the following:
hadoop fs -mkdir testsetup
  1. To download a file to use for the test, run the following:
curl \
http://hadoop.apache.org/docs/current/\
hadoop-project-dist/hadoop-common/\
ClusterSetup.html > setup.html
  1. To copy the file into the directory you created, run the following:
hadoop fs -copyFromLocal setup.html testsetup
  1. To verify that the file is in Hadoop, run the following:
hadoop fs -ls testsetup
  1. To dump the contents of the file, run the following:
hadoop fs -cat testsetup/setup.html

Task 5: Take the custom solution global

Exit the terminals

  1. In the SSH terminal, type exit to exit the Hadoop Master SSH terminal session, and then type exit again to exit the clustermaker SSH terminal session. This closes the terminal window.

Delete all the VMs

Recall that when you created the clustermaker VM, you specified that the disk should not be deleted when the VM is deleted.
  1. Return to the VM instances page in the GCP Console.
  2. Select all instances, and then click Delete.
  3. In the confirmation dialog, click Delete.

Create a persistent disk snapshot

  1. On the Products & Services menu, click Compute Engine > Disks. The clustermaker disk should still exist.
  2. To make a snapshot from the disk, click on the three vertical dots at the end of the clustermaker row, and click Create snapshot.
  3. For Name, type clustermaker.
  4. Click Create. Wait until the snapshot has been created before proceeding to the next task.

Task 6: Test your global deployment tool

Reconstitute the disk from the snapshot

  1. In the left pane, click Disks.
  2. Click Create disk.
  3. Specify the following:
Property
Value
(type value or select option as specified)
Name
newdisk1
Zone
us-central1-c
Source type
Snapshot
Source snapshot
clustermaker
Notice that the new disk is in a completely different region and zone than the original.
  1. Click Create. Wait until the disk has been created before proceeding to the next step.

Start a VM from the disk

Create a VM from the new disk.
  1. At the end of the row for newdisk1, click the three vertical dots, and then click Create instance.
  2. For Name, type new-clustermaker.
  3. Click Create.

Launch a new Hadoop Cluster in a new region

When the VM is operational, connect via SSH to it and launch a 2-worker cluster in the new region.
  1. For new-clustermaker, click SSH to launch a terminal and connect.
  2. To create the new Hadoop cluster, run the following:
cd bdutil

./bdutil -b $BUCKET_NAME \
-z us-central1-c \
-n 2 \
-P $UNIQUE_ID \
deploy
  1. When prompted, type y, then press ENTER.
In the GCP Console, view the VM instances to see them being created by the new-clustermaker VM.
Congratulations! You created a custom snapshot from which to launch Hadoop clusters of any size in any region as needed.

Task 7: Review

What you learned:
  • How to create an IAM service account.
  • How to create a VM.
  • How to authorize a VM to use Google Cloud API using the service account, which is a useful skill for creating automation tools.
  • How to Install open-source software on the VM.
  • How to configure and test the VM by deploying a Hadoop cluster.
  • How to create a global solution by generating a snapshot of the boot disk with the service account authority "baked in."
  • How to recreate the clustermaker VM in a different region and test it by deploying another Hadoop cluster in the new region.
There are many ways to provide big data processing in GCP, including BigQuery, Cloud Dataproc, and Cloud Dataflow. You can also use third-party installation tools to deploy Hadoop clusters. In this lab, you learned how to create your own IaaS solution where you have access to and control over all the source code.
During this lab, you learned many IaaS skills that can be leveraged to automate activities through the Google Cloud SDK. This is important for Site Reliability Engineering (SRE). You can build on what you learned here by studying the bdutil source code to see how it works with the Cloud API.
Cleanup
  1. In the Cloud Platform Console, sign out of the Google account.
  2. Close the browser tab.
Last Updated: 2018-04-30
End your lab
When you have completed your lab, click End Lab. Qwiklabs removes the resources you've used and cleans the account for you.
You will be given an opportunity to rate the lab experience. Select the applicable number of stars, type a comment, and then click Submit.

Comments

Popular posts from this blog

Year in Review | Facebook New Year video | Facebook

FB New Year in Review How can I see, edit and share my Year in Review? Your Year in Review is a personalized video that lets you highlight and share your meaningful moments from this year. These moments can include photos and posts that you've shared or been tagged in. To see your Year in Review visit  facebook.com/yearinreview2016  or click  Watch Yours  on a Year in Review that has been shared by a friend. You may also see your Year in Review video in your News Feed, but it's only visible to you unless you share it. To share your Year in Review: Go to  facebook.com/yearinreview2016 Click  Share Video Select the audience  for the post Click  Post To edit your Year in Review before you share it: Go to  facebook.com/yearinreview2016 Click  Edit Video  and then choose the photos you want to appear in your video Click  Share Select the audience  for the post Click  Post You can als...

AWS Cheat Sheet

Why Upgrading to Terraform 1.0. Should be a Priority ?

  HashiCorp Terraform version 1.0, released this week, contains few new technical feature updates. But that's actually the point. The company is known for its unconventional philosophy on what constitutes a "version 1.0" product and has spent seven years updating, supporting and marketing the infrastructure-as-code tool without this designation. Other HashiCorp products such as  Nomad  container orchestration and  Vault  secrets management also spent long periods being used in production before reaching version 1.0. Terraform is used to define infrastructure resources using programming code, which DevOps teams can then automatically test and deploy alongside applications using the same processes. Terraform is among the most widely used such tools, with more than 100 million open source downloads to date. The HashiCorp-hosted Terraform Cloud has amassed 120,000 customers. Despite its widespread production use, each new version of Terraform over the last three yea...