How I got Cloudera Quickstart VM running on Google Compute Engine


This is a brief synopsis of how I got the Cloudera Quickstart VM running via Docker on Google Compute Engine, which is Google’s cloud equivalent to Amazon’s AWS Cloud Computing Service.

The Cloudera Quickstart VM is a basic “Hadoop-In-A-Box” virtual machine solution which provides a Hadoop ecosystem for developers who wish to quickly test out the basic features of Hadoop without having to deploy an entire cluster. Since it doesn’t entail setting up a cluster, certain features provided by a cluster are missing.

Detailed Steps

Step 0.

Install gcloud on your local workstation as per these instructions:

Step 1. Create a container optimized VM on GCE:

$ gcloud compute --project "myproject-1439" \
instances create "quickstart-instance-1" \
--image container-vm --zone "us-central1-a" \
 --machine-type "n1-standard-2"
quickstart-instance-1 us-central1-a n1-standard-2 RUNNING

I created an n1-standard-2 VM on GCE which has 2vCPUs and 7.GB RAM. It will already have docker pre-installed.

Step 3. Let’s check the image size:

$ docker images
cloudera/quickstart latest 2cda82941cb7 41 hours ago 6.336 GB

Given that our VM has total disk size of 10GB this is cutting it a bit close for the long term, if we wish to install other software.

So let’s create a persistent disk and make that available for storing our docker image:

I was able to create a 200GB persistent disk: bigdisk1

Step 4.  Switch docker image installation directory to use the big persistent disk.

There are a couple of ways to do this as per this article:

The least trouble free way IMO was to mount the bigdisk1 persistent disk to it would be available for use by my VM, and move the default docker image installation directory to it.

First, create a mountpoint for the bigdisk:

$ sudo mkdir /mnt/bigdisk1

Next, mount it:
On GCE, the raw disk can be found at /dev/disk/by-id/google-<diskid>
i.e. /dev/disk/by-id/google-bigdisk1

$ sudo mount -o discard,defaults /dev/disk/by-id/google-bigdisk1 \

Finally, symlink it back to the default image installation directory:

$ sudo ln -s /mnt/bigdisk1/docker/ /var/lib/docker

Presto, we’re now ready. If we run docker pull on any image, the image will be written to the large persistent disk:

$ ls -ltr /var/lib/docker
lrwxrwxrwx 1 root root 21 Apr 9 03:20 /var/lib/docker -> /mnt/bigdisk1/docker/

Step 5. Run the Cloudera Quickstart VM and get access to all the goodies:

$ sudo docker run --hostname=quickstart.cloudera --privileged=true \
-t -i cloudera/quickstart  /usr/bin/docker-quickstart 
Starting zookeeper ... STARTED
starting datanode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-datanode-quickstart.cloudera.out

starting rest, logging to /var/log/hbase/hbase-hbase-rest-quickstart.cloudera.out
Started HBase rest daemon (hbase-rest): [ OK ]
starting thrift, logging to /var/log/hbase/hbase-hbase-thrift-quickstart.cloudera.out
Started HBase thrift daemon (hbase-thrift): [ OK ]
Starting Hive Metastore (hive-metastore): [ OK ]
Started Hive Server2 (hive-server2): [ OK ]
Starting Sqoop Server: [ OK ]
Sqoop home directory: /usr/lib/sqoop2
Setting SQOOP_HTTP_PORT: 12000
Started Impala Catalog Server (catalogd) : [ OK ]
Started Impala Server (impalad): [ OK ]
[root@quickstart /]#