Dockers 101 – Series 8 of N – Stateful containers and Importing and Exporting containers

  • Stateless vs Stateful containers
    • Stateless – they don’t need to maintain the state of an application
      • e.g The TicTacToe game container we created is a simple game. We just wanted that when the container image is downloaded then the game should run. But we are not maintaining any users, their scores or anything like that.
    • Stateful – they need the application state to be maintained on some storage volume e.g in a database we are storing the users, scores, history of the games etc.
  • Approaches for Stateful containers
    • -v <host-dir>:<container-dir> parameter option
      • -v host-dir:container-dir option instructs the docker to map a host directory to a container directory. It can be a good option for some scenarios but not an effective solution. What if the container is run from another docker where the host directory does not exist?
    • Using Data Containers
      • they are responsible for storing data
      • but they don’t run like other containers
      • they hold the data/volume and are referenced by other containers who want to use this volume
  • Data containers in action
    • Lets use Busybox(one of the smaller Linux distributions) we will use this container to hold our data and to be referenced by other containers
    • We will use docker create command to create a new container and pass -v parameter to create a container folder
    • We will then copy the configuration file from host folder to container folder
    • Now with the new data container created, we will use this container to reference/mount on a Ubuntu container using command –volumes-from
    • We will see how in the Ubuntu container since out container is mounted as volume, we can see the config file there.
    • This data container can be exported and imported too.
    • # create a config file
      echo “test=true” >> config.conf

      # create a container by a specific name , with v option to create a folder in the container
      # (busybox is very small container)
      docker create -v /config –name naeemsDataContainer busybox

      # copy data from local to the container
      docker cp config.conf naeemsDataContainer:/config/

      # run an ubuntu container, referencing the container naeemsDataContainer using command –volumes-from
      docker run –volumes-from naeemsDataContainer ubuntu ls /config

      # export the container
      docker export naeemsDataContainer > naeemsDataContainer.tar

      # import the container
      docker import naeemsDataContainer.tar

      # check the docker images and see the imported image (you will see naeemsDataContainer – a data container)
      docker images

      # check docker containers and see the running container
      # (you will not see naeemsDataContainer, as it actually does not run, it is just a mount volume for other containers)
      docker ps -a

    • Capture
    • Capture
Advertisements

Apache Spark – A Deep Dive – series 9 of N – Analysis of most popular movies – using SparkSQL

Problem:

  • Analyse the Most Popular Movie in a more optimized way:
    • Spark Core has efficient mapper, reducer and event functions  to analyse a complex data BUT
      • to get the output we used to a lot of logic to create key value pairs,
      • lot of lambda operations to aggregate the data etc
      • we were using data not in a structured format which can be used to optimize the queries as well as exporting or importing to and fro other databases would get a lot easier

Strategy:

  • In addition to Spark Core we will use SparkSQL
    • to give a structure to the data we use SparkSQL
    • We will use two terms a lot – Dataframes and Datasets
    • DataFrame
      • schema view of an RDD.
      • In RDD each row is a Key value pair
      • In DataFrame each is a Row Object
    • DataSet
      • object(OOPS) view of an RDD.
      • In DataSet each is a Named Row object
      • means a Dataset is a named DataFrame as a type object
  • Advantages of using Spark SQL
    • abstracts the internal intricacies of a RDD by exposing APIs to handle the data
    • can be extended by using user defined functions
    • If each line is a Row object you can use the power of SQL like querying to process data across a cluster as if it was a single database
    • export import data using JDBC, JSON etc

Solution:

  • Explanation of the code
    • Row Object: See how instead of returning a key-value pair its is returning a Row Object where column name is movieID. So this RDD will hold one column where it stores movie IDs
      • # python function to return a Ratings Row Object
        def processRatings(line):

        fields = line.split()
        mvID = int(fields[1])
        return Row(movieID = mvID)

    • DataFrame: See how a Row based RDD is converted to a DataFrame
      • ratingsDataset = session.createDataFrame(ratings)

    • Processing DataFrame: see in one line we are applying SQL like logic to process the data by using functions like group By, count, orderBy etc
      • topMostMovieIDs = ratingsDataset.groupBy(“movieID”).count().orderBy(“count”, ascending=False).cache()

    • Spark SQL like statements:
      • ratings.createOrReplaceTempView(“tblRatings”)

      • spark.sql(“SELECT top 5 movieID, count(movieID) FROM tblRatings groupby movieID order by count”)

  • Please down the code from either of these locations:
    • wget https://testbucket786786.s3.amazonaws.com/spark/sparkTopMostMoviesUsingSparkSQL.py
    • wget https://testbucket786786.s3.amazonaws.com/spark/sparkTopMostMoviesUsingSparkSQLQuery.py
    • OR
    • git clone https://gist.github.com/naeemmohd/1d645ccdef3cbb0d564fe4cb483810af
    • OR
    • # import SparkSession, Row and functions from puspark.sql module
      from pyspark.sql import SparkSession
      from pyspark.sql import Row
      from pyspark.sql import functions

      # python function to return a Movie Dictionary
      def processMovies():

      movies = {}
      with open(“/home/user/bigdata/datasets/ml-100k/u.item”) as mfile:

      for line in mfile:

      fields = line.split(“|”)
      movieID = int(fields[0])
      movieName= fields[1]
      movies[movieID]= movieName

      return movies

      # python function to return a Ratings Row Object
      def processRatings(line):

      fields = line.split()
      mvID = int(fields[1])
      return Row(movieID = mvID)

      #python function to print results
      def printResults(results):

      for result in results:

      print(“\n%s:\t%d ” %(moviesDictionary[result[0]], result[1]))

      # create a SparkSession
      session = SparkSession.builder.appName(“MostPopularMovies”).getOrCreate()

      # load the movies
      moviesDictionary = processMovies()

      # load the ratings Row Objects
      rawData = session.sparkContext.textFile(“/home/user/bigdata/datasets/ml-100k/u.data”)

      # conevert the ratings to an RDD of Row objects
      ratings = rawData.map(processRatings)

      # convert the ratings Row Objects into an RDD
      ratingsDataset = session.createDataFrame(ratings)

      # process the Dataframe
      topMostMovieIDs = ratingsDataset.groupBy(“movieID”).count().orderBy(“count”, ascending=False).cache()

      # show all topMostMovieIDs
      topMostMovieIDs.show()

      # collect and display results for topmost 25 movies
      topMost5MovieIDs = topMostMovieIDs.take(5)

      # print the Movie Names with ratins count
      printResults(topMost5MovieIDs)

      # close the spark sessions
      session.stop()

The Output:

  • Capture

Dockers 101 – Series 7 of N – Setting up a NodeJs Docker Application

  • Requirement:
    • Setting up a NodeJs Docker Application
  • Strategy:
    • Create the files needed to run the NodeJS application
    • Create a Dockerfile
    • Build, run , push and pull the image
    • How to use ONBUILD to delay a dependency till build time
  • Solution:
    • Login to your Host machine(in my case a CentOS 7 machine)
    • Make a directory “mynodejs” and go to the directory – mkdir mynodejs && cd mynodejs
    • Create a file package.json with the following component and save
      • {

        “name”: “my_docker_nodejs_app”,
        “version”: “1.0.0”,
        “description”: “My Docker NodeJs App”,
        “author”: “MOhd Naeem <naeem.mohd@hotmail.com>”,
        “main”: “server.js”,

        “scripts”: {

        “start”: “node server.js”

        },

        “dependencies”: {

        “express”: “^4.16.1”

        }

        }

    • Create a file server.js with the following component and save
      • ‘use strict’;
        const express = require(‘express’);
        // Constants
        const PORT = 8080;
        const HOST = ‘0.0.0.0’;
        // App
        const app = express();
        app.get(‘/’, (req, res) => {
        res.send(‘Hello world\n’);
        });
        app.listen(PORT, HOST);
        console.log(`Running on http://${HOST}:${PORT}`);

    • Create a file Dockerfile with the following component and save
      • # starting from base image node:alpine
        FROM node:7-alpine
        # Creating an app directory on the container
        RUN mkdir -p /src/app
        # setup working directory
        WORKDIR /src/app
        # Installing any app dependencies
        # A wildcard being used to ensure both package.json and package-lock.json are copied
        # if nodejs V>5+
        COPY package*.json /src/app
        # For PROD env only use flag –only=production
        # e.g RUN npm install –only=production
        # Running npm install in Non-Prod env.
        RUN npm install
        # Bundle app source
        COPY . /src/app

        # Expose port 3000
        EXPOSE 3000

        # Run command to start npm
        CMD [ “npm”, “start” ]

    • Create a file .dockerignore with the following component and save
      • node_modules
        npm-debug.log

    • Now build the app-
      • docker build -t mynodejsapp-image:v1 .
    • Now run run the container to run the website
      • docker run -d -p 49160:8080 mynodejsapp-image:v1
    • Check the content
      • curl -i localhost:49160
    • Capture
    • Now check for the image name for your app and tag it for pushing it to Docker Hub
      • docker images # to check for image name
      • docker tag image username/repository:tag # for tagging
        • docker tag 4ffd91cdc6a0 mnaeemsiddiqui/naeemsrepo:mynodejsapp-image-v1
      • docker login # to login to the Docker hub
    • Now push the image to Docker Hub
      • docker push mnaeemsiddiqui/naeemsrepo:mynodejsapp-image-v1
    • Capture
    • Now pull the image to Docker Hub
      • docker pull mnaeemsiddiqui/naeemsrepo:mynodejsapp-image-v1
    • Now run it on another server
      • docker run -d -p 49160:8080 mnaeemsiddiqui/naeemsrepo:mynodejsapp-image-v1
      • curl -i localhost:49160
    • Capture
    • Using OnBuild to delay execution of dependencies
    •  Lets update the Dockerfile with content below
    • The big difference is that we are delaying the execution of commands for copying the package.json, npm install and copying of source application files till building by using keyword build
      • #starting from base image node:alpine
        FROM node:7-alpine
        # Creating an app directory on the container
        RUN mkdir -p /src/app
        # setup working directory
        WORKDIR /src/app
        # Installing any app dependencies
        # A wildcard being used to ensure both package.json and package-lock.json are copied
        # if nodejs V>5+
        ONBUILD COPY package*.json /src/app
        # For PROD env only use flag –only=production
        # e.g RUN npm install –only=production
        # Running npm install in Non-Prod env.
        ONBUILD RUN npm install
        # Bundle app source
        ONBUILD COPY . /src/app

        # Expose port 3000
        EXPOSE 3000

        # Run command to start npm
        CMD [ “npm”, “start” ]

    • Now build and run the application once again.

Dockers 101 – Series 6 of N – Using Dockerfile to a static website using nginx server

  • Requirement:
    • To run a static website using nginx server
  • Strategy:
    • Docker uses a Dockerfile to define what all will be going in a container
    • For above requirement we need the following:
      • nginx web server
      • a working directory with some static html content
      • copying the contents to nginx server
      • build the app
      • push the container to Docker Hub( you will need to create Docker Hub account and a repository under the account, Please visit hub.docker.com)
      • pull the image 
      • run the container
  • Solution:
    • Login to your Host machine(in my case a CentOS 7 machine)
    • Make a directory “myweb” and go to the directory – mkdir myweb && cd myweb
    • Create a html filr with some content
      • echo “<h1>HI , This is a statis web page</h1>”> index.html
    • Now create a Dockerfile and copy the following content into it – nano Dockerfile
    • Copy following content into the Dockerfile and save:
    • The docker file has self explanatory explanations as what it is doing:
    • FROM nginx:alpine
      COPY . /usr/share/nginx/html
    • Now build the app-
      • docker build -t mywebserver-image:v1 .
    • Now run run the container to run the website
      • docker run -d -p 80:80 mywebserver-image:v1
    • Check the content
      • curl localhost
    • Capture
    • Now check for the image name for your app and tag it for pushing it to Docker Hub
      • docker images # to check for image name
      • docker tag image username/repository:tag # for tagging
        • docker tag 4ffd91cdc6a0 mnaeemsiddiqui/naeemsrepo:mynginxwebserverv1
      • docker login # to login to the Docker hub
    • Now push the image to Docker Hub
      • docker push mnaeemsiddiqui/naeemsrepo:mynginxwebserverv1
    • Now that you have a docker image on docker hub, you can
      • pull the docker image – docker pull mnaeemsiddiqui/naeemsrepo:mynginxwebserverv1
      • to run your app – docker run -d -p 80:80  mnaeemsiddiqui/naeemsrepo:mynginxwebserverv1
    • Capture
    • Now update the docker file to add EXPose and CMD commands
    • FROM nginx:1.11-alpine
      COPY index.html /usr/share/nginx/html/index.html
      EXPOSE 80
      CMD [“nginx”, “-g”, “daemon off;”]

    • Build, run, push, pull and run.
    • Capture
    • Now lets use a docker-compose.yml, copy the content below and save.
    • version: ‘3.3’
      services:

      web:

      image: nginx:alpine
      working_dir: /usr/share/nginx/html
      volumes:
      – ./:/usr/share/nginx/html
      expose:
      – “8080”
      ports:
      – “8080:80”
      environment:
      – NGINX_HOST=localhost
      – NGINX_PORT=80
      command: “nginx -g ‘daemon off;'”

    • run – docker compose up -d
    • Capture
    • Capture
    • Yay!!, you containerized your app and pushed it to docker hub and pulled that image and ran the container to run your application.

Dockers 101 – Series 5 of N – Using docker-setting up a MySQL/MariaDB container

  • Requirement:
    • Lets imagine that as a DevOps, you have been asked to create a container to run MySQL/MariaDB, and try different docker commands to run MySQL/MariaDB in foreground, background, with specific port binding, with dynamic port binding, persisting data and logs from container to a volume on host
  • Strategy:
    • search the name of image on docker hub
    • run the MySQL/MariaDB container in background as its a database and will take time to setup
    • run MySQL/MariaDB in background
    • run MySQL/MariaDB with specific port
    • run MySQL/MariaDB with dynamic port
    • run MySQL/MariaDB with volume persistance
  • Solution:
    • Login to your Host machine(in my case a CentOS 7 machine)
    • Make a directory “mymariadb” and go to the directory – mkdir mymariadb && cd mymariadb
    • How to:
      • search for an image using filters and limits- docker search –filter “is-official=true” –limit 5 mariadb
      • run an image
        • in interactive mode  – docker run –name mymariadb-fg -e MYSQL_ROOT_PASSWORD=mypasswordfg -it mariadb:latest
        • in background mode – docker run –name mymariadb-bg -e MYSQL_ROOT_PASSWORD=mypasswordbg -d mariadb:latest
      • Capture
      • Check logs –
        • docker logs -f 506290cb3cba # show continuous logs as it generates
        • Capture
      • Using docker-compose:
      • Capture

         

      • version: '3.1'
        
        services:
        
          db:
            image: mariadb
            restart: always
            environment:
              MYSQL_ROOT_PASSWORD: testpass
        
          adminer:
            image: adminer
            restart: always
            ports:
              - 8080:8080
      • Create a file ‘docker-compose.yml’ and copy the above content and save.
      • Capture
      • run  ‘docker-compose up’ to execute the container
      • run ‘docker exec -it <container-id>’ e.g. docker exec -it 71b9352ecef5 bash and follow the series of questions you are prompted with.
      • Capture

Dockers 101 – Series 4 of N – Using docker-setting up a redis container

  • Requirement:
    • Lets imagine that as a DevOps, you have been asked to create a container to run redis, and try different docker commands to run redis in foreground, background, with specific port binding, with dynamic port binding, persisting data and logs from container to a volume on host
  • Strategy:
    • search the name of image on docker hub
    • run the redis container in background as its a database and will take time to setup
    • run redis in background
    • run redis with specific port
    • run redis with dynamic port
    • run redis with volume persistance
  • Solution:
    • Login to your Host machine(in my case a CentOS 7 machine)
    • Make a directory “TicTackToe” and go to the directory – mkdir myredis && cd myredis
    • How to
      • search for an image – docker search <image-name>
      • e.g. docker search redis
      • Capture
      • run an image
        • in interactive mode  – docker run -it <image-name>
          • docker run -it redis
          • switch  “-it” is for interactive mode
        • in background mode – docker run -d <image-name>
          • docker run -d redis
          • switch  “-d” is for background mode
      • check if container is running – docker container ls or docker ps -a
      • Capture
      • check logs of a container – docker logs <container-id>
        • docker logs a15268c9c0be # show full log
        • docker logs –tail 10 a15268c9c0be # show last 10 lines
        • docker logs -f a15268c9c0be # show continuous logs as it generates
      • To stop a container – docker container stop a15268c9c0be
      • To start a container – docker container start a15268c9c0be
      • To kill a container – docker container kill a15268c9c0be
      • Capture
      • to execute redis with a specific port with host container port forwarding( not a good practice because we are hard setting the port)
        • docker run -d –name redisHostStaticPort -p 6379:6379 redis:latest
      • to execute redis with a dynamic host port with host container port forwarding( not a good practice because we are hard setting the port)
        • docker run -d –name redisHostDynamicPort001 -p 6380 redis:latest
        • But now you don’t know which host port was assigned
        • so use command – docker port redisHostDynamicPort001 6380
          • the host port assigned was 32770
        • Now you can run another instance too
          • docker run -d –name redisHostDynamicPort002 -p 6381 redis:latest
          • docker port redisHostDynamicPort002 6381
            • now another port was assigned 32551
        • Now you can see that you are running two instances of redis. See snapshot below.
        • Capture
      • Now to persist the data each time the redis restart we have to define the volumes
        • volumes help you persist the dats from container to the host
        • we can use switch  -v <host-dir>:<container-dir>
        • e.g. docker run -d –name redisHostDynamicPort003 -p 6389 -v “$PWD/data”:/data redis:latest
        • $PWD is to get present working directory
        • $PWD/data means the data folder in current directory of host
        • /data means the /data folder of redis container where logs will be stored
        • -v “$PWD/data”:/data means defining an attaching a volume forwarding from container folder /data to the host folder data
        • Capture
      • check or inspect a container
        • it provides the meta data about the container
        • docker inspect <containerid-or-name>
        • e.g. docker inspect redis
        • docker inspect –format “{{RepoTags}}” redis # searches lines in the inspect JSON
      • Capture
    • Running redis in foreground:
      • docker run -it redis 
      • the switch “-it” runs a container in foreground or interactive mode
      • Capture

Dockers 101 – Series 3 of N – Using docker-compose.yml and docker-compose to setup a WordPress site

  • Requirement:
    • To  setup a WordPress website using docker-compose.yml and docker-compose
  • Strategy:
    • Docker uses a docker-compose.yml to define the orchestration.
    • docker-compose is an orchestration tool which uses the docker-compose.yml to setup multiple containers.
    • For above requirement we need the following:
      • WordPress is a multi tier application with a front-end, business layer and a database back-end.
      • One of the features of docker is single responsibility principle. So these multiple layers will be represented by multiple images and containers
      • Install docker-compose and setup the file docker-compose.yml
  • Solution:
    • Login to your Host machine(in my case a CentOS 7 machine)
    • Make a directory “TicTackToe” and go to the directory – mkdir mywordpress && cd mywordpress
    • Now setup and define images and configuration in the docker-compose.yml- nano docker-compose.yml
    • Copy following content into the file and save:
    • Explanation of the code:
      • Download the docker-compose.yml from here – https://testbucket786786.s3.amazonaws.com/docker/docker-compose.yml
      • Capture
      • docker-compose use some terms to define the configuration
      • services: it user service keyword to define services
        • the file defines two services
          • db for database image settings
            • image: mysql:5.7
            • volume for database volumes : /var/lib/mysql
            • restart: always
            • and then environment settings like database, username, password and root password.
          • wordpress for wordpress settings
            • depends on db means that wordpress service is dependent db service to be ready first
            • image: wordpress:latest
            • ports: “8000:80” means 8000 is host port “port forwarding” to port 80 of container
            • and then environment settings like host, user, password
      • volumes: it user columes keyword to define volumes to be used and attached in the configuration
      • networks: it user networks keyword to define network settings
    • Install docker-compose tool
    • Capture
    • Build the project – docker-compose up -d
    • Below is the output of the command
    • Capture
    • Lets access the website: