Wednesday, April 25, 2018

Zeppelin as a Kubernetes Service connecting to secure MapR Cluster

                 Zeppelin as a Kubernetes Service connecting to secure MapR Cluster

Apache Zeppelin is a web-based notebook that enables data-driven interactive data analytics, providing built-in integration for Apache Spark and having about five different interpreters able to execute Scala, Python, R and SQL code on Spark apart from shell.  Since Kubernetes is becoming more and more popular and a de-facto standard for enterprise level orchestration it absolutely makes sense to start your Zeppelin notebooks on Kubernetes the native way . 

This blog assumes you have secure MapR cluster and separate K8S cluster on which Zeppelin pods need to be spun up. Once the pod is in running state after pulling container with desired configs and software pre-installed with dependency , all software and services used by Zeppelin are started and ready for you to use.

PRE-REQ :

MapR 6.x Secure cluster : MapR user ticket exists .


bash-4.2$ maprlogin print
Opening keyfile /tmp/maprticket_2147483632
cluster4: user = mapr, created = 'Wed Apr 25 15:42:28 PDT 2018', expires = 'Wed May 09 15:42:28 PDT 2018', RenewalTill = 'Fri May 25 15:42:28 PDT 2018', uid = 2147483632, gids = 2147483632, 1000, CanImpersonate = false
bash-4.2$ cat /opt/mapr/MapRBuildVersion 
6.0.1.20180326182302.GA

K8S cluster is up and running .

[root@tssperf09 ~]# kubectl get nodes
NAME            STATUS    ROLES     AGE       VERSION
tssperf09.lab   Ready     master    33d       v1.9.6
tssperf10.lab   Ready     <none>    33d       v1.9.6
tssperf11.lab   Ready     <none>    33d       v1.9.6
[root@tssperf09 ~]# 


Steps to spin up Zeppelin pod and access via web-browser .

1) Following Sample yaml file can be used to provision a POD

[root@tssperf09 datarefinory]# cat ds-refinery-secure.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: dsr-secure-kube                       # POD name
  labels:
      app: dsr-svc
spec:
  containers:
  - name: dsr
    imagePullPolicy: Always
    image: maprtech/data-science-refinery
    ports:
        - containerPort: 9995                  # Done for port mapping "container to host"
          hostPort: 9995
          protocol: TCP
        - containerPort: 10050
          hostPort: 10050
          protocol: TCP
        - containerPort: 10051
          hostPort: 10051
          protocol: TCP
        - containerPort: 11052
          hostPort: 11052
          protocol: TCP
    securityContext:
      capabilities:
        add: ["SYS_ADMIN" , "SYS_RESOURCE"]
      privileged: true                          # Fuse needs privileged permission to start
    resources:
      requests:
        memory: "2Gi"
        cpu: "500m"
    env:
    - name: MAPR_MOUNT_PATH
      value: /mapr
    - name: MAPR_CLUSTER
      value: cluster4 
    - name: MAPR_CLDB_HOSTS
      value: 10.10.102.95
    - name: MAPR_CONTAINER_USER
      value: mapr
    - name: MAPR_CONTAINER_GROUP
      value: mapr
    - name: MAPR_CONTAINER_PASSWORD
      value: mapr
    - name: HOST_IP
      valueFrom:
        fieldRef:
          fieldPath: status.hostIP
    - name: DEPLOY_MODE
      value: kubernetes
    - name: MAPR_TICKETFILE_LOCATION
      value: "/tmp/mapr_ticket/CONTAINER_TICKET" 
    volumeMounts:
    - mountPath: /dev/fuse
      name: fuse 
    - mountPath: /sys/fs/cgroup
      name: cgroups
      readOnly: true
    - mountPath: /tmp/mapr_ticket
      name: maprticket
      readOnly: true
  volumes:
  - name: cgroups
    hostPath:
      path: /sys/fs/cgroup
  - name: fuse
    hostPath:
      path: /dev/fuse
  - name: maprticket
    secret:
      secretName: dsr-ticket-secret
---
apiVersion: v1
kind: Secret
metadata:
  name: dsr-ticket-secret
type: Opaque
data:
  CONTAINER_TICKET: Y2x1c3RlcjQgRGZQSVZHNmxTeXREeGI0OG9SM0RPTTNLQ0tWdG0vS0FWejB5QzFncTVlR01uS2lTRFlZZ1k1b2cxVFlFMHZ4VmczUnVyYlNXazJ0RUVHYjBrWWNLVnQ3L0xlNzJnUGZ3dzYxUWtBYmNHR2xodmpMQXo2ZlFKN1lCN3gzVGJJbzJYeHR0akVuZm1XcFFXNlNwckxMenJKa3d0VlFFZC9CMXg0amN2SlpUTytsOGdnYkJjSHpNN3dHSEFlRFowRHl3akhnMHBtNlA2WTYwRG5HS3dCMVRPd05KWlNJV1hsTGZqblhKdk5jYXpUSGlPNVo2eVhvRldQbTVZMHJndjZNUG5OT0VyekFLcDIzcmY1NUI5eGpSR0IrRmVzUXQycVZMVG45WjVDekNkQT09                          

[root@tssperf09 datarefinory]# 

Note : in above yaml CONTAINER_TICKET is base64 format . You simple cat the user ticket and get the Encoded output.


2) Use the kubectl create command with the -f option to install provision DS POD on  Kubernetes cluster

[root@tssperf09 datarefinory]# kubectl create -f ds-refinery-secure.yaml 
pod "dsr-secure-kube" created
secret "dsr-ticket-secret" created
[root@tssperf09 datarefinory]#

3) To check the status and step the pod is currently executing while coming up below command can be executed.

[root@tssperf09 datarefinory]# kubectl describe pod dsr-kube  -n default
Name:         dsr-kube
Namespace:    default
Node:         tssperf10.lab/10.10.72.250
Start Time:   Wed, 25 Apr 2018 18:34:41 -0600
Labels:       app=dsr-svc
Annotations:  <none>
Status:       Running
IP:           192.168.61.80
Containers:
  dsr:
    Container ID:   docker://37bd33f344d5fb2cc46e27684b5bb92e24785be1a109d5493f912dea6ef8b079
    Image:          maprtech/data-science-refinery
    Image ID:       docker-pullable://docker.io/maprtech/data-science-refinery@sha256:cf9c82338cf068ec49acc1121861693c0636f170acaabf534ad60794f808835f
    Ports:          9995/TCP, 10050/TCP, 10051/TCP, 10052/TCP, 10053/TCP, 10054/TCP, 10055/TCP, 11050/TCP, 11051/TCP, 11052/TCP, 11053/TCP, 11054/TCP, 11055/TCP
    State:          Running
      Started:      Wed, 25 Apr 2018 18:34:43 -0600
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:     500m
      memory:  2Gi
    Environment:
      MAPR_MOUNT_PATH:          /mapr
      MAPR_CLUSTER:             ObjectPool
      MAPR_CLDB_HOSTS:          10.10.70.113
      MAPR_CONTAINER_USER:      mapr
      MAPR_CONTAINER_GROUP:     mapr
      MAPR_CONTAINER_PASSWORD:  mapr
      HOST_IP:                   (v1:status.hostIP)
      DEPLOY_MODE:              kubernetes
    Mounts:
      /dev/fuse from fuse (rw)
      /sys/fs/cgroup from cgroups (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-v8p85 (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          True 
  PodScheduled   True 
Volumes:
  cgroups:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/fs/cgroup
    HostPathType:  
  fuse:
    Type:          HostPath (bare host directory volume)
    Path:          /dev/fuse
    HostPathType:  
  default-token-v8p85:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-v8p85
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>
[root@tssperf09 datarefinory]# kubectl describe pod dsr-secure-kube  -n default
Name:         dsr-secure-kube
Namespace:    default
Node:         tssperf11.lab/10.10.72.251
Start Time:   Wed, 25 Apr 2018 20:08:42 -0600
Labels:       app=dsr-svc
Annotations:  <none>
Status:       Running
IP:           192.168.217.147
Containers:
  dsr:
    Container ID:   docker://6049bc3625fa3733c159a68760336b03d63c481dcec0966cd686eff6eea49676
    Image:          maprtech/data-science-refinery
    Image ID:       docker-pullable://docker.io/maprtech/data-science-refinery@sha256:cf9c82338cf068ec49acc1121861693c0636f170acaabf534ad60794f808835f
    Ports:          9995/TCP, 10050/TCP, 10051/TCP, 10052/TCP, 10053/TCP, 10054/TCP, 10055/TCP, 11050/TCP, 11051/TCP, 11052/TCP, 11053/TCP, 11054/TCP, 11055/TCP
    State:          Running
      Started:      Wed, 25 Apr 2018 20:08:44 -0600
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:     500m
      memory:  2Gi
    Environment:
      MAPR_MOUNT_PATH:           /mapr
      MAPR_CLUSTER:              cluster4
      MAPR_CLDB_HOSTS:           10.10.102.95
      MAPR_CONTAINER_USER:       mapr
      MAPR_CONTAINER_GROUP:      mapr
      MAPR_CONTAINER_PASSWORD:   mapr
      HOST_IP:                    (v1:status.hostIP)
      DEPLOY_MODE:               kubernetes
      MAPR_TICKETFILE_LOCATION:  /tmp/mapr_ticket/CONTAINER_TICKET
    Mounts:
      /dev/fuse from fuse (rw)
      /sys/fs/cgroup from cgroups (ro)
      /tmp/mapr_ticket from maprticket (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-v8p85 (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          True 
  PodScheduled   True 
Volumes:
  cgroups:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/fs/cgroup
    HostPathType:  
  fuse:
    Type:          HostPath (bare host directory volume)
    Path:          /dev/fuse
    HostPathType:  
  maprticket:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  dsr-ticket-secret
    Optional:    false
  default-token-v8p85:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-v8p85
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason                 Age   From                    Message
  ----    ------                 ----  ----                    -------
  Normal  Scheduled              57s   default-scheduler       Successfully assigned dsr-secure-kube to tssperf11.lab
  Normal  SuccessfulMountVolume  57s   kubelet, tssperf11.lab  MountVolume.SetUp succeeded for volume "fuse"
  Normal  SuccessfulMountVolume  57s   kubelet, tssperf11.lab  MountVolume.SetUp succeeded for volume "cgroups"
  Normal  SuccessfulMountVolume  57s   kubelet, tssperf11.lab  MountVolume.SetUp succeeded for volume "maprticket"
  Normal  SuccessfulMountVolume  57s   kubelet, tssperf11.lab  MountVolume.SetUp succeeded for volume "default-token-v8p85"
  Normal  Pulling                56s   kubelet, tssperf11.lab  pulling image "maprtech/data-science-refinery"
  Normal  Pulled                 55s   kubelet, tssperf11.lab  Successfully pulled image "maprtech/data-science-refinery"
  Normal  Created                55s   kubelet, tssperf11.lab  Created container
  Normal  Started                55s   kubelet, tssperf11.lab  Started container
[root@tssperf09 datarefinory]# 


4) Verify pod is in running state.

[root@tssperf09 ~]# kubectl get pods -o wide
NAME              READY     STATUS    RESTARTS   AGE       IP                NODE
dsr-kube          1/1       Running   0          1h        192.168.61.80     tssperf10.lab
dsr-secure-kube   1/1       Running   0          50m       192.168.217.146   tssperf11.lab
[root@tssperf09 ~]# 

5)  Once the POD is up you can login into the POD via below command and verify the posix mount is mounted . 

[root@tssperf09 datarefinory]# kubectl exec -it  dsr-secure-kube  -n default -- bash
[root@dsr-secure-kube ~]# df -hP /mapr
Filesystem              Size  Used Avail Use% Mounted on
posix-client-container  4.4T  127G  4.3T   3% /mapr

[root@dsr-secure-kube ~]# cd /mapr/cluster4/
apps  hbase  installer oozie  opt  test1  tmp user  var

Also do run "hadoop fs" commands to verify connectivity from the POD to the cluster.

[root@dsr-secure-kube mapr]# hadoop fs -ls /
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Found 9 items
drwxr-xr-x   - 2147483632 2147483632             1 2018-03-27 16:30 /apps
drwxr-xr-x   - 2147483632 2147483632             0 2018-03-27 16:27 /hbase
drwxr-xr-x   - 2147483632 2147483632             3 2018-03-27 16:30 /installer
drwxr-xr-x   - 2147483632 2147483632             1 2018-03-27 16:34 /oozie
drwxr-xr-x   - 2147483632 2147483632             0 2018-03-27 16:27 /opt
-rwxr-xr-x   3 root       root       1048576000000 2018-04-24 22:11 /test1
drwxrwxrwx   - 2147483632 2147483632             0 2018-03-27 16:26 /tmp
drwxr-xr-x   - 2147483632 2147483632             2 2018-03-27 16:30 /user
drwxr-xr-x   - 2147483632 2147483632             1 2018-03-27 16:26 /var
[root@dsr-secure-kube mapr]# 

6) Now if you check ZeppelinServer is also started and running . 

[root@dsr-secure-kube ~]# jps
3072 ZeppelinServer
2261 LivyServer
4330 Jps
[root@dsr-secure-kube ~]# 

To access the zeppelin UI from our node we can do below steps .

i) Expose the services running in dsr-secure-kube on the node.

[root@tssperf09 datarefinory]# kubectl expose pod dsr-secure-kube --type=NodePort
service "dsr-secure-kube" exposed

ii) Get the port mapping. In our case port "31027" has zeppelin running which i can access via web-browser .

[root@tssperf09 datarefinory]# kubectl get services
NAME              TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                                                                                                                                                                                          AGE
dsr-kube          NodePort    10.103.22.78     <none>        9995:31027/TCP,10050:32503/TCP,10051:32493/TCP,10052:30350/TCP,10053:30060/TCP,10054:32672/TCP,10055:30504/TCP,11050:31940/TCP,11051:31963/TCP,11052:30345/TCP,11053:30966/TCP,11054:30517/TCP,11055:30397/TCP   24m
dsr-secure-kube   NodePort    10.100.109.255   <none>        9995:31788/TCP,10050:32645/TCP,10051:32451/TCP,10052:32638/TCP,10053:32135/TCP,10054:30601/TCP,10055:30499/TCP,11050:31109/TCP,11051:30602/TCP,11052:30812/TCP,11053:30883/TCP,11054:32437/TCP,11055:32515/TCP   9s
kubernetes        ClusterIP   10.96.0.1        <none>        443/TCP                                                                                                                                                                                                          33d
[root@tssperf09 datarefinory]#



Login to Zeppelin via username/passwd (mapr/mapr) and access the cluster after creating the notebook .  Yay it works !!




























No comments:

Post a Comment