Zeppelin as a Kubernetes Service connecting to secure MapR Cluster
Apache Zeppelin is a web-based notebook that enables data-driven interactive data analytics, providing built-in integration for Apache Spark and having about five different interpreters able to execute Scala, Python, R and SQL code on Spark apart from shell. Since Kubernetes is becoming more and more popular and a de-facto standard for enterprise level orchestration it absolutely makes sense to start your Zeppelin notebooks on Kubernetes the native way .
This blog assumes you have secure MapR cluster and separate K8S cluster on which Zeppelin pods need to be spun up. Once the pod is in running state after pulling container with desired configs and software pre-installed with dependency , all software and services used by Zeppelin are started and ready for you to use.
PRE-REQ :
MapR 6.x Secure cluster : MapR user ticket exists .
bash-4.2$ maprlogin print
Opening keyfile /tmp/maprticket_2147483632
cluster4: user = mapr, created = 'Wed Apr 25 15:42:28 PDT 2018', expires = 'Wed May 09 15:42:28 PDT 2018', RenewalTill = 'Fri May 25 15:42:28 PDT 2018', uid = 2147483632, gids = 2147483632, 1000, CanImpersonate = false
bash-4.2$ cat /opt/mapr/MapRBuildVersion
6.0.1.20180326182302.GA
K8S cluster is up and running .
[root@tssperf09 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
tssperf09.lab Ready master 33d v1.9.6
tssperf10.lab Ready <none> 33d v1.9.6
tssperf11.lab Ready <none> 33d v1.9.6
[root@tssperf09 ~]#
Steps to spin up Zeppelin pod and access via web-browser .
1) Following Sample yaml file can be used to provision a POD
[root@tssperf09 datarefinory]# cat ds-refinery-secure.yaml
apiVersion: v1
kind: Pod
metadata:
name: dsr-secure-kube # POD name
labels:
app: dsr-svc
spec:
containers:
- name: dsr
imagePullPolicy: Always
image: maprtech/data-science-refinery
ports:
- containerPort: 9995 # Done for port mapping "container to host"
hostPort: 9995
protocol: TCP
- containerPort: 10050
hostPort: 10050
protocol: TCP
- containerPort: 10051
hostPort: 10051
protocol: TCP
- containerPort: 11052
hostPort: 11052
protocol: TCP
securityContext:
capabilities:
add: ["SYS_ADMIN" , "SYS_RESOURCE"]
privileged: true # Fuse needs privileged permission to start
resources:
requests:
memory: "2Gi"
cpu: "500m"
env:
- name: MAPR_MOUNT_PATH
value: /mapr
- name: MAPR_CLUSTER
value: cluster4
- name: MAPR_CLDB_HOSTS
value: 10.10.102.95
- name: MAPR_CONTAINER_USER
value: mapr
- name: MAPR_CONTAINER_GROUP
value: mapr
- name: MAPR_CONTAINER_PASSWORD
value: mapr
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: DEPLOY_MODE
value: kubernetes
- name: MAPR_TICKETFILE_LOCATION
value: "/tmp/mapr_ticket/CONTAINER_TICKET"
volumeMounts:
- mountPath: /dev/fuse
name: fuse
- mountPath: /sys/fs/cgroup
name: cgroups
readOnly: true
- mountPath: /tmp/mapr_ticket
name: maprticket
readOnly: true
volumes:
- name: cgroups
hostPath:
path: /sys/fs/cgroup
- name: fuse
hostPath:
path: /dev/fuse
- name: maprticket
secret:
secretName: dsr-ticket-secret
---
apiVersion: v1
kind: Secret
metadata:
name: dsr-ticket-secret
type: Opaque
data:
CONTAINER_TICKET: Y2x1c3RlcjQgRGZQSVZHNmxTeXREeGI0OG9SM0RPTTNLQ0tWdG0vS0FWejB5QzFncTVlR01uS2lTRFlZZ1k1b2cxVFlFMHZ4VmczUnVyYlNXazJ0RUVHYjBrWWNLVnQ3L0xlNzJnUGZ3dzYxUWtBYmNHR2xodmpMQXo2ZlFKN1lCN3gzVGJJbzJYeHR0akVuZm1XcFFXNlNwckxMenJKa3d0VlFFZC9CMXg0amN2SlpUTytsOGdnYkJjSHpNN3dHSEFlRFowRHl3akhnMHBtNlA2WTYwRG5HS3dCMVRPd05KWlNJV1hsTGZqblhKdk5jYXpUSGlPNVo2eVhvRldQbTVZMHJndjZNUG5OT0VyekFLcDIzcmY1NUI5eGpSR0IrRmVzUXQycVZMVG45WjVDekNkQT09
[root@tssperf09 datarefinory]#
Note : in above yaml CONTAINER_TICKET is base64 format . You simple cat the user ticket and get the Encoded output.
2) Use the
kubectl create
command with the -f
option to install provision DS POD on Kubernetes cluster
[root@tssperf09 datarefinory]# kubectl create -f ds-refinery-secure.yaml
pod "dsr-secure-kube" created
secret "dsr-ticket-secret" created
[root@tssperf09 datarefinory]#
3) To check the status and step the pod is currently executing while coming up below command can be executed.
[root@tssperf09 datarefinory]# kubectl describe pod dsr-kube -n default
Name: dsr-kube
Namespace: default
Node: tssperf10.lab/10.10.72.250
Start Time: Wed, 25 Apr 2018 18:34:41 -0600
Labels: app=dsr-svc
Annotations: <none>
Status: Running
IP: 192.168.61.80
Containers:
dsr:
Container ID: docker://37bd33f344d5fb2cc46e27684b5bb92e24785be1a109d5493f912dea6ef8b079
Image: maprtech/data-science-refinery
Image ID: docker-pullable://docker.io/maprtech/data-science-refinery@sha256:cf9c82338cf068ec49acc1121861693c0636f170acaabf534ad60794f808835f
Ports: 9995/TCP, 10050/TCP, 10051/TCP, 10052/TCP, 10053/TCP, 10054/TCP, 10055/TCP, 11050/TCP, 11051/TCP, 11052/TCP, 11053/TCP, 11054/TCP, 11055/TCP
State: Running
Started: Wed, 25 Apr 2018 18:34:43 -0600
Ready: True
Restart Count: 0
Requests:
cpu: 500m
memory: 2Gi
Environment:
MAPR_MOUNT_PATH: /mapr
MAPR_CLUSTER: ObjectPool
MAPR_CLDB_HOSTS: 10.10.70.113
MAPR_CONTAINER_USER: mapr
MAPR_CONTAINER_GROUP: mapr
MAPR_CONTAINER_PASSWORD: mapr
HOST_IP: (v1:status.hostIP)
DEPLOY_MODE: kubernetes
Mounts:
/dev/fuse from fuse (rw)
/sys/fs/cgroup from cgroups (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-v8p85 (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
cgroups:
Type: HostPath (bare host directory volume)
Path: /sys/fs/cgroup
HostPathType:
fuse:
Type: HostPath (bare host directory volume)
Path: /dev/fuse
HostPathType:
default-token-v8p85:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-v8p85
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
[root@tssperf09 datarefinory]# kubectl describe pod dsr-secure-kube -n default
Name: dsr-secure-kube
Namespace: default
Node: tssperf11.lab/10.10.72.251
Start Time: Wed, 25 Apr 2018 20:08:42 -0600
Labels: app=dsr-svc
Annotations: <none>
Status: Running
IP: 192.168.217.147
Containers:
dsr:
Container ID: docker://6049bc3625fa3733c159a68760336b03d63c481dcec0966cd686eff6eea49676
Image: maprtech/data-science-refinery
Image ID: docker-pullable://docker.io/maprtech/data-science-refinery@sha256:cf9c82338cf068ec49acc1121861693c0636f170acaabf534ad60794f808835f
Ports: 9995/TCP, 10050/TCP, 10051/TCP, 10052/TCP, 10053/TCP, 10054/TCP, 10055/TCP, 11050/TCP, 11051/TCP, 11052/TCP, 11053/TCP, 11054/TCP, 11055/TCP
State: Running
Started: Wed, 25 Apr 2018 20:08:44 -0600
Ready: True
Restart Count: 0
Requests:
cpu: 500m
memory: 2Gi
Environment:
MAPR_MOUNT_PATH: /mapr
MAPR_CLUSTER: cluster4
MAPR_CLDB_HOSTS: 10.10.102.95
MAPR_CONTAINER_USER: mapr
MAPR_CONTAINER_GROUP: mapr
MAPR_CONTAINER_PASSWORD: mapr
HOST_IP: (v1:status.hostIP)
DEPLOY_MODE: kubernetes
MAPR_TICKETFILE_LOCATION: /tmp/mapr_ticket/CONTAINER_TICKET
Mounts:
/dev/fuse from fuse (rw)
/sys/fs/cgroup from cgroups (ro)
/tmp/mapr_ticket from maprticket (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-v8p85 (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
cgroups:
Type: HostPath (bare host directory volume)
Path: /sys/fs/cgroup
HostPathType:
fuse:
Type: HostPath (bare host directory volume)
Path: /dev/fuse
HostPathType:
maprticket:
Type: Secret (a volume populated by a Secret)
SecretName: dsr-ticket-secret
Optional: false
default-token-v8p85:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-v8p85
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 57s default-scheduler Successfully assigned dsr-secure-kube to tssperf11.lab
Normal SuccessfulMountVolume 57s kubelet, tssperf11.lab MountVolume.SetUp succeeded for volume "fuse"
Normal SuccessfulMountVolume 57s kubelet, tssperf11.lab MountVolume.SetUp succeeded for volume "cgroups"
Normal SuccessfulMountVolume 57s kubelet, tssperf11.lab MountVolume.SetUp succeeded for volume "maprticket"
Normal SuccessfulMountVolume 57s kubelet, tssperf11.lab MountVolume.SetUp succeeded for volume "default-token-v8p85"
Normal Pulling 56s kubelet, tssperf11.lab pulling image "maprtech/data-science-refinery"
Normal Pulled 55s kubelet, tssperf11.lab Successfully pulled image "maprtech/data-science-refinery"
Normal Created 55s kubelet, tssperf11.lab Created container
Normal Started 55s kubelet, tssperf11.lab Started container
[root@tssperf09 datarefinory]#
4) Verify pod is in running state.
[root@tssperf09 ~]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
dsr-kube 1/1 Running 0 1h 192.168.61.80 tssperf10.lab
dsr-secure-kube 1/1 Running 0 50m 192.168.217.146 tssperf11.lab
[root@tssperf09 ~]#
5) Once the POD is up you can login into the POD via below command and verify the posix mount is mounted .
[root@tssperf09 datarefinory]# kubectl exec -it dsr-secure-kube -n default -- bash
[root@dsr-secure-kube ~]# df -hP /mapr
Filesystem Size Used Avail Use% Mounted on
posix-client-container 4.4T 127G 4.3T 3% /mapr
[root@dsr-secure-kube ~]# cd /mapr/cluster4/
apps hbase installer oozie opt test1 tmp user var
Also do run "hadoop fs" commands to verify connectivity from the POD to the cluster.
[root@dsr-secure-kube mapr]# hadoop fs -ls /
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Found 9 items
drwxr-xr-x - 2147483632 2147483632 1 2018-03-27 16:30 /apps
drwxr-xr-x - 2147483632 2147483632 0 2018-03-27 16:27 /hbase
drwxr-xr-x - 2147483632 2147483632 3 2018-03-27 16:30 /installer
drwxr-xr-x - 2147483632 2147483632 1 2018-03-27 16:34 /oozie
drwxr-xr-x - 2147483632 2147483632 0 2018-03-27 16:27 /opt
-rwxr-xr-x 3 root root 1048576000000 2018-04-24 22:11 /test1
drwxrwxrwx - 2147483632 2147483632 0 2018-03-27 16:26 /tmp
drwxr-xr-x - 2147483632 2147483632 2 2018-03-27 16:30 /user
drwxr-xr-x - 2147483632 2147483632 1 2018-03-27 16:26 /var
[root@dsr-secure-kube mapr]#
6) Now if you check ZeppelinServer is also started and running .
[root@dsr-secure-kube ~]# jps
3072 ZeppelinServer
2261 LivyServer
4330 Jps
[root@dsr-secure-kube ~]#
To access the zeppelin UI from our node we can do below steps .
i) Expose the services running in dsr-secure-kube on the node.
[root@tssperf09 datarefinory]# kubectl expose pod dsr-secure-kube --type=NodePort
service "dsr-secure-kube" exposed
ii) Get the port mapping. In our case port "31027" has zeppelin running which i can access via web-browser .
[root@tssperf09 datarefinory]# kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
dsr-kube NodePort 10.103.22.78 <none> 9995:31027/TCP,10050:32503/TCP,10051:32493/TCP,10052:30350/TCP,10053:30060/TCP,10054:32672/TCP,10055:30504/TCP,11050:31940/TCP,11051:31963/TCP,11052:30345/TCP,11053:30966/TCP,11054:30517/TCP,11055:30397/TCP 24m
dsr-secure-kube NodePort 10.100.109.255 <none> 9995:31788/TCP,10050:32645/TCP,10051:32451/TCP,10052:32638/TCP,10053:32135/TCP,10054:30601/TCP,10055:30499/TCP,11050:31109/TCP,11051:30602/TCP,11052:30812/TCP,11053:30883/TCP,11054:32437/TCP,11055:32515/TCP 9s
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 33d
[root@tssperf09 datarefinory]#
Login to Zeppelin via username/passwd (mapr/mapr) and access the cluster after creating the notebook . Yay it works !!
No comments:
Post a Comment