Thursday, May 10, 2018

Upgrade Python version before DataBricks cluster launches

Upgrade Python version before DataBricks Cluster Launches 



This blog walks through creating an init script for a cluster named " testabi-python3.6 "  that installs the Python 3.6  on that cluster during startup. You can create a customizable commands or pre-installs if you create a variable clusterName that holds the cluster name.


  1. Display the list of existing global init scripts.

    display(dbutils.fs.ls("dbfs:/databricks/init/"))
  2. Create dbfs:/databricks/init/ if it doesn’t exist, this is the location where all the init scripts for any cluster lives for the specific shard.

    dbutils.fs.mkdirs("dbfs:/databricks/init/")
  3. Configure a cluster name variable. This clusterName should match your cluster name where in the script will be placed and executed during the cluster startup .
clusterName = "testabi-python3.6"

 4.  Create a directory named testabi-python3.6 using Databricks File System - DBFS.
dbutils.fs.mkdirs("dbfs:/databricks/init/%s/"%clusterName)
Now , you can list the dbfs mount .
5. Create the script which will install required version of python (Depending on the OS version and flavor).
dbutils.fs.put("/databricks/init/testabi-python3.6/python-install.sh","""
#!/bin/bash
sudo add-apt-repository ppa:jonathonf/python-3.6 -y
sudo apt update -y
sudo apt install python3.6 -y""", True)

Note: -  In my case OS version is "Ubuntu 16.04.4 LTS"
6 .  Restart the cluster (Make sure there are no other conflicting packages or it would cause issues while cluster starts)
Bingo, Init logs prove Python 3.6 is installed .



No comments:

Post a Comment