With my last blog i just showed how to use init scripts to install customer packages by creating bash script to reside in a sub-directory of the init scripts directory named the same as the cluster name. For example, to specify init scripts for the cluster named
testabizer-python3.6
, create the directory dbfs:/databricks/init/testabizer-python3.6
, and put all shell scripts that should run on cluster testabizer-python3.6
in that directory.http://abizeradenwala.blogspot.com/2018/05/upgrade-python-version-before-cluster.html
This is great for most cases but in some cases Databricks Notebook has to use the new version of package/library but since some path are not set and cluster/containers already started the Notebook might still use older version.
In this blog i will write details on how to upgrade to Python 3.6 and make sure DB Notebook uses them as well.
Step 1
Download the Anaconda Python distribution from https://www.continuum.io/downloads:
Step 2
We will want to make sure all of the packages Databricks includes by default in Databricks and PySpark are replicated in Anaconda so create list of python packages and save in DBFS location
Step 3
Use an init script to change the default Python distribution.
Now after cluster restarts verify python was upgraded and available via Notebook as needed.
No comments:
Post a Comment