Friday, November 2, 2018

Set up an external metastore for Azure Databricks

                  Set up an external metastore for Azure Databricks


Set up an external metastore using the web UI


  1. Click the Clusters button on the sidebar.
  2. Click Create Cluster.
  3. Click Show advanced settings, and navigate to the Spark tab.
  4. Enter the following Spark configuration options:
    Set the following configurations under Spark Config.
Note :-  <mssql-username> and <mssql-password> specify the username and password of your Azure SQL database account that has read/write access to the database
```
javax.jdo.option.ConnectionURL jdbc:sqlserver://abizerdb.database.windows.net:1433;database=test_abizerDB
javax.jdo.option.ConnectionPassword < Password >
datanucleus.schema.autoCreateAll true
spark.hadoop.hive.metastore.schema.verification false
datanucleus.autoCreateSchema true
spark.sql.hive.metastore.jars maven
javax.jdo.option.ConnectionDriverName com.microsoft.sqlserver.jdbc.SQLServerDriver
spark.sql.hive.metastore.version 1.2.0
javax.jdo.option.ConnectionUserName abizer@abizerdb
datanucleus.fixedDatastore false
```

5. Continue your cluster configuration, Click Create Cluster to create the cluster.


Once cluster is up , in the driver logs you should see below details logged .


18/11/02 20:34:29 INFO SparkConfUtils$: Set spark config: spark.sql.hive.metastore.version -> 1.2.0
18/11/02 20:34:29 INFO SparkConfUtils$: Set spark config: datanucleus.autoCreateSchema -> true
18/11/02 20:34:29 INFO SparkConfUtils$: Set spark config: spark.hadoop.hive.metastore.schema.verification -> false
18/11/02 20:34:29 INFO SparkConfUtils$: Set spark config: javax.jdo.option.ConnectionURL -> jdbc:sqlserver://abizerdb.database.windows.net:1433;database=test_abizerDB
18/11/02 20:34:29 INFO SparkConfUtils$: Set spark config: datanucleus.schema.autoCreateAll -> true
18/11/02 20:34:29 INFO SparkConfUtils$: Set spark config: javax.jdo.option.ConnectionUserName -> abizer@abizerdb
18/11/02 20:34:29 INFO SparkConfUtils$: Set spark config: spark.databricks.delta.preview.enabled -> true
18/11/02 20:34:29 INFO SparkConfUtils$: Set spark config: spark.driver.tempDirectory -> /local_disk0/tmp
18/11/02 20:34:29 INFO SparkConfUtils$: Set spark config: spark.sql.hive.metastore.jars -> maven
18/11/02 20:34:29 INFO SparkConfUtils$: Set spark config: datanucleus.fixedDatastore -> false
18/11/02 20:34:29 INFO SparkConfUtils$: Set spark config: javax.jdo.option.ConnectionDriverName -> com.microsoft.sqlserver.jdbc.SQLServerDriver


Once you confirm everything looks fine attach a notebook and try to create test DB and tables as below.


I cross checked via SQLWorkbench and see all the metastore tables as expected.



Also the new Spark tables metadata is present, so external metastore is setup correctly !