How to set up a communication reading and manipulating files in Azure data lake using Databricks.

Before starting

I assume that you already have an Azure Data Lake. Otherwise, it’s easy to set up using Azure portal or using ARM template described here.

You also need a comma delimited file (csv) in my example I use a tweet file downloaded from Kaggle. I also have a python code where you could convert this file into parquet that you could need in the mount point example.

The most simple way reading from Data Lake using notebook and Python.

FullCode

Cmd 1

Using spark.conf.set("spark.sql.name-of-property", value) to set up or configure configuration to Access keys in your data lake,

The result should look something like:

spark.conf.set("fs.azure.account.key.labstoac.dfs.core.windows.net", "QswwB4jNa/TbPweLzzUuBkwA2EAQoxCthLnIHn67RitYLd00F5kjffAtebtGFjDMVMEI2bNIcWPZE+AStPeQ9zA==")

In Databricks notebook add command

spark.conf.set("spark.sql.name-of-property", value)

Replace name-of-property with fs.azure.account.key.ReplaceMe.dfs.core.windows.net with the name of your Storage Acount (Lake).

Replace value with the key (key1 or key2) found under Access Keys.

Cmd 2

File location is the container were you store your file.

file_location = "abfss://raw@labstoac.dfs.core.windows.net/"