top of page

How to set up a communication reading and manipulating files in Azure data lake using Databricks.

Before starting 

I assume that you already have an Azure Data Lake. Otherwise, it’s easy to set up using Azure portal or using ARM template described here.

You also need a comma delimited file (csv) in my example I use a tweet file downloaded from Kaggle. I also have a python code where you could convert this file into parquet that you could need in the mount point example. 

​

The most simple way reading from Data Lake using notebook and Python.
DatabricksNotebook1.jpg
FullCode

Cmd 1

Using spark.conf.set("spark.sql.name-of-property", value)  to set up or configure configuration to Access keys in your data lake,

​

The result should look something like:

spark.conf.set("fs.azure.account.key.labstoac.dfs.core.windows.net", "QswwB4jNa/TbPweLzzUuBkwA2EAQoxCthLnIHn67RitYLd00F5kjffAtebtGFjDMVMEI2bNIcWPZE+AStPeQ9zA==")

In Databricks notebook add command 

spark.conf.set("spark.sql.name-of-property", value

​

Replace name-of-property with fs.azure.account.key.ReplaceMe.dfs.core.windows.net with the name of your Storage Acount (Lake).

Replace value with the key (key1 or key2) found under Access Keys.

​

​

​

​

GetAccessKey.jpg

Cmd 2

File location is the container were you store your file.

​

file_location = "abfss://raw@labstoac.dfs.core.windows.net/"

​

bottom of page