AzDO Pipelines

Provision your Databricks Workspace infrastructure with BICEP

Discover the power of BICEP in provisioning a robust Databricks workspace infrastructure, including the Azure Data Factory, Key Vault and Storage account; I'll use some PowerShell to configure RBAC permissions between resources and I'll show you how use the logging command task.setvariable

Hector Sven

13 Feb 2024 — 5 min read

Welcome 😁! This would be Part 1 of the series ...

Setting the stage

Buckle up as this would be quite a ride, but I hope a useful one at very least 😅🤞 from my previous examples, this example's complexity is higher in at least one order of magnitude, nonetheless, necessary to meet real world solutions.

✍️

With regards to the code, be advise that the code explained throughout this article compared to the one in the repository may suffer some changes during the series, at the end, there will be one final codebase that will evolve over time, nonetheless, rest assure that all code in the images presented works just as well as the latest published in the repository 😉

Solution

On the next part of the article, I'll explain just the most relevant parts of the code

🧑‍💻

REMEMBER TO DOWNLOAD THE CODE!!!! from this AzDO Git repository streamlining-databricks-with-devops-and-adf 😁🤞 and follow along.

Deploying the Infrastructure

Of course with are going to start with the start of the show... Databricks Workspace's BICEP! Perhaps you will find shocking how easy this template looks like right? And that is exactly the point, is that easy!

I want you to take note on the fact that each Databricks Workspace creates along with it a managed resource group (1), in a nutshell, is a resource group that holds related resources for an Azure solution; I also want you to observe the Output section (2), we are sending back the Id and Url of the workspace and we will configure ADF liked services with this information, hence, streamlining the pipeline 😁.

🤓

Of course, there're more options available on the latest API version Microsoft.Databricks workspaces 2023-02-01; options for deploying your workspace within your own virtual network (aka VNet injection), and encryption are available.

In our example environment, we will be creating some additional resources, in several of my prev. articles, I showed an example of Azure Data Factory, but I'm going to show you something new!!! Let's use BICEP to create a linked service and use the output values from Databricks directly.

As shown, we are creating a linked service of type AzureDatabricks (1), and using the return output from Databricks BICEP module, we configure our linked service with the domain (2) and workspace resource id (3)

Following the same logic, we will also deploying an Azure Key Vault, a Storage Account with its Hierarchy namespace enabled (named Data Lake in the example).

Retrieving values from Deployment

There are a few additional tasks that are not possible to achieve via BICEP, but in order to complete it, we DO need its output, this is how I do it.

Once task Azure Resource Manager template deployment (1) is complete, we provide a name to the outputs section using the option deploymentOutputs (2), we us ARM_OUTPUTS (3) to capture the output values, we then convert from json to and object and we iterate using ForEach and create variables that can be used in subsequent tasks in the pipeline using the logging command (5)

Write-Output "##vso[task.setvariable variable=NameOfVariable;]ValueOfVariable"

🤓

Logging command task.setvariable is very important, hence I recommend to check the article in the link... Spoiler alert! I'll be using its property isoutput=true shortly

Setting RBAC permission with Power Shell

We will use an Azure Power Shell task to set RBAC permissions

We have the Databricks workspace name from our variables (1) and we use the ADF's Object Id (2) extracted from the outputs and use with New-AzRoleAssignment PowerShell cmdlet

Preparing the agent and creating a KV Secret scope

Tasks 1 to 3 prepare DevOps pipeline pool agent, first by indicating what version of Python to run (1), generate the Entra's ID authentication token (2) for the service connection, it then creates the [DEFAULT] configuration profile (3) for the workspace.

We finally create the Secret scope (4) to the prev. created Azure Key Vault for secure and centralized storage of application secrets.

Create the Workspace Cluster

Last but not least... creating the cluster 😅! Remember that we are still under the context of our previously created [DEFAULT] configuration profile in the a DevOps pipeline pool agent powering our job, therefore, we just need to call via Databricks CLI for a new cluster defined as a json file and get its ID (1).

Also notice how we we are now using isOutput=true along with the task.setvariable, the reason is because we are shortly going to use another job, this property allows variables to be fetch as a dependency as show below.

If you work with Databricks before, you'll know that clusters take some time to start 😬 and since this is an asynchronous operation, I just use a Manual Validation task to check the cluster status before continue (2), there's other ways to "wait" using Delays or loops to check for the cluster status, use the one that fits your needs, I will leave two additional methods in the code 😁.

For our gran finale, we just terminate the cluster (3) to avoid any unnecessary costs 😊

Execute the deployment

I explained the code, it is now your turn ...

🧑‍💻

DOWNLOAD THE CODE!!!! from this AzDO Git repository streamlining-databricks-with-devops-and-adf 😁🤞 and follow along.

Then, check on the video at the end of below article where I show you how to create the Azure DevOps pipeline

Run the pipeline and monitor the deployment by clicking on your Resource Group (1) > Deployments (2), you can view each resource's deployment status (3)

Call to action

On my next article, I'm going to talk about how to use Users & Groups Management via Azure's Entra ID for your Databricks setup, if you like to get it directly via email, below is the subscription button... wish you all the best!

Provision your Databricks Workspace infrastructure with BICEP

Hector Sven

Setting the stage

Solution

Deploying the Infrastructure

Retrieving values from Deployment

Setting RBAC permission with Power Shell

Preparing the agent and creating a KV Secret scope

Create the Workspace Cluster

Execute the deployment

Call to action

Read more

SQL Bits 2025

End-to-End Azure SQL Automation with Bicep and DevOps Pipelines

MSFT Zero to Hero - SQL Meets Fabric

Dataverse Meets Fabric - What About Monitoring?