Manage Databricks Users with Azure's Entra ID
Azure Databricks SCIM allows you to sync users and groups from Azure Active Directory to Azure Databricks ensuring consistent access control across platforms and simplifying user provisioning and deprovisioning processes
Setting the stage
I was reviewing our Databricks deployment against recommended security best practices listed in article Azure Databricks - Security Best Practices and Threat Model, in particular, the one suggesting to "Separate accounts with admin privileges from day-to-day user accounts".
I was shocked when I realized that all developers were members of the admin group 😬... of course that I immediately removed them from the group just to find them all back the next day 😒, but why?
Same article gave me the answer...
"[...] as part of the Azure RBAC model, users that are given Contributor or above permissions to the Resource Group for a deployed Azure Databricks workspace automatically become administrators when they login to that workspace.
Indeed we granted the RBAC Contributor role for all developers at Resource Group, so, we fixed this by granting the Read role instead... but the second issue arrived, new developers weren't able to log unless created one-by-one manually by an administrator within the workspace as Azure's Entra ID security groups weren't available 😬!
Solution HAD to meet two criteria 1) give proper level of access, 2) allow admins vs users separation, 3) automating user access provisioning/de-provisioning.
Solution
Accounts provisioning/de-provisioning needs to happen regardless of these users being admins or day-to-day users (separation will come naturally afterwards) so let's start with managing Databricks Users with Azure's Entra ID, and to do so, we need a System for Cross-Domain Identity Management (SCIM)
Keep users and groups up-to-date using SCIM
From About SCIM provisioning in Azure Databricks
SCIM lets you use an identity provider (IdP) to create users in Azure Databricks, give them the proper level of access, and remove access (deprovision them)
This comes in the form of Free Azure Enterprise Application named Azure Databricks SCIM Provisioning Connector that can be install from Azure's Entra ID, but before, we need to decide whether it should be configured for Workspace or Account.
- Workspace. Means that we create one SCIM enterprise application for each Databricks Workspace; it uses the workspace's URL and personal access token (PAT) generated to configure and associate the application to the workspace.
- Account. Means that we create just one SCI M for all Databricks workspaces within the Account (aka Azure's Entra ID); it uses account's URL an SCIM Token to configure the application and use it on all your deployed workspaces.
According to article Configure SCIM provisioning using Microsoft Entra ID
Databricks recommends that you provision users, service principals, and groups to the account level and manage the assignment of users and groups to workspaces within Azure Databricks.
Configure the Account SCIM Application
The procedure is pretty straight forward... If you meet the requirements 😅! The articles do not have visual aids, so, I decided that it will be much better to show you the process 😉
Manage identity & access in your workspace
Listed as requisite in article Provision identities to your Azure Databricks account using Microsoft Entra ID
Your Microsoft Entra ID account must be a Premium edition account to provision groups. Provisioning users is available for any Microsoft Entra ID edition.
This is more a restriction than a requisite... if you are like me and using Entra's ID free plan level, you'll see the following note and, as stated, work with individual users only.
If you work for a company most likely you'll working with a Premium plan, alternatively, you can subscribe to Microsoft Entra ID P1 for $6.00 user/month 😉
I recorded the following video to show how to do it for your Databricks workspace.
Conclusion
There are three major benefits of this approach:
- When you remove a user, the user is automatically removed from Databricks.
- Users can also be disabled temporarily via SCIM.
- User & groups are automatically synchronized.
In the next delivery of this series I'm going to show you how to Deploy your Notebooks across environments with DevOps YAML Pipelines, don't miss it! Subscribe if you'd like to get articles directly to your inbox, as always, best of lucks.