Selective deployment of Azure Data Factory (ADF) components (Series)
Adding ADF the ability to selectively deploy components will significantly enhance the flexibility and efficiency of your data integration processes by allowing you to deploy specific components based on your business needs.
Introduction
I have been actively working with Azure Data Factory (ADF) since its initial release in 2015 when version 1 (script-based) became generally available. Over the years, I have closely followed its evolution through diverse scenarios, varying requirements, and companies of different sizes and industries.
As indicated by Gartner's Magic Quadrant for Data Integration Tools - Dec. 4, 2023, Microsoft maintains its position as a leader in this domain:
Microsoft [...] offers SQL Server Integration Services (SSIS) for on-premises data integration, Azure Data Factory (ADF) for hybrid and Azure-based data integration, and Power Query for data preparation tasks.
I'm not trying to convince you to use ADF, I assume that you already are or... you really have no choice 😬 because it was already decided for your project or company to use it! Nonetheless, it is your job to "[...] achieve consistent access and delivery of data across a wide spectrum of data sources and data types to meet the data consumption requirements of business applications and end users." (Magic Quadrant for Data Integration Tools - Dec. 4, 2023)
Adding the ability to selectively deploy components will significantly enhance the flexibility and efficiency of your data integration processes by allowing you to deploy specific components based on your business needs, rather than deploying the entire setup each time.
The advantages are manifold: increased efficiency in data operations, reduced downtime, and optimized resource usage. Furthermore, it can facilitate iterative development and testing processes, enabling you to incrementally improve and expand your data pipelines based on evolving requirements and feedback.
Setting the stage
For you to get the most of this series you MUST know at least these two things about ADF: first would be the function and usage of its core "building blocks" (linked services, datasets, pipelines and triggers), the second, is that you have to click on the "Publish" button for your triggers to "work", that is, run as scheduled or in response to events; throughout this series I will show you how to configure your ADF in a Continuous integration and delivery (CI/CD) cycle, this means in a nutshell, connect it to a Git repository and removing the need of manually clicking the "Publish" button and do so programmatically instead, in addition, we will be able to control with precision which of the building blocks are to be included (selective deploy), all of this orchestrated by a Azure DevOps YAML pipeline.
Scenario that I just described is the end result of an "evolution" of three generations:
- Automating ADF's CI/CD with Full Deployment (Generation 1)
- Automating ADF's CI/CD with Selective Deployment (Generation 2)
- Automating ADF's CI/CD with Selective Deployment on Shared instances (Generation 3)
Each of these generations will be thoroughly explored in separate articles, where I will not only explain the technical aspects but also provide insights into the associated business context, implementation complexity, and limitations in addressing more intricate requirements. This comprehensive understanding will empower you to choose the approach that best aligns with your specific needs.
Where to start?
I'm going to split this series into four articles 😅 as follow:
Part 1. Understanding ADF's authoring modes and Publishing Cycle
There are two ways to authoring changes in ADF: one is the Live mode and the other is Git-enabled, I will explain the differences, usage and configuration; the second goal would be basically answer the question "what does the Publish button do in Azure Data Factory?"
Part 2. Automating ADF's CI/CD with Full Deployment (Generation 1)
I'll show step-by-step how to configure the The new CI/CD flow described by Microsoft and explain what in which cases I recommend you to use this simplistic, yet, powerful method; I'll also include some best-practices and recommendations.
Part 3. Automating ADF's CI/CD with Selective Deployment (Generation 2)
This would be a leap forward from our previous implementation G1, from publishing a Full ARM into a Selective Deployment of Components, I'll introduce Azure DevOps Marketplace and show you how to install the extension Deploy Azure Data Factory (#adftools) created by Kamil Nowinski and step-by-step how to configure our ADF.
Part 4. Automating ADF's CI/CD with Selective Deployment on Shared instances (Generation 3)
This would be our final leap forward where I will divide ADF components into two groups: Core and Data-Product; I'll explain why I think this is a good approach for enterprise scale setups where you might have a team dedicated to provide infrastructure services and development teams using this infrastructure.