In many organization a common misconception is that DevOps is about the tools we use, so let’s use a second to read the citation from Microsoft.
DevOps brings together people, processes, and technology, automating software delivery to provide continuous value to your users.Microsoft
Of course this post will not be about what DevOps is and isn’t, but I think it’s important refresh the citation once in a while. This post will however focus on how the tooling can help you automate the deployment of Data Factory from development to test environment, that can easily be transitioned to any other environments you may have.
For this purpose I have set up a GitHub repository, two resource groups (development and test) in Azure and a project in Azure DevOps.
Configuring our development environment
Firstly we need to create a data factory resource for our development environment that will be connected to the GitHub repository, and then the data factory for our testing environment. This can be done by using PowerShell, Azure CLI or manually from the Azure portal- pick your choosing, but remember to create it in their respective resource groups.
When the resources is successfully created, then navigate to the Data Factory Author & Monitor tool for development environment and click the Set up Code Repository icon.
This will prompt you to provide information about the repository that should be used for version control. Follow the configuration steps and click the save button when done.
Note! Be aware that if the repository is private within an organization on GitHub you might need to get the GitHub OAuth app for Data Factory approved.
Note! Another issue that can occur is when the code repository seem to be configured, but the data factory cannot show signs of version control being in place. The reason for this is often related to how the Azure subscription is setup and what role you have been assigned. The result is that the wizard cannot complete the necessary configuration.
If the configuration was successful the Data Factory has been configured to use the GitHub repository. Let’s create a sample pipeline that will be used during our deployment.
In the visual designer click on the name of the active branch and select the option New branch and follow the steps. This will generate a new branch in your GitHub repository and is where our development will be done. Take note that you can only debug pipelines that is not published.
In the below image I have created a feature branch with the name myfirstpipeline that contains one pipeline with a wait activity. The below pipeline is saved to version control using the save all functionality and tested using the debug option.
When something is saved using the visual designer, Data Factory will handle the interaction with the version control system and do necessary changes on the active branch.
Diving into the folder for our pipelines you will find our newly created pipeline as a JSON file. This is how Data Factory store our definitions of pipelines, triggers, connections and so on.
Let’s skip a few steps and assume our team processes have been followed and everything is merged back to the master branch. The changes are now assumable ready to be promoted to our testing environment.
Clicking the publish button back in the visual designer will trigger the data factory to list pending changes that will be applied to the adf_publish branch. If the changes are approved, the branch will be updated with latest version of the source code. In GitHub the publish branch is non-mergable to master and will contain the ARM deployment templates. These templates will be used by Azure DevOps when deploying. Sadly clicking publish at this stage will not trigger any changes on our testing environment.
Configuring our Azure DevOps release pipeline
Having the development environment configured and all our changes stored to version control is crucial in order to enable automated deployments.
Navigate to your Azure DevOps project and create a release pipeline. Below I have created the release pipeline ndev-adf within the project with the same name.
The first step needed is to define where our release pipeline will pick up it’s artifacts. Click the add button and choose the appropriate service connection and repository. The default branch must be adf_publish as this is where the Data Factory will generate the ARM templates.
When the artifact source is defined we want to enable continuous deployment for each time we publish our changes. Click the lightning button on the artifact source and enable the trigger. I also like to specify the branch filter in case something is changed on the artifact source. Click the save button to store the changes.
Let’s create the deployment stage that will be responsible for publishing changes to our test environment. Click the add button in the Stages window and choose that you want to start with an Empty job. Name the stage Test and click the menu option Tasks in order to see the details.
We then need to add a task to our agent job that will perform the deployment. On the agent job click the plus sign and search for ARM. This will give you the option of adding Azure Resource Group Deployment task. Add the task.
Provide the task with a readable display name and the Azure details for where you are going perform the deployment. As we can see in the below image it’s required that we provide a reference to a ARM template. In our case we want to add a reference to the template within the adf_publish branch.
Click on the open details button and select the file ARMTemplateForFactory.json before clicking the OK button.
Do the same steps for adding the template parameters file to the task. This will allow us to override template parameters like the data factory name. The name is required in order for the resource template to know what resource to update.
The last step is to choose that the task should have a deployment mode of incremental. Click the save button when done.
To test our newly created pipeline click the Release menu option and choose Create a release. When done you can follow the deployment status from the release view. If everything was deployed successfully the stage will be marked as green.
If we navigate to our data factory in test environment we can now see the published changes.
Note! As you probably remember we enabled continuous deployment based on changes in the adf_publish branch, meaning that if you go back to the development environment and do a change that is then published – it will automatically be deployed to the test environment.
This post touched briefly upon the deployment of Data Factory from a development to test environment using GitHub as version control system.
As you develop data factory pipelines you will most like face questions like
- How do I update active triggers?
- Why isn’t the deployment deleting removed elements?
- How can I best manage connection strings, keys and secrets?
The documentation on the Microsoft website is a good place to start and gives you answer on some of the questions above.