The Power of DevOps in Business Intelligence
Common pain points experienced when building Business Intelligence (BI) data assets, and how Imaginet uses DevOps to mitigate them
If you’re a Business Intelligence developer, have you experienced some of these challenges?
- One team member overwrites the changes of another team member accidentally, such as when someone uses an outdated Power BI Desktop file and publishes to a workspace
- Someone didn’t deploy or publish all the necessary changes to databases, data loading (ETL) processes, or data models (like Power BI), and parts of your BI solution are broken.
- A change to existing code, like a table or a view in a data warehouse, breaks something else that is dependent on it. Or, because of this problem, there is resistance to change anything and it becomes hard to tell what code is clean and usable, and what is no longer in use. As a result, over time, there are many quirks that users need to know about to use the system correctly.
- Someone on the team, maybe a new team member, does something incorrectly, and no one catches the error until a user discovers it.
- If a release goes wrong, it’s tough to roll back to a previous codebase.
- It takes a long time for new functionality to be released to users because the team is reluctant to publish changes frequently (often stung by the above problems). When a release does happen, it contains many changes, with potentially more problems, and the users can be overwhelmed by the amount of change or lose confidence in the system due to the issues they experience.
- It’s hard to separate development from testing from production, so it all happens in production.
Note that most of these pain points are also applicable to a team of one person; this isn’t just about coordinating multiple developers.
Imaginet uses DevOps to provide pain relief for the above problems; this increases team velocity and improves quality significantly, and users have a far better experience and gain confidence in the BI assets they use.
What is DevOps?
According to Wikipedia, “DevOps is a set of practices that combines software development (Dev) and IT operations (Ops). It aims to shorten the systems development life cycle and provide continuous delivery with high software quality. DevOps is complementary with Agile software development; several DevOps aspects came from the Agile methodology.”
The Imaginet BI team uses Agile practices, and DevOps is a very important component of our customer projects and is a large part of why they are so successful. Primarily, DevOps consists of the automated building and deployment of the code assets of our project, such as data warehouses, data loading processes (often referred to as Extract, Transform, Load or ETL), data models, and reports.
The key part of these processes is that they are automated. When deployments are manual, the people who do the deployments can do them differently each time, and they may miss steps or use different configurations or settings. When deployment processes are automated, they are done consistently, in the same way and with the same configurations each time. Automated deployments can also be done more frequently without taking up the time of the team.
When Imaginet is working on a BI project, we use three features of Azure DevOps:
- work-item tracking
- source code repositories
- build and release pipelines.
Azure DevOps comes with five free Basic licenses, and unlimited Stakeholder licenses (that have limited read-only access), and if you have MSDN subscriptions, they have a full license to Azure DevOps. Additional Azure DevOps Basic licenses are available for purchase at about $6 per user per month.
Typically, a data engineering project has one or more databases (used for a data warehouse or data mart), an Azure Data Factory, and a data model, often a Power BI data model but we also use standalone analysis services tabular data models, and perhaps SQL Reporting Services (SSRS) or Power BI paginated reports. All of these pieces of the solution have source code (files) that define their structures, logic, and configuration. Imaginet uses Visual Studio Data Tools (which is available as a free download) for many of the source code assets in a data engineering project. We store all these source files in a source code repository, which is the first part of the DevOps value stream.
You may have heard the term CI/CD used – this stands for Continuous Integration/Continuous Deployment. Integration refers to integrating the code changes made by all team members into a single code base. When developer teams use a source code repository, they can work independently on their own copy of the source code, then check in their changes to the main repository and the system can identify the changes made by a specific team member and for what reason and when it occurred. This is even better than adding comments to source code.
When a team member checks in their changes, Imaginet implements a policy that requires the code to be reviewed by another team member and that the code successfully compiles prior to the changes getting committed to the main branch. This policy reinforces the practice that effective teams have that only production-ready code changes should be checked into the code repository.
Source code repositories keep a full history of the changes that have taken place, and detects when two or more team members have changed the same source code, and allows those change conflicts to be resolved correctly. Even for a team of one person, the change history of source code repositories is extremely valuable.
Once code has been checked in, reviewed, and successfully compiled, the output of the compiled code is available for deployment. This output includes things like a a SQL database DACPAC file that represents the structure of a SQL database without any data, or the JSON files that define an Azure Data Factory, or an .ASDatabase file which is the structure of a tabular analysis database (sometimes also called a BI semantic model). Or it could include the RDL files for SSRS or PowerBI paginated reports. These output files are generated by a Build pipeline automatically whenever the main branch of the source code repository is updated. This is the end of the Continuous Integration process and the beginning of the Continuous Deployment process.
If this was a football game, we’d be at half-time right now, so let’s recap. So far, we have implemented a source code repository into which we store all our code assets, such as database or tabular projects, Azure Data Factory, and paginated reports. This repository stores all the change history of the files and detects when team members make conflicting changes and helps to resolve those conflicts correctly. We have also put in place a policy that requires another team member to review code , and also that the code compiles without errors prior to it being committed. Once code changes are committed, a Continuous Integration process automatically builds the source code and stores the compiled output files for the next step, Continuous Deployment. The effort to get to this point is minimal; usually a new BI team just needs to adjust their habits to use Visual Studio and database projects, source code repositories, and follow the policies enforced by the system.
Now we turn to Continuous Deployment. This is the process that deploys compiled source code into specific environments for testing or production use. We use Azure Release pipelines to deploy the assets from the previous Continuous Integration phase. Typically a release pipeline will deploy a SQL database (usually a data warehouse or data mart), an Azure Data Factory, a tabular data model, and sometimes a set of paginated reports (from SSRS or Power BI Report Designer). The release pipeline has a set of tasks to deploy these objects into specific environments, and often our first environment is called Beta. This environment is used to ensure that the servers and locations and configuration settings used by the tasks are valid and can successfully deploy the objects. For example, there may be a database project that needs to be deployed, and in the Beta environment, it gets deployed to a database called MyDataWarehouse_Beta on a SQL Server called ServerA. The Azure Data Factory (ADF) code would get deployed to an ADF resource called ADF-MyETL-Beta. The tabular model would get deployed as MyTabularModel_Beta into a tabular server (SQL Server Analysis Server (SSAS) or Azure Analysis Service (AAS)) called TabularServerA. The deployment to the Beta environment would occur automatically, whenever one of the pieces generated by the CI process gets updated.