Today, DataOps is for data engineering teams what DevOps is for software development teams. Instead of improving the efficiency and robustness of shipping code, DataOps focuses on the efficiency and robustness of shipping trusted data. DataOps considers the data lifecycle from raw data, via curated data, analytics, insights, to business benefits. It looks at the data and insights being produced in production lines thus borrowing ideas from both DevOps, agile development, and lean manufacturing.
Key features of DataOps include:
- Support for team collaboration: Reusable data flows and tools, documentation, breaking down data silos.
- Source control: Code and metadata is versioned and managed.
- Data lineage: Traceability of data through pipelines.
- Access control and data sharing: Limiting who can access data for different purposes.
- Pipeline orchestration: Using reusable components to design pipelines end-to-end.
- Monitoring pipeline health at runtime: Detecting and fixing anomalies.
With data automation you also get a set of additional features:
- Productivity boost for the team: Reusable patterns, deployment and change management.
- Test automation: Generating tests in bulk such as comprehensive tests of semantic models.
- Unit testing: Low level data quality test, potentially for each pipeline and activity.
- Guided pipeline development: Built-in standards and best practices.
- Data lineage end-to-end: Complete traceability of data from your dashboard, through your complete data platform down to each data source.
- Live documentation: Friendly to end users to simplify self-service and user adoption of trusted data, dashboards, and reports.
- Proactive health warnings at design time: Metadata dependencies are used to look for anomalies before code is run in production.
- Flexibility: Ability to adapt to new architectures with minimal or no re-coding.
Xpert BI enables efficient DataOps on many levels, and this blog explains how Xpert BI supports source control and data testing.
Xpert BI v4 is fully integrated with Git and Azure DevOps. A lot of our customers are already using source control for other IT initiatives and projects or for other parts of the data platform, and now Xpert BI can be an integrated part of this process. This means that any configuration and/or metadata change done in Xpert BI can be tracked and pushed to Azure DevOps (in this example).
To get started with source control, just connect Xpert BI to your source control environment and you are ready to track all your code and configuration changes as you develop your solution.
This creates a local file repository of the entire solution. When you have made some changes and are ready to commit your work, simply click Commit and Xpert BI will analyze and identify which changes you have done, show the changes in a list and you can select which of these you want to commit to source control and push to Azure DevOps.
When you log on to Azure DevOps and open files and commits there, you will find all files automatically nice and structured in an intuitive and standardized folder structure.
Integrating with source control ensures better quality and control in the development process, and it makes it easier for larger teams to collaborate. You can of course also update/sync changes from the shared repository to the local, merge and resolve changes and code conflicts.
All changes are logged both in Xpert BI – for the local changes and commits – and in Azure DevOps for the changes which are pushed to the global repository.
Xpert BI includes Test Automation and Unit testing for data QA
Xpert BI has incorporated DataOps features such as Test Automation and Unit Testing for a few years now. This is an important part of ensuring the ongoing data quality of a data warehouse (or data platform) solution. The new feature in Xpert BI v4 is that the list of supported database types has been increased in the Connection manager for Unit Testing.
Xpert BI increases data quality out of the box (OOTB) by standardizing and automating table generations and data load processes. This gives a consistent design and includes built-in data quality checks such as duplications, naming conventions and relationships and ensures a good and qualified foundation for the data platform.
In addition to the OOTB data quality enhancing functionalities, Xpert BI also incorporates testing as a natural part of a data flow and data processing throughout the ELT/ETL process. Not only as part of a project's development phase, but also supporting data quality and data logic testing for the production environment. The test module in Xpert BI has two parts, Test Automation and Unit Testing.
Xpert BI auto-generates tests in bulk using metadata that is stored in the metadata repository. This includes auto-generated tests for checking fact-dim relations in a star schema or checking for duplicates in a set of tables or views.
This saves time for the developer for establishing tests and test code, but more importantly enables and encourages tests to be written as a natural part of development. Instead of a developer having to write code to do relations checks on all combinations for all facts and dimensions, this is done automatically by a few clicks. This test (which includes checking the entire data model/semantic model) can then run as a part of the data load to ensure data quality in the end user layer.
Unit testing can involve almost any type of data quality test, including row count comparisons, logic checks, and metadata checks. Tests can be grouped into folders, and folder hierarchies, and can also include set-up and tear-down processes so that an entire test suite can be built and scheduled to run as a part of the data processing jobs for the data platform. The schedule can be on folder level or single test level.
The connection for the queries/tests can be set to any supported database connection also including MS Analysis Services. This means you for example can have row-count tests on source-to-destination for all layers also including the data source.
You can also include a test against hard-coded values and a range of other configurations.
With all these features easily available we see that testing has become a natural part of the development life cycle. As it should be. However, many data and analytics solutions have initial and ad hoc testing at design time but does not have an ongoing data quality test suite running continuously in the production environment. Continuous testing and regression testing will enable the operations teams to be more pro-active in both error handling and change handling.
The latest release of Xpert BI extends support for DataOps. To learn more about the release make sure to watch our release webinar.