Why DataOps with Data Automation?

Bilde av Terje Vatle
by Terje Vatle

22. Apr 2021, 10 minutes reading time

Why DataOps with Data Automation?

DataOps is an approach inspired by DevOps, Agile Development and Lean Manufacturing looking at the whole data lifecycle to maximize the value of data to an organization. It aims at aligning people around a shared focus on increasing agility, reducing data delivery cycle times, and minimizing errors to deliver confidence to the business user with each data delivery. It is about reacting faster to changes while minimizing risks and safeguarding the integrity of the organization’s data estate. 

Today, DevOps is a well-established approach to software development to accelerate the build lifecycle using automation. Effectively, by leveraging on-demand IT resources, and automating integration, test, and deployment of code, massively improving quality and software release cycle times of leading companies such as Google, Amazon, Facebook, and Apple. How can these concepts be applied to the data analytics value chain to adapt quickly adapt to changes while at the same time ensuring quality, system integrity and reducing operational risks?

According to McKinsey, the insights value chain is a combination of technical and business foundations. It starts with identifying, accessing, and storing necessary data, analyzing, and visualizing the data, having the necessary infrastructure in place, and finally having the people and processes to implement insights and decisions. As a multiplicative of data, analytics, IT, people, and processes, you are only as good as the weakest link in the chain. So, for DevOps to be effective for data analytics the people aspects found in Agile Development such as collaboration and innovation must be added.

As opposed to sequential waterfall project management with long development cycles and “big bang” deliverables at the end, agile development delivers new or updated analytics in short increments. The shorter increments of agile development are well suited for data analytics where business requirements tend to change quickly, and you need to accelerate the time to business value. However, value is a result of a value chain like in a factory, so we must not only consider development and deployment, but also operations.

Adopting_DataOps_3

The operational perspective of data analytics considers a data pipeline from beginning to end, where data traverses through a series of steps before being available to people and processes. For example, raw data and metadata from an ERP system is extracted, loaded into a data platform, transformed and quality assured, put into business context and made consumable as a dashboard for a specific decision-making process. The operational process, must manage quality, efficiency, constraints, and uptime. These principles are found in lean manufacturing i.e., with statistical process control to monitor the pipeline and warn about any deviations outside acceptable ranges.

DataOps is therefore inspired by the following three areas:

  1. DevOps for accelerating build lifecycle using automation
  2. Agile Development for adding the people aspects to DevOps in the data value chain
  3. Lean Manufacturing with statistical process control to monitor the health of the data pipeline

 

Applying DataOps with Data Automation 

According to the 2021 Gartner paper “Assessing the Capabilities of Data Warehouse Automation (DWA)”, where our software Xpert BI is used as case study, data warehouse automation can simplify the implementation of DataOps. Data warehouse automation, or simply Data Automation to indicate a wider scope than just a data warehouse, helps the organization with a tool to align focus on automating and improving rapid delivery of quality assured data for decision making processes, while maintaining integrity of the data pipeline.  

Looking at typical DataOps features we could list the additional features when adding Data Automation.

DataOps and Automation

An example of increased flexibility is the opportunity to quickly move your data platform into the cloud without having to re-write all the code. And when requirements may change in the future you can move your data platform to a different storage and processing technology such as between Azure SQL database, Azure Data Lake Storage gen2, Azure Synapse Analytics and Snowflake, providing more business agility. 

A more comprehensive look into Data Automation and how it enables and extends DataOps:  

  1. Accelerating build lifecycle for data assets
    • Enabling reuse of code, best practices, methodologies, data structures, architectures etc. whenever possible
    • Enabling patterns for leveraging different types of scalable databases, data lakes and massively parallel processing architectures
    • Enforcing standardization of development practices to improve development speed and make change management more efficient
    • Significantly reducing time needed for manual and repetitive tasks such as configuring the ingestion from a complete data source as one operation rather than detailed point-and-click for each table
    • Efficient impact analysis and change management end-to-end
    • Conserving data source metadata from complex sources such as ERP systems to quickly re-engineer a data model from the source with usable keys and understandable table and column names
  1. Adding the people aspects with alignment and shorter delivery increments
    • Supporting the work of data stewards and data owners with built-in data governance functionality such as data catalogs, data lineage, Live documentation and tagging of data and compliance reports
    • Enforcing data quality with test automation, and by having automation delivered as software rather than a custom framework, the framework itself doesn’t have to be testing with each release
    • Automating deployment and migration for cloud and hybrid data platform environments to quickly test and move new deliveries into production
    • In addition for Xpert BI: Enabling data owners and data engineers with an interface to document the meaning of data and ownership to assure correct use later down the data pipeline
    • Enabling data lineage from sources, across different storage technologies and all the way up to where analytics is delivered
  1. Monitoring the health of the data pipeline
    • Adding statistical measures along the data pipeline to quickly warn if any statistics are outside acceptable ranges
    • Automatic detections of changes in data sources
    • In addition for Xpert BI: Built-in test management enables the organizations to monitor deviations in data content for each data asset, such as monitoring data growth, specific data values and data integrity
    • Having end-to-end metadata enables proactive error detections and comprehensive performance optimizations at design time, before anything code is executed
    • Enabling estimation of costs of a design decision before runtime to comply with budget constraints for a cloud data platform

 

Data Automation aims to improve time to value across the complete data lifecycle. It includes key processes in both designing and maintaining a data platform such as planning, analysis, design, development, orchestrating, testing, deployment, management, operations, change management and documentation. 

The concept is to identify tasks within those processes that are either repetitive or would otherwise be error-prone if coded by hand, and then making those tasks a matter of manageable configuration. Automation of repetitive or time-consuming tasks is a means to free up critical resources to focus on high-impact initiatives in the organization.

Gartner predicts that by 2023, 70% of organizations will use value stream management to improve flow in the DevOps [or rather DataOps] pipeline, leading to faster delivery of customer value.

 

Our experience

With today’s many digitalization initiatives, cloud migrations, new business use cases and growing focus on analytics, AI and ML, there is an increased need for governed and quality assured data delivered quickly. However, with growing complexity of hybrid environments, growing data volumes, velocity and variety it is difficult for organizations to manage given a limited set of resources. DataOps combined with Data Automation is the next evolutionary step to handle these challenges.  

In our experience, at least 80% of the business value of a modern data platform is delivered over time through changes, rather than based on the original requirements. Adopting DataOps combined with Data Automation is key to rapidly detect, collaborate and manage changes efficiently while still ensuring integrity and minimized operational risks.

Adopting_DataOps_2

Our software Xpert BI helps organizations adopt DataOps and Data Automation by accelerating the build lifecycle for data assets, adding the people dimension, enabling short delivery increments, and safeguarding the operations of your data pipeline. We enable data specialists to focus on solving business challenges together rather than being slowed down by manual and repetitive work. 

 

Selected sources for further reading:

New call-to-action

Terje Vatle

Terje Vatle

Terje Vatle is Chief Technology Officer at BI Builders. With a background from data & analytics advisory and development, Terje focuses on how to make organizations more data driven and the journey towards a modern data platform in the cloud. Terje has a passion for skiing, traveling and international politics.

Follow our blog


Our dedicated employees write professional blogs worth reading.
Follow the blog for a sneak peek at the future!

Others would also read


What defines a data-driven company?

What defines a data-driven company?

Picture of Jarle Soland

by Jarle Soland

12. May 2021, 4 min reading time

BI Builders 10 Year Anniversary

BI Builders 10 Year Anniversary

Picture of Jarle Soland

by Jarle Soland

05. May 2021, 5 min reading time

How can you respond more quickly to digital change?

How can you respond more quickly to digital change?

Picture of Alf Inge Johansen

by Alf Inge Johansen

04. May 2021, 3 min reading time