Automating Data Warehouse Development

Bilde av Terje Vatle
by Terje Vatle

17. Nov 2020, 14 minutes reading time

Automating Data Warehouse Development

Leading organizations today make active use of data to power their decisions and drive digital transformation. However, data resides in silos across the organization and across the ecosystem in which the organization operates. And data cannot simply be used for any purpose without up-front work. This blog looks at how to make that work as efficient as possible reflecting on insights from the 2020 Gartner paper Automating Data Warehouse Development where our software Xpert BI is referenced as a case study.

A rule of thumb in analytics and business intelligence is that the collection and preparation of meaningful data accounts for 80% of the work involved. A large part of data preparation is typically done in a data warehouse. Correspondingly, the work on the consumption side i.e. with analytics, dashboarding, reporting, AI & ML etc accounts for 20%. For the same reason, the largest data driven organizations globally, such as Google, Amazon, Facebook and Apple, have far more data engineers than data scientists. It is precisely the manual tasks with the facilitation of data that the data warehouse automation aims to minimize, targeting a reduction where it matters the most, namely on the 80% of the work.

 

Why data warehouse?

Key activities in making data available to your organization include collecting the data you need from any source or system, organizing it and giving it meaningful context ready for use. Meaningful data can be used across multiple business areas. Some of the typical areas are supporting decisions on operational, tactical and strategic levels in an organization, i.e. with perfomance KPIs, customer insights, sales, marketing, investment analysis, logistics optimizations and much more. It is often done by providing reports and dashboards, data as a service into business systems and as a basis for AI & ML. The place you collect and organize data is typically your data platform. And data that follows structures that would fit into tables, rows and columns, are typically managed in a data warehouse. A data warehouse is a specialized and constantly evolving approach to handling structured data. It is not replacing or being replaced by data lakes, data hubs or other design patterns.

A key challenge with developing a data warehouse has not only remained, but grown in scope; how can you efficiently build and maintain something that is constantly changing and growing in complexity?

Most organizations have had ways of working with structured data, and large tech companies, banks, insurance providers, telecom providers and others have had different flavors of data warehouses for decades. And the use cases these data warehouses cover grow in importance as each organization become more and more data driven. The advent of big data and cloud computing some 10+ years ago has not reduced the need for meaningful data, rather exploded the complexity. A key challenge has not only remained, but grown in scope; how can you efficiently build and maintain something that is constantly changing and growing in complexity?

 

Need for a more agile data warehouse

Looking at the past, many data warehouse implementations have been extremely expensive and struggled with proving a positive ROI. Also, many data warehouses have proven to be difficult to maintain over time and often not provided the business agility organizations really need. Take current day as example. The global COVID-pandemic has massively accelerated the digital journey for many organizations and increased the need for meaningful data beign available for decisions, anywhere the employees may be.

Traditional approaches to building and maintaining data warehouses are not suited for today’s rapid changes and growing complexity.

On the flip side, as digitalization accelerates, organizations who miss out on the opportunities residing in their data risk being disrupted or not meeting customer expectations. Some examples of key insights provided with data can be understanding how your organization performs, how the market is changing, how each customer thinks, how each service you provide can be tailored and how each business process can be improved. Customer expectations tend to slide across industries where leaders set the bar for others to follow. In addition, purely data driven organizations tend not to be confined to traditional industry segment thinking and can be the next big disruptor and competitor. In summary, traditional approaches to building and maintaining data warehouses are not suited for today’s rapid changes and growing complexity.

 

The case for data warehouse automation

Today, technology and practices have matured, and automation is no longer a question. For data warehouses, data warehouse automation is intended to handle the growing complexity.

According to Gartner, data warehouse automation can make each developer four times as efficient compared to not using automation.

In 2020, Gartner, a global research and advisory firm, published the white paper Automating Data Warehouse Development. According to Gartner, data warehouse automation, can on average make each data warehouse developer four times as efficient. So not only making at the initial construction of your data warehouse much more efficient, but also significantly reducing time-to-market for each new data-driven initiative and change. In this blog I will demystify automation, explain how a data warehouse automation software works and investigate some of the benefits for your organization.

 

How data warehouse automation works

Compared to ETL, ELT and other data integration tools which are generic in nature, data warehouse automation sets a narrower scope. It focuses on building and maintaining a data warehouse as efficiently as possible considering typical tasks and best practices involved to see what can be automated to simplify the work of the developer. 

The productivity gain of 400% with data warehouse automation is made possible by each developer working at a higher level of abstraction with a narrower scope. Data warehouse automation typically offers a low code approach and lets most changes be a question of metadata configurations rather than coding.

The developers work largely with the configuration of metadata that generates code, rather than writing the code itself. Since the framework is metadata driven, it helps the developer keep control of end-to-end dependencies which simplifies impact analyses, runtime optimizations, error detections and reduces risks of human errors. Generic tasks can be done using configuration. Customer-specific tasks such as implementing business logic can be drag and drop or coding depending on one’s preferences, skill level and the flexibility of the data warehouse automation tool. In addition, there is a set of built-in complex operations that are designed according to best practice so that the solution becomes more robust and the need for testing is significantly reduced. 

 

Below are typical tasks supported by a data warehouse automation tool:

  • Creating definitions for connection to data sources
  • Generating code to extract data and metadata from these data sources
  • Creating unique data definitions in the data warehouse
  • Establishing structures and logical levels in the data warehouse
  • Creating and customizing loading methodologies and templates that write the code for the data warehouse
  • Facilitating self-service of data, reports and analyzes
  • Managing changes such as upgrades of business systems, new data sources, change of reports etc., without compromising integrity and consistency
  • Optimizing the runtime of data load and data processing, especially important when a data warehouse solution grows over time and you want to work cost-effectively in a cloud
  • Letting users get their data updated when they need it, without double-loading data, important to avoid triggering extra costs in a cloud solution
  • Generating documentation of business concepts and data fields with traceability end to end
  • Facilitating data governance, inspection and control
  • Enabling DataOps with robust development and testing frameworks

 

Benefits of data warehouse automation

Gartner divides the benefits into two main areas; productivity and adaptability. In addition, we also see fostering a data-driven culture as an important contribution from data warehouse automation. Our view on a data driven culture is the combination of knowledge and behavior our that is necessary for the organization to actively use data for decisions.

 

Key benefit 1: Productivity

Increased productivity allows the organization to deliver business value faster, do more of the work itself and meeting user expectations.

  • Deliver more business value: Gartner states as each developer gains 400% productivity compared to not having automation. High priority requirements can be delivered faster or more business requirements can be implemented within a given amount of time thus providing more business value.
  • Do more yourself and keep key competency in-house: Increased productivity enables the organization to do more of the work itself and reduce needs for external assistance. This also reduces the risk of competence leaving the organization after the projects have been completed.
  • Meeting user expectations: As more requirements can be implemented faster, IT can reduce the backlog at a higher pace and get more satisfied end-users across the organization.

 

Key benefit 2: Adaptability

Increased adaptability enables the organization to seize new opportunities faster with reduced risk, scale up more easily and handle migrations.

  • Seize new business opportunities faster: As data warehouse automation makes it faster and less risky to implement changes, the organization can respond more quickly to new critical business requirements and changes in the competitive landscape.
  • Scale up and migrate more easily: Automation has built-in functionality to handle complex changes such as moving to more scalable solutions for storage and processing. For example, within the Microsoft Azure cloud ecosystem you may start with Azure SQL, add a data lake with Azure Data Lake Storage gen2 and then move heavy processing to scalable solutions such as Snowflake or Azure Synapse Analytics. You may also test and combine any of these solutions in sequence or parallel.
  • Handle migration to new ERP, CRM etc: As it can handle versioning of data sources, it can support your organization migrating to a new ERP-system, CRM-system, billing system, logistics system or any other complex system you may have. This significantly reduces risk and speeds up the process while making the data warehouse much more adaptable to changes.

 

Key benefit 3: Fostering a data-driven culture

Unleashing the full potential of your data in terms of productivity and adaptability will in most organizations depend on a cultural shift. Data warehouse automation can enable the shift from a technology perspective. 

  • Transparency: Data warehouse automation enables full transparency where users in any part of the organization can see data definitions, applied business rules and where data is coming from. Adding tags to data simplifies end to end data governance since for example data ownership and regulatory compliance down to each data field level can be resolved. An example is knowing exactly where all GDPR-sensitive data can be found and exactly which reports uses these data. It helps business users to better trust the data and insights they are provided. 
  • Self-service: Self-service is often difficult to achieve. For example, it requires the users to be able to easily find the data, analyses and reports they need in order to solve a business problem. Data warehouse automation can enable self service by providing an easy to use search interface for all data, reports, analyses and dashboards end to end.  
  • Always updated documentation: Traditionally, documentation was disconnected to the data warehouse and tended to be quickly outdated or requiring massive manual work to keep it updated. Data warehouse automation lets you combine user documentation with active metadata from code running in production, so it's always up to date. 

 

Summary

Most organizations today aim at becoming more data-driven to stay competitive and meet customer expectations. A centerpiece in that expanding and increasingly complex data-driven agenda, is the concept of a data warehouse. It typically handles the structured data of the organization for a wide range of analytical purposes. Within this scope, data warehouse automation becomes a necessity to reduce growing complexity.

Gartner states that data warehouse automation greatly improves the productivity and adaptability of your organization. In addition, we see data warehouse automation also fostering a data driven culture by enabling transparency, self-service and always updated documentation.

In comparison, ETL, ELT and other data integration tools typically do not deliver the level of automation you would need to build your data warehouse. Thus the choice going forward will be between building your own automation framework or buying an automation framework off the shelf. Key considerations could be robustness, flexibility and scalability. We will look into discussions about build vs buy in follow up blog entries.

Download free e-book: Building a data-driven organization

Resources:

 

Definitions

  • Data warehouse: A data warehouse (DW) is a storage architecture designed to hold data extracted from transaction systems, operational data stores and external sources. The warehouse then combines that data in an aggregate, summary form suitable for enterprisewide data analysis and reporting for predefined business needs. The five components of a DW are production data sources, data extraction and conversion, the DW database management system, DW administration, and business intelligence (BI) tools. (Gartner)
  • Data warehouse automation: Modern DWA tools can dramatically reduce the amount of work required to build, operate and maintain a DW. Data warehouse automation helps with change management — and it can enable DevOps, DataOps and other forms of agile DBMS development. Data warehouse automation can also extend beyond central functions of the data warehouse. It can also be used for a wide variety of structural data automation and for the building of data pipelines. (Gartner)
Terje Vatle

Terje Vatle

Regional Director, MSc, experienced advisor and speaker on scalable data platforms and architectures.

Follow our blog


Our dedicated employees write professional blogs worth reading.
Follow the blog for a sneak peek at the future!

Others would also read


Benefits of moving from SQL Server platform 2016 to 2019

Benefits of moving from SQL Server platform 2016 to 2019

Picture of Alexander Khorunzhiy

by Alexander Khorunzhiy

27. Nov 2020, 6 min reading time

Automating Data Warehouse Development

Automating Data Warehouse Development

Picture of Terje Vatle

by Terje Vatle

17. Nov 2020, 14 min reading time

Innovation & Development Grant To BI Builders

Innovation & Development Grant To BI Builders

Picture of BI Builders

by BI Builders

05. Nov 2020, 3 min reading time