17. Nov 2020, 14 minutes reading time
Leading organizations today make active use of data to power their decisions and drive digital transformation. However, data resides in silos across the organization and across the ecosystem in which the organization operates. And data cannot simply be used for any purpose without up-front work. This blog looks at how to make that work as efficient as possible reflecting on insights from the 2020 Gartner paper Automating Data Warehouse Development where our software Xpert BI is referenced as a case study.
A rule of thumb in analytics and business intelligence is that the collection and preparation of meaningful data accounts for 80% of the work involved. A large part of data preparation is typically done in a data warehouse. Correspondingly, the work on the consumption side i.e. with analytics, dashboarding, reporting, AI & ML etc accounts for 20%. For the same reason, the largest data driven organizations globally, such as Google, Amazon, Facebook and Apple, have far more data engineers than data scientists. It is precisely the manual tasks with the facilitation of data that the data warehouse automation aims to minimize, targeting a reduction where it matters the most, namely on the 80% of the work.
Key activities in making data available to your organization include collecting the data you need from any source or system, organizing it and giving it meaningful context ready for use. Meaningful data can be used across multiple business areas. Some of the typical areas are supporting decisions on operational, tactical and strategic levels in an organization, i.e. with perfomance KPIs, customer insights, sales, marketing, investment analysis, logistics optimizations and much more. It is often done by providing reports and dashboards, data as a service into business systems and as a basis for AI & ML. The place you collect and organize data is typically your data platform. And data that follows structures that would fit into tables, rows and columns, are typically managed in a data warehouse. A data warehouse is a specialized and constantly evolving approach to handling structured data. It is not replacing or being replaced by data lakes, data hubs or other design patterns.
A key challenge with developing a data warehouse has not only remained, but grown in scope; how can you efficiently build and maintain something that is constantly changing and growing in complexity?
Most organizations have had ways of working with structured data, and large tech companies, banks, insurance providers, telecom providers and others have had different flavors of data warehouses for decades. And the use cases these data warehouses cover grow in importance as each organization become more and more data driven. The advent of big data and cloud computing some 10+ years ago has not reduced the need for meaningful data, rather exploded the complexity. A key challenge has not only remained, but grown in scope; how can you efficiently build and maintain something that is constantly changing and growing in complexity?
Looking at the past, many data warehouse implementations have been extremely expensive and struggled with proving a positive ROI. Also, many data warehouses have proven to be difficult to maintain over time and often not provided the business agility organizations really need. Take current day as example. The global COVID-pandemic has massively accelerated the digital journey for many organizations and increased the need for meaningful data beign available for decisions, anywhere the employees may be.
Traditional approaches to building and maintaining data warehouses are not suited for today’s rapid changes and growing complexity.
On the flip side, as digitalization accelerates, organizations who miss out on the opportunities residing in their data risk being disrupted or not meeting customer expectations. Some examples of key insights provided with data can be understanding how your organization performs, how the market is changing, how each customer thinks, how each service you provide can be tailored and how each business process can be improved. Customer expectations tend to slide across industries where leaders set the bar for others to follow. In addition, purely data driven organizations tend not to be confined to traditional industry segment thinking and can be the next big disruptor and competitor. In summary, traditional approaches to building and maintaining data warehouses are not suited for today’s rapid changes and growing complexity.
Today, technology and practices have matured, and automation is no longer a question. For data warehouses, data warehouse automation is intended to handle the growing complexity.
According to Gartner, data warehouse automation can make each developer four times as efficient compared to not using automation.
In 2020, Gartner, a global research and advisory firm, published the white paper Automating Data Warehouse Development. According to Gartner, data warehouse automation, can on average make each data warehouse developer four times as efficient. So not only making at the initial construction of your data warehouse much more efficient, but also significantly reducing time-to-market for each new data-driven initiative and change. In this blog I will demystify automation, explain how a data warehouse automation software works and investigate some of the benefits for your organization.
Compared to ETL, ELT and other data integration tools which are generic in nature, data warehouse automation sets a narrower scope. It focuses on building and maintaining a data warehouse as efficiently as possible considering typical tasks and best practices involved to see what can be automated to simplify the work of the developer.
The productivity gain of 400% with data warehouse automation is made possible by each developer working at a higher level of abstraction with a narrower scope. Data warehouse automation typically offers a low code approach and lets most changes be a question of metadata configurations rather than coding.
The developers work largely with the configuration of metadata that generates code, rather than writing the code itself. Since the framework is metadata driven, it helps the developer keep control of end-to-end dependencies which simplifies impact analyses, runtime optimizations, error detections and reduces risks of human errors. Generic tasks can be done using configuration. Customer-specific tasks such as implementing business logic can be drag and drop or coding depending on one’s preferences, skill level and the flexibility of the data warehouse automation tool. In addition, there is a set of built-in complex operations that are designed according to best practice so that the solution becomes more robust and the need for testing is significantly reduced.
Below are typical tasks supported by a data warehouse automation tool:
Gartner divides the benefits into two main areas; productivity and adaptability. In addition, we also see fostering a data-driven culture as an important contribution from data warehouse automation. Our view on a data driven culture is the combination of knowledge and behavior our that is necessary for the organization to actively use data for decisions.
Increased productivity allows the organization to deliver business value faster, do more of the work itself and meeting user expectations.
Increased adaptability enables the organization to seize new opportunities faster with reduced risk, scale up more easily and handle migrations.
Unleashing the full potential of your data in terms of productivity and adaptability will in most organizations depend on a cultural shift. Data warehouse automation can enable the shift from a technology perspective.
Most organizations today aim at becoming more data-driven to stay competitive and meet customer expectations. A centerpiece in that expanding and increasingly complex data-driven agenda, is the concept of a data warehouse. It typically handles the structured data of the organization for a wide range of analytical purposes. Within this scope, data warehouse automation becomes a necessity to reduce growing complexity.
Gartner states that data warehouse automation greatly improves the productivity and adaptability of your organization. In addition, we see data warehouse automation also fostering a data driven culture by enabling transparency, self-service and always updated documentation.
In comparison, ETL, ELT and other data integration tools typically do not deliver the level of automation you would need to build your data warehouse. Thus the choice going forward will be between building your own automation framework or buying an automation framework off the shelf. Key considerations could be robustness, flexibility and scalability. We will look into discussions about build vs buy in follow up blog entries.
Regional Director, MSc, experienced advisor and speaker on scalable data platforms and architectures.
Our dedicated employees write professional blogs worth reading.
Follow the blog for a sneak peek at the future!
by Alexander Khorunzhiy
27. Nov 2020, 6 min reading time