<img alt="" src="https://secure.dump4barn.com/213374.png" style="display:none;">

Chapter 1

What is data automation?

Data automation (a.k.a. Data Warehouse Automation) is about applying decades of experience to improve how we build and maintain modern data platforms.

It provides a toolset to boost productivity, agility, and robustness while minimizing risks. It can be deliverd as software or built custom in-house. 

It encompasses all the core processes, including planning, analysis, design, development, orchestrating, testing, deployment, management, operations, change management and documentation, required for data life cycle management. It is typically used to enable end-to-end DataOps and Data GovernanceIt leverages technology to automate the entire data management process, including data lakes, data warehouses, data vaults and data marts.

Read more in our blogs: The human aspect of data automation and Data Warehouse Automation.

Chapter 2

Why data automation?

Key reasons: 

  1. Be more productive with limited resources: According to Gartner paper Automating Data Warehouse development data automation as software gives 4x productivity boost over conventional data integration tools
  2. Change faster: Working on a higher level of abstraction, ready change management, source control, documentation, optimizations, swap storage technology without rewriting code, and end-to-end data lineage enables your data team to implement changes much faster. 
  3. Build robust solutions for the long term: Data automation is developed with robustness and uptime in mind, and thoroughly tested and in production at a wast number of organizations.   
  4. Minimize risks: Xpert BI is designed to mitigate the many risks of historic data warehouse, data lake and data platform projects. For example, standardization and automatic documentation minimize key personnel risks, and test automation minimizes data quality risks.
  5. Enable data governance: Data governance is about how people and applications interact with your data, and describes data ownership, data quality, data lineage and more. Despite being mainly an organizational task, it is highly time-consuming and hard to do unless properly supported by an automation tool.   
  6. Optimize cloud price perfomance: In the cloud, performance is close to unlimited, but sometimes at a steep price. Automation can let you continuously optimize price performance by being able to swapping/adding/removing storage technologies without rewriting code. 
  7.   More time for analytics, AI and ML: There are several repetitive, manual steps during data integration, documentation, optimization etc that can be automated. This frees up massive amounts on time that can be spent on value creation with analytics, AI and ML. 

Most data integration tools lack support for  data automation and data governance. As a result, projects without automation take much longer and deliver less ROI, involve more risks and not meeting user expectations.

Read more about how automation improves your day as a data engineer or business analyst

Chapter 3

How does data automation work?

Where possible, configure metadata that generates code, rather than having to write all the code yourself. Since the framework is metadata driven, it helps you keep control of end-to-end dependencies which simplifies impact analyses, runtime optimizations, error detections and reduces risks of human errors. 

No-code, low-code and full-code: Customer-specific tasks such as implementing business logic can be both drag and drop or coding depending on one’s preferences, skill level and the flexibility of the data warehouse automation tool. 

In addition, there is a set of built-in complex operations that are designed according to best practice so that the solution becomes more robust and the need for testing is significantly reduced. 

Read more about how data automation works in our blog Data Warehouse Automation

Note that there are different maturity levels of automation. Most tools for data intagration, ETL and ELT only offer repetition, not high level automation. True automation augments the data team and gives guidance according to established best practices. 

 

Chapter 4

When to use data automation?

Typical use cases include: 

  • Reduce time to market: Automation helps each data engineer to be more productive so more trusted data can be delivered faster. 
  • Few data engineers available: Get 4x productivity out of each data engineer. 
  • Being able to deliver projects: The threshold for successfully implementing a project is lowered so more projects get completed.   
  • Cost control: Optimize price performance in the cloud by swapping storage and processing without rewriting code.
  • Documentation and self-service: Documentation reflects end-to-end code in production including all assets inside analytics tools making it easier for end users to trust, understand  and use data and insights.
  • Future flexibility: Avoid having to make all architectural decisions on day one. 
  • Data Quality: Implement unit testing and run complete tests as part of your DataOps or CI/CD methodologies.
  • Reduce the need for testing: Only test your business rules and data, not your automation framework. 
  • Data Governance: With end-to-end data lineage and ability to tag data with ownership, domains, regulatory requirements etc you may govern how data should be used, how it is accessed and by whom. 
  • Reduce IT backlog: Help IT deliver data faster and be an accelerator and not a bottleneck.

More technical use cases include: 

  • Providing data to a chatbot
  • Automating data ingest from external APIs
  • Providing data as a service
  • Getting data from SAP into your data platform

See more about specific use cases here: How to streamline APIs with Xpert BI, and Use cases for data warehouse automation functionality

Chapter 5

Data automation maturity levels

Data platform development has evolved over the years from ad hoc data delivery to using automation software. There are 4 levels of data automation maturity:

Automation-maturity

4 data automation maturity levels

  1. Ad Hoc Data Delivery
    • Manual approach with data in silos rather than in a centralized data storage. 
    • Challenges include lack of data availability, low data quality, difficulties with combining data across different areas of the organization and limited performance.
    • Organizations may recognize the strategic value of their data but lack the ability to transform. Organizations in this position may be running large risks in terms of lack of compliance and potential disruption.
       
  2. Structured Manual Process
    • Still a manual approach without using any automation but bridging the data silos with a central data warehouse or data storage.
    • Challenges are lack of scaling up performance and productivity, lack of standardized structures, a solution that is hard to maintain and effectively unnecessary dependencies on key personnel.
    • Organizations appreciating the benefits of automation have typically experienced the challenges of this level. 

  3. Automation Framework
    • Developers accelerate their productivity by using an in-house developed automation framework. It can take the form of code templates, code generation scripts, best practice development methodologies and testing frameworks.
    • A challenge is limited maintainability as the solution and the automation framework is too tightly coupled, a change in either one needs the other to be re-tested, significantly decreasing agility over time.
    • Many frameworks do not have dedicated developers who actively maintain so it can get outdated and eventually not supported anymore.
    • Another challenge is increased dependencies to key personnel as the framework is custom-built to each organization. 
    • Developing and customizing a framework takes development time away from solving requirements, significantly reducing time to market. 
    • Lastly, the overall solution complexity grows significantly without adding much business value, introducing unnecessary technical debt. 

  4. Automation Software
    • As with an automation framework, developers accelerate their productivity, but the automation is delivered as software, not a custom framework.
    • Having a solution as software means there are dedicated developers who fix bugs, and there are frequent releases of fixes and new features.
    • There is also a community where you get answers, there are certified developers who can implement it, there are other customers using, testing and improving the same software.
    • There is documentation and training, new storage and compute technologies are covered shortly after they arrive, there is a support channel and a roadmap ensuring the solution is never outdated. 

However, if you are pioneering new data platforms or technologies, or your business requirements are unique, you may find yourself without an automation software available. In those cases, building an automation framework may be the only viable approach to improve productivity.

As automation software eventually catch up and become available, you may want to migrate to an automation software at a later point in time to get the added robustness and agility, as well as reduced maintenance costs. 

Chapter 6

Data automation maturity level technology examples

Technology examples for each level

  1. Ad Hoc Data Delivery: Power BI, Qlik, Tableau, Looker, Excel etc directly onto a data source or system. May work in small scale and for basic prototyping.

  2. Structured Manual Process: Matillion, Fivetran, Azure Data Factory, DBT, Jupyter notebooks, Visual Studio, Google Colab, and Apache Zeppelin.  or pipelines designed for each business case. Requires separate automation to be built on-top for optimizations, migration, productivity etc. Note that a GUI and low-code/no-code alone does not equal high productivity, agility, robustness or governance support.

  3. Automation Framework: In larger or more mature data environments, typically with high expectations on productivity, but without the availability of an automation software. A typical example is creating an automation framework using configuration tables and variables/parameters in the jobs for Azure Data Factory to avoid having to manually create all the data pipelines. Developers may also combine this with their own toolkit, approach, or standards for how to improve productivity.

  4. Automation Software: Investing in a data warehouse automation tool to both increase productivity and stability, as well as simplify the data warehouse implementation process. Xpert BI from BI Builders is an example of automation software for Azure that lets developers use automation for Azure Data Factory without having to develop their own automation framework.

Chapter 7

Common pitfalls

In our experience the success of a data platform project is not decided by technology or functionality alone. Below are some of the common pitfalls. 

Lack of wide user adoption

Very often a data warehouse solution is run by a few data enthusiasts within an organization and not appreciating the required effort to get user acceptance across the range of roles that exists. Though solution ‘champions’/’sponsors’ are key to success, it is also a common pitfall to rely too much on these few employees and not researching the wider needs and requirements in the entire organization.

Lack of budget for continuous improvement

Take into account that data sources changes, requirement changes and data quality changes over time – and that time may be soon. Having technologies that can handle change is important, but the change still needs to be performed and tested to ensure the endurance of the data quality in the solution. The resource budget should be continuous also after the implementation is ‘finished’ to account for this. Over time – usually from 12months - there will also be a need for budget for removing technical debt or sunsetting ideas/solutions that were built earlier but later changed and improved.

Unclear Master data or data integration principles

As soon as the data warehouse solution combines data from more than one data source, integration rules must be in place. This might not require a master data management system or another big project, but it will require some standards and principles to maintain data quality. Often a term used in one department has a slightly different meaning in another, or the list of customers or projects are not completely synchronized across all systems. However big or small the solution, organizations should appoint a resource who can specify and qualify that the integration principles are being followed. 

Lack of consistent technology choices

Another common pitfall is choosing technology without determining the actual business need first. Although IT is becoming a heavily integrated part of the business it is most often not the actual business - but should be a supporting enabler. Keeping in mind that the solution must be able to co-exist in the current environment and be compatible with current solutions is important. Considering the skills and competencies both inside the organization but also externally can be another factor.  The most common pitfalls when it comes to technology is to fail to evaluate the following:

  • Are people in IT and line-of-business aligned around common goals?
  • Will the technology choices enable the needed agility and productivity?
  • Are the technologies scalable, not only in terms of data volumes, but also supporting bigger teams collaborating across disciplines? 
  • Is there sufficient competency available in the market at an affordable cost?
  • Is automation with standardization and best-practices available?
  • Is this technology compatible and interoperable with other technologies we have or plan to purchase?
  • Does it support our current security requirements?
  • Does it support regulatory compliance requirements?
  • What is the TCO (Total Cost of Ownership) or the lifetime cost of the technology?

Read more about key risks with data and analytics and how to minimize it here.  

Chapter 8

Data automation, AI & ML, ChatGPT and OpenAI

Today, trusted data delivered at speed is not only required for data warehousing, but also for analytics sandboxes, data lakes, data lakehouses, MPP-architectures and data science environments, machine learning and AI initiatives. 

Typical use cases: 

  • Data ingest: Working on a higher abstraction level to quickly add data and metadata into a data platform.
  • Training generative AI: Data governance and metadata will be important for training responsible, generative AIs relevant to your business context. Examples include OpenAI, OpenAI DALL-E, and ChatGPT. Data automation helps manage matadata and enables data governance. 
  • APIs: Getting data from APIs or sharing data via APIs. Examples include creating pass-thru APIs on top of any system or database to make data available, or quickly create APIs for application prototyping.
  • DataOps: Manage data integration code with version control, test automation and more.
  • Manage data and metadata: Manage training data and test data, preserve metadata where possible. 
  • Data lake structures: Automate folder structures in a data lake. 

Chapter 9

Isn't an ETL or ELT tool enough?

No. Data integration tools that enable ELT or ETL are typically designed to be generic to handle any form of data integration, thus not being specific enough. Being a Jack of all trades, ETL and ELT tools risk being Master of none when it comes to data warehouse development and operation.

Data integration tools leave it up to the data warehouse developers or data engineers to define the processes in the data life-cycle, determine precisely how to perform each process, including which practices to apply and how to apply them.

In the data automation maturity model ETL and ELT tools are only level 2. They can be augmented with automation frameworks or either augmented or replaced by automation software. 

Read more in our blog Why ETL tools are not enough for agile data warehouse development and operation.

Chapter 10

Data automation vs data warehouse automation

Data automation covers a wider range of storage architectures than only data warehouses. 

Looking back, the automation was designed to aid development and maintenance of data warehouses. It helped mitigate typical challenges such as long development cycles due to low productivity, inability to adapt quickly,  high key personnel risks and lack of governance and self-service documentation.

Today, data automation helps more than the data warehouse. It typically extends to automating data integration and documentation for data lakes, data lakehouses, data science environments, analytics sandboxes, ad hoc prototyping needs and even enables a series of API related use cases

Stay updated

New call-to-action
Worth reading

Want to know more about data warehouse automation?

Our latest blog posts


How data automation elevates your Power BI reports
How data automation elevates your Power BI reports
Picture of Terje Vatle

by Terje Vatle

28. Jun 2023, 10 min reading time

Adding the missing piece to Microsoft Fabric
Adding the missing piece to Microsoft Fabric
Picture of Terje Vatle

by Terje Vatle

21. Jun 2023, 13 min reading time

Seasonal shifts and exciting updates!
Seasonal shifts and exciting updates!
Picture of BI Builders

by BI Builders

03. May 2023, 8 min reading time