<img alt="" src="https://secure.dump4barn.com/213374.png" style="display:none;">

Automating Azure Synapse Analytics and Azure Purview

Bilde av Terje Vatle
by Terje Vatle

10. Dec 2020, 10 minutes reading time

Automating Azure Synapse Analytics and Azure Purview

December 3rd Microsoft announced Azure Synapse Analytics as generally available and Azure Purview as public preview. BI Builders has designed a comprehensive automation platform tailored for Microsoft Azure technologies, and now also accelerating Synapse Analytics and Azure Purview.

Azure Synapse Analytics makes the compelling business case of having one, integrated service and user experience for both your cloud data warehouse and your big data analytics environments, greatly reducing the barriers between operational reporting and advanced analytics & AI.

Azure Synapse Analytics - Screenshot 2

Source: What is Azure Synapse Analytics? Generally Available Today. by Microsoft Mechanics (screenshot)

Referring to Gartner, the ambitions for Azure Synapse Analytics can resemble the Logical Data Warehouse as described in the Gartner paper Benefit of AI and the Logical Data Warehouse, integrated as one service. It also implements ideas from the concept of Data Lakehouse as described amongst others by the Databricks team.

Adding Azure Purview, Microsoft is aiming to fill some of the gap on data governance and self-service. It includes functionality such as scanning and classifying data across your data estate both on premises and multicloud. It handles security, data usage monitoring and compliance, and makes data discovery available to different user groups in your organization.

Azure Purview - screenshot

Source: Azure Purview | Map, Discover, and Find Insights Across Data Sources, by Microsoft Mechanics (screenshot)

With the built in business glossary and data discovery, Azure Purview can replace some of the documentation that for many data platforms reside in documents and excel sheets. It signals a promising and much sought after focus on data governance.

With Azure Synapse Analytics and Azure Purview the Azure cloud offering takes a great step towards an integrated and seamless experience with improved support for data governance. However, for the Data Engineer, the development process is still quite manual and time-consuming. And there is very little help for the developer who wants to work smarter with data integration and data quality. With a wide range of functionality we find there is an increased need for standardization and best-practices to create a stable and robust data platform.

 

Modernization is an opportunity for working smarter

For data driven organizations, modernizing their data environments requires a fundamental shift in how they work with data. The shift is not only moving to the cloud and combining data warehousing and big data analytics, but the competitive advantage lies in working smarter. As 80% of all data and analytics work tend to be around data integration and data quality, this should be standardized and optimized to increase productivity as well as to improve the KPI “time to insights”.

Modernization is not just about scalable solutions in the cloud - it is an equal opportunity for working smarter. We apply the concept described by Gartner as Data Warehouse Automation into our product Xpert BI. The concept is to simplify complex processes. By leveraging years of experience, and development of methodology and best practices within data warehousing, the architecture is auto-generated and standardized. Through easy to use and metadata-driven configuration, data engineers can quickly generate data pipelines, tables, and other objects. The time spent is greatly reduced while at the same time significantly improving solution quality and robustness. Hence, automation helps organizations work smarter. This is key to leverage the full potential of a modern Azure data platform.

 

A closer look at Azure Synapse Analytics

In every data platform project, including working with Azure Synapse Analytics, data engineers need to use considerable time on tasks that could and should be automated. For example, they need to define metadata structures, define development standards, define scalable data loading techniques, optimize the environment, define testing frameworks, develop data models, script migration and deployment frameworks, script consistency checks, add instrumentation for monitoring, enable data governance and much, much more.

What if all these repetitive, metadata-driven tasks were automated and were just a matter of configuration rather than development from scratch? You would certainly save time which in turn could be applied to solving more of the actual business challenges.

Considering being a data engineer working in Azure Synapse Studio. When identifying a repeatable task, he or she would switch to our accelerator, quickly configuring the functionality. All objects, metadata, JSON and SQL code would be generated, just as if it was done completely inside Synapse Studio. Synapse Studio then displays all objects, visible data flows, code and traceable metadata. The difference the time spent is a fraction of actually doing everyting in Synapse Studio. 

 

A closer look at Azure Purview

Documentation of a data platform is often written down excel and word documents that quickly get outdated. Also, many ETL and ELT tools have limitations to their data lineage capabilities. In addition, it’s often difficult for business users to find the reports, analyses and data that are best suited to solve their business problems. User friendliness for non-technology users is key for the organization’s willingness to adopt new data and analytics.

While Azure Purview is great at scanning, mapping, categorizing and tracking “physical” data across the data estate, there are some clear limitations for metadata and documentation. It includes the ability to see data transformations (i.e. business rules) in complete end-to-end data lineage, the ability to document every step and object in the ETL/ELT process without getting into trouble with future scans, the ability to search all metadata levels and ability to support data life cycle activities. For a data warehouse which requires a comprehensive data governance and deep compliance Further details on these shortcomings can be seen in our blog Short Review of Azure Purview.

Our approach is to complement Azure Purview with functionality that we believe is missing. That functionality includes simplicity for business users, and live (metadata driven) documentation end-to-end across your Azure data platform, and even your hybrid or on premises SQL data warehouse. Our metadata is fully open through APIs.

 

Azure + Cost control

On premises you are limited by your current infrastructure. In the cloud you are only limited by your budget. And as with many cloud providers, Azure has a complex pricing model. Different pricing mechanisms apply for different components and varies between compute, storage, number of executions, data movements, payment-floors/minimum charges and more. As a result, it is very difficult to predict the complete costs of different deployment options up front. This is recognized as a common challenge and can for some organizations pose a risk when moving their data platform to the cloud.

Looking at Azure Synapse Analytics in specific, the different compute options drive different costs and the total cost can be a combination of all of these:

  • Provisioned SQL Pools, previously known as SQL DWH, which is billed for highest scale in any given hour
  • SQL On-Demand which is billed per TB read
  • Mapping data flows which is billed per execution and Spark cluster uptime
  • Spark code which is billed per cluster uptime

From a data warehouse perspective looking primarily at the alternatives a) and b) above, it still poses a challenge to optimize costs when designing different pipelines in Azure Data Factory or Azure Synapse Studio. When additional Azure components into the architecture such as Azure Data Lake Storage and Azure SQL the complexity of cost control could sky-rocket.

Therefore, we are adding a cost calculator, where important cost optimizations can be done up front, as you design your data platform with pipelines, compute, storage, transformations and more. This will help your organization get the most out of your investments without overspending.

 

Summary

Azure Synapse Analytics is a big step forward. Nevertheless, it lacks automation functionality and cost control mechanisms when designing your data platform. Azure Purview is also very promising, but cannot provide live documentation and has less usability for the majority of non-tech people in the organization.

In summary:

  1. Data Engineers have pre-built functionality for any repeatable task when building a data platform in Azure Synapse Analytics. It is more about configuration than developing from scratch – massively saving time in development, increasing solution robustness and increasing business agility
  2. Data Engineers have cost calculators that help them consider cost impacts at design time - getting the most out of the Azure investments without overspending
  3. Non technical users have a data catalog which is easy to use and has documentation that is always updated. They can also see business rules and data lineage for every data and analytics asset from data sources to Power BI – simplifying user adoption and fostering a data driven culture

Our mission is to help organizations get more value out of their data. When organizations modernize their data platform and move into the Azure cloud, it is a great opportunity for also working smarter and increasing user adoption. This is made possible through automation so your organization can get the most out of your data and the most out of your investments in Azure. 

Read more about Automating Azure Synapse and Azure Purview.

Terje Vatle

Terje Vatle

Terje Vatle is Chief Technology Officer at BI Builders. With a background from data & analytics advisory and development, Terje focuses on how to make organizations more data driven and the journey towards a modern data platform in the cloud. Terje has a passion for skiing, traveling and international politics.

Follow our blog


Our dedicated employees write professional blogs worth reading.
Follow the blog for a sneak peek at the future!

Others would also read


How data management drives innovation and profitability

How data management drives innovation and profitability

Picture of Jarle Soland

by Jarle Soland

07. Sep 2021, 5 min reading time

Using data to minimize manual operations

Using data to minimize manual operations

Picture of Alf Inge Johansen

by Alf Inge Johansen

30. Jun 2021, 4 min reading time