<img alt="" src="https://secure.dump4barn.com/213374.png" style="display:none;">

Adding the missing piece to Microsoft Fabric

Bilde av Terje Vatle
by Terje Vatle

21. Jun 2023, 13 minutes reading time

Adding the missing piece to Microsoft Fabric

A few weeks ago, Microsoft announced Microsoft Fabric. According to Microsoft, Fabric is a unified and AI-powered data analytics platform covering data engineering, data science, semantic models, data visualization, and data governance all in one place with a unified user experience. 

For the AI part, Copilot will let users ask natural language questions and generate the corresponding code, visualizations, or models necessary, essentially assisting the data team in their creative process. 

On paper, Microsoft Fabric seems to be a much-anticipated answer to the growing complexity and disparity of most modern data stacks, hopefully making it easier, cheaper, and faster to build and maintain a data platform. However, there are a few key challenges.
   
Simplified architecture
Microsoft recognizes the need for simplifying the overall architecture of its modern data stack. The messaging is strikingly like that of Azure Synapse some years back. Synapse also promised a unified experience and architecture with a holistic and simple way of extracting valuable insights from your complete data estate and providing it to business users. However, in comparison, Microsoft Fabric is on a higher conceptual layer, containing Synapse as one of several architectural components: 
•    Synapse: For data engineering, data warehousing, data science, and real-time analytics
•    Power BI: For visualizations, simplified data hubs, connecting to Microsoft 365, and providing an interface to generative AI
•    Data Factory: Data integration, code-free ETL/ELT, and pipeline management
•    Data Activator: Data observability, alerts, and triggers

 

Data lakehouse
A major feature of Microsoft Fabric is adopting the data lakehouse concept with only one copy of data stored in the OneLake, built on top of Azure Data Lake Storage Gen 2. So, whether data is managed in a data warehouse environment using T-SQL or in a data science environment using a variety of coding languages such as Python, Scala, and SQL (not T-SQL though), it still resides in the same place in the same format. 

This concept minimizes the number of data copies and reduces overall architectural complexity making it easier to share data across environments. For example, trusted and quality-assured data from the data warehouse can be directly accessed by a data scientist and combined with ad hoc data without data movement, which is brilliant. This approach seems like a counter to Snowflake which at least in the past, had to copy data between the data warehouse and data science environments.  

Data fabric
As the name suggests, Microsoft Fabric may seem like Microsoft’s attempt to jump on the data fabric trend in strong competition with Databricks, Snowflake, Google, and others. 

Gartner defines data fabric as a design framework for attaining flexible, reusable, and automated data integration pipelines, services, and semantics. It is intended to make it easier for people to consume data across multiple use cases on-premises, multi-cloud or hybrid, and avoid a rip-and-replace experience. In essence, data fabric is the combination of active metadata and recommendations to make data more trusted and accessible.

A data fabric design consists of the following key elements: 
•    Holistic data plus metadata to bring context and semantic meaning to data across the organization.
•    Metadata-driven approach where all systems share metadata used for alerts, recommendations, and automation.
•    Composable design integrating any data management solution without ripping and replacing.

However, as Gartner points out, data fabric is a design framework with no current complete implementations available despite what we might see in product marketing from leading software vendors.

Looking at Microsoft Fabric we see a few key challenges, some of which are related to data fabric:
Challenge 1: Lack of holistic metadata 
Challenge 2: Lack of data automation
Challenge 3: Not a complete data stack
Challenge 4: Is the price performance compelling?

Challenge 1: Lack of holistic metadata 
Holistic metadata is a complete end-to-end weave across your data estate that brings context, semantics/meaning, and integrity to your data. It serves as a backbone for knowledge discovery in your organization. 

There is currently no joint metadata repository across Synapse, Power BI, Data Factory, and Data Activator. Rather, each technology is its own metadata silo without any holistic view end-to-end. Without the end-to-end metadata, it can be challenging for business users to find, understand and trust data, essentially limiting the degree of self-service. It can be hard to trust data when you cannot easily understand the business context or determine who owns the data, or even where those data are coming from. 

Holistic metadata is also a prerequisite for enhancing the user experience such as providing alerts, recommendations, and automation of data management work processes. With the current metadata silos, all actions on productivity, governance, optimizations, recommendations, and cost savings are typically fragmented and would provide only a fraction of the potential value to the organization compared to having holistic metadata. 

Challenge 2: Lack of data automation
As a rule of thumb, 80% of traditional data platform projects fail. Two of the key reasons for failure include a lack of productivity and a technology-first mindset. Without automation from the start, focus will typically be on how to make a specific piece of technology work, rather than on how to achieve a specific business outcome, and how to quickly adapt to changes in the business environment.  

Data automation software was designed to address those challenges. By having such software from day one, your data team can perform at their best by delegating repetitive tasks to the machine and thus free up time for interacting with stakeholders and solving their business problems. Also, the built-in guidance, methodology, and code generation will create more robust data platforms that are easier to change. It is a similar way of thinking to the use of generative AI today. 

Neither Data Factory, Synapse Studio, Spark Notebooks, Spark SQL, etc. provide any data automation above the most basic levels. Without data automation software, it is very much up to the data teams to create their own automation framework. Such a framework will have to be developed and managed in a project. It can contain assets such as templates, code snippets, methodologies, data architectures, documentation generators, and more to be productive and agile. Needless to say, there is a substantial operational risk if any of the people managing such a framework leave the project or if the technology infrastructure should change. 

Challenge 3: Not a complete data stack
Microsoft Fabric does offer basic data management capabilities within the data integration and data observability, such as with Data Factory and Synapse Studio. However, other key capabilities are missing including: 

•    Holistic metadata repository: Central updated repository of metadata and semantics.
•    Data quality: Ability to continuously monitor, detect, and remediate data quality issues. 
•    Master data management: Expanding on data quality managing and distributing master records such as customer, product, and employee data. 
•    Data catalog: Holistic view of data with semantic meaning, ownership, and data lineage end-to-end.
•    Data governance: Managing data ownership, performing data stewardship, setting data access rights, etc.

Consequently, the final price of your data platform may prove higher than what is offered in Microsoft Fabric. In addition, if you must choose separate technologies for each capability the resulting data stack can become very complex with potentially high maintenance costs. Adding a data automation tool could help keep the costs and complexity down by having these missing capabilities available in a consistent and complete package.

Challenge 4: Is the price performance compelling? 
A few years back, one of the key challenges with Azure Synapse adoption was the price performance. At the time, Synapse shined when handling very large data volumes above 4 TB with complex analytics workloads, but at a hefty starting price. It struggled however with scaling down to smaller data volumes. 

Therefore, many small and medium-sized businesses chose the Azure SQL database as the backbone for their data platforms, essentially getting a high level of performance for smaller data volumes at a very affordable price point. As Synapse is still a centerpiece of the new Microsoft Fabric architecture the question remains if Synapse has improved the downward scalability and whether it has a more compelling starting price.

On the plus side, Microsoft Fabric does have a simplified pricing structure. letting customers buy a Fabric capacity. Currently, in a trial, Microsoft Fabric is available to be purchased through Power BI Premium capacity. 

Our recommendation: Start small and scale up later 
Let us assume your organization wants to build a scalable data platform (data warehouse, data lakehouse, and/or data lake) and you want to be as productive and agile as possible. You want to have a holistic metadata layer for self-service, documentation, governance, AI recommendations, and automation. Still, you would like to start at a lower price point, only pay for what you need, and later scale up or down when needed. This is exactly what data automation software such as Xpert BI is designed for. 

We recommend:
1.    Start simple and cost-effective with an Azure SQL database, Power BI, and data automation software such as Xpert BI. It lets you quickly deliver value at a low subscription cost with fewer resources. Ideally, your automation software should have capabilities to cover holistic metadata repository, data quality, master data management, and data catalog so the overall architecture is simplified, robust and consistent.
2.    Expand when you need. Say if your data and analytics needs surpass Azure SQL database, and you have available competencies, add Azure Data Lake Storage Gen 2 and any analytics or AI tools. Use automation software to maintain holistic metadata, and to ensure productivity and robustness.
3.    When you are ready and your data challenges require it, try out Microsoft Fabric, Snowflake, Databricks, and other more advanced infrastructures in parallel to find what gives you the best price performance. Ideally, your automation software should fill in the missing pieces while keeping your architecture consistent and robust.

Your data automation software lets you transition between architectures without having to rewrite code and keep the overall complexity down. For your organization, it means you can continuously optimize price performance and avoid the lock-in to a single infrastructure, database, or data lake. 

If Microsoft Fabric is your starting point, consider data automation software as an accelerator adding the data fabric capabilities that are currently missing. 

As a side note: Azure Purview could potentially be a holistic metadata repository in Azure. However, it is currently not included in the Microsoft Fabric offering as it is based on scans rather than being updated live. Scans mean you always risk having outdated metadata potentially limiting integrity and trust. Also, you would have to set up all the scans, then manually document every object, and finally hope your metadata isn’t broken after your next scan. In addition, it still hasn’t brought automation to Data Factory, Synapse, or other technologies yet.
Terje Vatle

Terje Vatle

Terje Vatle is Chief Technology Officer at BI Builders following global market trends within data & analytics. Terje has a technology and advisory background, and focuses on how to make organizations achieve their goals by becoming more data driven.

Follow our blog


Our dedicated employees write professional blogs worth reading.
Follow the blog for a sneak peek at the future!

Others would also read


How data automation elevates your Power BI reports

How data automation elevates your Power BI reports

Picture of Terje Vatle

by Terje Vatle

28. Jun 2023, 10 min reading time

Adding the missing piece to Microsoft Fabric

Adding the missing piece to Microsoft Fabric

Picture of Terje Vatle

by Terje Vatle

21. Jun 2023, 13 min reading time

Seasonal shifts and exciting updates!

Seasonal shifts and exciting updates!

Picture of BI Builders

by BI Builders

03. May 2023, 8 min reading time