Wednesday, May 24, 2023

What is OneLake in Microsoft Fabric?

Get ready to be blown away! The highly anticipated Microsoft Build 2023 has finally unveiled its latest and greatest creation: the incredible Microsoft Fabric - an unparalleled Data Intelligence platform that is guaranteed to revolutionize the tech world!



fig 1: OneLake for all Data

One of the most exciting things in Fabric I found is OneLake. I was amazed to discover how OneLake is simplified just like OneDrive! It's a single unified logical SaaS data lake for the whole organization (no data silos). Over the past couple of months, I've had the incredible opportunity to engage with the product team and dive into the private preview of Microsoft Fabric. I'm sharing my learning through the Private Preview via this blog post, emphasizing that it is not an exhaustive list of what OneLake encompasses.

 

I got OneLake installed on my PC and can easily access the data in the OneLake like OneDrive as shown in Fig 2:


Fig 2: OneLake is like OneDrive on a PC


Single unified Managed and Governed SaaS Data Lake 

All Fabric items keep their data in OneLake so no data silos. OneLake is fully compatible with Azure Data Lake Storage Gen 2 at the API layer which means it can be accessible as ADLS Gen 2.

Let's investigate some of the benefits of OneLake:

Fig 3: Unified management and governance
  • OneLake comes automatically provisioned with every Microsoft Fabric tenant with no infrastructure to manage.

  • Any data in OneLake works with out-of-the-box governance such as data linage, data protection, certification, catalog integration, etc. Please note that this feature is not part of the public preview.

  • OneLake enables distributed ownership. Different workspaces allow different parts of the organization to work independently while still contributing to the same data lake
  • Each workspace can have its own administrator, access control, region, and capacity for billing

Do you have requirements that data must reside in those countries?

Fig 4: OneLake covers data residency

Yes, your requirement is covered through OneLake. If you're concerned about how to effectively manage data across multiple countries while meeting local data residency requirements, fear not - OneLake has got you covered! With its global span, OneLake enables you to create different workspaces in different regions, ensuring that any data stored in those workspaces also reside in their respective countries. Built on top of the mighty Azure Data Lake Store gen2, OneLake is a powerhouse solution that can leverage multiple storage accounts across different regions, while virtualizing them into one seamless, logical lake. So go ahead and take the plunge - OneLake is ready to help you navigate the global data landscape with ease and confidence!


Data Mesh as a Service:


OneLake gives a true data mesh as a service. Business groups can now operate autonomously within a shared data lake, eliminating the need to manage separate storage resources. The implementation of the data mesh pattern has become more streamlined. OneLake enhances this further by introducing domains as a core concept. A business domain can have multiple workspaces, which typically align with specific projects or teams.



Open Data Format

Simply, no matter which item you start with, they will all store their data in OneLake similar to how Word, Excel, and PowerPoint save documents in OneDrive.

 

You will see files and folders just like you would in a data lake today. All workspaces are going to be folders, each data item will be a folder. Any tabular data will be stored in delta lake format. There are no new proprietary file formats for Microsoft Fabric. Proprietary formats create data silos. Even the data warehouse will natively store its data in Delta Lake parquet format. While Microsoft Fabric data items will standardize on delta parquet for tabular data, OneLake is still a Data Lake built on top of ADLS gen2. It will support any file type, structured or unstructured.


Shortcuts/Data Virtualization


Shortcuts virtualize data across domains and clouds. A shortcut is nothing more than a symbolic link that points from one data location to another. Just like you can create shortcuts in Windows or Linux, the data will appear in the shortcut location as if it were physically there.


fig 5: Shortcuts

As shown in above fig 4, if you have existing data lakes stored in ADLS gen2 or in Amazon S3 buckets. These Lakes can continue to exist and be managed externally by OneLake in Microsoft Fabric.

Shortcuts will help to avoid data movements or duplication. It’s easy to create Shortcuts from Microsoft Fabric as shown in Figure 6:

 Fig 6: Shortcuts from Microsoft Fabric


OneLake Security


In the current preview, data in OneLake is secured at the item or workspace level. A user will either have access or not. Additional engine-specific security can be defined in the T-SQL engine. These security definitions will not apply to other engines. Direct access to the item in the lake can be restricted to only users who are allowed to see all the data for that warehouse. 

In addition, Power BI reports will continue to work against data in OneLake as the analysis services can still leverage the security defined in the T-SQL engine through DirectQuery mode and can sometimes still optimize to DirectLake mode depending on the security defined.


In summary, OneLake is a revolutionary advancement in the data and analytics industry, surpassing my initial expectations. It transforms Data Lakes into user-friendly OneDrive-like folders, providing unprecedented convenience. The delta file format is the optimal choice for Data Engineering workloads. With OneLake, Data Engineers, Data Scientists, BI professionals, and business stakeholders can collaborate more effectively than ever before. To find out more about Microsoft Fabric and OneLake, please visit Microsoft Fabric Document.