Migrating from Azure Data and AI Stack to Microsoft Fabric: A practical overview

Reading Time: 9 minutes

Migrating from Azure Data and AI Stack to Microsoft Fabric: A practical overview -data management

As the landscape of data platforms continues to evolve, Microsoft has introduced Microsoft Fabric as a comprehensive solution for data integration, analytics, and AI. As a primer, refer to our blog on MS Fabric capabilities for implementing Data Mesh.

 

The question of migrating to Fabric may arise for organizations already invested in Microsoft Azure’s Data and AI stack. In this blog, we’ll walk through considerations and essential steps for migrating from Azure’s existing services to Fabric, leveraging the latest capabilities of Fabric to streamline data operations.

Key azure workloads and migration to fabric equivalents

The question of which Azure workloads are amenable to MS Fabric migration is addressed using the below criteria:

 

Migration amenability score





Lift and shift of existing artifacts with minimal manual effort.
Decommission existing workloads post migration.




Lift and Shift possible for some but not all artifacts.
Decommission existing workloads post migration.




Artifacts have to be recreated / recoded in Fabric.
Decommission existing workloads post migration.




No equivalent Fabric capability.
Integration is feasible at present




No equivalent Fabric capability.
Integration is not feasible at present.

 

Services with a score of 5 indicate a near-seamless migration, while a score of 3 suggests the need for more involved processes, such as data restructuring or schema changes.

Migration amenability assessment

Capability

Data Lake Storage

Amenability Score





Native Azure Feature

  • ADLS Gen2

Fabric Feature

  • OneLake – on ADLS Gen2

Short term – Data movement from ADLS to Fabric is not necessary. Creating ‘Shortcuts’ can provide access to ADLS Long Term – Phased data migration is required to decommission PaaS and get the full benefits of SaaS model.

Data Warehouses / Data Marts – SQL Workloads

Amenability Score





Native Azure Feature

  • Serverless and Dedicated SQL Pools

Fabric Feature

  • Synapse Data Warehouse

Migrating from Synapse Analytics Dedicated SQL Pools requires robust planning and migration methodology. For small data marts (GBs), options are available to migrate using Fabric Data Factory Copy Wizard. For large data warehouses (TBs), convert schemas to Fabric with Copy Wizard, then export data to Lakehouse using CETAS, then ingest this data into Fabric Warehouse using COPY INTO or Fabric Data Factory activities.

Lakehouse

Amenability Score





Native Azure Feature

  • Synapse Delta Lake

Fabric Feature

  • Synapse Data Lakehouse

Structured / Semi-structured data in any other format has to be converted to Delta format and stored in OneLake to make it queryable by SQL or Spark compute engines.

Big Data Engineering

Amenability Score





Native Azure Feature

  • Azure Synapse Spark

Fabric Feature

  • Fabric Spark

Migrating from Azure Synapse Spark requires robust planning and migration methodology. Spark Pools, Spark Configurations, and Spark Libraries have to be manually re-created in Fabric Notebooks. Spark Job definitions and Hive Metastore can be exported from Azure Synapse and imported into Fabric. Time to spin up Spark is significantly reduced in Fabric.

Data streaming

Amenability Score





Native Azure Feature

  • Stream Ingestion – Event Hubs, IoT Hubs, Event Grids
  • Stream Storage – Event / IoT Hubs
  • Stream Processing – Azure Stream Analytics
  • Destination – Azure Data Explorer / Azure CosmosDB / ADLS Gen2

Fabric Feature

  • Stream Ingestion – No native service. Use Azure Event Hub / IoT Hub
  • Stream Storage – No native service. Use Azure Event Hub / IoT Hub
  • Stream Processing – Event stream
  • Destination – Event house > KQL datasets,

No direct path for migration from Azure PaaS to Fabric. Use cases will have to be recreated using Fabric services. Additional services in Fabric include: Connectors to Amazon Kinesis, Google Pub/sub. Native integration with CDC for Azure SQL DBs, Cosmos DB Data profiling, anomaly detection, and forecasting available. Integration with PBI or TP tools like Kibana, Grafana, etc Reflex items through Data Activator enable downstream actions in other services Real-time hub provides a catalog

Data Pipelines Orchestration

Amenability Score





Native Azure Feature

  • Azure Data Factory – Mapping Data flow (No/Low code)
  • ADF Pipelines (Custom code)Power BI – Dataflow Gen1

Fabric Feature

  • Data Factory – Dataflows Gen2 (Low Code)
  • Data Pipelines (Custom code)
  • Missing- SSIS, CI/CD

Short term – Existing ADF pipelines can write to OneLake. Dataflow Gen 1 queries can be exported as PQT files and imported into Dataflow Gen2. Copy-pasting queries is another option Long Term – Mapping flows to be redone in Dataflow Gen2 or converted to Spark code for Fabric. Upgrade experience from existing ADF pipelines to Fabric is set to release.

NoSQL Data Stores

Amenability Score





Native Azure Feature

  • Azure Cosmos DB

Fabric Feature

  • No Equivalent Data store

Mirroring option is available for Azure CosmosDB, Azure SQL Database, and Snowflake. This converts data to Delta format and incrementally ingests new data in near real-time. Mirroring is free of cost for computing and is used to replicate your Cosmos DB data into Fabric OneLake. Storage in OneLake is free of cost based on certain conditions. For more information, see OneLake pricing for mirroring. The compute usage for querying data via SQL, Power BI or, Spark is still charged based on the Fabric Capacity.

ML Development

Amenability Score





Native Azure Feature

  • Azure Machine Learning Studio

Fabric Feature

  • Fabric Notebooks

Notebooks can be migrated without much refactoring AzureML pipelines are not available. Data Factory pipelines are alternate and have to be created from scratch (unless they are notebook-based) Data asset management available in AzureML SDK is not available in Fabric Fabric provides an MLFlow endpoint eliminating the need to create an instance of Azure Machine Learning to register ML models/log experiments

MLOps

Amenability Score





Native Azure Feature

  • Azure Machine Learning MLOps

Fabric Feature

  • NA

Governance features are not available currently: Azure Machine Learning data assets (help you track, profile, and version data), Model interpretability (allows you to explain your models, meet regulatory compliance, and understand how models arrive at a result for a given input), Azure Machine Learning Job history (stores a snapshot of the code, data, and computes used to train a model), Azure Machine Learning model registry (captures all the metadata associated with your model) Integration with Azure allows you to act on events, such as model registration, deployment, data drift, and training (job) events, in the machine learning lifecycle.

Analytics App

Amenability Score





Native Azure Feature

  • Power BI

Fabric Feature

  • Power BI

Selection of the right capacity based on current usage is required. All artifacts can move seamlessly to Fabric

Data Governance

Amenability Score





Native Azure Feature

  • MS Purview

Fabric Feature

  • MS Purview Hub (preview)

Purview can connect to Fabric data estate and create catalogs automatically.

Data Sharing

Amenability Score





Native Azure Feature

  • Azure Data Share

Fabric Feature

  • External Data Sharing -only between Fabric tenants
  • Clean room – no capability

Since there is no equivalent capability in Fabric, users have to leverage Azure Data Share.

DevOps

Amenability Score





Native Azure Feature

  • Azure DevOps & GitHub

Fabric Feature

  • Azure DevOps & GitHub

Github or Azure DevOps deployed on Azure can be reused.

Configurability

Amenability Score





Native Azure Feature

  • Configure the storage and compute required for each workload

Fabric Feature

  • SKU’s offer choice of compute power in terms of Capacity Units

No direct mapping from Azure to Fabric configurations is available as of today. We suggest starting with ‘Trial’ SKU and adjusting upwards as you migrate more workloads to Fabric.

Pricing

Amenability Score





Native Azure Feature

  • Sizing – Different for each component
  • Charging – Usage based, different tiers for each component

Fabric Feature

  • Sizing – SKU (Capacity) based
  • Charging – monthly committed spend

No direct mapping from Azure sizing to Fabric capacity units is available as of today. Consider starting with an SKU where monthly costs are in the ballpark of your current Azure spends.

  

Image

 

Fig. Migration Ameanibility score

 

As Fabric continues to evolve and add more capabilities, we expect a more complete mapping of capabilities to the native Azure stack to allow complete migration to the SaaS model.

 

For workloads with scores of 1 or 2, we recommend a wait-and-watch approach as Fabric adds capabilities over the next few quarters.

 

For workloads with scores of 3, 4, or 5, enterprises can consider undertaking PoCs to test capabilities before evolving a concrete migration plan.

Capability mapping to other CSP data and analytics stacks

For enterprises already invested in other hyperscalers, a migration to Fabric is likely to involve a relook at their data, analytics, and AI strategy. The paradigm shift from the PaaS to SaaS model offers a compelling reason to consider such a move. The below table provides a mapping of capabilities between Fabric and other stacks as of today.

Note: The migration amenability is not as relevant in this context, hence omitted

 

Capability MS Fabric AWS GCP
Data Lake Storage ADLS Gen2 (no configuration access) AWS S3 Google Cloud Storage
Data Warehouses / Data Marts – SQL Workloads Synapse Data Warehouse Amazon Redshift,Amazon Athena BigQuery
Lakehouse Synapse Data Lakehouse Combination of AWS S3, Amazon Redshift with Lake Formation BigLake
Big Data Engineering Fabric Spark Amazon EMR (PaaS or Serverless),Amazon Athena Dataproc
Data streaming Stream Ingestion – No native service. Use Azure Event Hub / IoT Hub
Stream Storage – No native service. Use Azure Event Hub / IoT Hub
Stream Processing – Event stream
Destination – Event house > KQL datasets,
Stream Ingestion – AWS IoT, Kinesis Agent
Stream Storage – Kinesis Data Streams, Amazon MSK, Apache Kafka
Stream Processing – Amazon EMR, AWS Glue
Destination – Amazon DynamoDB, Amazon OperSearch Service
Stream Ingestion – Pub/Sub
Stream Storage – Pub/Sub
Stream processing – Dataflow, Cloud functions
Destination – BigQuery, Cloud Datastore, Cloud Storage
Cloud IoT Core Device SDK,
Data Pipelines Orchestration Data Factory
Dataflows Gen2 (Low Code)
Data Pipelines (Custom code)
Missing- SSIS, CI/CD
Amazon Managed Workflows for Apache Airflow (Amazon MWAA) ,
AWS Step Functions
Cloud Composer,Cloud Workflows
NoSQL Data Stores No Equivalent Data store Amazon DynamoDB (Key-value),Amazon DocumentDB (Document),Amazon Neptune (Graph),Amazon Keyspaces (Column) Cloud Datastore (Key-value),Cloud Firestore (Document),Cloud Neo4J (Graph),Cloud Cassandra (Column)
ML Development Fabric Notebooks Amazon SageMaker Vertex AI
MLOps MLFlow Integration NA Amazon SageMaker MLOps Vertex AI MLOps
Analytics App Power BI Amazon QuickSight Looker
Data Governance MS Purview Hub (preview) Amazon Data Zone AWS Lake Formation, AWS Glue Catalog Dataplex
DevOps Azure DevOps & GitHub AWS CodeCommit, AWS CodeBuild, AWS CodePipeline Cloud Source Repositories, Cloud Build, Cloud Deploy

Migration methodology

For Azure workloads with amenability of >=3 and for workloads on other CSPs, robust planning is key to a successful migration. Sigmoid has helped several clients through migrations of their data, analytics, and AI workloads. We cover the critical elements of a migration plan below.

 

Step 1: Evaluate current workloads

Before starting the migration process, it’s crucial to evaluate your current workloads running on Azure / other CSP services. Determine which services are most critical to your operations and map them to the corresponding Fabric services using the table above.
 
A plot of the business criticality vs. migration amenability can help prioritize candidate workloads for migration. The workloads towards the top right are the most suitable for migration. A sample of such a prioritization matrix is below. However, this has to be moderated by the upstream and downstream dependencies for each workload. For example, Analytics Reports & Dashboards on Power BI are easy to migrate in isolation but would depend on the migration of the DE Pipelines, Storage, and Semantic data models.
 

Image

 

Fig. Sample prioritization framework

 

Step 2: Prepare for data migration

  • Provision MS Fabric Capacity
  • Prepare Environments and DevOps – Dev, Test, Production. Azure DevOps
  • Detailed analysis of Existing workloads
  • Architecture
  • Schema
  • Security
  • Operational Dependencies
  • PoC / MVP Plan
  • Timelines
  • Fabric capacity
  • Tools & accelerators
  • Test Plan and quality assurance
  • Contingency plan

Step 3: Implement migration

For each service, there are specific migration strategies and considerations. For example,
 

  1. ADLS Gen2 / Delta Lake to Fabric: Leverage shortcuts to begin migrating data from Azure Data Lake Storage into Fabric. Ensure you have appropriate access control and configuration management.
     
    Provision bronze, silver, and gold layers on Fabric to prepare for lift and shift of data from Azure / other CSPs.
  2. Synapse Analytics to Fabric: Migration of analytics services like Serverless SQL or Dedicated SQL Pools requires thorough validation of queries, tables, and data models. Fabric’s Synapse Data Warehouse can support these workloads but may require additional configurations for performance optimization.

The typical activities to be executed include:

  • Reverse engineer DDL, PL/QSL, ETL
  • Build or copy reports, data models
  • Identify workarounds for artifacts that cannot be migrated to Fabric at this point
  • Configure security model & RBAC
  • Perform thorough unit and integration testing
  • Perform UAT

 

After completion of UAT,

  • Promote code to production
  • Execute historical data loads
  • Turn on incremental data load pipelines and perform data quality checks
  • Perform parallel run with existing env and perform reconciliation checks
  • Support users to perform testing on new Fabric reports
  • Cut-over to Fabric and provide hyper care support to business users
  • Decommission existing workloads where feasible by design

Step 4: Post-migration optimization

Once migration is complete, ongoing optimization is necessary to ensure that you leverage the full capabilities of Microsoft Fabric. Fabric is designed to streamline operations across data lakes, data warehouses, and real-time analytics, so take time to:
 

  • Monitor data integration and synchronization processes.
  • Adjust access control policies to reflect the new environment.
  • Observe capacity unit utilization and optimize SKU’s tier.

Conclusion

Migrating from Azure Data and AI stack or other CSPs to Microsoft Fabric requires a thoughtful, staged approach. By evaluating the amenability of different Azure services and understanding how they map to Fabric features, organizations can ensure a methodical transition. With the right planning, this migration can enhance operational efficiency and unlock new capabilities within the Fabric ecosystem.

About the author

Vinay Prabhu is the Director, Data Engineering at Sigmoid. He has over 10 years of experience across Azure, AWS data & analytics stacks. With his extensive knowledge and experience in data engineering and analytics projects, he helps enterprises in CPG, Manufacturing, and BFSI extract meaningful insights from data to drive informed decision-making.

 

References:

Suggested readings

Implementing data products and Data Mesh on Microsoft Fabric

Maximizing Project Success through JIRA Data Upcycling

Grafana Authentication with Microsoft Entra ID (earlier known as Azure AD) and Okta

Transform data into real-world outcomes with us.