The Extract Load Transform (ELT) framework is a metadata-driven orchestration framework designed for modern cloud data platforms. It simplifies ingestion and transformation pipelines, ensuring a consistent development experience and ease of maintenance. The framework supports batch ingestion and has been extensively tested with Microsoft Fabric and Azure managed services like Azure Databricks and Azure Synapse. It utilizes an ANSI-compatible control database as the metadata repository.
- Configurable and Extendable: Easily adapt the framework to meet specific needs.
- Data Source Agnostic: Ingest data from various sources such as databases, Delta Lake, REST API, flat files, JSON, XML, without storing connection strings as metadata.
- Delta and Full Loads: Support for both incremental and full data loads.
- Re-run and Retry Capability: Automatically handle failures without manual intervention.
- In-built Audit Tracking: Track data processing activities with built-in audit capabilities.
- Extended Audit Capability: Enhance audit tracking with Azure PaaS services like Diagnostic Logging.
- Eliminates Manual Data Patching: Streamline data processing by removing the need for manual interventions.
- Data Lineage Support: Maintain data lineage throughout the data lifecycle.
- Level1 and Level2 Transformations: Support for one-to-many and many-to-many transformations.
- On-demand Pipeline and Transformation Management: Enable or disable pipelines and transformations as needed.
Key concepts and configuration metadata explained in detail at Wiki
To get started follow these steps:
-
Clone or Fork the Repository: Start by cloning or forking the repository from github.com/bennyaustin/elt-framework.
-
Deploy ControlDB: The GitHub Action workflows/ControlDB-deployment.yml executes the workflow to deploy controlDB objects.
Pre-Requisites
- controlDB is already provisioned by IaC process like 07-IaC-Bicep or iac-synapse-dataplatform
- Create 1 or reuse an existing Service Principal. Take note of the Application (client) ID and the secret, they will be required later.
- Grant db_owner permission to the Service Principal
CREATE USER [<service_principal>] FROM EXTERNAL PROVIDER
GO
EXEC sp_addrolemember 'db_owner', [<service_principal>]
GO
This GitHub Action requires the following repository secrets:
- CLIENT_ID: Client/Application ID of the Service Principal.
- CLIENT_SECRET: Service Principal Secret.
- SUBSCRIPTION_ID: Azure Subscription ID of controlDB
- TENANT_ID: Entra Tenant ID of controlDB
- CONTROLDB_CONNECTIONSTRING: controlDB connection string in service principal authentication format.
Server=<SQL Server>;Authentication=Active Directory Service Principal; Encrypt=True;Database=controlDB;User Id=<Service Principal Client/Application ID>;Password=<Service Principal Secret>
Now, hit the Run Workflow on GitHub action ControlDB-deployment.yml to deploy database objects.
- Microsoft Fabric data platform using ELT Framework
- Azure Databricks data platform using ELT Framework
- Azure Synapse data platform using ELT Framework
You can collaborate in various ways, including:
- Pull Requests
- Update/Enrich Wiki documentation
- Raise issues when you spot one
- Answer questions in the discussion forum
Please contact me to be added as a contributor.
If you have any questions or need support, please contact the maintainer: