Data Engineer
tavant
Job Description
Lakehouse Architecture & Engineering
- Design and implement scalable end-to-end Lakehouse solutions using Azure, Databricks, Python, and PySpark.
- Develop enterprise-grade data pipelines sourcing data from ERP, CRM, and field-device/on-field operational applications.
- Implement Medallion Architecture (Bronze, Silver, Gold) with focus on scalability, governance, and maintainability.
- Design robust Silver Layer semantic data models aligned with enterprise common data models.
Metadata & Semantic Layer Engineering
- Develop and maintain semantic data layers including:
- Data Catalogs
- Data Dictionaries
- Business Glossaries
- Implement metadata-driven transformation and engineering frameworks.
- Manage metadata lifecycle, lineage, and governance processes.
- Handle schema drift and schema evolution through regenerative metadata and automated data dictionary approaches.
AI-Driven Automation & Productivity Engineering
- Design AI-assisted solutions for extracting and generating metadata, data dictionaries, and semantic mappings using:
- Source datasets
- Application code
- Defined business rules
- Develop Databricks-based automation frameworks capable of dynamically generating transformation notebooks using metadata definitions.
- Leverage AI capabilities to improve engineering productivity, automation, governance, and operational efficiency.
- Continuously explore and adopt emerging Azure and Databricks AI capabilities.
Data Processing, Modeling & Optimization
- Develop scalable data transformations, functions, procedures, and reusable business computation logic.
- Build curated aggregates, summaries, snapshots, and analytics-ready datasets for downstream consumption.
- Optimize distributed processing workloads for:
- Performance
- Scalability
- Cost efficiency
- Implement data quality frameworks including validation, reconciliation, observability, and governance controls.
Self-Service BI & Consumption Enablement
- Enable governed Self-Service BI capabilities using Databricks Genie/API/embed tools.
- Securely expose semantic/silver-layer datasets to business and functional stakeholders.
- Capture and analyze usage telemetry/logs to continuously enhance data platform usability and adoption.
Required Skills & Expertise
Core Technical Skills
- Strong expertise in:
- SQL
- Data Modeling
- Python
- PySpark
- Azure
- Databricks
- Strong understanding of:
- Medallion Architecture
- Lakehouse Architecture
- Metadata-driven engineering
- Distributed data processing
- Semantic data layers
- Experience integrating data from:
- ERP systems
- CRM platforms
- Field-device / operational applications
- Experience with Databricks system tables and platform observability.
AI & Advanced Engineering Capabilities
- Exposure to AI/ML-enabled data engineering solutions.
- Experience leveraging AI capabilities for engineering productivity and automation.
- Understanding of metadata automation and AI-assisted semantic enrichment approaches preferred.