Data Engineer III
cencora
Job Description
Data Engineering & Pipeline Development
-
Design, build, and maintain large-scale, fault-tolerant data pipelines using Python/PySpark, Databricks, Delta Lake, and orchestration tools (e.g., Airflow, Azure Data Factory).
-
Develop and optimize ETL/ELT workflows to support ingestion, transformation, and modeling of large datasets into a Lakehouse using Delta Lake: batch ingestion from files, databases, APIs; streaming using Structured Streaming; handling semi-structured data (JSON, Parquet, Avro); ELT patterns using Spark SQL / PySpark; Incremental processing patterns; Databricks Jobs; External orchestrators (ADF, Airflow, etc.)
-
Hands‑on experience with SAP ECC or SAP S/4HANA data extraction and processing
-
Implement CDC, incremental loads, and full refresh patterns; handle schema evolution and data reconciliation.
-
Develop and maintain curated data models (bronze/silver/gold) and support BI/analytics consumption.
-
Optimize performance and cost (partitioning, Z-ORDER, file sizing, caching, cluster policies, job tuning).
-
Implement scalable data lake and analytical platform architectures on Azure, ensuring security, governance, and cost efficiency.
-
Automate repeatable ingestion processes using infrastructure as code (IaC) and Continuous Integration (CI)/Continuous Delivery (CD) deployment methodologies.
-
Develop robust data models and semantic layers to facilitate analytical consumption by auditors and Data Analytics teammates.
Data Quality, Monitoring & Governance
-
Create and manage data quality checks, anomaly detection routines, and automated alerting to ensure accuracy and integrity of audit datasets, and SLA-driven operations.
-
Establish repeatable processes for documenting data lineage, validation, reconciliation, and test coverage.
-
Implement scalable frameworks for metadata management, schema validation, and versioning of data pipelines.
Audit Collaboration & Analytics Support
-
Support IA audit execution by enabling access to clean, reliable, and well-documented datasets.
-
Provide SME-level guidance on data availability, data structures, pipeline behavior, and data limitations.
Standards, Innovation & Best Practices
-
Establish consistency in design patterns, coding approaches, documentation, and engineering standards.
-
Identify opportunities to modernize or optimize existing pipelines, architecture, or data processing patterns.
-
Contribute to the continuous improvement of the Internal Audit analytics program through automation, performance tuning, and new capability development.
-
Create and maintain technical documentation, runbooks, and onboarding guides.
-
Participate in code reviews and promote engineering best practices (testing, CI/CD, version control).
.