The Infrastructure Reliability Engineer is responsible for ensuring the reliability, performance, availability, and quality of IT infrastructure through automation, monitoring, KPIs, and continuous process improvement.
This role bridges quality assurance, IT automation, and data reporting, contributing to continuous improvement, compliance with standards (ISO/IEC), and operational efficiency.
Key Responsibilities:
1. Quality Management & Compliance
- Implement, maintain, and improve the IT Quality System
- Ensure compliance with ISO standards (ISO 27001, ISO 20000, ISO 9001 or others depending on scope).
- Participate in internal and external audits; prepare required documentation and evidence in collaboration with the IT GOV team.
- Conduct gap analysis and define corrective / preventive action plans
- Promote best practices, procedures, and governance within the teams.
2. Reporting & Data Analytics
- Design and maintain Power BI dashboards and automated KPI reports.
- Manage data models and ETL flows
- Produce high‑value insights to support decision making and operational performance tracking.
- Ensure data quality, consistency, and reliability across reporting sources.
- Work with stakeholders to define business metrics and reporting needs.
3. Scripting & Automation
- Develop automation scripts using PowerShell/Python for routine operations, configuration tasks, and system health checks.
- Use Terraform to deploy, manage, and maintain infrastructure as code (IaC) across cloud/hybrid environments.
- Create reusable automation modules and maintain version control practices (Git).
- Optimize processes to reduce manual workload and increase efficiency.
4. Architecture Documentation (C4 Model)
- Use the C4 Model to produce clear architectural views:
- Document automation flows, reporting architecture, and infrastructure-as-code using C4 notations.
- Collaborate with architects and engineers to maintain accurate system documentation.
5. Operational Support & Continuous Improvement
- Collaborate with technical teams (infrastructure, cloud, security, service management).
- Identify opportunities for efficiency improvements and propose automation initiatives.
- Support incident, problem, and change management processes.
Required Skills & Experience:
✅ Technical Skills
- Power BI: dashboards, DAX, Power Query, data modeling.
- Scripting: PowerShell (advanced scripting, automation pipelines).
- Terraform: IaC modules, providers, deployment pipelines.
- Git / CI/CD tools (Azure DevOps, GitHub Actions, etc.).
- Good understanding of infrastructure, cloud (Azure/AWS), virtualization, or networking fundamentals.
✅ Quality & Compliance Skills
- Knowledge of ISO standards:
- ISO 27001 (information security)
- ISO 20000 (service management)
- ISO 9001 (quality management)
- Understanding of governance frameworks (ITIL, COBIT is a plus).
- Experience participating in or leading audits, assessments, and documentation reviews.
✅ Soft Skills
- Strong analytical and problem‑solving skills.
- Ability to translate technical concepts into business‑friendly reports.
- Excellent communication and documentation writing abilities.
- Autonomous, structured, and proactive mindset.
- Ability to work cross‑functionally with technical and management teams.
✅Education & Certifications (Nice to Have)
- Bachelor’s or Master’s degree in IT, Computer Science, Data Analytics or related field.
- Certifications such as:
- Power BI Data Analyst
- Terraform Associate
- ISO lead auditor or internal auditor
- ITIL Foundation