Job Description
Job Summary
We are seeking a highly experienced Senior / Lead Linux Engineering Support Engineer to lead and develop a small team supporting engineering systems within a fast-paced AI-focused environment.
This role combines deep Linux expertise with strong leadership, automation, and DevOps practices to ensure systems are reliable, scalable, and supportable at scale. A key aspect of the role is establishing and operating within a configuration-as-code environment, where system configuration and operational processes are managed through automation, pipelines, and source control rather than manual administration.
You will be responsible for leading incident response, driving operational improvements, and setting standards for how Linux systems are managed and supported across the organisation.
While the role includes leadership responsibilities, it will initially require a hands-on approach, including direct involvement in troubleshooting, system support, and automation efforts, while building team capability and scaling processes.
Working closely with engineering teams, platform engineers, and infrastructure specialists, you will ensure systems remain stable, performant, and aligned with evolving business and product delivery needs.
The Team
You’ll be joining a multi-disciplinary team with strong technical skills and a very supportive culture. We work closely together, regularly share knowledge, and your skills will make a direct impact on our business. It’s an exciting and pivotal moment for us right now, with plenty of new projects ahead. If you're looking to solve interesting problems and see your work deliver real-world results, this is the team for you.
Responsibilities and Duties
Lead, mentor, and develop a team of Linux Engineering Support Engineers, establishing clear roles, responsibilities, and ways of working
Own and oversee support for Linux-based systems and engineering environments, ensuring stability, performance, and availability
Act as an escalation point for complex technical issues and outages, providing hands-on support where required
Diagnose and resolve high-impact system and interoperability issues across mixed and distributed environments
Perform hands-on investigation and troubleshooting to understand issues and drive effective solutions
Lead incident response activities, including triage, coordination, and resolution
Own and drive Root Cause Analysis (RCA) processes, ensuring preventative improvements are identified and implemented
Establish and improve incident management processes, driving operational maturity and reliability
Drive adoption of automation and configuration-as-code practices across Linux systems
Ensure system changes are delivered through controlled, auditable processes wherever possible
Oversee development and implementation of automation solutions for system management and operational tasks
Promote and enforce use of Git-driven workflows and CI/CD pipelines for configuration and operational processes
Identify and prioritise opportunities to reduce manual effort through automation and improved tooling
Work closely with engineering teams to support development environments and system requirements
Act as a senior technical liaison between engineering teams and infrastructure/platform functions
Support onboarding of new systems, services, and environments using standardised and automated approaches
Ensure system configurations remain consistent and aligned with defined standards and governance
Oversee integration points (e.g. identity, CI/CD, tooling) and ensure issues are resolved effectively
Identify and drive improvements in system performance, scalability, and maintainability
Contribute to and enforce documentation, standards, and operational best practices
Ensure systems meet audit, compliance, and governance requirements, with full traceability of changes
Essential
Extensive experience administering and supporting Linux-based systems in complex technical or engineering environments
Strong troubleshooting skills across operating systems, networking, storage, and application layers
Proven experience diagnosing and resolving complex technical issues, including across mixed or distributed environments
Proven experience handling major incidents and outages, including leading resolution and contributing to Root Cause Analysis (RCA)
Strong experience with automation and scripting (e.g. Bash, Python, or similar)
Strong experience with configuration management or infrastructure-as-code tools (e.g. Ansible, Terraform, Puppet, or similar)
Experience working with configuration-as-code practices and Git-driven workflows
Experience designing, implementing, or supporting CI/CD pipelines for configuration and operational processes
Strong understanding of system interoperability across distributed environments
Experience working within defined standards, governance frameworks, and controlled processes
Strong communication skills and ability to work closely with engineering, platform, and infrastructure teams
Experience mentoring or supporting the development of other engineers
Ability to operate effectively across time zones in a distributed organisation
Proven ability to operate independently, set direction, and deliver outcomes
Desirable
Experience leading or coordinating incident response activities
Experience working alongside DevOps, platform, or infrastructure engineering teams
Experience with monitoring, observability, and logging systems
Experience supporting AI/ML or high-performance computing environments
Understanding of identity and access management concepts
Experience building or scaling operational processes or support functions
Experience administering and supporting Linux-based systems in a technical or engineering environment