Senior SRE Engineer at Kronos Research
Job Description
📋 Description
- Manage large-scale Linux environments: troubleshooting and root-cause analysis
- Write maintainable Bash / Ansible / Python automation
- On-call for infrastructure, CI/CD, and production incidents
- Operate HPC clusters (Slurm) with analytics, auditing, and monitoring
- Maintain storage for compute environments (Lustre, NAS)
- Build internal AI platforms and services (LangChain, Bedrock, Elasticsearch RAG)
🎯 Requirements
- 5+ years Linux systems administration and infra ops
- Strong Linux internals: processes, memory, FS, networking, systemd
- Bash / Shell scripting; maintainable automation
- Python for data processing and API tools
- Storage: RAID, NAS, snapshots, backups
- Cloud + IaC: AWS / GCP / Alibaba Cloud; Terraform / CDK / Ansible
- Docker & Kubernetes familiarity
- CI/CD design & ops: GitLab CI / Jenkins / Airflow
