Infrastructure & Reliability Engineer at StratiFi Technologies
Job Description
1. Linux System Administration
- Navigates Linux environments confidently and efficiently
- Troubleshoots system-level issues (disk, memory, networking, processes)
- Manages services, cron jobs, and system configurations
- Understands file permissions, users, and security hardening
- Writes shell scripts to automate routine system tasks
- Can diagnose and resolve production server issues under pressure
2. AWS Cloud Infrastructure
- Configures and manages core AWS services (EC2, ECS, RDS, S3, Lambda, VPC)
- Understands networking concepts (VPCs, subnets, security groups, load balancers)
- Implements auto-scaling and high-availability patterns
- Uses CloudWatch for monitoring, logging, and alerting
- Manages IAM roles, policies, and access controls
- Can troubleshoot AWS service issues and performance problems
3. Docker & Containerization
- Writes and optimizes Dockerfiles for production services
- Manages container lifecycle (build, deploy, scale, debug)
- Understands container networking, volumes, and resource limits
- Works with container registries and image management
- Deploys containerized workloads on AWS ECS or similar platforms
- Troubleshoots container-level issues in production
4. CI/CD Pipelines & Deployment
- Builds and maintains CI/CD pipelines for automated testing and deployment
- Configures GitHub Actions or CircleCI workflows
- Implements deployment strategies (blue-green, rolling, canary)
- Manages environment variables and secrets in CI/CD systems
- Troubleshoots pipeline failures and optimizes build times
- Ensures pipelines enforce code quality and security checks
5. Scripting & Automation
- Writes clean, maintainable Bash and Python scripts
- Automates repetitive infrastructure and operational tasks
- Builds monitoring and alerting scripts for custom use cases
- Uses Python libraries for AWS (boto3), data processing, and API integrations
- Follows version control practices for all scripts and automation
- Documents scripts and provides usage instructions
6. Monitoring, Alerting & Incident Response
- Sets up monitoring dashboards that surface real issues, not noise
- Configures alerting thresholds that catch problems before customers notice
- Responds to incidents methodically and escalates appropriately
- Participates in post-mortems and contributes to actionable improvements
- Uses logs, metrics, and profiling tools to diagnose issues
- Stays calm when systems are down and contributes to resolution efforts
7. Version Control & Infrastructure as Code
- Uses Git confidently for daily work (branches, merges, rebases)
- Follows team branching strategy (GitFlow, trunk-based, etc.)
- Writes clear commit messages and PR descriptions
- Reviews infrastructure and automation code from teammates
- Manages infrastructure-as-code repositories (CloudFormation, Terraform)
8. Team Collaboration & Communication
- Collaborates effectively with backend, frontend, AI, and product teams
- Communicates infrastructure changes and their impact clearly
- Participates in planning, task breakdown, and sprint ceremonies
- Takes ownership of tasks and delivers reliably without constant supervision
- Shares knowledge and documents processes for team benefit
More Current Jobs at StratiFi Technologies
Apply to other open positions at StratiFi Technologies

