Site Reliability Engineer at Cognition


Company Logo

Cognition is Hiring

Job Info:
  • Company Cognition
  • Position Site Reliability Engineer
  • Location San Francisco
  • Source Empllo
  • Published April 09, 2026
  • Category DevOps
  • Type Full-Time


Job Description

📋 Description

  • Define and own SLOs, SLIs, and error budgets for Devin and Windsurf.
  • Build monitoring, alerting, and observability for service health.
  • Lead incident response with speed and blameless postmortems.
  • Create runbooks and tooling for sustainable on-call.
  • Own CI/CD pipelines and deployment infrastructure.
  • Reduce toil with automation and developer tooling.

🎯 Requirements

  • Deep exp running production systems at scale: SLOs, on-call, incident command.
  • Strong software fundamentals; SRE writes real code, not just configuring tools.
  • Cloud infra (AWS, GCP, or Azure), Kubernetes, and Terraform.
  • Experience building and owning CI/CD pipelines and deployment infrastructure.
  • Strong observability instincts; instrument systems and design useful alerts.
  • Proven track record reducing toil through automation.

🎁 Benefits

  • Small, selective team shipping products used by thousands of developers.
  • High ownership and trust; set the reliability bar.
  • Environment rewards proactive, systematic reliability as a craft.

✉️