Senior Site Reliability Engineer K8s (remote) at Replicated


Company Logo

Replicated is Hiring

Job Info:
  • Company Replicated
  • Position Senior Site Reliability Engineer K8s (remote)
  • Location United States
  • Source Himalayas
  • Published November 09, 2023(45+ days ago)
  • Category DevOps
  • Type Full-Time
  • Experience Senior
  • Salary $155k - $195k


Job Description

Replicated is the modern way to ship and monitor multi-prem software. Replicated helps software vendors quickly and securely deploy their applications to any customer using a single architecture. The Replicated platform provides all of the tools needed to operationalize and scale the distribution of Kubernetes applications into any enterprise environment.

Our customers include HashiCorp, Puppet, SmartBear, Jama, Swimlane, Tripwire, Acrolinx, and Knime and many other fast-growing enterprise software vendors. We're a Series C funded company (over $80m raised) with long term investors and a long term focus.

Replicated is committed to cultivating an efficient, respectful workplace. We know that innovation thrives on teams where diverse points of view come together to solve hard problems in ways that are just now possible. As such, we explicitly seek people that bring diverse life experiences, diverse educational backgrounds, diverse cultures, and diverse work experiences.

We are fully remote and plan to stay that way! We're open to any state in the US. In addition, for some roles, we're open to candidates in Canada, the UK, Israel, Australia, and New Zealand.

Replicated is hiring a Senior Site Reliability Engineer to join our growing team! The Site Reliability Engineering team works to ensure that the company's services maintain the high level of reliability needed by our engineers and vendors. You’ll monitor and troubleshoot systems (Containers, Kubernetes, Cloudflare), using tooling to increase efficiency (Terraform, Github, Flux, Datadog).

Replicated builds products for independent software vendors, which are primarily used by infrastructure and release engineering teams. You’ll need to understand our customers’ needs as they influence the way our SREs build, manage and maintain our own infrastructure. As such, you’ll need to embed yourself in our engineering pods and be part of the product development lifecycle. We’re proud to champion a blameless culture focused on continuous improvement, and highly value interpersonal skills as critical in handling incidents, effectively communicating technical issues, and managing conflicts in a productive way.

This role is based in the US.

The Vision for Success:

  • Manage the Replicated software infrastructure and the Kubernetes clusters it runs on.

  • Embed yourself with various product development pods to support their needs and act as a subject matter expert in the areas of infrastructure, security and systems reliability.

  • Troubleshoot failures in tooling and infrastructure and build sustainable fixes

  • Collaborate with development teams to establish internal SLOs and SLIs and to improve the observability of our systems.

  • Write code and help support new services, products, and features.

  • Grow our SRE practice at Replicated to support an enhanced level of maturity.

  • Participate in on-call responsibility and lead incident response by remediating or triaging issues.

The Value you Will Bring:

  • 3+ years experience as a Site Reliability Engineer

  • Strong command of at least 1 programming language

  • Must have hands-on experience deploying Kubernetes in the cloud

  • Deep experience with at least one cloud provider (AWS, GCP, Azure)

  • Must have hands-on experience with EKS, Terraform and DataDog. Additional value if you’ve got experience with Cloudflare and Tailscale

  • Our systems are complex, and you bring your inherent passion to learn to the role

  • You have a growth mindset, the curiosity to learn, and commitment to a culture of psychological safety

  • Excellent collaboration and technical communication skills

Bonus Points:

  • Strong, well reasoned opinions on technology

  • Development experience in Go, Typescript, or Bash

  • Solid understanding of CI/CD and the modern SDLC

  • With the recent release of our Compatibility Matrix product, a preference is given to those with deep experience in AWS plus one other cloud provider

  • Emphasis on the human components that make for a great SRE practice

  • You’ve signed up for our free trial and used the product before your interviews.

In your first 30 Days

  • Learn as much as you can about the company, team and product

  • Complete hands-on training with the product, complete an onboarding checklist and meet with team members from across the company

  • Improve the onboarding process as you notice rough edges, bringing your fresh perspectives to identify areas where we can do better

  • Spend time with the team understanding our work and current initiatives

  • Pair with members of the team to grow your knowledge of our infrastructure and ask questions to understand how we got to where we are

  • Make at least one improvement to our GitOps systems, review and approve numerous PRs from team members.

In your first 60 days

  • Contribute individual work to existing initiatives to deepen your understanding of Replicated’s infrastructure

  • Shadow engineers in the on-call rotation, learning the process and making suggestions for improvements (both technical and procedural)

  • Examine the systems you are learning, seeking to understand the “why” and challenge the answers as appropriate

  • Continue to grow your understanding of Replicated - how our products are developed, building relationships with individuals, how our services connect and interact

In your first 90 days

  • Take a leadership role for one of the team’s priorities, coordinating within and beyond the infrastructure team to accomplish the goal as you see fit

  • Use your expertise to improve the reliability of Replicated services in collaboration with development and security teams

  • Join the on-call rotation as a fully fledged member, leading incident command and learning reviews as needed

  • Continue to learn and grow. Replicated is committed to ongoing individual growth and there will always be opportunities that require it

Pay transparency

At Replicated, we value our teammates as individuals who are stronger together. We offer a robust pay and benefits package that rewards employees for their contributions to our success, supports their well-being, and helps all of us create a great remote work environment.

In the US, the salary range for this role is as follows:

Sr. Site Reliability Engineer I: $155,000 - $180,000

Sr. Site Reliability Engineer II: $175,000 - $195,000

This is dependent on several factors, including level, qualifications, and experience. We also offer stock options, a strong health insurance package, as well as a unique home office allowance & a professional development budget. An overview is on our careers page here: https://www.replicated.com/careers/

Not sure you meet 100% of our qualifications? Please apply anyway!

We invest in our team and love candidates who are eager to learn and grow. We have a fantastic team of highly collaborative individuals who enjoy learning, growing, and mentoring others.

OUR CORE VALUES

Care Deeply: Care deeply about the work that you do. Because of that you are constantly learning and willing to go out on a limb, challenge assumptions, go back to first principles, etc.

Longterm: Treat every interaction as part of a 30 year relationship, you’ll see everyone down the road again as customers, partners, coworkers, etc.

Curious: We're always learning and we approach everyone and every problem with curiosity. When needed we challenge assumptions, and go back to first principles.

BENEFITS

We offer strong benefits to help you stay healthy and productive. For the US, our benefits are listed below:

  • Health/Dental/Vision

  • Life/AD&D

  • LTD/STD

  • FSA

  • 401K

  • Stock options

  • Partner perk programs

  • Generous time off, we expect you to take a minimum of 3 weeks of per year

  • Laptop+accessories you need to get set up

  • Generous home office set up allowance or co-working space allowance - up to $10,000 per year!

  • Curiosity Budget to help you keep learning and growing!

Replicated is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We encourage applicants of all backgrounds and we work to make sure that all team members have an equal opportunity to succeed.

We do not accept unsolicited assistance from any headhunters, recruitment firms or any other third party for any of our job openings. Any unsolicited resumes sent from anyone other than the candidate, in any format, to any person at Replicated, will be considered Replicated property. Replicated will NOT pay a fee for any placement resulting from the receipt of an unsolicited resume.

Originally posted on Himalayas


✉️