Remote Programming Jobs

SRE Manager/Program Lead - drive reliability across 700 strong engineering team!


  • java
  • amazon-web-services
  • python
  • database

Quick Facts

Boston, MA, US
Senior, Lead, Manager
Remote & On-site
$ 150,000 - 200,000

Job benefits

  • Unlimited vacation
  • Competitive salary and company equity
  • Free fruit, yogurt, cereal, snacks, and coffee
  • Flexible hours and work environment
  • An engineering culture team dedicated to developer growth & happiness
  • Management and technology workshops
  • $5,000 tuition reimbursement
  • Healthy @ HubSpot wellness programs
  • Resource Groups: PeopleofColor@HubSpot, LGBTQ Alliance, Women@HubSpot
  • Parental Benefits, Programming, & Perks (Bring Your Kids to Work Day)

Job description

About the role

The HubSpot Product team is made up of over 700 engineers, designers, product managers, and researchers. We’re passionate about building tools that help small and medium-sized businesses market, sell, and serve their customers — and ultimately, grow better.

Those tools end up in the HubSpot application platform, which itself is made up of thousands of services, workers, and jobs spanning over 100 teams and thousands of repos. Our teams work autonomously to deploy these systems across a common infrastructure, up to 2000 times a day. As we’ve grown to serve over 65,000 customers in 100 countries, reliability and stability has become just as important as speed and time to market. And as we’ve opened up our APIs, our product has moved to the core of many of our customers and partners business.

In 2019, we built an SRE team to help our product teams focus on delivering highly available and dependable products. This team is off to a great start; evangelizing, building tools, and embedding onto product teams. However, the team needs a strong communicator and thoughtful process leader to help scale its impact faster and more broadly.

What you’ll do

  • Work directly with the SRE team to identify and catalog learnings and best practices

  • Constantly communicate and evangelize best practices internally and externally, through writing and speaking 

  • Help run and improve blameless operational incident reviews (postmortems)

  • Help organize, run, and improve game days and chaos testing

  • Work with engineering leaders to give and receive feedback about how to improve reliability and incident reviews

  • Strategize and communicate to customers about reliability and operational incidents

  • Work across the entire HubSpot product team to proactively identify risks, organize those risks, and follow up with teams to ensure mitigation

  • Help design, iterate on, and run processes designed to improve reliability and performance while aligning with our product values to maintain product team autonomy and momentum

What we’re looking for

  • Experience with SRE culture, improving reliability with automation, chaos testing, and process improvement

  • Technical experience designing and operating systems and cloud infrastructure at scale

  • Experience implementing and iterating on process to improve outcomes with minimal disruption to team culture

  • Experience working across multiple stakeholders to drive effective change

Apply now