Site Reliability Engineering Manager

Engineering Team | San Francisco, CA

Dropbox is the home for your most important stuff—now we're bringing it to life with a growing family of products. As we scale our global brand, there’s plenty of space for you to grow alongside us and simplify life for millions of people around the world.

Our engineering team is architecting a family of products that handle over a billion files a day. We take on the complexities of technology that affect everyday life, so that people can get back to living and doing their best work.

Dropbox's Site Reliability Engineering (SRE) team is a hybrid software/systems group which works with traditional software engineering, capacity engineering, and infrastructure teams to ensure that Dropbox runs smoothly. Managing a SRE team requires a high degree of technical mastery, the ability to brutally prioritize and execute, and a focus on growing teams both by recruiting and mentorship.

Responsibilities

  • Manage engineers working with infrastructure and product engineering teams. Example services may include our metadata storage infrastructure run on MySQL, Go based processes serving as RPC systems for front-end components, or front-end components themselves such as the Photos tab
  • Understand technical architectures, failure domains, tooling/automation, product launch plans, disaster recovery/business continuity plans, and other issues
  • Create plans for prioritizing technical and resourcing challenges within the infrastructure organization
  • Partner with product management, network engineering, product engineering, and other related groups
  • Help engineers develop their careers, assigning them to projects tailored to their skill levels, long-term skill development, personalities, and work styles
  • Build teams and work closely with Dropbox recruiting team members, which involves sourcing/engaging candidates, interviewing, organizing Dropbox participation in conferences/events, and onboarding new employees
  • Balance the need to "keep things running" with allocating time to long-term, high-impact projects
  • Assess employee performance frequently by providing feedback on an ongoing basis, address under-performance, and recognize excellent performance

Requirements

  • BS/MS in Computer Science, Engineering, or a related technical discipline or equivalent experience
  • At least three years of direct management and leadership experience at a technology company
  • Previous experience with hiring and performance management, including working with under-performers
  • Sound knowledge of Linux and TCP/IP networks
  • Ability to code well in at least one language
  • Above average knowledge of basic large-scale internet service architectures (such as load balancing, LAMP, CDNs)
  • Good understanding of how to think about data durability (think backups, max time to recovery, and generally how to avoid losing data at all costs)
Back to Engineering Team

Other open positions for the Engineering Team