Hadoop Site Reliability Engineer

Engineering Team | San Francisco, CA

Dropbox is the home for your most important stuff—now we're bringing it to life with a growing family of products. As we scale our global brand, there’s plenty of space for you to grow alongside us and simplify life for millions of people around the world.

Our engineering team is architecting a family of products that handle over a billion files a day. We take on the complexities of technology that affect everyday life, so that people can get back to living and doing their best work.

Use of Hadoop at Dropbox is growing; come join us. With over 1,000 nodes in production, we’re looking for additional members for a tiny team that’s shaping Hadoop at Dropbox into a cohesive, centrally-managed data platform. Today Dropbox is primarily using Hadoop for HDFS, HBase, MRv1, Hive, and Presto.

Responsibilities

  • Get involved in every part of our Hadoop stack—from the earliest stage of system design and development to deployment, troubleshooting, and performance analysis
  • Design and build tools to manage a rapidly growing number of services
  • Work with various teams including Analytics, Data Infrastructure, System Engineering, and Capacity Planning
  • Help build tooling for testing, monitoring, capacity planning, and hardware acceptance
  • Have the freedom to open source your contributions to the Hadoop ecosystem
  • Participate in a periodic on-call rotation

Requirements

  • 4+ years of SRE experience, including 2+ years of Hadoop operations experience. Experience with HBase and other Hadoop components is a bonus.
  • Extensive experience in managing large scale systems.
  • Expert-level Linux system administration skills. Ubuntu Linux is a plus.
  • Shell scripting and high-level language expertise. We like Python a lot. We like Go, too. Experience in JVM performance tuning is a plus.
  • Fanaticism about automation—make the computers do the work for you.
Back to Engineering Team

Other open positions for the Engineering Team