Cloudflare and Google Outages Prove Why You Need Infrastructure Independence

Explore
Careers

Site Reliability Engineer

Job Description / Functions

  • As NetActuate’s site reliability engineer, you’ll have the opportunity to manage NetActuate’s complex challenges of scale, while using your expertise in coding, algorithms, complexity analysis and large-scale system design.
  • This role combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems.
  • SRE ensures that NetActuate’s services, both internally critical and externally-visible systems, have reliability, uptime appropriate to users’ needs and a fast rate of improvement.
  • Our site reliability engineer will also keep a watchful eye on our systems capacity and performance. Software development work will focus on optimizing existing systems, building infrastructure, and eliminating work through automation. This is a full-time, 100% remote position.

Position Responsibilities / Requirements

  • Engage in and improve the whole lifecycle of services—from inception and design, deployment, operation, and refinement.
  • We make use of slack, zoom, whatsapp and Skype for internal and external communication and you are expected to be available during your expected work hours. You may be required to work additional hours to continue work on projects, or during schedules that do not impact customer use of infrastructure.
  • Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews.
  • Expected to work remotely on an assigned schedule and may be called upon for other projects or assignments with reasonable notice of schedule changes/modifications.
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
  • Scale systems sustainably through mechanisms like automation; evolve systems by pushing for changes that improve reliability and velocity.
  • Practice sustainable incident response and blameless postmortems.

Ready to Join Us?

Submit your application and let’s explore the possibilities together.