By using our website, you agree to the use of cookies to enhance your browsing experience, analyze site traffic, and personalize content. To learn more, see our Privacy Policy.
By clicking "Accept" you allow cookies that improve your experience on our site, help us analyze site performance and usage, and enable us to show relevant marketing content.
By clicking "Accept all" you allow cookies that improve your experience on our site, help us analyze site performance and usage, and enable us to show relevant marketing content. You can manage cookie settings below. By clicking “Confirm selection” you agree with the current settings.
As NetActuate’s site reliability engineer, you’ll have the opportunity to manage NetActuate’s complex challenges of scale, while using your expertise in coding, algorithms, complexity analysis and large-scale system design.
This role combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems.
SRE ensures that NetActuate’s services, both internally critical and externally-visible systems, have reliability, uptime appropriate to users’ needs and a fast rate of improvement.
Our site reliability engineer will also keep a watchful eye on our systems capacity and performance. Software development work will focus on optimizing existing systems, building infrastructure, and eliminating work through automation. This is a full-time, 100% remote position.
Position Responsibilities / Requirements
Engage in and improve the whole lifecycle of services—from inception and design, deployment, operation, and refinement.
We make use of slack, zoom, whatsapp and Skype for internal and external communication and you are expected to be available during your expected work hours. You may be required to work additional hours to continue work on projects, or during schedules that do not impact customer use of infrastructure.
Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews.
Expected to work remotely on an assigned schedule and may be called upon for other projects or assignments with reasonable notice of schedule changes/modifications.
Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
Scale systems sustainably through mechanisms like automation; evolve systems by pushing for changes that improve reliability and velocity.
Practice sustainable incident response and blameless postmortems.
Ready to Join Us?
Submit your application and let’s explore the possibilities together.