**Overview**
Join UKG's enterprise Site Operations team as a Problem Manager, where your expertise will directly impact millions of workers worldwide. At UKG, we're passionate about creating workforce solutions that matter - helping people get paid, advance their careers, and transform industries. You'll drive systematic improvements to our SaaS platform reliability while working with cutting-edge cloud technologies in a collaborative, learning-focused environment that celebrates both innovation and results.
**
Responsibilities **
• Manage comprehensive problem lifecycle from identification through resolution closure • Facilitate blame-free post-incident reviews and structured root cause analysis sessions using methodologies like 5 Whys and Fishbone diagrams • Create detailed postmortems with actionable remediation roadmaps; ensure timely implementation and validation of corrective measures • Convert operational incidents into prioritized engineering initiatives with full tracking to completion • Analyze incident patterns and failure trends; design and coordinate systematic prevention strategies • Integrate problem management with service level objectives, error budgets, and uptime goals • Champion cross-team accountability while escalating critical reliability concerns to executive leadership with comprehensive impact assessments • Collaborate with Observability, Release Engineering, and Security organizations to address monitoring, testing, and dependency vulnerabilities • Establish and monitor key problem management KPIs and executive reporting • Ensure adherence to governance frameworks and change management protocols for enterprise SaaS operations
**
Requirements **
• 5+ years in SaaS operations, Site Reliability Engineering, incident management, or problem resolution within enterprise settings • Bachelor's degree in Computer Science, Information Systems, Engineering, or comparable hands-on experience • Proven track record leading root cause investigations and driving cross-functional remediation in cloud-native environments (AWS, Azure, GCP) • Solid understanding of distributed architecture, microservices, containerization (Kubernetes/Docker), CI/CD pipelines, and Infrastructure as Code •
Experience with monitoring and incident management platforms such as Datadog or Prometheus
Submit your application directly to UKG.
🔗 Apply on Employer Site →