🏢

Cloud Operations Problem Manager - Enterprise SaaS | Plantation, FL

UKG

Senior

📍 Plantation, FL 🏙 Plantation, Florida ⏳ Closes Jul 02, 2026

📋 Job Description

**Overview**

Join UKG's enterprise Site Operations team as a Problem Manager, where your expertise will directly impact millions of workers worldwide. At UKG, we're passionate about creating workforce solutions that matter - helping people get paid, advance their careers, and transform industries. You'll drive systematic improvements to our SaaS platform reliability while working with cutting-edge cloud technologies in a collaborative, learning-focused environment that celebrates both innovation and results.

Responsibilities **

• Manage comprehensive problem lifecycle from identification through resolution closure • Facilitate blame-free post-incident reviews and structured root cause analysis sessions using methodologies like 5 Whys and Fishbone diagrams • Create detailed postmortems with actionable remediation roadmaps; ensure timely implementation and validation of corrective measures • Convert operational incidents into prioritized engineering initiatives with full tracking to completion • Analyze incident patterns and failure trends; design and coordinate systematic prevention strategies • Integrate problem management with service level objectives, error budgets, and uptime goals • Champion cross-team accountability while escalating critical reliability concerns to executive leadership with comprehensive impact assessments • Collaborate with Observability, Release Engineering, and Security organizations to address monitoring, testing, and dependency vulnerabilities • Establish and monitor key problem management KPIs and executive reporting • Ensure adherence to governance frameworks and change management protocols for enterprise SaaS operations

Requirements **

• 5+ years in SaaS operations, Site Reliability Engineering, incident management, or problem resolution within enterprise settings • Bachelor's degree in Computer Science, Information Systems, Engineering, or comparable hands-on experience • Proven track record leading root cause investigations and driving cross-functional remediation in cloud-native environments (AWS, Azure, GCP) • Solid understanding of distributed architecture, microservices, containerization (Kubernetes/Docker), CI/CD pipelines, and Infrastructure as Code •

Experience with monitoring and incident management platforms such as Datadog or Prometheus

🔧 Skills & Technologies

Problem Management Root Cause Analysis Incident Management Cross-functional Leadership SRE/DevOps

Ready to Apply?

Submit your application directly to UKG.

🔗 Apply on Employer Site →

Opens the employer's application page in a new tab

⏳ Closes Jul 02, 2026

🏢

UKG

📍 Plantation