Platform Operations Engineer (Hybrid- Greenfield Opportunity)
Match Made Tech
Los Angeles, CA, United States
Full Time
Expires On: 03/04/2026
Platform Operations Engineer/ Support Engineer (Production Stability & Incident Response)
Location:Irvine, CA (Hybrid: Onsite Mon–Thurs, Remote Fridays)
UNABLE TO OFFER SPONSORSHIPS- US CITIZENS AND GREEN CARD HOLDERS ONLY
Type:Contract-to-Hire | High-Impact, Greenfield Organization
Level:Mid–Senior Engineer
Role Summary
We are seeking a Platform OperationsEngineer/ Application Support Engineerto play a critical role in maintaining production stability across a fast-growing, distributed engineering platform. This role is designed for a senior-minded engineer who thrives in high-signal, high-responsibility environments and excels at incident triage, coordination, and recovery —without owning product features or making unilateral production changes.
Your mission is simple but vital:make production quieter, more predictable, and less disruptive— for customers and engineers alike.
This role sits at the intersection of engineering, operations, and communication. You’ll act as theprimary daytime responder for production incidents, ensuring issues are triaged efficiently, escalations are controlled, and service owners are engaged through clear, approved processes.
What You’ll Do
- Serve as thefirst responder for production incidentsduring core business hours
- Triage issues across distributed systems and identify severity, scope, and impact
- Coordinate incident response with service owners, infrastructure, and engineering teams
- Drive service restoration throughdocumented runbooks and approved escalation paths
- Communicate clearly and calmly during incidents — status, impact, mitigation, and next steps
- Document incidents, postmortems, and recurring failure patterns
- Surfacesystemic reliability risksand help turn recurring issues into planned work
- Build and maintain incident workflows, runbooks, and escalation standards
- Reduce ad-hoc interruptions to feature teams by creating asingle, predictable entry pointfor production issues
What You Will Not Do
- Build or own product features
- Make product or architectural decisions
- Own services long-term
- Perform ad-hoc, emergency, or unauthorized production changes
- Act as a “fix everything” engineer
What Success Looks Like
- Fewer production escalations reaching feature teams
- Faster incident response and recovery times
- Clear ownership and calm coordination during outages
- Reduced engineer interruptions and burnout
- Improved visibility into reliability and operational risks
- Recurring issues become planned engineering work — not repeated firefighting
Coverage Expectations (Initial Phase)
- Primary focus oncore business hours
- Acts as thedaytime production responder
- After-hours coverage via alightweight on-call rotation
- Escalations limited tohigh-severity incidents only
This phase prioritizesstability, predictability, and process— not full 24/7 coverage.
What You Bring
- Mid–senior engineering experience withreal production incident exposure
- Strong debugging skills acrossdistributed systems and backend services
- Comfort operating in production environments withincomplete context
- Proven judgment under pressure and ability to lead incident response
- Experience investigating unfamiliar systems safely and methodically
- Clear written and verbal communication skills — especially during outages
- Familiarity with incident management, runbooks, and escalation frameworks
Seniority is defined by judgment and autonomy — not years of experience.
Why This Role Is Compelling
- High ownershipwithout feature churn
- Clear boundaries and expectations — no “everything engineer” trap
- Greenfield opportunity to define production support theright way
- Direct impact on reliability, engineering focus, and team health
- Startup-level influence within a stable, well-funded organization
- Hybrid work model with strong collaboration and visibility