Site Reliability Engineer (SRE)
high-performance iGaming systems with automation, observability, and 24/7 reliability in a fully remote, flexible, and rewarding environment
Position: Site Reliability Engineer (SRE)
Location: Fully Remote (Offices in Limassol, Kyiv, London, Tbilisi)
Working Hours: Availability to work between 5 PM and 8 AM CET, in one of the following shifts: 17:00–01:00 or 00:00–08:00.
Company Overview:
Our client is one of the fastest-growing B2B iGaming solutions providers in Europe, with over 100 remote team members across the continent. They specialize in delivering high-quality software platforms, payment solutions integrations, marketing tools, and technical support to clients in the online casino and betting sectors. As they continue to expand, they are looking for a talented and growth-oriented individual to help enhance and streamline their infrastructure.
The company offers a dynamic and supportive environment where your input is valued and your professional growth is encouraged. Don’t miss the opportunity to join their exciting journey!
Role Overview:
As a Site Reliability Engineer (SRE), you will bridge the gap between development and operations to ensure that services and platforms remain reliable, scalable, and performant — even under high transaction volumes and regulatory requirements.
You will work closely with backend engineers, DevOps, InfoSec, and operational teams to build automation, improve observability, and respond to incidents.
Key Requirements:
Experience with AWS or hybrid data center setups
Reading logs and stacktraces to determine the root cause of incidents
Infrastructure as Code: Experience with Terraform, Helm, Ansible, (optional: Werf)
Linux administration and container orchestration (K8s) skills
Experience with monitoring/observability stacks: Prometheus, Grafana, ELK, Loki, etc.
Strong understanding of TCP/IP, DNS, and load balancers
Familiarity with incident response, postmortems, and blameless culture
Availability to work between 5 PM and 8 AM CET, in one of the following shifts: 17:00–01:00 or 00:00–08:00
Bonus Skills:
Background in high-throughput environments (e.g., financial, trading, iGaming)
Experience with CDNs, and real-time log aggregation
Proficiency in one or more scripting languages (Python, Bash, Go)
Knowledge of Java, PHP with their respective web-development frameworks
Hands-on experience with MSSQL, PostgreSQL, MongoDB, etc.
Exposure to Kafka, Redis, or other event-driven systems
Key Responsibilities:
Maintain and improve SLA/SLO/SLI metrics for critical systems (e.g., live games, sports betting, KYC, payments)
Manage and support highly available, scalable infrastructure (K8s, cloud, and bare metal)
Implement and manage monitoring, logging, and alerting systems (e.g., Prometheus, Grafana, Loki, ELK)
Automate deployments and operations using CI/CD pipelines (e.g., Jenkins, ArgoCD, Helm)
Conduct post-incident reviews, define action items, and reduce mean time to recovery (MTTR)
Participate in on-call rotation to ensure 24/7 system reliability
Secure infrastructure in line with regulations (e.g., player data integrity, jurisdictional compliance)
Collaborate with Dev, QA, DevOps, and Ops to improve services' stability and uptime
Success Metrics:
< 1% downtime for any user-/partner-facing services
SLO 99.95%
95% of infrastructure managed via code and automation
Documented runbooks and alert playbooks per service group
Why You'll Love Working Here:
International Team: Be part of a respectful, supportive, and goal-driven team.
Freedom & Responsibility: We trust you to take ownership of your work.
Сompetitive Salary: We offer competitive compensation based on your skills and experience.
Fully Remote: Work from anywhere, with optional access to our offices in Limassol, Kyiv, London, or Tbilisi.
Flexible Schedule: We measure performance, not time.
Unlimited Paid Time Off: Enjoy paid vacation and sick leave days for a great work-life balance.
Career Development: Opportunities for continuous learning and growth.
Team-Building & Fun: Enjoy awesome corporate parties and team-building events throughout the year.
Referral Bonuses: Earn rewards when you refer talented friends to join us.
Private Medical Insurance: Choose the right coverage for you, with full/partial compensation based on cost.
Flexible Benefits: Get compensated for activities and expenses like gym subscriptions, language courses, Netflix, spa days, etc.
Learning Foundation: Participate in our biannual raffle for the chance to learn something new outside of your role.
- Department
- DevOps / Cloud
- Role
- Site Reliability Engineer (SRE) / DevOps Engineer
- Locations
- Tbilisi, Cyprus, Remote , Canada, Mexico
- Remote status
- Fully Remote
Already working at OnHires ?
Let’s recruit together and find your next colleague.