Site Reliability Engineer (SRE)

Position: Site Reliability Engineer (SRE)
Location: Fully Remote (Offices in Limassol, Kyiv, London, Tbilisi)
Working Hours: Availability to work between 5 PM and 8 AM CET

Company Overview:

Our client is one of the fastest-growing B2B iGaming solutions providers in Europe, with over 100 remote team members across the continent. They specialize in delivering high-quality software platforms, payment solutions integrations, marketing tools, and technical support to clients in the online casino and betting sectors. As they continue to expand, they are looking for a talented and growth-oriented individual to help enhance and streamline their infrastructure.

The company offers a dynamic and supportive environment where your input is valued and your professional growth is encouraged. Don’t miss the opportunity to join their exciting journey!

Role Overview:

As a Site Reliability Engineer (SRE), you will bridge the gap between development and operations to ensure that services and platforms remain reliable, scalable, and performant — even under high transaction volumes and regulatory requirements.

You will work closely with backend engineers, DevOps, InfoSec, and operational teams to build automation, improve observability, and respond to incidents.

Key Requirements:

• 2–5 years in SRE / Infrastructure / Platform / Production DevOps
• Strong Linux experience in production
• Networking: TCP/IP, DNS, HTTP, load balancers, TLS
• Kubernetes in production (cluster ops, networking, ingress)
• AWS experience (EC2, ALB/NLB, RDS, S3, IAM, EKS or self-managed K8s)
• Terraform, Ansible (IaC), Helm (optional)
• Observability tools: Prometheus, Alertmanager, Grafana, ELK, Loki
• Containers and image lifecycle (Docker)
• Troubleshooting across application, network, and infrastructure layers
• CI/CD pipelines: Jenkins, GitLab CI, GitHub Actions, ArgoCD
• Incident response experience and participation in post-incident reviews
• Availability for late-evening and night shifts 17:00–01:00 or 00:00–08:00 CET

Bonus Skills:

• Experience with high-load or real-time systems
• CDNs, log aggregation, real-time analytics
• Scripting: Python, Bash, Go
• Knowledge of Java/PHP ecosystems
• Databases: PostgreSQL, MySQL, MongoDB
• Message systems: Kafka, Redis, RabbitMQ
• External API integrations

Key Responsibilities:

• Ensure reliability, scalability, and performance of distributed services
• Operate and improve Kubernetes clusters
• Manage AWS-based infrastructure
• Build and maintain IaC with Terraform and Ansible
• Enhance monitoring, logging, and alerting stacks
• Handle production incidents end-to-end and reduce MTTR
• Maintain SLOs, SLIs, and error budgets for critical systems
• Automate operations and reduce manual toil
• Collaborate with engineering teams to embed SRE practices

Success Metrics:

• < 1% downtime for critical services
• SLO: 99.85–99.95% availability
• 90–95% of infrastructure managed as code
• Consistent reduction of MTTR
• Completed post-incident actions and improved system resilience

Why You'll Love Working Here:

International Team: Be part of a respectful, supportive, and goal-driven team.
Freedom & Responsibility: We trust you to take ownership of your work.
Сompetitive Salary: We offer competitive compensation based on your skills and experience.
Fully Remote: Work from anywhere, with optional access to our offices in Limassol, Kyiv, London, or Tbilisi.
Flexible Schedule: We measure performance, not time.
Unlimited Paid Time Off: Enjoy paid vacation and sick leave days for a great work-life balance.
Career Development: Opportunities for continuous learning and growth.
Team-Building & Fun: Enjoy awesome corporate parties and team-building events throughout the year.
Referral Bonuses: Earn rewards when you refer talented friends to join us.
Private Medical Insurance: Choose the right coverage for you, with full/partial compensation based on cost.
Flexible Benefits: Get compensated for activities and expenses like gym subscriptions, language courses, Netflix, spa days, etc.
Learning Foundation: Participate in our biannual raffle for the chance to learn something new outside of your role.

Site Reliability Engineer (SRE)

high-performance iGaming systems with automation, observability, and 24/7 reliability in a fully remote, flexible, and rewarding environment

Site Reliability Engineer (SRE)

high-performance iGaming systems with automation, observability, and 24/7 reliability in a fully remote, flexible, and rewarding environment

Already working at OnHires ?