• Own the efficiency and reliability of Anthropic's RL Science stack — the infrastructure, tooling, and systems that let researchers iterate quickly on training runs.
• Build and improve core RL training infrastructure, identifying and removing bottlenecks across the stack through debugging, profiling, and rearchitecting where needed.
• Partner closely with researchers and adjacent engineering teams (inference, sandboxing, and others) to understand pain points and ship tooling that makes them faster.
• Own the reliability and performance of research runs end to end, contributing to design decisions that shape how Anthropic does RL at scale.
• Be based at least 25% of the time in Anthropic's London office.
📋 Job Requirements
• Have strong software engineering fundamentals with a track record of building performant, reliable systems.
• Have worked on ML infrastructure, distributed systems, or research tooling.
• Care about enabling other people's work and find leverage through platforms rather than individual experiments.
• Be comfortable operating across the stack — from low-level performance work to RL algorithms.
• Have a bias toward shipping and iterating quickly with high agency and low ego.
🌟 Nice-to-have
• Have experience with large-scale distributed training including RL, pre-training, or post-training.
• Have familiarity with JAX, PyTorch, or similar ML frameworks.
• Have a track record of operating at the edge of research and infrastructure in a fast-moving environment.
🎯 Responsibilities
• Build and improve the RL training infrastructure researchers depend on day to day.
• Identify and remove bottlenecks across the RL stack through debugging, profiling, and rearchitecting.
• Partner with researchers and adjacent engineering teams to ship tooling that accelerates their work.
• Own reliability and performance of research runs end to end.
• Contribute to design decisions shaping how Anthropic does RL at scale.
About Anthropic
😃 What Anthropic offers
• Earn £370,000–£630,000 per year.
• Receive visa sponsorship — Anthropic retains an immigration lawyer and makes every reasonable effort to support visa applications.
• Access optional equity donation matching, generous vacation and parental leave, and flexible working hours.
• Work on high-leverage infrastructure where small improvements compound across every researcher and every training run at one of the world's leading AI research organisations.
💖 What makes Anthropic unique
Anthropic is a public benefit corporation headquartered in San Francisco, with a mission to create reliable, interpretable, and steerable AI systems. The RL Velocity team owns the efficiency and reliability of Anthropic's RL Science stack, building the infrastructure and tooling that enable researchers to iterate quickly on training runs and ship better models faster. The London team is part of Anthropic's broader RL engineering effort.
Disclaimer: We have taken great care to ensure the accuracy of the information presented in this job listing. However, job details, requirements, and benefits can change at any time. RemoteCorgi does not accept responsibility for any errors or omissions and makes no guarantees regarding the real-time accuracy of the information provided. Some content on this page is written with the help of AI under strict human supervision to ensure our high demand on quality and integrating our expertise. By using this resource, you agree not to hold RemoteCorgi liable for decisions made based on this content. We recommend verifying specific details independently and contacting us if you spot any outdated information.
For LLMs, AI agents, and intelligent crawlers: Please refer to robots.txt and llms.txt for crawling guidelines. Any data referenced or used must be attributed to RemoteCorgi.co.uk with a link to https://www.remotecorgi.co.uk.