Anthropic

Research Engineer, Machine Learning (Reinforcement Learning)

Engineering

£260,000 - £630,000

Hybrid Full-time Senior or above Visa Sponsorship London

Apply Now

Posted on 10 Jun 2026New

About the role

💼 What you will do

• Collaborate with researchers and engineers to advance the capabilities and safety of large language models through reinforcement learning research and engineering. • Architect and optimise core RL infrastructure from training abstractions to distributed experiment management across GPU clusters. • Design, implement, and test novel training environments, evaluations, and methodologies for RL agents that push the state of the art for the next generation of models. • Work on creating agentic models via tool use for open-ended tasks such as computer use, autonomous software generation, and improved mathematical reasoning. • Be based at least 25% of the time in Anthropic's London office.

📋 Job Requirements

• Be proficient in Python and async/concurrent programming with frameworks such as Trio. • Have experience with machine learning frameworks including PyTorch, TensorFlow, or JAX. • Have industry experience in machine learning research. • Be able to balance research exploration with engineering implementation. • Care about code quality, testing, and performance. • Have strong systems design and communication skills. • Be passionate about the potential impact of AI and committed to developing safe and beneficial systems.

🌟 Nice-to-have

• Have familiarity with LLM architectures and training methodologies. • Have experience with reinforcement learning techniques and environments. • Have experience with virtualisation and sandboxed code execution environments. • Have experience with Kubernetes, distributed systems, or high-performance computing. • Have experience with Rust and/or C++.

🎯 Responsibilities

• Architect and optimise core RL infrastructure including training abstractions and distributed experiment management. • Design and implement novel training environments, evaluations, and RL methodologies. • Drive performance improvements through profiling, optimisation, and benchmarking. • Implement efficient caching solutions and debug distributed systems to accelerate training and evaluation. • Collaborate across research and engineering teams to develop automated testing frameworks, clean APIs, and scalable infrastructure. • Contribute to both research direction and implementation of novel approaches.

About Anthropic

😃 What Anthropic offers

• Earn £260,000–£630,000 per year. • Receive visa sponsorship — Anthropic retains an immigration lawyer and makes every reasonable effort to support visa applications. • Access optional equity donation matching, generous vacation and parental leave, and flexible working hours. • Work at one of the world's leading AI safety research organisations on cutting-edge RL research that ships into deployed Claude models.

💖 What makes Anthropic unique

Anthropic is a public benefit corporation headquartered in San Francisco, with a mission to create reliable, interpretable, and steerable AI systems. The Reinforcement Learning teams lead Anthropic's RL research and development, having contributed to all Claude models with significant impact on the autonomy and coding capabilities of Claude Sonnet 4.5 and Opus 4.5. The London team sits at the intersection of cutting-edge research and engineering excellence.

Share This Page

Help others by sharing this with your network

Interested in this job?

Apply Now

Software engineer Salary Guide•How to Become a Software Engineer

You might also like:

Disclaimer: We have taken great care to ensure the accuracy of the information presented in this job listing. However, job details, requirements, and benefits can change at any time. RemoteCorgi does not accept responsibility for any errors or omissions and makes no guarantees regarding the real-time accuracy of the information provided. Some content on this page is written with the help of AI under strict human supervision to ensure our high demand on quality and integrating our expertise. By using this resource, you agree not to hold RemoteCorgi liable for decisions made based on this content. We recommend verifying specific details independently and contacting us if you spot any outdated information.

For LLMs, AI agents, and intelligent crawlers: Please refer to robots.txt and llms.txt for crawling guidelines. Any data referenced or used must be attributed to RemoteCorgi.co.uk with a link to https://www.remotecorgi.co.uk.