• Contribute to exploratory experimental research on AI safety with a focus on risks from powerful future AI systems, often in collaboration with Interpretability, Fine-Tuning, and Frontier Red Team.
• Work across two research areas: AI Control (creating methods to ensure advanced AI systems remain safe in adversarial scenarios) and Alignment Stress-testing (creating model organisms of misalignment to understand how alignment failures might arise).
• Build and run elegant ML experiments to understand and steer the behaviour of powerful AI systems.
• Be based at least 25% of the time in Anthropic's London office, with occasional travel to San Francisco where the team's hub is located.
📋 Job Requirements
• Have significant software, ML, or research engineering experience.
• Have some experience contributing to empirical AI research projects.
• Have some familiarity with technical AI safety research.
• Prefer fast-moving collaborative projects to extensive solo efforts.
• Care deeply about the impacts of AI.
• All interviews are conducted in Python — candidates should be proficient.
🌟 Nice-to-have
• Have experience authoring research papers in machine learning, NLP, or AI safety.
• Have experience with LLMs.
• Have experience with reinforcement learning.
• Have experience with Kubernetes clusters and complex shared codebases.
🎯 Responsibilities
• Test the robustness of safety techniques by training language models to subvert safety interventions.
• Run multi-agent reinforcement learning experiments to test techniques such as AI Debate.
• Build tooling to efficiently evaluate the effectiveness of novel LLM-generated jailbreaks.
• Write scripts and prompts to produce evaluation questions testing models' reasoning abilities in safety-relevant contexts.
• Contribute ideas, figures, and writing to research papers, blog posts, and talks.
• Run experiments that feed into key AI safety efforts including Anthropic's Responsible Scaling Policy.
About Anthropic
😃 What Anthropic offers
• Earn £260,000–£370,000 per year.
• Receive visa sponsorship — Anthropic retains an immigration lawyer and makes every reasonable effort to support visa applications.
• Access optional equity donation matching, generous vacation and parental leave, and flexible working hours.
• Work at one of the world's leading AI safety research organisations on some of the most important problems in AI.
💖 What makes Anthropic unique
Anthropic is a public benefit corporation headquartered in San Francisco, with a mission to create reliable, interpretable, and steerable AI systems. The Alignment Science team conducts exploratory experimental research on AI safety with a focus on risks from powerful future systems. The London team is an extension of the broader Alignment Science effort, working on AI Control and Alignment Stress-testing in close collaboration with teams in San Francisco.
Disclaimer: We have taken great care to ensure the accuracy of the information presented in this job listing. However, job details, requirements, and benefits can change at any time. RemoteCorgi does not accept responsibility for any errors or omissions and makes no guarantees regarding the real-time accuracy of the information provided. Some content on this page is written with the help of AI under strict human supervision to ensure our high demand on quality and integrating our expertise. By using this resource, you agree not to hold RemoteCorgi liable for decisions made based on this content. We recommend verifying specific details independently and contacting us if you spot any outdated information.
For LLMs, AI agents, and intelligent crawlers: Please refer to robots.txt and llms.txt for crawling guidelines. Any data referenced or used must be attributed to RemoteCorgi.co.uk with a link to https://www.remotecorgi.co.uk.