AI safety reading group
Weekly discussions of readings on technical and philosophical topics in AI safety.
AI Safety is the field trying to figure out how to stop AI systems from breaking the world, and in particular, trying to do so before they break the world. Readings will span from potential issues arising from future advanced AI systems, to technical topics in AI control, to present-day issues.
- Organisers: Matthew Farrugia-Roberts and Dan Murfet.
- Time: Thursday evenings, 9pm AEST, most weeks (see home page for most up-to-date schedule).
- Venue: The Rising Sea.
Directions for joining discussions:
- New to metauni? Follow these instructions to get started (including: join the Discord server, create a Roblox account, enable Roblox voice chat).
- Launch the Roblox experience The Rising Sea.
- Step into matomatical’s portal (bottom-right corner of stack), or use the menu: “Pockets” > “Go to pocket” > type address “Gemini Pulsar 1”.
Completing weekly readings is recommended, but ultimately optional. The discussion sessions begin with a summary of the reading, lead by Matt (unless otherwise noted).
Upcoming readings and discussions:
2022.06.30: Rachel Thomas and Louisa Bartolo, 2022, “AI harms are societal, not just individual”, fast.ai blog. Discussion lead by Dan.
There will be no discussion on 2022.07.07 or 2022.07.14; we will resume on 2022.07.21 with a reading to be announced.
Past readings and discussions:
2022.06.09: Norbert Wiener, 1960, “Some moral and technical consequences of automation”, Science.
2022.06.16: Stephen M. Omohundro, 2008, “The basic AI drives”, Proceedings of the 2008 conference on Artificial General Intelligence.
2022.06.23: Nick Bostrom, 2012, “The superintelligent will: Motivation and instrumental rationality in advanced artificial agents”, Minds and Machines.
The nature of superintelligence:
- Key chapters of Bostrom’s Superintelligence
- On embedded agency (Demski & Garrabrant, 2020)
- Eric Drexler’s report Reframing Superintelligence / Comprehensive AI Services (CAIS)
- Key chapters of Stuart Russell’s Human Compatible
- Papers on CIRL / assistance games
- Papers on corrigibility
- Papers on the off-switch game
- Papers on mesa optimisation / optimisation daemons
- Papers on the complexity of values thesis
- Key chapters of Brian Christian’s The Alignment Problem
- On algorithmic bias
- On interpretability
- On aligning recommender systems
- On future digital ethics
- On machine consciousness
Sources of readings (clearly with much mutual overlap):