This is the homepage of a seminar on Singular Learning Theory (SLT), a theory applying algebraic geometry to statistical learning theory founded by Sumio Watanabe. The seminar takes place at metauni. For applications of SLT to alignment see the SLT Alignment Plan, and the 2023 workshop.

The canonical references are Watanabe’s two textbooks:

**The gray book:**S. Watanabe “Algebraic geometry and statistical learning theory” 2009.**The green book:**S. Watanabe “Mathematical theory of Bayesian statistics” 2018.

Some other introductory references:

- Liam Carroll’s sequence: Distilling Singular Learning Theory.
- Jesse Hoogland’s blog post: general intro to SLT.
- Matt Farrugia-Roberts’ MSc thesis, October 2022, Structural Degeneracy in Neural Networks.
- Spencer Wong’s MSc thesis, May 2022, From Analytic to Algebraic: The Algebraic Geometry of Two Layer Neural Networks.
- Liam Carroll’s MSc thesis, October 2021, Phase transitions in neural networks.
- Tom Waring’s MSc thesis, October 2021, Geometric Perspectives on Program Synthesis and Semantics.
- S. Wei, D. Murfet, M. Gong, H. Li , J. Gell-Redman, T. Quella “Deep learning is singular, and that’s good” 2022.
- Edmund Lau’s blog Probably Singular.
- Shaowei Lin’s PhD thesis, 2011, Algebraic Methods for Evaluating Integrals in Bayesian Statistics.

## S2 2023

**20-7-23**(*Dan Murfet*): The Research Agenda (video)**27-7-23**(*Dan Murfet*): Intro to Developmental Biology (video)**10-8-23**(*Ben Gerraty*): Hamiltonian Monte Carlo and the SLT Hamiltonian**17-8-23**(*Dan Murfet*): In-context learning and implicit Bayesian inference (paper, video)**24-8-23**(*Jesse Hoogland*): “Saddle-to-saddle dynamics in deep linear networks” A. Jacot et al 2021.**31-8-23**(*Edmund Lau*): Quantifying degeneracy in singular models via the learning coefficient**6-9-23**(*Dan Murfet*): Research updates**14-9-23**(*Arthur Conmy*): Automated circuit discovery (paper)**21-9-23**(*Nisch*): “A mathematical theory of semantic development in deep neural networks” A. Saxe, J. McClelland, S. Ganguli 2019.**28-9-23**(*Alok Singh*): “Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit” B. Barak et al 2022.**5-10-23**(*Matt Farrugia-Roberts*): “You and your TPU”

Unscheduled:

- Acquisition of chess knowledge in AlphaZero (paper)

## Previous seminars

### S1 2023

**19-1-23**(*Russell Goyder*): Physical entropy vs information-theoretic entropy Pt 2 (video, pocket, transcript, notes)**26-1-23**(*Dan Murfet*): Towards in-context learning in SLT Pt 2 (video, pocket, transcript)**2-2-23**(*Dan Murfet*): Towards in-context learning in SLT Pt 3 (video, pocket, transcript)**9-2-23**(*Dan Murfet*): Solid state physics and SLT Pt 1 (video, notes, pocket, transcript)**16-2-23**(*Dan Murfet*): Solid state physics and SLT Pt 2 (video, notes, pocket, transcript)**23-2-23**(*Edmund Lau*): Variational Bayesian posterior and learning resolutions (video, pocket, transcript)**2-3-23**(*Samuel Jolly*): Toy models of superposition and SLT (video, pocket, transcript)**9-3-23**(*Dan Murfet*): SLT and Alignment Pt 1 (video, pocket, transcript)**16-3-23**(*Edmund Lau*): Occam’s razor following Balasubramanian (video, pocket, transcript)**23-3-23**(*Ben Gerraty*): Toy models of superposition Pt 2 (video, pocket, transcript**30-3-23**(*Dan Murfet*): SLT and Alignment Pt 2 (video, notes, pocket, transcript)**6-4-23**(*Rohan Hitchcock*): Induction heads (video, pocket, transcript)**13-4-23**(*Eric Michaud*): The Quantization Model of Neural Scaling (paper)**20-4-23**(*Zhongtian Chen*): Jet schemes of monomial ideals (video)**27-4-23**(*Dan Murfet*): Primer planning session (video, pocket)**25-5-23**(*Neel Nanda*): Mechanistic Interpretability (video)

### 2022

Below you can find the seminars for 2022, with videos and pocket links (which take you to the virtual world where the talk took place, with the blackboards just as we left them at the end of the talk).

**13-1-22**(*Dan Murfet*): What is learning? Singularities and pendulums (video, transcript).**13-1-22**(*Edmund Lau*): The Fisher information matrix (video, transcript).**20-1-22**(*Edmund Lau*): Fisher information, KL-divergence and singular models (video, transcript).**20-1-22**(*Liam Carroll*): Markov Chain Monte Carlo (video, transcript).**27-1-22**(*Liam Carroll*): Neural networks and the Bayesian posterior (video, transcript)**27-1-22**(*Spencer Wong*): Rings, ideals and the Hilbert basis theorem (video, transcript).**3-2-22**(*Spencer Wong*): From analytic to algebraic I (video, transcript).**3-2-22**(*Ken Chan*): Resolution of singularities (video, transcript).**10-2-22**(*Dan Murfet*): Introduction to density of states (video, notes, transcript).**10-2-22**(*Spencer Wong*): Polynomial division (video, transcript).**17-2-22**(*Spencer Wong*): From analytic to algebraic II (video, transcript).**17-2-22**: Working session 1 (video, transcript).**24-2-22**(*Edmund Lau*): Free energy asymptotics (video, transcript)**24-2-22**: Working session 2 (video, transcript)**3-3-22**(*Spencer Wong*): From analytic to algebraic III (video, transcript).**3-3-22**: Working session 3 (video, transcript).**10-3-22**(*Tom Waring*): Regularly parametrised models (video, transcript).**17-3-22**(*Edmund Lau*): Bounding the partition function (video, transcript).**24-3-22**(*Edmund Lau*): The influence of sampling (video, transcript).**7-4-22**(*Edmund Lau*): Main Theorem 1 (video, transcript).**14-4-22**(*Edmund Lau*): Main Theorem 2 (video, transcript).**8-9-22**(*Matt Farrugia-Roberts*): Complexity of rank estimation (video, pocket).**15-9-22**(*Matt Farrugia-Roberts*): Piecewise-linear paths in equivalent networks (video, pocket).**22-9-22**(*various*) A minimal introduction to the geometry of tanh networks (video, pocket, transcript).**29-9-22**(*Dan Murfet*): Information theory I - entropy and KL divergence (video, pocket, transcript).**6-10-22**(*Zhongtian Chen*): The Kraft-McMillan theorem (video, pocket, transcript).**13-10-22**(*Edmund Lau*): Asymptotic learning curve and renormalizable condition in statistical learning theory (video, pocket, transcript).**13-10-22**(*Dan Murfet*): Intro to blowing up (cross-posted from the Abstraction seminar, video, pocket).**20-10-22**(*Dan Murfet*): State of scaling laws 2022 (video, pocket, transcript).**27-10-22**(*Dan Murfet*): In-context learning (video, pocket, transcript).**3-11-22**(*Dan Murfet*): Open problems (video, pocket, transcript).**10-11-22**(*Edmund Lau*): Newton diagrams in singular learning theory (video, pocket, transcript).**17-11-22**(*Matt Farrugia-Roberts*): Overview of MSc thesis (video, pocket).**24-11-22**(*Dan Murfet*): Jet schemes I (video, pocket, transcript).**1-12-22**(*Matt Farrugia-Roberts*): Overview of MSc thesis Pt 2 (video, pocket).**8-12-22**(*Dan Murfet*): Jet schemes II (video, pocket, transcript).**15-12-22**(*Matt Farrugia-Roberts*): Overview of MSc thesis Pt 3 (video, pocket).**22-12-22**(*Russell Goyder*) Physical entropy vs information-theoretic entropy (video, pocket, transcript, notes).

## Background

- A. Karpathy on Transformers (on data distribution).

Some rough handwritten notes:

- Deep Learning Theory 1: Why deep learning theory?
- Deep Learning Theory 2: Thermodynamics of Singular Learning Theory
- Deep Learning Theory 3: Phase transitions
- Singular Learning Theory 4: Local RLCT
- Singular Learning Theory 5: Symmetry and RLCT
- Singular Learning Theory 6: Generalisation and Power Laws
- Singular Learning Theory 8: Calculations for feedforward networks
- Singular Learning Theory 12: Density of states
- Singular Learning Theory 13: Asymptotics of the free energy