This is the homepage of a seminar on Singular Learning Theory (SLT), a theory applying algebraic geometry to statistical learning theory founded by Sumio Watanabe. The seminar takes place at metauni. For tentative beginnings of applications of SLT to alignment see the metauni Alignment Plan.

The canonical references are Watanabe’s two textbooks:

**The gray book:**S. Watanabe “Algebraic geometry and statistical learning theory” 2009.**The green book:**S. Watanabe “Mathematical theory of Bayesian statistics” 2018.

Some other introductory references:

- Matt Farrugia-Roberts’ MSc thesis, October 2022, Structural Degeneracy in Neural Networks.
- Spencer Wong’s MSc thesis, May 2022, From Analytic to Algebraic: The Algebraic Geometry of Two Layer Neural Networks.
- Liam Carroll’s MSc thesis, October 2021, Phase transitions in neural networks.
- Tom Waring’s MSc thesis, October 2021, Geometric Perspectives on Program Synthesis and Semantics.
- S. Wei, D. Murfet, M. Gong, H. Li , J. Gell-Redman, T. Quella “Deep learning is singular, and that’s good” 2022.
- Edmund Lau’s blog Probably Singular.
- Shaowei Lin’s PhD thesis, 2011, Algebraic Methods for Evaluating Integrals in Bayesian Statistics.
- Jesse Hoogland’s blog posts: general intro to SLT, and effects of singularities on dynamics.

## Upcoming seminars

**30-3-23**(*Dan Murfet*): SLT and Alignment Pt 2**6-4-23**(*Rohan Hitchcock*): Induction heads**TBD**(*Russell Goyder*): Saxe et al

Some topics and papers to be discussed:

- “Scaling laws for reward model overoptimization” L. Gao, J. Schulman, J. Hilton (played a role in the creation of ChatGPT).
- “Statistical inference, Occam’s razor, and statistical mechanics on the space of probability distributions” V. Balasubramanian.
- “Toy models of superposition” Anthropic.
- “A mathematical theory of semantic development in deep neural networks” A. Saxe, J. McClelland, S. Ganguli 2019.
- See this survey of scaling laws.

## Previous seminars

### 2023

**19-1-23**(*Russell Goyder*): Physical entropy vs information-theoretic entropy Pt 2 (video, pocket, transcript)**26-1-23**(*Dan Murfet*): Towards in-context learning in SLT Pt 1 (video, pocket, transcript)**2-2-23**(*Dan Murfet*): Towards in-context learning in SLT Pt 2 (video, pocket, transcript)**9-2-23**(*Dan Murfet*): Solid state physics and SLT Pt 1 (video, notes, pocket, transcript)**16-2-23**(*Dan Murfet*): Solid state physics and SLT Pt 2 (video, notes, pocket, transcript)**23-2-23**(*Edmund Lau*): Variational Bayesian posterior and learning resolutions (video, pocket, transcript)**2-3-23**(*Samuel Jolly*): Toy models of superposition and SLT (video, pocket, transcript)**9-3-23**(*Dan Murfet*): SLT and Alignment Pt 1 (video, pocket, transcript)**16-3-23**(*Edmund Lau*): Occam’s razor following Balasubramanian (video, pocket, transcript)**23-3-23**(*Ben Gerraty*): Toy models of superposition Pt 2

### 2022

Below you can find the seminars for 2022, with videos and pocket links (which take you to the virtual world where the talk took place, with the blackboards just as we left them at the end of the talk).

**13-1-22**(*Dan Murfet*): What is learning? Singularities and pendulums (video, transcript).**13-1-22**(*Edmund Lau*): The Fisher information matrix (video, transcript).**20-1-22**(*Edmund Lau*): Fisher information, KL-divergence and singular models (video, transcript).**20-1-22**(*Liam Carroll*): Markov Chain Monte Carlo (video, transcript).**27-1-22**(*Liam Carroll*): Neural networks and the Bayesian posterior (video, transcript)**27-1-22**(*Spencer Wong*): Rings, ideals and the Hilbert basis theorem (video, transcript).**3-2-22**(*Spencer Wong*): From analytic to algebraic I (video, transcript).**3-2-22**(*Ken Chan*): Resolution of singularities (video, transcript).**10-2-22**(*Dan Murfet*): Introduction to density of states (video, notes, transcript).**10-2-22**(*Spencer Wong*): Polynomial division (video, transcript).**17-2-22**(*Spencer Wong*): From analytic to algebraic II (video, transcript).**17-2-22**: Working session 1 (video, transcript).**24-2-22**(*Edmund Lau*): Free energy asymptotics (video, transcript)**24-2-22**: Working session 2 (video, transcript)**3-3-22**(*Spencer Wong*): From analytic to algebraic III (video, transcript).**3-3-22**: Working session 3 (video, transcript).**10-3-22**(*Tom Waring*): Regularly parametrised models (video, transcript).**17-3-22**(*Edmund Lau*): Bounding the partition function (video, transcript).**24-3-22**(*Edmund Lau*): The influence of sampling (video, transcript).**7-4-22**(*Edmund Lau*): Main Theorem 1 (video, transcript).**14-4-22**(*Edmund Lau*): Main Theorem 2 (video, transcript).**8-9-22**(*Matt Farrugia-Roberts*): Complexity of rank estimation (video, pocket).**15-9-22**(*Matt Farrugia-Roberts*): Piecewise-linear paths in equivalent networks (video, pocket).**22-9-22**(*various*) A minimal introduction to the geometry of tanh networks (video, pocket, transcript).**29-9-22**(*Dan Murfet*): Information theory I - entropy and KL divergence (video, pocket, transcript).**6-10-22**(*Zhongtian Chen*): The Kraft-McMillan theorem (video, pocket, transcript).**13-10-22**(*Edmund Lau*): Asymptotic learning curve and renormalizable condition in statistical learning theory (video, pocket, transcript).**13-10-22**(*Dan Murfet*): Intro to blowing up (cross-posted from the Abstraction seminar, video, pocket).**20-10-22**(*Dan Murfet*): State of scaling laws 2022 (video, pocket, transcript).**27-10-22**(*Dan Murfet*): In-context learning (video, pocket, transcript).**3-11-22**(*Dan Murfet*): Open problems (video, pocket, transcript).**10-11-22**(*Edmund Lau*): Newton diagrams in singular learning theory (video, pocket, transcript).**17-11-22**(*Matt Farrugia-Roberts*): Overview of MSc thesis (video, pocket).**24-11-22**(*Dan Murfet*): Jet schemes I (video, pocket, transcript).**1-12-22**(*Matt Farrugia-Roberts*): Overview of MSc thesis Pt 2 (video, pocket).**8-12-22**(*Dan Murfet*): Jet schemes II (video, pocket, transcript).**15-12-22**(*Matt Farrugia-Roberts*): Overview of MSc thesis Pt 3 (video, pocket).**22-12-22**(*Russell Goyder*) Physical entropy vs information-theoretic entropy (video, pocket, transcript).

## Background

- A. Karpathy on Transformers (on data distribution).

Some rough handwritten notes:

- Deep Learning Theory 1: Why deep learning theory?
- Deep Learning Theory 2: Thermodynamics of Singular Learning Theory
- Deep Learning Theory 3: Phase transitions
- Singular Learning Theory 4: Local RLCT
- Singular Learning Theory 5: Symmetry and RLCT
- Singular Learning Theory 6: Generalisation and Power Laws
- Singular Learning Theory 8: Calculations for feedforward networks
- Singular Learning Theory 12: Density of states
- Singular Learning Theory 13: Asymptotics of the free energy