Why Machines Learn

Name: Why Machines Learn
Author: Anil Ananthaswamy

Anil Ananthaswamy

May 2025

Read

Highlights

ScienceTechnology

An exploration of machine learning technology, its underlying principles, and how algorithms learn from data.

← All books

There’s delicious irony in the uncertainty over Thomas Bayes’s year of birth. It’s been said that he was “born in 1701 with probability 0.8.”

· · ·

As Julie Delon of the Université Paris–Descartes says in her talks on the subject, “In high dimensional spaces, nobody can hear you scream.”

· · ·

With the math behind us, it’s rather simple to recap what SVMs do: They take datasets that are linearly inseparable in their original, relatively low-dimensional space and project these data into high enough dimensions to find an optimal linearly separating hyperplane, but the calculations for finding the hyperplane rely on kernel functions that keep the algorithm firmly anchored in the more computationally tractable lower-dimensional space.

· · ·

“It was incredible. Neural networks dominated machine learning in the eighties. And in the nineties, all of a sudden, everybody switched to kernel methods.”

· · ·

“A network could ‘solve a problem’ or have a function that was beyond the capability of a single molecule and a linear pathway,”7 Hopfield wrote. “Six years later I was generalizing this view in thinking about networks of neurons rather than the properties of a single neuron.”

· · ·

If a network requires more than one weight matrix (one for the output layer and one for each hidden layer), then it’s called a deep neural network: the greater the number of hidden layers, the deeper the network.

· · ·

LLMs such as ChatGPT are transformers; GPT stands for “generative pre-trained transformer.”