Here is a list of the best textbooks I have read on the following subjects
- machine learning
- computer science
There are essential. A proper mathematical education is one full of analysis and algebra and then some applications of these, such as group theory, differential equations, numerical analysis, and so on. All the following books provide that education.
A First Course in Real Analysis by M.H. Protter and C.B. Morrey.
Absolute essential. Explains all basic concepts of real analysis. Avoids the sensible delicacies of the actual construction of the continuum. There is little talk (say) on Dedekind cuts as to rigorous demonstrations of the real numbers. Naturally there is little talk of deeper concepts as Zorn’s lemma or posets. But a good understanding of this book is essential in nearly all science areas that need to use an epsilon sign. Fun exercise for books on real analysis: if they do not give a construction of the real numbers or do not explain one, it is likely that the book will end up cheating at some point, i.e. using ideas that it is trying to prove.
A good alternative is Introduction to Real Analysis by M. Stoll.
An Introduction to Complex Analysis by R.P. Agarwal, K. Perera, S. Pinelas.
Most complex analysis books are ‘ruined’ by giving a good theoretical introduction and then wandering off to some very applied problems and the theory specific to that problem. A lot of people just want to understand complex numbers.This book does that well and is comprised into ‘lecture’ series. Bonus points for its ending: it explains the Julia and Mandelbrot set. A lot of people regard the Mandelbrot set as the most complicated mathematical object. The theory here is relatively weak and introductory.
The Elements of Integration and Lebesgue measure, R.G.Bartle.
Quite simply the best introduction to measure theory. If you have a good understanding of real analysis (a nice way of saying it is a prerequisite) and you will find this book a very enjoyable read. Bartle provides a fantastic structure and concepts are not rushed.
A good alternative is Measure Theory by P. Halmos, although that is a more complicated book. The book Introduction to Real Analysis by M. Stoll provides a concise introduction to the Lebesgue measure and the Lebesgue integral, which can be studied in a short period of time. An extreme alternative is Measure Theory by J. Doob which is a considerably more complicated book to read due to stating several ‘rare’ results in measure theory.
A Course in Functional Analysis, J. Conway.
Simply fantastic. Takes a long time to read but it is worth it. Conway makes functional analysis exciting. At times it is like a story and Conway is full of quips. A good alternative is Real and Functional Analysis by S. Lang.
Probability Theory and Examples, R. Durrett.
Provides a measure theoretic explanation of probability. Nothing else to say – it treats probability very well but it is possible to find books that go far deeper than this. Good alternatives are Probability Theory: Independence, Interchangeability, Martingales by V.S. Chow, H. Teicher which is deeper but frustrating to read if you do not have a good measure theoretic background. For an easier read, Probability Models by S. Ross, which has very little measure theory.
Stochastic Differential Equations: An Introduction by Applications by B. Oksendal
The only book on stochastic differential equations that you will ever need. Simple and concise.
Linear Algebra, S. Lang.
Perhaps even more important than analysis. Linear algebra is used everywhere. Statistics and computer sciences are essentially subsets of linear algebra. Gives a great introduction. There are many alternatives!
Groups and Symmetry by M.A Armstrong.
A fun read and good preparation for Galois theory or abstract algebra. Sometimes it does not get to the action quick enough, but that is only because a lot of explanation is needed to understand the concept of symmetry, especially when trying to visualise it. Something fun to do whilst you read this book: get jelly beans and little short wood sticks and you can make the three dimensional shapes yourself to see the rotational symmetries and so on!
Galois Theory by D.A Cox.
Galois theory is beautiful but not easy to get into! All the more remarkable that Galois managed to learn most of this by the age of 21. Has a great introduction and fantastic exercises – Cox helps you to solve the exercises with well selected hints. Some books are essentially just lecture notes glued together. This is clearly a book, it is a good read. There are many alternatives but a lot of them either miss the point or make things too complex (by doing everything in complex numbers).
Nearly all statistics books are to be avoided. Any elementary statistic book will make (if not many) at least a few severe mistakes. This is fixed by providing a mathematical treatment. Ok, some people do not care for mathematics, but a lot of what elementary statistical books say are shaky – some ideas are just not true, sometimes completely false. The best ones are ones that give concise, quick explanations and define what they use. Most waffle on.
Theory of Statistics by M. J. Schervish.
A theoretical treatment of statistics. Prerequisites are measure theory, some real and functional analysis and linear algebra. Explains everything extremely well. Good exercises too. Good point: avoids the mistake that nearly all elementary statistics books make. How? By not being elementary. Bad point: quite expensive. A good alternative is Theory of Statistics by J.E. Gentle, which has the same idea in mind, but is perhaps not as good.
Mathematical Statistics and Data Analysis by J. A. Rice.
I think this is the best elementary statistics textbook around. Explains everything reasonally well. I don’t think any measure theory is needed. A lot of graduate courses in non-mathematical (or non-theoretical) statistics are based on this.
Statistical Inference by G. Casella, R.L. Berger
This book explains concepts such as point estimation, Bayesian inference, regression, maximum likelihood estimation, etc, in a fairly rigorous manner. Not very theoretical but explains everything on inference, even a touch of machine learning and mathematics. Fantastic exercises too.
Time Series: Theory and Methods by P.J. Davis, R.A. Davis.
This is the only book I have ever found that provides a rigorous treatment to time series. I try to avoid all elementary time series books as they waffle on, are misguiding and sometimes incorrect. Some knowledge of real analysis and measure theory is needed. If anyone can find a good alternative to this (there are many books that treat continuous time series – they are just books on stochastic differential equations) please let me know.
Combine statistics with mathematics and you get machine learning. Programming is the language that is spoken in machine learning: mathematics is the rules and statistics is the arena in which it is played in.
Kernel Methods for Pattern Analysis by J.S. Taylor.
Best book I have read on kernels. Gives a good introduction of kernels, some functional analysis is needed. Provides many applications of kernels. Taylor also has a video series on kernel methods if you search around on Google.
Support Vector Machines by I. Steinwart, A. Christmann.
Provides a rigorous (meaning measure-theoretic) description of support vector machines and a lot of other supervised learning techniques. Again, completely fantastic. Combine this with the previous book and you will see why kernels are so essential in a lot of different applications. The problem is the prerequisites needed: (1) data experience, (2) good mathematical analysis and (3) statistical experience. Without (1) the book will seem ‘undirected’ or not relevant in a lot of situations, without (3) it will be hard to apply a lot of techniques and without (2) you will not understand anything!
I have omitted a lot of other books on machine learning as most of them are plain garbage.
Books need to avoid waffling on and get to the point. Books that attempt to explain complex mathematical ideas – such as mathematical finance (Ito’s lemma, the Wiener process, local martingales, etc) through elementary methods end up completely failing. It is not that they are ‘tricking’ you (some of the ‘proofs’ are because they are incorrect) but that they are not explaining to you what the actual subject is about. For example, when I read Mathematics of Financial Derivatives by P. Wilmott and then read Mathematics of Arbitrage by F. Delbaen, I had to erase nearly everything I learned from the former as lot of it was wrong: not rigorous or even heuristic, just wrong. The infamous ‘proof’ of Ito’s lemma by the Taylor series approximation which somehow stops after the second order was the best example – it is completely wrong. Then one is lead to believe that stochastic integrals can be seen as derivatives integrated and this goes back to problems that we have always had in mathematics: not being careful. Cauchy shocked Euler and many others by his definition of a function, Wieirstrass shocked Poincare by exhibiting a non-differentiable yet everywhere continuous function and there are countless many others: want to know if someone read a good analysis text-book? Just ask them how the chain rule works, or what a ‘differential’ is. The point is, that is why the best textbooks make you suffer a little bit, because you learn carefully.
Someone wise once said “worthy problems prove their worthiness by fighting back”, well textbooks are like that: they give you the rules early on and make them explicit (definitions, not bolded words in the middle of no-where), they give you instructions (theorems) and tell you what to do (algorithms) and ultimately you challenge yourself by doing the exercises.