Here is a list of the best textbooks I have read on the following subjects
- machine learning
- computer science
There are essential.
A First Course in Real Analysis by M.H. Protter and C.B. Morrey.
Explains all the basic concepts of real analysis. Avoids the sensible delicacies of the actual construction of the continuum. There is little talk (say) on Dedekind cuts as opposed to rigorous demonstrations of the real number properties. Naturally there is little talk of deeper concepts as Zorn’s lemma or posets. But a good understanding of this book is essential in nearly all science areas that need to use an epsilon sign. Caveat for books on real analysis: if they do not give a construction of the real numbers or do not explain one, it is likely that the book will end up cheating at some point, i.e. using ideas that it is trying to prove.
A good alternative is Introduction to Real Analysis by M. Stoll.
An Introduction to Complex Analysis by R.P. Agarwal, K. Perera, S. Pinelas.
Most complex analysis books are ‘ruined’ by giving a good theoretical introduction and then wandering off to some very applied problems and the theory specific to that problem. A lot of people just want to understand complex numbers.This book does that well and is comprised into ‘lecture’ series. Bonus points for its ending: it explains the Julia and Mandelbrot set. A lot of people regard the Mandelbrot set as the most complicated mathematical object. The theory here is relatively weak and introductory.
The Elements of Integration and Lebesgue measure, R.G.Bartle.
Quite simply the best introduction to measure theory. If you have a good understanding of real analysis (a nice way of saying it is a prerequisite) and you will find this book a very enjoyable read. Bartle provides a fantastic structure and concepts are not rushed.
A good alternative is Measure Theory by P. Halmos, although that is a more complicated book. The book Introduction to Real Analysis by M. Stoll provides a concise introduction to the Lebesgue measure and the Lebesgue integral, which can be studied in a short period of time. An extreme alternative is Measure Theory by J. Doob which is a considerably more complicated book to read due to stating several ‘rare’ results in measure theory.
A Course in Functional Analysis, J. Conway.
Simply fantastic. Takes a long time to read but it is worth it. Conway makes functional analysis exciting. At times it is like a story and Conway is full of quips. A good alternative is Real and Functional Analysis by S. Lang.
Probability Theory and Examples, R. Durrett.
Provides a measure theoretic explanation of probability. Nothing else to say – it treats probability very well but it is possible to find books that go far deeper than this. Good alternatives are Probability Theory: Independence, Interchangeability, Martingales by V.S. Chow, H. Teicher which is deeper but frustrating to read if you do not have a good measure theoretic background. For an easier read, Probability Models by S. Ross, which has very little measure theory.
Stochastic Differential Equations: An Introduction by Applications by B. Oksendal
The only book on stochastic differential equations that you will ever need. Simple and concise.
Linear Algebra, S. Lang.
Perhaps even more important than analysis. Linear algebra is used everywhere. Statistics and computer sciences are essentially subsets of linear algebra. Gives a great introduction. There are many alternatives!
Groups and Symmetry by M.A Armstrong.
A fun read and good preparation for Galois theory or abstract algebra. Sometimes it does not get to the action quick enough, but that is only because a lot of explanation is needed to understand the concept of symmetry, especially when trying to visualise it. Something fun to do whilst you read this book: get jelly beans and little short wood sticks and you can make the three dimensional shapes yourself to see the rotational symmetries and so on!
Galois Theory by D.A Cox.
Galois theory is beautiful but not easy to get into! All the more remarkable that Galois managed to learn most of this by the age of 21. Has a great introduction and fantastic exercises – Cox helps you to solve the exercises with well selected hints. Some books are essentially just lecture notes glued together. This is clearly a book, it is a good read. There are many alternatives but a lot of them either miss the point or make things too complex (by doing everything in complex numbers).
From a mathematician’s point of view (that is, one of rigor, clear explanation, a lack of ambiguity, …) a lot of statistics books are to be avoided. Elementary statistic book make many severe mistakes. Some people do not care for mathematics, but some ideas expressed in elementary statistics books are just not true, sometimes being completely false in practice. The best ones are ones that give concise, quick explanations and define what they use. Most waffle on.
Theory of Statistics by M. J. Schervish.
A theoretical treatment of statistics. Prerequisites are measure theory, some real and functional analysis and linear algebra. Explains everything extremely well. Good exercises too. Good point: avoids the mistake that nearly all elementary statistics books make. How? By not being elementary. Bad point: quite expensive. A good alternative is Theory of Statistics by J.E. Gentle, which has the same idea in mind, but is perhaps not as good.
Mathematical Statistics and Data Analysis by J. A. Rice.
I think this is the best elementary statistics textbook around. Explains everything reasonably well. I don’t think any measure theory is needed. A lot of graduate courses in non-mathematical (or non-theoretical) statistics are based on this.
Statistical Inference by G. Casella, R.L. Berger
This book explains concepts such as point estimation, Bayesian inference, regression, maximum likelihood estimation, etc, in a fairly rigorous manner. Not very theoretical but explains everything on inference, even a touch of machine learning and mathematics. Fantastic exercises.
Time Series: Theory and Methods by P.J. Davis, R.A. Davis.
This is the only book I have ever found that provides a rigorous treatment to time series. I try to avoid all elementary time series books as they waffle on, are misguiding and sometimes incorrect. Some knowledge of real analysis and measure theory is needed. If anyone can find a good alternative to this (there are many books that treat continuous time series – they are just books on stochastic differential equations) please let me know.
Combine statistics with mathematics and you get machine learning. Programming is the language that is spoken in machine learning: mathematics is the rules and statistics is the arena in which it is played in.
Kernel Methods for Pattern Analysis by J.S. Taylor.
Best book I have read on kernels. Gives a good introduction of kernels, some functional analysis is needed. Provides many applications of kernels. Taylor also has a video series on kernel methods if you search around on Google.
Support Vector Machines by I. Steinwart, A. Christmann.
Provides a rigorous (meaning measure-theoretic) description of support vector machines and a lot of other supervised learning techniques. Again, completely fantastic. Combine this with the previous book and you will see why kernels are so essential in a lot of different applications. The problem is the prerequisites needed: (1) data experience, (2) good mathematical analysis and (3) statistical experience. Without (1) the book will seem ‘undirected’ or not relevant in a lot of situations, without (3) it will be hard to apply a lot of techniques and without (2) you will not understand anything!
There are many more books on machine learning but the ones presented here are, in my view, the best.
Someone wise once said “worthy problems prove their worthiness by fighting back”, textbooks are like that: they give you the rules early on and make them explicit (definitions, not bolded words in the middle of no-where full of hand-waving clutter), they give you instructions (theorems) and tell you what to do (exercises) and ultimately you challenge yourself.