D. J. C. MacKay was born in Britain in 1967. He received the B.A.\ degree in Natural Sciences (Physics) from Trinity College, Cambridge in 1988. He studied as a Fulbright scholar at Caltech for a doctorate which was awarded in 1992. After four years as the Royal Society Smithson Research Fellow at Darwin College, Cambridge, he became a lecturer in the Department of Physics of the University of Cambridge, where he is now a Professor. His publications include papers on Bayesian methods, adaptive models, error-correcting codes, and human-computer interfaces; and a textbook on Information Theory, Inference, and Learning Algorithms (Cambridge, 2003). \bio{D. J. C. MacKay}{was born in Britain in 1967. He received the B.A. degree in Natural Sciences (Physics) from Trinity College, Cambridge in 1988. He then studied as a Fulbright scolar at Caltech for a doctorate which was awarded in 1992. After spending four years as the Royal Society Smithson Research Fellow at Darwin College, Cambridge, he became a lecturer in the Department of Physics at the University of Cambridge. He was promoted to a readership in 1999. He has published papers on a wide variety of topics including Bayesian methods, adaptive models and error-correcting codes.} \begin{biography}{David J.C. MacKay} was born in Britain in 1967. He received the B.A. degree in Natural Sciences (Physics) from Trinity College, Cambridge in 1988. He then studied as a Fulbright scolar at Caltech for a doctorate which was awarded in 1992. After spending four years as the Royal Society Smithson Research Fellow at Darwin College, Cambridge, he became a lecturer in the Department of Physics of the University of Cambridge in 1995, where he is now a Professor. He has published papers on Bayesian methods for adaptive models, on the application of neural network methods to industrial data modelling problems, on language modelling and protein sequence modelling, on cryptanalysis and coding theory, and on Hebbian learning. He is curently writing a textbook on information theory, inference and learning algorithms. \end{biography} \begin{biography}{David J.C. MacKay} was born in Britain in 1967. He received the B.A. degree in Natural Sciences (Physics) from Trinity College, Cambridge in 1988. He then studied as a Fulbright scolar at Caltech for a doctorate which was awarded in 1992. After spending four years as the Royal Society Smithson Research Fellow at Darwin College, Cambridge, he became a lecturer in the Department of Physics of the University of Cambridge in 1995, where he is now a Professor. He has published papers on Bayesian methods for adaptive models, on language modelling and protein sequence modelling, on cryptanalysis and coding theory, and on human-computer interfaces; and a textbook on {\em Information theory, Inference and Learning Algorithms}. \end{biography}
Name: David J. C. MacKay Title: Prof. Affiliations: Does this mean membership of learned societies? If so, none. If this means `whom do I work for' then it's Department of Physics University of Cambridge I am also a Fellow of Darwin College, Cambridge. address: Cavendish Laboratory, Madingley Road, Cambridge CB3 0HE. U.K. email: mackay@mrao.cam.ac.uk www: http://www.inference.org.uk/mackay/ tel: +44 1223 339852
My first research work was in 1986 at RSRE Malvern, where I was given the task of testing high precision digitizers statistically, by putting in uniformly distributed random voltages and looking at the distribution of digital read-outs.
That summer I sent a contribution to an Inst of Physics magazine giving a solution to the problem of constructing a spiral mirror. Other topics that amused me at that time were
I presented this work as a poster at a meeting at RSRE late in 1987. I also showed a poster on the idea of exploring constant energy surfaces in weight space for neural networks with more parameters than data points.
In 1988 I went to Caltech. My first research project was to investigate what was going on in Linsker's models of Hebbian learning. After a couple of months of work I teamed up with Ken Miller and published a series of papers on this.
I also did some work with Seymour Benzer on the expression of beta-galactosidase in Drosophila which had had a p-element inserted in the genome. I looked at polka-dot expression patterns in the brains of larvae and adults and found a few interesting patterns, including one strain in which the stained cells appeared to be associated with the optic chiasm of the adult.
Someone in Benzer's group showed me the paper by Delbruck on estimation of a mutation rate in an exponentially growing bacterial population. Their estimator was clearly unreliable, but at the time I still had not fully absorbed Bayesian ideas, and all I did was suggested a collection of alternative estimators, some of which had better sampling properties.
I thought about Bayesian methods and Maximum Entropy on and off. I was aware of a difficulty with setting `alpha' and I came up with a derivation of `alpha=1' which Skilling rightly shot down as nonsense.
I came up with a maximum entropy / Bayesian viewpoint for Hopfield networks and solved a special case (a cycle-free network); I erroneously believed I had solved a more general case for a time.
At this time I did know that Bayesian methods could help with complexity control and model comparison. In 1990, and maybe earlier, I decided to write a review paper explaining how minimum description length and Bayesian methods were equivalent (if MDL was done carefully), so I wrote a paper on Occam factors. I also wrote software to do mixture-of-Gaussians modelling, and density modelling with some other distributions. For each model I fitted a Gaussian approximation to the posterior and computed the evidence.
Attending Maxent 90 was a key moment for me; there I consolidated what I knew about Bayesian complexity control, and David Robinson told me a lot of details about the hierarchical modelling for image reconstruction.
At the end of 1990 I decided to implement Bayesian methods for neural networks (something I had been telling mlp users to do for some time). By April 1991 I had written two papers on regression problems (`Bayesian Interpolation' and `A Practical Bayesian Framework...') which I presented at Snowbird (my first talk).
I decided to do classifiers next, and a friend encouraged me to do something on active data selection. My data selection chapter, which I rattled out rather hastily, contained errors and failures to cite the literature which were fortunately caught (with the help of a referee) before publication of my thesis.
Radford Neal got in touch with me on April 17th 1990 and we started our long-lasting discussion of Bayesian methods for neural nets, and many other topics. I visited Toronto for the first time in November just before NIPS.
Maxent 91 in Seattle was an enjoyable meeting, with stimulating discussions with John Skilling, who was presenting his `Clouds' algorithm. I gave a talk at NIPS in 1991, and organized a workshop with Steve Nowlan.
In 1992, back in Cambridge, I worked on defending my thesis against the attacks of David Wolpert and others. I did this by writing review papers, writing a paper on optimization versus `integrating out' hyperparameters, and entering the ASHRAE prediction competition using my software. I also looked into the immense size of chromatic aberration in the human eye.
At Snowbird in 1992, Peter Brown told me how IBM's `smoothing' method works for language modelling, and I worked out a Bayesian hierarchical model alternative.
I also started working on latent variable models for proteins. (Density networks).
In 1993, my paper on the ASHRAE prediction competition (which I won, using `Automatic Relevance Determination'), and my paper with Radford on Automatic Relevance Determination were rejected by NIPS. I don't recall now whether these were my first rejections; there were indeed other rejections at this time, for example, rejections of grant applications; I still find it irritating how good ideas can be hindered at the refereeing stage just because presentational hoops haven't been jumped through.
In 1993 I was asked to look at a cryptanalysis problem; this brought me into information theory and coding theory. I came up with a variational free energy minimization algorithm motivated by Cheeseman's description of solving the colouring problem at Maxent 90. This marked the start of my work on `ensemble adapting' methods (also known as variational methods), which I was taught about by Radford.
Radford Neal and I then discussed how to use this solution to solve new problems. We were interested in the challenge of getting closer to the Shannon limit for error-correcting codes. We invented some new codes, of which MN codes seemed especially interesting - using a sparse source to introduce redundancy, instead of extra parity bits. In July we came up with the belief propagation decoding method. In November 1994 we came up with Gallager's codes. In December 1994 we realised we had rediscovered Gallager's 1962 work. Along the way, I implemented and tested MN codes that became known as `repeat-accumulate codes' in due course; I didn't publish this construction because I detected that these codes had small numbers of low-weight codewords, and I was a perfectionist; I only wanted to publish codes that had no detectable low-weight codewords. This may have been a mistake.
In 1995, I wrote a paper on Ensemble Learning and Evidence Maximization.
In 1996, Sejnowski spoke about `ICA'. Shortly thereafter, I wrote down a (much simpler?) maximum likelihood derivation of ICA. In Erice in 1996, I had the idea of combining Jordan and Jaakkola's variational methods with Gaussian process classifiers and got Mark Gibbs to implement it.
In 1997, I wrote a paper on Ensemble Learning for Hidden Markov Models.
Meanwhile, back in the error-correcting code business, my work with Radford led a revival of interest in Gallager's codes, and in 1999, Gallager received a gold medal at the information theory symposium. Matthew Davey and I came up with some enhancements to Gallager codes that turned them into record-breakers. We demonstrated that Gallager codes could outperform Reed-Solomon codes, and IBM are now considering using Gallager codes in disc drives.
From 1995 to 2003 I wrote a 640-page textbook on Information Theory, Inference, and Learning Algorithms; this started out as a tiny, elegant(?) 8 chapters for an 8-lecture course on Information Theory; the book gradually expanded as I taught a 16-lecture course in the Physics department, which motivated the addition of chapters on neural networks, variational methods and Monte Carlo methods (borrowing heavily from expositions by David Spiegelhalter, Wally Gilks, and Radford Neal). The book's growth was partly driven by complaints that the brief exposition was too brief, so I felt obliged to fill in omitted steps and arguments. Guess what? I then received complaints about the filled-in steps and arguments, so those had to be expanded too. That was one form of tumour that hindered completion of the book. The book also grew because of my lack of self-control: I recklessly added new topics to the book. For me, everything is connected, and it was great fun to include all the things I was interested in - for example, my paper on evolution, sex, and information theory: rather than go through the inevitable tribulations of submitting it to a conventional journal, hey, just slip it in the book! The third reason the book grew and took 8 years was that, by chance, I wrote it at a time of great change in Information Theory. When I started the book, I wrote about the state of the art in error-correction, and that was JPL's hugely expensive Galileo code. By the time I finished, Matthew Davey's and my codes had matched Galileo, as had Repeat-Accumulate codes and Turbo codes. Another revolution at the same time was the invention of Digital Fountain Codes, which I was able to squeeze in as a final chapter just before sending the book to the publisher. All these new developments belonged perfectly in the book, since the overarching theme of the book was the connections between machine learning and information theory.
Since 1999, a major new research project has been the development in my group of Dasher, a keyboard alternative that is intended to be information-efficient, by both making more efficient use of human gestures (so that only one finger or one eye is needed to communicate rapidly), and making use of integrated language models that exploit the predictability of one's language.
In December 2000, my research group won Hopfield and Brody's `mouse brain' competition.
In 2001, I started working with Graeme Mitchison on transferring successful ideas from the field of classical error correcting codes to the next-door field of quantum error-correction.
In 2002 I co-organized the Darwin College Lecture Series with Alan Blackwell. (`Power'). I learned to use metapost and wished I had used it for all the figures in my book.
In 2002, the Gatsby charitable foundation gave me a Senior Research Fellowship to allow me to devote more time to research. Fingers crossed! New interests include computation using spike-timing in networks of spiking networks, and Go-playing algorithms.
In 2003 I finished my textbook on Information theory, Inference, and Learning Algorithms.
I continued to spend lots of time managing the Dasher project and increasing the number of languages supported by Dasher.
In 2003 I developed with my brother, Robert, an explicit theory of how biological systems such as actin/myosin convert chemical energy efficiently into work. I also became involved in AIMS, a new institute in South Africa providing a one-year course for African graduates in mathematical sciences. I spent roughly 8 weeks per year there in academic years 2003-4, 4-5, and 5-6.
In 2003 I made a useable breath-mouse and demonstrated that writing at 12 words per minute was possible by breath alone. About this time, Chris Ball and I developed several versions of `Button Dasher' described in the paper Efficient communication through one or two buttons.
In 2004 I had an idea about how the brain works, which I call Distributed Phase Codes.
In 2005 Alan Blackwell and I started a joint project called talks.cam, aimed at reducing the labour of being a seminar-series organizer, and enhancing cross-disciplinary interactions in the university. We were joined by Phil Cowans and Duncan Simpson and Tom Counsell, and made a system which we hope will be used by all research groups in the university.
In 2005 I became interested in the global energy crisis and started writing a popular book tentative titled `You Figure It Out!'
In 2005 we made a first prototype of a single-button version of Dasher. I can write at 10 words per minute with one button.
In 2005 I attended 'Closing the Gap' and invented a way in which someone totally blind and having the control of only one or two buttons could plausibly communicate using a Dasher-like system. I made a prototype called VIDasher.
In 2005, Tadashi Tokieda introduced me to a visual illusion called `anti-Glass' or the Trefethen effect. I spent some time working on this illusion.
As of 2006, my current research activities include Information-efficient Sorting algorithms, Reinforcement learning, Monte Carlo methods. Also, an old-project on the counting of crosswords.