# Occam’s Razor, Leprechauns and The Search for Truth

William of Occam (also spelled William of Ockham) was a 14^{th} century English philosopher. At the heart of Occam’s philosophy is the principle of simplicity, and Occam’s Razor has come to embody the method of eliminating unnecessary hypotheses. Essentially, Occam’s Razor holds that the theory which explains all (or the most) while assuming the least is the most likely to be correct. This is the principle of parsimony – explain more, assume less. Put more elegantly, it is the principle of ‘pluritas non est ponenda sine necessitate’ (plurality must never be posited beyond necessity).

Empirical support for the Razor can be drawn from the principle of ‘overfitting.’ In statistics, ‘overfitting’ occurs when a statistical model describes random error or noise instead of the underlying relationship. Overfitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. Critically, a model that has been overfit will generally have poor predictive performance, as it can exaggerate minor fluctuations in the data. For example, a complex polynomial function might after the fact be used to pass through each data point, including those generated by noise, but a linear function might be a better fit to the signal in the data. By this we mean that the linear function would predict new and unseen data points better than the polynomial function, although the polynomial which has been devised to capture signal and noise would describe/fit the existing data better.

Turning now to ‘ad hoc’ hypotheses and the Razor. In science and philosophy, an ‘ad hoc hypothesis’ is a hypothesis added to a theory in order to save it from being falsified. Ad hoc hypothesising is compensating for anomalies not anticipated by the theory in its unmodified form. For example, you say that there is a leprechaun in your garden shed. A visitor to the shed sees no leprechaun. This is because he is invisible, you say. He spreads flour on the ground to see the footprints. He floats, you declare. He wants you to ask him to speak. He has no voice, you say. More generally, for each accepted explanation of a phenomenon, there is generally an infinite number of possible, more complex alternatives. Each true explanation may therefore have had many alternatives that were simpler and false, but also approaching an infinite number of alternatives that are more complex and false.

This leads us the idea of what I term ‘Occam’s Leprechaun.’ Any new and more complex theory can always be possibly true. For example, if an individual claims that leprechauns were responsible for breaking a vase that he is suspected of breaking, the simpler explanation is that he is not telling the truth, but ongoing ad hoc explanations (e.g. “That’s not me on the CCTV, it’s a leprechaun disguised as me) prevent outright falsification. An endless supply of elaborate competing explanations, called ‘saving hypotheses’, prevent ultimate falsification of the leprechaun hypothesis, but appeal to Occam’s Razor helps steer us toward the probable truth. Another way of looking at this is that simpler theories are more easily falsifiable, and hence possess more empirical content.

All assumptions introduce possibilities for error; if an assumption does not improve the accuracy of a theory, its only effect is to increase the probability that the overall theory is wrong.

It can also be looked at this way. The prior probability that a theory based on n+1 assumptions is true must be less than a theory based on n assumptions, unless the additional assumption is a consequence of the previous assumptions. For example, the prior probability that Jack is a train driver must be less than the prior probability that Jack is a train driver AND that he owns a Mini Cooper, unless all train drivers own Mini Coopers, in which case the prior probabilities are identical.

Again, the prior probability that Jack is a train driver and a Mini Cooper owner and a ballet dancer is less than the prior probability that he is just the first two, unless all train drivers are not only Mini Cooper owners but also ballet dancers. In the latter case, the prior probabilities of the n and n+1 assumptions are the same.

From Bayes’ Theorem, we know that reducing the prior probability will reduce the posterior probability, i.e. the probability that a proposition is true after new evidence arises.

Science prefers the simplest explanation that is consistent with the data available at a given time, but even so the simplest explanation may be ruled out as new data become available. This does not invalidate the Razor, which does not state that simpler theories are necessarily more true than more complex theories, but that when more than one theory explains the same data, the simpler should be accorded more probabilistic weight.

The theory which explains all (or the most) and assumes the least is most likely. So Occam’s Razor advises us to keep explanations simple. But it is also consistent with multiplying entities necessary to explain a phenomenon. A simpler explanation which fails to explain as much as another more complex explanation is not necessarily the better one. So if leprechauns don’t explain anything they cannot be used as proxies for something else which can explain something. This is the classic riposte to the materialist who holds that there is nothing beyond what we observe in the natural or material world. If a non-materialist explanation better explains the origin of the universe, for example, that explanation may be true and consistent with Occam’s Razor. I explore this issue separately in my blog – ‘Why is there Something Rather than Nothing? A Solution’.

More generally, we can now unify Epicurus and Occam. From Epicurus’ Principle we need to keep open all hypotheses consistent with the known evidence which are true with a probability of more than zero. From Occam’s Razor we prefer from among all hypotheses that are consistent with the known evidence, the simplest. In terms of a prior distribution over hypotheses, this is the same as giving simpler hypotheses higher a priori probability, and more complex ones lower probability.

From here we can move to the wider problem of induction about the unknown by extrapolating a pattern from the known. Specifically, the problem of induction is how we can justify inductive inference. According to Hume’s ‘Enquiry Concerning Human Understanding’ (1748), if we justify induction on the basis that it has worked in the past, then we have to use induction to justify why it will continue to work in the future. This is circular reasoning. This is faulty theory. “Induction is just a mental habit, and necessity is something in the mind and not in the events.” Yet in practice we cannot help but rely on induction. We are working from the idea that it works in practice if not in theory – so far. Induction is thus related to an assumption about the uniformity of nature. Of course, induction can be turned into deduction by adding principles about the world (such as ‘the future resembles the past’, or ‘space-time is homogeneous.’) We can also assign to inductive generalisations probabilities that increase as the generalisations are supported by more and more independent events. This is the Bayesian approach, and it is a response to the perspective pioneered by Karl Popper. From the Popperian perspective, a single observational event may prove hypotheses wrong, but no finite sequence of events can verify them correct. Induction is from this perspective theoretically unjustifiable and becomes in practice the choice of the simplest generalisation that resists falsification. The simpler a hypothesis, the easier it is to be falsified. Induction and falsifiability are in practice, from this viewpoint, is as good as it gets in science. Take an inductive inference problem where there is some observed data and a set of hypotheses, one of which may be the true hypothesis generating the data. The task then is to decide which hypothesis, or hypotheses, are the most likely to be responsible for the observations.

A better way of looking at this seems to be to abandon certainties and think probabilistically. Entropy is the tendency of isolated systems to move toward disorder and a quantification of that disorder, e.g. assembling a deck of cards in a defined order requires introducing some energy to the system. If you drop the deck, they become disorganised and won’t re-organise themselves automatically. This is the tendency in all systems to disorder. This is the Second Law of Thermodynamics, which implies that time is asymmetrical with respect to the amount of order: as the system, advances through time, it will statistically become more disordered. By ‘Order’ and ‘Disorder’ we mean how compressed the information is that is describing the system. So if all your papers are in one neat pile, then the description is “All paper in one neat pile.” If you drop them, the description becomes ‘One paper to the right, another to the left, one above, one below, etc. etc.” The longer the description, the higher the entropy. According to Occam’s Razor, we want a theory with low entropy, i.e. low disorder, high simplicity. The lower the entropy, the more likely it is that the theory is the true explanation of the data, and hence that theory should be assigned a higher probability.

More generally, whatever theory we develop, say to explain the origin of the universe, or consciousness, or non-material morality, must itself be based on some theory, which is based on some other theory, and so on. At some point we need to rely on some statement which is true but not provable, and so we think may be false, although it is actually true. We can never solve the ultimate problem of induction, but Occam’s Razor combined with Epicurus, Bayes and Popper is as good as it gets if we accept that. So Epicurus, Occam, Bayes and Popper help us pose the right questions, and help us to establish a good framework for thinking about the answers.

At least that applies to the realm of established scientific enquiry and the pursuit of scientific truth. How far it can properly be extended beyond that is a subject of intense and continuing debate.

Further Reading and Links

Bayes’ Theorem: The Most Powerful Equation in the World. https://leightonvw.com/2017/03/12/bayes-theorem-the-most-powerful-equation-in-the-world/

Why is there Something Rather than Nothing https://wordpress.com/post/leightonvw.com/639