09 December, 2020

Most Functions with Predictive Power in the Natural Sciences are Non-Differentiable

Epistemic status: highly uncertain.

Recently, Spencer Greenberg posted three useful facts about mathematics:

This generated a bit of discussion on facebook:

Here's the most useful mathematics I can fit in 3 short paragraphs (see image). -- Note: each week, I send out One...

Posted by Spencer Greenberg on Friday, December 4, 2020

In one of the comment threads, I put forward what I thought to be an uncontroversial thought: that although it is true that most useful mathematics in the natural sciences are differentiable, this is not because the useful math stuff happens to also be differentiable, but instead because we can (mostly) only make sense of the differentiable stuff, so that's the stuff that we find useful. This is a weak anthropic argument that merely makes the statement partially vacuous. (It's like saying I, who reads only English, find that most useful philosophy to me is written in English. It's true, but not because there is a deep relationship between useful philosophy and it being written in English.)

It turns out that this was not considered an uncontroversial thought:


However, I also received a number of replies that indicated that I did a poor job of explaining my position in facebook comments. (And I wanted to ensure that I wasn't making some critical mistake in my thinking after hearing so many capable others dismiss the idea outright.) To fix this, I decided to organize my thoughts here. Please keep in mind that I'm not certain about the second section on math in the natural sciences at all (although I think the first section on pure math is accurate), and in fact I think that, on balance, I'm probably wrong about this essay's ultimate argument. But whereas my confidence level is maybe around 20% for this line of thinking, I'm finding that others are dismissing it completely out of hand, and so I find myself arguing for its validity, even if I personally doubt its truth. (In the face of uncertainty, we need not take a final position (unless it's moral uncertainty), but we should at least understand other positions enough to steelman them.)


Mathematics Encompasses More Than We Can Know

Before we talk about the natural sciences, let's look at something simpler: really big numbers. When it comes to counting numbers, it's relatively easy to think of big ones. But, unless you're a math person, you may not fully comprehend just how big they can get. It's easy to say that the counting numbers go on forever, and that they eventually become so large that it becomes impossible to write them down. Yet it's actually stranger than that: they eventually get to be so big that they can't be thought of (except expressed in this way). As a simple example, consider that there exists a least big whole number such that it can't, even in principle, be thought of by a human being. Graham's number, for example, is big enough that if you were somehow able to hold the base ten version of it in your brain, the sheer amount of information held would mean that your brain would quite literally implode. Yet we can still talk about it; I just did earlier, when I called it Graham's number. The thing is: the counting numbers keep going, so eventually you can reach a number so high that its informational content cannot be expressed without exceeding the maximum amount of entropy that the observable universe can hold.

Opening one's eyes to this helps with the following realization: not all numbers are nameable. Somehow, despite being an amateur interested in math for most of my life, having thought I understood Cantor's diagonal argument after reading through it several times, teaching it to others several times, and talking about it several times, I recently learned that I had skipped understanding something basic about it that wasn't made clear to me before:

Scott Aaronson's excellent explanation on this really hits home. The parts of the number line that we can name are but countable dust among the vast majority of points that we have no way of writing down in any systematic way. We can only vaguely point toward them when making mathematical arguments and can only really make basic (unappended) statements that either apply to zero, one, or an infinite amount of them at once. We can, for example, say that a random real number picked between 0 and 1 has certain properties, but if we try to say which number it is, we must use some kind of systematic method to point it out, like 1/3 = 0.3 repeating.

Something similar is true when it comes to functions. Most functions, by far, are not nameable. They are relations between sets that don't follow any pattern that makes sense to humans. For a finite example, consider the set X:{a,b,c} as a domain and Y:{d,e,f} as the range. We can construct a function f()that maps X➝Y in pretty much any way we please. Each function we create this way is nameable, but only because it is finite. Imagine instead doing this for an infinite field, with each input going to a random output. Out of all possible functions mapping ℝ to itself, almost none are continuous, and thus almost none are differentiable. Almost all of them are not even constructable in any systematic way. They are, ultimately, not really understandable by us humans right now, which is why we don't really have people doing math work on those topics at all.


Mathematics in the Natural Sciences

So far, we've established that, in pure mathematics at least, the vast majority of functions are not understandable by humans today. Thankfully, we understand a lot about differentiable functions (and some others that are easily constructable, like additions of multiple different differentiable functions separated by kinks, stepwise functions, &c.). As has been pointed out previously, the natural world uses differentiable functions all over the place. Modern physics is awash with these types of functions, and they all do an extraordinary job, giving us an amazing amount of predictive power across the spectrum from the very large to the very small. Nothing in what I'm about to say can take anything away from that in the least.

But it occurs to me that although it is uncontestedly true that almost all the useful-to-us functions governing the natural world around us are also differentiable functions, it may be that this is true for anthropic reasons, not because of some underlying feature of ultimately-useful-functions-in-the-natural-sciences themselves.

I'm not at all sure that this would actually be true, but it doesn't seem to contradict anything I know if to suppose that there may be a great many functions governing the natural world that aren't differentiable, and that the only reason we don't use them in the natural sciences is because we can't currently understand them. They are uncountable dust, opaque to us, even if, one day, our understanding of mathematics and natural science improves enough so that may eventually use these functions to make predictions in just the same way that we currently use differentiable functions. In short: the reason why almost all useful functions are differentiable is because we really only can usefully read differentiable functions. It is not (necessarily) that the useful functions in the natural world all happen to be differentiable.


Ockham's Razor

One counterargument given to me in the facebook thread involves Ockham's razor:


They are saying that while there may be no reason that this supposition might be true, we shouldn't think that it is true because, by Ockham's razor, we should prefer the hypothesis that doesn't include these extra not-yet-discovered non-differentiable functions that have predictive power over the natural world.

Before I respond to this, I feel that I have to first look more closely at what Ockham's razor actually does. I'll quote myself from Why Many Worlds is Correct:

The law of parsimony does not refer to complexity in the same way that we use the word in common usage. Most of the time, things are called "complex" if they have a bunch of stuff in them, and "simple" if they have relatively less stuff. But this cannot possibly be what Occam's razor is referring to, since we all gladly admit that Occam's Razor does not imply that the existence of multiple galaxies is less likely to be true than just the existence of the Milky Way alone.

Instead, the complexity referred to in Occam's razor has to do with the number of independent rules in the system. Once Hubble measured the distance to cepheid variables in other galaxies, physicists had to choose between a model where the laws of physics continue as before and a model where they added a new law saying Hubble's cepheid variables measurements don't apply. Obviously, the model with the fewer number of physical laws was preferable, given that both models fit the data.

Just because a theory introduces more objects says nothing about its complexity. All that matters is its ruleset. Occam's razor has two widely accepted formulations, neither of which care about how many objects a model posits.

Solomonoff inductive inference does it by defining a "possible program space" and giving preference to the shortest program that predicts observed data. Minimum message length improves the formalism by including both the data and the code in a message, and preferring the shortest message. Either way, what matters is the number of rules in the system, not the the number of objects those rules imply.

What's relevant here is that while it is true that this argument is introducing vast new entities in the form of currently ununderstandable functions that may have predictive power, it is not introducing a new rule in doing so. Those ununderstandable functions certainly do exist; they're just not studied because studying them wouldn't be useful. So the question is: does saying that they might have predictive power introduce a new hypothesis? Or does it make more sense to say that of course some of them have predictive power; we just can't use them to predict things because we don't understand those functions. If the former, then Ockham's razor would act against this supposition; if the latter, then Ockham's razor would act against those who would claim that these functions can't have predictive power.

It's unclear to me which of these is the case. I don't want to play reference class tennis about this, but the latter certainly feels true to me. The analogous Borges' Library of Babel certainly shows that an infinite number of these non-differentiable real-world functions will have predictive power (though maybe not explanatory power?), but this isn't sufficient to say that MOST functions with predictive power are non-differentiable. I think that probably most functions with predictive power are in fact differentiable -- but I'm not at all certain about this, and that's why I'm arguing for that side. I think that others are wrong to so quickly dismiss the idea that most functions with predictive power might be non-differentiable. They're probably correct in thinking that it's wrong, but the certainty with which they think it is wrong seems very off to me. Hopefully, after reading this blog post you might agree.


edit on 10 December 2020: Neural Nets

Originally I ended this blog post with the previous paragraph, but Ben West points out that neural nets have a black box that uses functions very like what I've described to make actual real-world predictions:


My confidence in this idea has increased upon realizing that there already exist at least some functions for which we do not know if they are differentiable or not that definitely have predictive power. It's important to point out that it's still possible that this thesis is wrong; it may be the the black box functions that neural nets find are all differentiable, and, in fact, that even still seems likely to me, but I definitely now give more credence to the idea that some might not be.

No comments:

Post a Comment