# I've Been Stretching My Mouth

June 12, 2013

My friend Brian posted on social media:

On a brief break from proposal-ing, I was pondering the statement in computer science that “a function is entirely defined by its inputs and outputs.” At first I thought, “yep, that sounds about right. In the sort of idea of function-as-mapping, two identical mappings should be the same function. I think. But then I thought, “but I can write two functions to compute the fibonacci sum up to a given integer, and they may have vastly different orders of computation time. And, at least I think in principle, I might be able to write two algorithms with the same order of computation time but different memory needs. Are those all really the same function? Is the input-output mapping sufficiently descriptive to stand for the entire DNA of a function?”

I’m not sure if all of the words I wrote were strictly necessary to convey my point, but I love computational complexity and I found that once I got going I was making connections that weren’t obvious to me when I simply stated what my opinion was.

Let’s set aside the issue of side effects and state variables and deal with pure functions. We know we can always rewrite a program that has state and functions with side effects into a program that is composed of pure functions that are stateless (though perhaps might involve closures). Finally, let’s also set aside uniqueness issues by demanding that the inputs to our functions belong to some set that is at most countably infinite, so that the pairs are enumerable and we don’t have to worry.

With these caveats aside, to me, the issue raised strikes at the core of distinguishing definability, computability, and complexity. Any mathematical mapping for which a computer scientist can write a program (lambda calculus, pseudo code, C++, whatever) is called computable.

I cannot in any way tell you a non-definable function. If I could tell you (or even come up with one myself), I’d have defined it. A definable function is just some big list of (input output) pairs. But definability is important because not all definable functions are computable. I’m not sure definability is the right technical term. But whatever.

It’s hard to think of specific mathematical functions that are definable but aren’t computable; the canonical one the Halting Problem: there is no function that can take a description of a Turing machine and an initial input tape and tell you if the machine halts or loops forever. So, even though there exists an (input, output) pairing for this function ({turing machine description, tape-input}, output-or-LOOPS_FOREVER) that entirely defines the function, nobody knows how to mechanically generate such a list of pairs. So while the list of (input, output) pairs is enough, in a mathematical sense, to entirely define a function, that function might not be computable. Another example of a well-defined but uncomputable function is the busy beaver function. It’s also merely a list of (input, output) pairs, but it is known that no algorithm can tell you how to generate such a list (because you could then solve the Halting Problem). See also Scott Aaronson’s essay Who Can Name the Bigger Number?

That doesn’t mean that no such list exists. Such a list of pairs certainly exists—we even know the first few pairs. If some oracle handed us a list of busy beaver numbers, that would be quite interesting. For two reasons: first, how did it get the list?An out-of-bounds question for both computational and mystical oracles Second, let’s run a theorem-proving program on all sorts of stuff–either we’ll find a proof or know the input string we fed in is not a theorem. Creating the list of the busy beaver numbers is at least as hard as doing all of mathematics.

The fibonacci sequence is obviously definable. In the sense of computability, Fib(n) doesn’t care about how long you’ve got or how much hard drive space you need or whatever, so your different implementations of Fib(n) all show that Fib is computable equally well.

The computational complexity of Fib(n) is not a property of the list of (input, output) pairs. Indeed, asking what the computational complexity of a function is sort of assumes you don’t have such a list, because otherwise there’s always an algorithm that’s very fast: look up the output in the list! Fastness is under the assumption that the list is ordered by the inputs, but the point remains. Let’s exclude this algorithm from consideration. One reason is that that algorithm doesn’t give a finite upper bound on Kolmogorov complexity, which means, in my opinion, that even though you wrote down an algorithm you didn’t grok the function.

Instead complexity is a property of the algorithms themselves. Assuming that the cost of the arithmetic is constant even for large numbers and we have enough floating point precision, the obvious recursive method is exponentially expensive, the iterative method linear in n, and the best method actually only costs $\log(n)$. People might say something like “Fib(n) takes $\log(n)$” which can be desugared into “The Best Possible Implementation® of Fib(n) takes $\log(n)$” where ® is a special form conveying the agreed-upon caveat “aside from already having the answer and just looking it up”. I always know how to make a function take longer

(define slowfunc (lambda (n) (begin (pause (anyPositiveFunctionOf n)) (func n))))


but unfortunately I don’t know how to develop a working implementation of (pause (anyNegativeFunctionOf n)); indeed, some levels of the complexity hierarchy collapse if you have access to a closed timelike curveClosed timelike curves can cause other concerns, because you can wind up with paradoxes such as the grandfather paradox. Interestingly, quantum mechanics cures these issues! . What I’m driving at is that if you want to define the complexity of a function itself and not just an algorithm, one sensible way is to ascribe it the minimum over all possible correct algorithms.

If you’re a mathematical constructivist, you might object to the idea that the busy beaver exists in any meaningful sense, since nobody can tell you how to make such a list. If you’re even more “practical” than a constructivist you might decide that only functions whose minimum computational complexity is small enough are useful/good/handy/etc. In particular functions whose min complexity are polynomial in the size of the input are considered tractable. These problems form the class FP. Under polynomial time reductions, these functions are all equivalent. For functions that are easier than FP, you’ve got to use reductions that take less time, but for functions that are harder than FP, continuing to use polynomial reductions seems to make sense since such reductions are fast to make.

SO FINALLY let me get to the question about tuples like ((input/output pairs) big-O space-requirements). First, the only thing I can meaningfully imagine adding to this list to characterize an algorithm is how many random bits are required as a function of n—which is not quite the same as allowing for state, though it kind of feels similarly disruptive to the notion of a pure function. Even pure functions, though, can have probabilistic algorithms that compute them. You might also want to characterize how the algorithm parallelizes.

I would say that the only really meaningful function-describing tuples are tuples like ((input/output pairs) complexity-class), while the only meaningful algorithm-describing tuples are tuples like ((i/o) big-O space randomness). Two algorithms that are the same in big-O are, in broad strokes, usually implementing the same idea, while two functions differing in big-O might be in the same complexity class but are nonetheless really different ways of thinking about the problem. A great example is sorting. Insertion and selection sort are the same $\mathcal{O}(n^2)$ and really embody the same way of thinking: go get the next one from the remaining ones. Contrast them with merge sort $\mathcal{O}(n \log n)$ and radix sort $\mathcal{O}(\text{numDigits} * n)$ which are qualitatively different thinking: divide-and-conquer and wait-what-oh-that’s-cool. So even though these algorithms are all poly time, I’d classify the first two as the same and the others as different.

June 12, 2013