V1

2022/01/07阅读：46主题：红绯

# Big-Oh Notation 大 oh 符号

What does it mean to be efficient? Cornell professors Bobby Kleinberg and Eva Tardos have a wonderful explanation in chapter 2 of their textbook, Algorithm Design (2006). This appendix is a summary and reinterpretation of that explanation from a functional programming perspective. The ultimate answer will be that an algorithm is efficient if its worst-case running time on input size is for some constant . But it will take us several steps to build up to that definition.

## Algorithms and Efficiency, Attempt 1 算法和效率，第一次尝试

A naive attempt at defining efficiency might proceed as follows:

Attempt 1: An algorithm is efficient if, when implemented, it runs in a small amount of time on particular input instances.

But there are many problems with that definition, such as:

• Inefficient algorithms can run quickly on small test cases.

低效的算法可以在小测试用例上快速运行。

• Fast processors and optimizing compilers can make inefficient algorithms run quickly.

快速处理器和优化编译器可以使低效算法快速运行。

• Efficient algorithms can run slowly when coded sloppily.

有效的算法可以运行缓慢时，编码马虎。

• Some input instances are harder than others.

有些输入实例比其他实例更难。

• Efficiency on small inputs doesn't imply efficiency on large inputs.

小投入的效率并不意味着大投入的效率。

• Some clients can afford to be more patient than others; quick for me might be slow for you.

有些客户可以比其他客户更有耐心; 对我来说快一点可能对你来说慢一点。

Lesson 1: One lesson learned from that attempt is: time measured by a clock is not the right metric for algorithm efficiency. We need a metric that is reasonably independent of the hardware, compiler, other software that is running, etc. Perhaps a good metric would be to give up on time and instead count the number of steps taken during evaluation.

But, now we have a new problem: how should we define a "step"? It needs to be something machine independent. It ought to somehow represent a primitive unit of computation. There's a lot of flexibility. Here are some common choices:

• In pseudocode, we might think of a step as being a single line.

在伪代码中，我们可能认为一个步骤是一行。

• In imperative languages, we might count assignments, array indexes, pointer dereferences, and arithmetic operations as steps.

在命令式语言中，我们可以将赋值、数组索引、指针取消推理和算术运算计算为步骤。

• In OCaml, we could count function or operator application, let bindings, and choosing a branch of an if or match as steps.

在 OCaml 中，我们可以将函数或运算符应用程序、 let 绑定和选择 if 或 match 的分支计算为步骤。

In reality, all of those "steps" could really take a different amount of time. But in practice, that doesn't really matter much.

Lesson 2: Another lesson we learned from attempt 1 was: running time on a particular input instance is not the right metric. We need a metric that can predict running time on any input instance. So instead of using the particular input (e.g., a number, or a matrix, or a text document), it might be better to use the size of the input (e.g., the number of bits it takes to represent the number, or the number of rows and columns in the matrix, or the number of bytes in the document) as the metric.

But again we have a new problem: how to define "size"? As in the examples we just gave, size should be some measure of how big an input is compared to other inputs. Perhaps the most common representation of size in the context of data structures is just the number of elements maintained by the data structure: the number of nodes in a list, or the number of nodes and edges in a graph, etc.

Could an algorithm run in a different amount of time on two inputs of the same "size"? Sure. For example, multiplying a matrix by all zeroes might be faster than multiplying by arbitrary numbers. But in practice, size matters more than exact inputs.

Lesson 3: A third lesson we learned from attempt 1 was that "small amount of time" is too relative a term. We want a metric that is reasonably objective, rather than relying on subjective notions of what constitutes "small".

One sort-of-okay idea would be that an efficient algorithm needs to beat brute-force search. That means enumerating all answers one-by-one, checking each to see whether it's right. For example, a brute-force sorting algorithm would enumerate every possible permutation of a list, checking to see whether it's a sorted version of the input list. That's a terrible sorting algorithm! Certainly quicksort beats it.

Brute-force search is the simple, dumb solution to nearly any algorithmic problem. But it requires enumeration of a huge space. In fact, an exponentially-sized space. So a better idea would be doing less than exponential work. What's less than exponential (e.g., )? One possibility is polynomial (e.g., ).

An immediate objection might be that polynomials come in all sizes. For example, is way bigger than . And some non-polynomials, such as , might do an adequate job of beating exponentials. But in practice, polynomials do seem to work fine.

## Algorithms and Efficiency, Attempt 2 算法和效率，第二次尝试

Combining lessons 1 through 3 from Attempt 1, we have a second attempt at defining efficiency:

Attempt 2: An algorithm is efficient if its maximum number of execution steps is polynomial in the size of its input.

Note how all three ideas come together there: steps, size, polynomial.

But if we try to put that definition to use, it still isn't perfect. Coming up with an exact formula for the maximum number of execution steps can be insanely tedious. For example, in one other algorithm textbook, the authors develop this following polynomial for the number of execution steps taken by a pseudo-code implementation of insertion sort:

No need for us to explain what all the variables mean. It's too complicated. Our hearts go out to the poor grad student who had to work out that one!

That formula for running time of insertion sort is from *Introduction to
Algorithms*, 3rd edition, 2009, by Cormen, Leiserson, Rivest, and Stein. We
aren't making fun of them. They would also tell you that such formulas are too
complicated.


Precise execution bounds like that are exhausting to find and somewhat meaningless. If it takes 25 steps in Java pseudocode, but compiled down to RISC-V would take 250 steps, is the precision useful?

In some cases, yes. If you're building code that flies an airplane or controls a nuclear reactor, you might actually care about precise, real-time guarantees.

But otherwise, it would be better for us to identify broad classes of algorithms with similar performance. Instead of saying that an algorithm runs in

steps, how about just saying it runs in steps? That is, we could ignore the low-order terms and the constant factor of the highest-order term.

We ignore low-order terms because we want to THINK BIG. Algorithm efficiency is all about explaining the performance of algorithms when inputs get really big. We don't care so much about small inputs. And low-order terms don't matter when we think big. The following table shows the number of steps as a function of input size N, assuming each step takes 1 microsecond. "Very long" means more than the estimated number of atoms in the universe.

N=10 10 < 1 sec < 1秒 < 1 sec < 1秒 < 1 sec < 1秒
N=100 100 < 1 sec < 1秒 < 1 sec < 1秒 1 sec 1秒
N=1,000 1,000 < 1 sec < 1秒 1 sec 1秒 18 min 18分钟
N=10,000 10,000 < 1 sec < 1秒 2 min 2分钟 12 days 12天
N=100,000 100,000 < 1 sec < 1秒 3 hours 3小时 32 years 32年
N=1,000,000 1000000 1 sec 1秒 12 days 12天 104 years 104年

As you can see, when inputs get big, there's a serious difference between and and . We might as well ignore low-order terms, because they are completely dominated by the highest-order term when we think big.

What about constant factors? My current laptop might be 2x faster (that is, a constant factor of 2) than the one I bought several years ago, but that's not an interesting property of the algorithm. Likewise, steps in pseduocode might be steps in assembly (that is, a constant factor of 1000), but it's again not an interesting property of the algorithm. So, should we really care if one algorithm takes 2x or 1000x longer than another, if it's just a constant factor?

The answer is: maybe. Performance tuning in real-world code is about getting the constants to be small. Your employer might be really happy if you make something run twice as fast! But that's not about the algorithm. When we're measuring algorithm efficiency, in practice the constant factors just don't matter much.

So all that argues for having an imprecise abstraction to measure running time. Instead of , we can just write . Imprecise abstractions are nothing new to you. You might write to imprecisely abstract a quantity within 1. In computer science, you already know that we use Big-Oh notation as an imprecise abstraction: is .

## Big-Ell Notation Big-Ell 符号

Before reviewing Big-Oh notation, let's start with something simpler that you might not have seen before: Big-Ell notation.

Big-Ell is an imprecise abstraction of natural numbers less than or equal to another number, hence the L. It's defined as follows:

Big-Ell 是对小于或等于另一个数的自然数的不精确抽象，因此定义为 l:

where and are natural numbers. That is, represents all the natural numbers less than or equal to . For example,

.

Could you do arithmetic with Big-Ell? For example, what would be? It's not a well-posed question, to be honest: addition is an operation we think of being defined on integers, not sets of integers. But a reasonable interpretation of could be doing the addition for each element in the set, yielding . Note that

is a proper subset of , and the latter is . So we could say that . We could even say that , but it's not tight: the former subset relation included the fewest possible extra elements, whereas the latter was loose by needlessly including extra.

For more about Big Ell, see Concrete Mathematics, chapter 9, 1989, by Graham, Knuth, and Patashnik.

## Big-Oh Notation 大 oh 符号

If you understand Big-Ell, and you understand functional programming, here's some good news: you can easily understand Big-Oh.

Let's build up the definition of Big-Oh in a few steps. We'll start with version 1, which we'll write as . It's based on :

• represents any natural number that is less than or equal to a natural number .

L (n)表示任何小于或等于自然数 n 的自然数。

• represents any natural function that is less than or equal to a natural function .

O1(g)表示任何小于或等于自然函数 g 的自然函数。

A natural function is just a function on natural numbers; that is, its type is .

All we do with is upgrade from natural numbers to natural functions. So Big-Oh version 1 is just the higher-order version of Big-Ell. How about that!

Of course, we need to work out what it means for a function to be less than another function. Here's a reasonable formalization:

Big-Oh Version 1:

Big-Oh 版本1: O1(g) = { f | something n.f (n)≤ g (n)}

For example, consider the function that doubles its input. In math textbooks, that function might be written as . In OCaml we would write let g n = 2 * n or let g = fun n -> 2 * n or just anonymously as fun n -> 2 * n. In math that same anonymous function would be written with lambda notation as . Proceeding with lambda notation, we have:

and therefore

• ,

(λn.n)∈ O1(λn. 2 n) ,

• , but

(λnn2)∈ O1(λn. 2 n) ，但(λnn2)∈ O1(λn. 2 n)

• .

(λn. 3 n) something O1(λn. 2 n).

Next, recall that in defining algorithmic efficiency, we wanted to ignore constant factors. does not help us with that. We'd really like for all these functions:

• (λn. 2 n)

• (λn. 3 n)

to be in .

Toward that end, let's define to ignore constant factors:

Big-Oh Version 2:

Big-Oh 版本2: O2(g) = { f | something c > 0 something n.f (n)≤ cg (n)}

That existentially-quantified positive constant lets us "bump up" the function to whatever constant factor we need. For example,

and therefore , because

if we take , or , or any larger .

Finally, recall that we don't care about small inputs: we want to THINK BIG when we analyze algorithmic efficiency. It doesn't matter whether the running time of an algorithm happens to be a little faster or a little slower for small inputs. In fact, we could just hardcode a lookup table for those small inputs if the algorithm is too slow on them! What matters really is the performance on big-sized inputs.

Toward that end, let's define to ignore small inputs:

Big-Oh Version 3:

That existentially quantified positive constant lets us "ignore" all inputs of that size or smaller. For example,

and therefore , because

if we take and . Note how we get to ignore the fact that is temporarily a little too big at by picking . That's the power of ignoring "small" inputs.

## Big-Oh, Finished 大-哦，完成了

Version 3 is the right definition of Big-Oh. We repeat it here, for real:

Big-Oh:

Big-Oh: o (g) = { f | something c > 0 something n0 > 0 something n ≥ n0.f (n)≤ cg (n)}

That's the final, important version you should know. But don't just memorize it. If you understand the derivation we gave here, you'll be able to recreate it from scratch anytime you need it.

Big-Oh is called an asymptotic upper bound. If , then is at least as efficient as , and might be more efficient.

## Big-Oh Notation Warnings 大 oh 标记警告

Warning 1. Because it's an upper bound, we can always inflate a Big-Oh statement: for example, if , then also , and

, etc. But our goal is always to give tight upper bounds, whether we explicitly say that or not. So when asked what the running time of an algorithm is, you must always give the tightest bound you can with Big-Oh.

Warning 2. Instead of , most authors instead write . They don't really mean applied to . They mean a function parameterized on input but not yet applied. This is badly misleading and generally a result of not understanding anonymous functions. Moral of that story: more people need to study functional programming.

Warning 3. Instead of nearly all authors write . This is a hideous and inexcusable abuse of notation that should never have been allowed and yet has permanently infected the computer science consciousness. The standard defense is that here should be read as "is" not as "equals". That is patently ridiculous, and even those who make that defense usually have the good grace to admit it's nonsense. Sometimes we become stuck with the mistakes of our ancestors. This is one of those times. Be careful of this "one-directional equality" and, if you ever have a chance, teach your (intellectual) children to do better.

## Algorithms and Efficiency, Attempt 3 算法和效率，第三次尝试

Let's review. Our first attempt at defining efficiency was:

Attempt 1: An algorithm is efficient if, when implemented, it runs in a small amount of time on particular input instances.

By replacing time with steps, particular instances with input size, and small with polynomial, we improved that to:

Attempt 2: An algorithm is efficient if its maximum number of execution steps is polynomial in the size of its input.

And that's really a pretty good definition. But using Big-Oh notation to make it a little more concrete, we can produce our third and final attempt:

Attempt 3: An algorithm is efficient if its worst-case running time on input size is for some constant .

By "worst-case running time" we mean the same thing as "maximum number of execution steps", just expressed in different and probably more common words. The worst-case is when execution takes the longest. "Time" is a common euphemism here for execution steps, and is used to emphasize we're thinking about how long a computation takes.

Space is the most common other feature of efficiency to consider. Algorithms can be more or less efficient at requiring constant or linear space, for example. You're already familiar with that from tail recursion and lists in OCaml.

V1