This is only tangentially related, but one thing that I am still kind of hopeful about is analytic combinatorics will free us from the tyranny of asymptotic analysis.
I suppose this is kind of a hot take, and in some ways less relevant than it was (say) 10 years ago, but I believe the theoretical CS community has spent a lot of time analyzing the performance of various algorithms over the years and kind of pointedly not inventing algorithms which accomplish some result at all, even if it is only known to be computable in a pathologically inefficient way. This means a lot of breakthroughs that could have been TCS breakthroughs have been invented, usually worse, by other communities.
I am not the only one who thinks so, here[1] is an obscure talk where Michael Mitzenmacher (a well-known CS theory prof at Harvard) makes more or less this specific complaint.
One of the nice things about Flajolet/Sedgewick[2] (mentioned in this post) is that it gives a nice set of combinators for (at least it seems to me, in theory) precisely enumerating how many "steps" an algorithm would take given some input. I optimistically imagine a world where we replace the relatively non-specific asymptotic bounds of an algorithm with a very specific quantification of the work an algorithm will take. And that this in turn would free TCS to focus on things that are more germane to the field as a whole.
With that all said I'm open to the argument that TCS has fundamentally changed and this is no longer such a big issue. But perusing the various TCS conferences I sense it isn't.
> One of the nice things about Flajolet/Sedgewick[2] (mentioned in this post) is that it gives a nice set of combinators for (at least it seems to me, in theory) precisely enumerating how many "steps" an algorithm would take given some input. I optimistically imagine a world where we replace the relatively non-specific asymptotic bounds of an algorithm with a very specific quantification of the work an algorithm will take. And that this in turn would free TCS to focus on things that are more germane to the field as a whole.
Check out the Art of Computer Programming, by Knuth. He writes all of the algorithms in a hypothetical machine code called MIX (MMIX in later editions). He does go through algorithmic analysis to the degree you describe, showing exactly how many cycles a particular routine may take to complete, based on different variables. This is actually fairly complex even for simple programs. Consider the simple case of computing the maximum in a list:
let current_max := 0
for x in array_of_numbers {
if x > current_max {
current_max = x;
}
}
return current_max;
Anyone can tell you this runs in O(N) time. But exactly how many cycles would this take? Ignoring the architectural details, you would need to know how often the condition "x > current_max" is true. If the list of numbers is in ascending order, this will be true every time, if it is in descending order it will only be true once. The difference in the number of cycles between those two cases is significant. I believe Knuth works through this example, and shows that, for a randomly-ordered array, this would be true around log_2(n) times.
I actually have read a chunk of the relevant TAOCP, but didn't mention it because the comment was already kind of long. I remarked to a friend the other day that it's kind of funny that we've a little bit come full circle on asymptotic analysis in that we are now counting FLOPs, and it is not a little bit like what Knuth does here. :)
I think this needs the use of a CAS (Computer Algebra System, like Mathematica) to calculate the runtime of some code relative to different choices of computer architecture. Then it looks practical. Could even be added to IDEs. But without the help of a CAS, few people would attempt this, because it would be too tedious to do by hand.
Natural log of n, actually. That's because the expected number of times a new max is found is H_n, the nth harmonic number, where n is the size of the array. And H_n = ln n + gamma + o(1).
I don't think it's controversial to say that asymptotic analysis has flaws: the conclusions you draw from it only hold in the limit of larger inputs, and sometims "larger" means "larger than anything you'd be able to run it on." Perhaps as Moore's law dies we'll be increasingly able to talk more about specific problem sizes in a way that won't become obsolete immediately.
I suppose my question is why you think TCS people would do this analysis and development better than non-TCS people. Once you leave the warm cocoon of big-O, the actual practical value of an algorithm depends hugely on specific hardware details. Similarly, once you stop dealing with worst-case or naive average-case complexity, you have to try and define a data distribution relevant for specific real-world problems. My (relatively uninformed) sense is that the skill set required to, say, implement transformer attention customizing to the specific hierarchical memory layout of NVIDIA datacenter GPUs, or evaluate evolutionary optimization algorithms on a specific real-world problem domain, isn't necessarily something you gain in TCS itself.
When you can connect theory to the real world, it's fantastic, but my sense is that such connections are often desired and rarely found. At the very least, I'd expect that to often be a response to applied CS and not coming first from TCS: it's observed empirically that the simplex algorithm works well in practice, and then that encourages people to revisit the asymptotic analysis and refine it. I'd worry that TCS work trying to project onto applications from the blackboard would lead to less rigorous presentations and a lot of work that's only good on paper.
Average-case complexity can be a fickle beast. As you say, the simplex algorithm for LP is great in practice, so it's rarely problematic to use. But meanwhile, people also say, "Modern SAT/SMT solvers are great, they can solve huge problems!" Yet when I try using one, I usually run into exponential runtimes very quickly, especially for unsatisfiable instances.
Overall, it's annoying to tell whether an NP-hard problem is always really hard, or if ~all practical instances can be solved with a clever heuristic. It doesn't help that most problems receive little attention (e.g., to find solvable special cases) after being labeled NP-hard.
> most problems receive little attention (e.g., to find solvable special cases) after being labeled NP-hard.
Since I know quite some people who exactly work on this kind of problem of finding special classes that can be solved in polynomial time, my impression is of course different.
But I think it can be said that if some NP-hard problem is very important in practice and there is no easy way to to get around this problem (i.e. it will also be practically relevant in, say, 15 years), the problem will for sure get a lot more attention.
This is also the reason why some NP-hard problems are much more researched than others.
Yeah, perhaps I was a bit unfair, it's just that the problems that have gotten good results never seem to be the ones I need! C'est la vie, I suppose. (In particular, I've been working with recognizing small formal languages from samples, which has usually NP-hard, but has a surprising number of solvable cases. But my impression is that most modern work has gone into various forms of probabilistic grammars, which aren't really what I'm looking for.)
Sometimes it's also helpful to look into approximation algorithms, e.g., a good LLL implementation can work wonders for certain problems. But heaven forbid someone obtains an inapproximability result for your problem, then you're really in trouble.
> But heaven forbid someone obtains an inapproximability result for your problem, then you're really in trouble.
This is not (necessarily) true:
For example, there exists a great approximation algorithm (Goemans-Williamson algorithm) for MAXCUT in graphs with non-negative edge weights.
On the other hand, when negative weights do occur, one can show (unless P=NP) that there exists no polynomial-time approximation algorithm for which the approximation guarantee is a constant factor times the optimal solution.
But since the Goemans-Williamson algorithm is a great algorithm (if the Unique Games Conjecture holds, and P != NP, it has the best approximation guarantee that any approximation algorithm for MAXCUT with non-negative weights can get in polynomial time) nobody "forbids" you to use it in situations where also negative edge weights can occur. You will loose the approximation goodness guarentee, but in practice, this algorithm nevertheless gives good results in this situation, just not certified good results.
I just meant that there are lots of problems where I think TCS could have contributed but didn't, because the relevant analysis was not acceptable at FOCS or SODA (or whatever). A good example is the original transformers paper, which is vastly over-complicated relative to (say) a later treatment[1] by Ruslan Salakhutdinov's lab, which shows that attention is "just" plain-old Nadaraya-Watson kernel smoothing—which if you don't know is a weighted average of points covered in elementary grad stats.
I'm not saying it was TCS people would be "better" at discovering this or anything like that, I'm just saying that the opportunity for contribution from TCS people is very high, and because the fields are not well-integrated you often have to wait years to uncover what ML concepts "really" are. I think working together would benefit both ML and TCS!
> I optimistically imagine a world where we replace the relatively non-specific asymptotic bounds of an algorithm with a very specific quantification of the work an algorithm will take.
I think you're getting this backwards. The (good) point you and Mitzenmacher are making is that our analysis standards are _too strict_, and hence we avoid good algorithms that are just too hard to analyze.
If we instead require exact combinatorics, e.g., using Flajolet/Sedgewick, that will be _even harder_ than if we _just_ try to do asymptotic analysis. Remember that asymptotics are a way to _approximate_ the exact number of steps. It's supposed to make it _easier_ for you.
Oh I think there is no real doubt that this is true, it's why analytic combinatorics did not succeed in replacing asymptotic analysis. I just wish it had. If it was tractable in a similar set of areas I think a lot of things would be quite a bit better.
His point is exactly that, often researchers care only about worst-case analysis and asymptotics, which is not necessarily a useful scientific analysis of how long an algorithm is expected to take on a machine as a function of the input size (which was the original motivation for Knuth etc to develop the field).
The same authors (but ordered as Sedgewick and Flajolet!) also have a book called An Introduction to the Analysis of Algorithms; there's a course website at https://aofa.cs.princeton.edu/home/ and videos on Coursera (https://www.coursera.org/learn/analysis-of-algorithms) — it is advanced mathematics but less advanced than the Analytic Combinatorics book.
Coincidentally I just did an asymptotic complexity analysis yesterday (something I rarely do). It seemed right since it's by the book but I wasn't sure. After running the extensive benchmarks on the algorithm, I was so happy to find the results matched the asymptotic prediction. The lesson is verify, verify, and always verify.
I'm not a researcher, so start with the default assumption that I don't know what I'm talking about, but I had the same impression when studying those topics, asymptotic analysis is kind of broken.
For example, we assume that arbitrary arithmetic takes constant time. Good. Except that if you take that seriously, then P=PSPACE. So we might say that bit operations on fixed registers are constant time. But nobody does that because then arithmetic is O(lg n) and it's too much trouble to keep track of everything (and if you do, then hashtables aren't O(1) any longer!) If we are more sophisticated we might take a transidchotomous model, but that's really weird and we have results like sorting in less than n lg n time.
Not to mention that the mathematical foundations get a bit shaky with more than one variable.
I think the hard problem underneath is that it's hard to define a computational model that (a) can be reasoned about and isn't extremely weird, (b) has solid mathematical foundations, (c) is reasonably predictive of real world performance, and (d) is not linked to a specific architecture. Asymptotic analysis is kind of a compromise. It can be formulated in very elementary terms, it makes mathematical sense and if you plot the performance of a sorting algorithm on a chart it kind of looks like the graph of n log n if you squint a bit.
Knuth does something similar to what you're suggesting IIRC, albeit without the analytic combinatorics machinery. For some programs he derives a precise expression as a function of the size of input that describes how many operations of each kind the underlying machine will perform. In numerical analysis there is a similar concept, e.g. evaluating a polynomial with Horner has the same asymptotic complexity as the naive method, but we say it's better because it takes exactly the optimal amount of floating point operations (and, in practice, because it uses fused multiply-adds, but that's another story...)
A research program along those lines would be certainly fascinating, but the combination of "needs to be reasonably predictive" and "needs to be portable" is a tough nut to crack.
> For example, we assume that arbitrary arithmetic takes constant time.
Arithmetic on fixed-width number types is constant. It's very much not constant on arbitrary precision integers, something that underpins much of cryptography. Indeed, if you try to e.g. use Gaussian elimination (which uses O(n^3) arithmetic operations) to compute the determinant of a large integer matrix, you'll run into trouble, which is why there is another algorithm for this use case: https://en.m.wikipedia.org/wiki/Bareiss_algorithm
My sentence there is a bit unclear -- I meant it as an implication.
Arithmetic on fixed-width registers in O(1) is equivalent to saying that bit operations are O(1), which intuitively makes the most sense, after all adding two uint64 is not the same as adding two arbitrary precision integers. But that's not always the model we pick, often we just assume that arithmetic is O(1) and sweep length considerations under the rug.
Also, what does fixed mean? Is it like the word size in an actual computer architecture, where registers are 64 bits and that's all you'll ever get, or it depends on the size of input? Because the latter is the transdichotomous model, yet another option. Unsurprisingly you can get different results with different computational models.
This is not to say that asymptotic analysis is wrong, obviously, but it's a bit of a mess and it's important to remember what computational model the analysis was performed under.
> But that's not always the model we pick, often we just assume that arithmetic is O(1) and sweep length considerations under the rug.
So, I only have rudimentary undergraduate knowledge about complexity theory, but my materials (e.g. in theoretical CS, cryptography and computational algebra) were always incredibly clear about the fact that arbitrary integer arithmetic isn't constant - this is why complexity classes such as weakly vs. strongly NP-complete exist. Do you have any examples of where people would treat arbitrary length integer arithmetic as constant? (I think most people just never have to use arbitrary precision arithmetic, so that's why they'd not talk about it explicitly.)
You're right that computational complexity depends on the model and that this makes it nontrivial to use in practice. The theoreticians would sweep this under the rug with "these models are all polynomially equivalent to Turing machines" (which lets us formulate a conjecture such as P=NP precisely), but of course that doesn't in practice help you distinguish between a constant, linear or quadratic algorithm.
Yep, I mentioned this elsewhere in the threads—Knuth does do this quite a bit in TAOCP. I think in a lot of ways we've ended up completely full circle, in that we are now counting FLOPs. This time though I think we have very little hope for something as nice and pretty as asymptotics. At least ones that are useful.
> I optimistically imagine a world where we replace the relatively non-specific asymptotic bounds of an algorithm with a very specific quantification of the work an algorithm will take.
My understanding is that it's pretty much how it started, but proved to be impractical. Meaning, we absolutely can quantify the work any algorithm will take, but for that we need to define the architecture first, i.e., to specify what is the unit of work, what are operations available to us. And, sure, we can do that for any given task (and we always could), but it's usually too messy to include that in general discussions about algorithms, not specifically concerned with the architecture.
In cryptography it is common to get exact estimates for the number of bit operations required to break a cipher. Asymptomatic analysis is great in some circumstances. It for example explains why quicksort is faster than bubble sort for large inputs. Beyond that you want some empirical results.
What's meant by "it’s already too much to ask for a closed form for fibonacci numbers"? Binet's formula is usually called a closed form in my experience. Is "closed form" here supposed to mean "closed form we can evaluate without needing arbitrary-precision arithmetic"?
It is closed form .the author makes so many mistakes here. All linear recusions are closed form by simply finding the roots of the characteristic equation. This is separate from the generating function, which the author confuses with the characteristic equation. A generating function is used when it's not possible to find a closed-form expression.
> Is "closed form" here supposed to mean "closed form we can evaluate without needing arbitrary-precision arithmetic"?
You don't need arbitrary-precision arithmetic to evaluate Binet's formula - big integer arithmetic suffices (simply do your calculations in the ring Z[X]/(X^2-X-1) ).
Author probably just unaware of Binet's formula. But I'd agree with their assessment that there's unlikely to be a closed form for the (indeed, much more complicated!) quantity that they're interested in.
They then say there's an approximation for Fibonacci, which makes me think that's what they're calling Binet's formula. (I'd also expect an author with this mathematical sophistication to be aware of Binet's formula, but maybe I'm projecting.)
Yes, the Binet formula is a closed-form. It arises because of a 2nd order linear recursion , which is not the same as the generating function. Also he confuses the characteristic equation for the generating function. I would recommend a discreate math textbook if you want to learn this.
In fact, for that 'warmup' problem, the leading term has a base and coefficient that are roots of cubic polynomials, given in the OEIS entry. But often the coefficient will be trancendental for these sorts of problems.
> But often the coefficient will be tran[s]cendental for these sorts of problems.
What makes you believe that the coefficient will be transcendental? I'd rather expect some non-trivial algebraic number (i.e. root of some polynomial with rational coefficients).
This article gets things wrong and overcomplicates other things.
he confuses the characteristic equation for the generating function. This : T(z) = z + zT + zT^2 + zT^3 is the characteristic equation.
There is no simple generating function ,as it's not possible to easily invert a cubic closed in form.
Successive derivatives of the generating function are taken to find the coefficients of the infinite series that encodes the sought values. It's hard enough explaining this with a quadratic; a cubic so much more involved.
This so far beyond the scope of a blog post. You need to spend at least a few weeks working on this or take some college courses.
The article is correct. T(z) (in the warmup problem) is the generating function that counts the number of rooted ordered ternary trees, i.e. if we write T(z) = t_0 + t_1z + t_2z^2 + …, then t_n is the number of such trees with n nodes.
Yes there is no "simple" generating function, i.e. there's no simple expression for T(z). Nevertheless, T(z) (as further explained in another comment here in this thread) satisfies the equation T(z) = z + T(z) + zT(z)^2 + zT(z)^3, and this fact can be used to find the coefficients of T (i.e. the numbers t_n) via Lagrange inversion, and the code in the post does so and gets the same numbers as https://oeis.org/A036765.
(You use the term "characteristic equation", but that is a term only used in the context of linear recurrence relations AFAIK, and not applicable here.)
(Also IMO the post does not overcomplicate anything; what it presents is in fact the easiest way of solving the problem — that of finding the asymptotics of the number of unordered rooted ternary trees, up to isomorphism. If you think you have a simpler method even for counting small cases, feel free to post an answer at https://math.stackexchange.com/questions/5050941/counting-tr....)
But more generally, why do we want to analyze combinatorics and algorithms? I suppose it gives us some confidence that we are actually making progress.
What is the relevance of this comment to this post? Are you insinuating that the author did not come up with the right answer? That's odd here because the answers are actually verified in multiple ways (Sage code, looking up the numbers on OEIS).
Not at all, the post seems solid. Just fond memories of a combinatorics class in school. I like problems that are deceptively simple. Easy to get lured in to, then satisfying as layers of complexity are peeled back.
This is cool! What does one need to do to get this to work? I get:
$ ghci
GHCi, version 9.12.2: https://www.haskell.org/ghc/ :? for help
ghci> ts=0:(1+ts+ts^2+ts^3)
ghci> take 12 ts
<interactive>:2:1: error: [GHC-39999]
• No instance for ‘Num [Integer]’ arising from a use of ‘it’
• In the first argument of ‘print’, namely ‘it’
In a stmt of an interactive GHCi command: print it
Edit: After reading Bentley's paper that you linked, figured it out:
$ cat > bentley.hs
ps0:: Num a => [a]
ps0 = 0 : ps0
-- (.*):: Num a => a->[a]->[a]
c .* (f:fs) = c*f : c.*fs
instance Num a => Num [a] where
-- negate (f:fs) = (negate f) : (negate fs)
(f:fs) + (g:gs) = f+g : fs+gs
(f:fs) * (g:gs) = f*g : (f.*gs + fs*(g:gs))
fromInteger c = fromInteger c : ps0
^D
$ ghci bentley.hs
GHCi, version 9.12.2: https://www.haskell.org/ghc/ :? for help
[1 of 2] Compiling Main ( bentley.hs, interpreted )
bentley.hs:7:10: warning: [GHC-06201] [-Wmissing-methods]
• No explicit implementation for
‘abs’, ‘signum’, and (either ‘negate’ or ‘-’)
• In the instance declaration for ‘Num [a]’
|
7 | instance Num a => Num [a] where
| ^^^^^^^^^^^^^^^^
Ok, one module loaded.
ghci> ts=0:(1+ts+ts^2+ts^3)
ghci> take 12 ts
[0,1,1,2,5,13,36,104,309,939,2905,9118]
I find this interesting but the nomenclature escapes me: How is T(z) = z + zT + zT^2 + ... ? I find the jump from the functional programming description of the tree to this recurrence relation not intuitive
I'm afraid the author is writing for an audience that's familiar with generating functions. It's using a common trick in analytic combinatorics, the notation is actually "shorthand".
Very informally, let T(z) be the generating function where the coefficient of the z^n term is the number of trees we are searching with exactly n nodes.
We know from the functional description that such a tree can be:
- a leaf;
- a node with 1 child;
- a node with 2 children;
- a node with 3 children.
A leaf is a tree with one node and that's it, so it "contributes" as a single tree of size 1.
How many trees of size n are there that fall in the second case? Exactly as many as the total trees of size (n-1), hence this case "contributes" as zT(z).
How many trees of size n fall in the third case? That's a bit harder to see, because now you have (n-1) nodes in total, split between the two branches. So you can have 1 in the left branch and n-2 in the right branch, 2 left n-3 right and so on. With a bit of basic combinatorics you can notice that this is exactly the convolution product of the series with itself, in analytical terms T^2(z). Hence the third case "contributes" for zT^2(z).
The fourth case is similar.
At this point we have constructed the inverse of the generating function we are looking for, i.e. z as a function of T(z) rather than the reverse, but luckily there's a standard trick for this: Lagrange inversion.
Generating functions are definitely strange and unintuitive, but they can be really helpful for combinatorics.
The introduction to generating functions in this well known text on the subject (at the beginning of Chapter 1) might be more helpful than the Wikipedia article linked in the original post:
> Although giving a simple formula for the members of the sequence may be out of the question, we might be able to give a simple formula for the sum of a power series, whose coefficients are the sequence that we’re looking for.
This article is technically way beyond me, but can anybody explain to me what on earth a "worked example" is? That expression is completely incoherent.
The key is to always square the generating function—it doesn’t matter why, it just feels more correct. Learned that from a proof in an old cereal box puzzle.
This is only tangentially related, but one thing that I am still kind of hopeful about is analytic combinatorics will free us from the tyranny of asymptotic analysis.
I suppose this is kind of a hot take, and in some ways less relevant than it was (say) 10 years ago, but I believe the theoretical CS community has spent a lot of time analyzing the performance of various algorithms over the years and kind of pointedly not inventing algorithms which accomplish some result at all, even if it is only known to be computable in a pathologically inefficient way. This means a lot of breakthroughs that could have been TCS breakthroughs have been invented, usually worse, by other communities.
I am not the only one who thinks so, here[1] is an obscure talk where Michael Mitzenmacher (a well-known CS theory prof at Harvard) makes more or less this specific complaint.
One of the nice things about Flajolet/Sedgewick[2] (mentioned in this post) is that it gives a nice set of combinators for (at least it seems to me, in theory) precisely enumerating how many "steps" an algorithm would take given some input. I optimistically imagine a world where we replace the relatively non-specific asymptotic bounds of an algorithm with a very specific quantification of the work an algorithm will take. And that this in turn would free TCS to focus on things that are more germane to the field as a whole.
With that all said I'm open to the argument that TCS has fundamentally changed and this is no longer such a big issue. But perusing the various TCS conferences I sense it isn't.
[1]: https://www.youtube.com/watch?v=p3anjSAnKBo
[2]: https://algo.inria.fr/flajolet/Publications/book.pdf
> One of the nice things about Flajolet/Sedgewick[2] (mentioned in this post) is that it gives a nice set of combinators for (at least it seems to me, in theory) precisely enumerating how many "steps" an algorithm would take given some input. I optimistically imagine a world where we replace the relatively non-specific asymptotic bounds of an algorithm with a very specific quantification of the work an algorithm will take. And that this in turn would free TCS to focus on things that are more germane to the field as a whole.
Check out the Art of Computer Programming, by Knuth. He writes all of the algorithms in a hypothetical machine code called MIX (MMIX in later editions). He does go through algorithmic analysis to the degree you describe, showing exactly how many cycles a particular routine may take to complete, based on different variables. This is actually fairly complex even for simple programs. Consider the simple case of computing the maximum in a list:
Anyone can tell you this runs in O(N) time. But exactly how many cycles would this take? Ignoring the architectural details, you would need to know how often the condition "x > current_max" is true. If the list of numbers is in ascending order, this will be true every time, if it is in descending order it will only be true once. The difference in the number of cycles between those two cases is significant. I believe Knuth works through this example, and shows that, for a randomly-ordered array, this would be true around log_2(n) times.I actually have read a chunk of the relevant TAOCP, but didn't mention it because the comment was already kind of long. I remarked to a friend the other day that it's kind of funny that we've a little bit come full circle on asymptotic analysis in that we are now counting FLOPs, and it is not a little bit like what Knuth does here. :)
I think this needs the use of a CAS (Computer Algebra System, like Mathematica) to calculate the runtime of some code relative to different choices of computer architecture. Then it looks practical. Could even be added to IDEs. But without the help of a CAS, few people would attempt this, because it would be too tedious to do by hand.
Natural log of n, actually. That's because the expected number of times a new max is found is H_n, the nth harmonic number, where n is the size of the array. And H_n = ln n + gamma + o(1).
I don't think it's controversial to say that asymptotic analysis has flaws: the conclusions you draw from it only hold in the limit of larger inputs, and sometims "larger" means "larger than anything you'd be able to run it on." Perhaps as Moore's law dies we'll be increasingly able to talk more about specific problem sizes in a way that won't become obsolete immediately.
I suppose my question is why you think TCS people would do this analysis and development better than non-TCS people. Once you leave the warm cocoon of big-O, the actual practical value of an algorithm depends hugely on specific hardware details. Similarly, once you stop dealing with worst-case or naive average-case complexity, you have to try and define a data distribution relevant for specific real-world problems. My (relatively uninformed) sense is that the skill set required to, say, implement transformer attention customizing to the specific hierarchical memory layout of NVIDIA datacenter GPUs, or evaluate evolutionary optimization algorithms on a specific real-world problem domain, isn't necessarily something you gain in TCS itself.
When you can connect theory to the real world, it's fantastic, but my sense is that such connections are often desired and rarely found. At the very least, I'd expect that to often be a response to applied CS and not coming first from TCS: it's observed empirically that the simplex algorithm works well in practice, and then that encourages people to revisit the asymptotic analysis and refine it. I'd worry that TCS work trying to project onto applications from the blackboard would lead to less rigorous presentations and a lot of work that's only good on paper.
Average-case complexity can be a fickle beast. As you say, the simplex algorithm for LP is great in practice, so it's rarely problematic to use. But meanwhile, people also say, "Modern SAT/SMT solvers are great, they can solve huge problems!" Yet when I try using one, I usually run into exponential runtimes very quickly, especially for unsatisfiable instances.
Overall, it's annoying to tell whether an NP-hard problem is always really hard, or if ~all practical instances can be solved with a clever heuristic. It doesn't help that most problems receive little attention (e.g., to find solvable special cases) after being labeled NP-hard.
> most problems receive little attention (e.g., to find solvable special cases) after being labeled NP-hard.
Since I know quite some people who exactly work on this kind of problem of finding special classes that can be solved in polynomial time, my impression is of course different.
But I think it can be said that if some NP-hard problem is very important in practice and there is no easy way to to get around this problem (i.e. it will also be practically relevant in, say, 15 years), the problem will for sure get a lot more attention.
This is also the reason why some NP-hard problems are much more researched than others.
Yeah, perhaps I was a bit unfair, it's just that the problems that have gotten good results never seem to be the ones I need! C'est la vie, I suppose. (In particular, I've been working with recognizing small formal languages from samples, which has usually NP-hard, but has a surprising number of solvable cases. But my impression is that most modern work has gone into various forms of probabilistic grammars, which aren't really what I'm looking for.)
Sometimes it's also helpful to look into approximation algorithms, e.g., a good LLL implementation can work wonders for certain problems. But heaven forbid someone obtains an inapproximability result for your problem, then you're really in trouble.
> But heaven forbid someone obtains an inapproximability result for your problem, then you're really in trouble.
This is not (necessarily) true:
For example, there exists a great approximation algorithm (Goemans-Williamson algorithm) for MAXCUT in graphs with non-negative edge weights.
On the other hand, when negative weights do occur, one can show (unless P=NP) that there exists no polynomial-time approximation algorithm for which the approximation guarantee is a constant factor times the optimal solution.
But since the Goemans-Williamson algorithm is a great algorithm (if the Unique Games Conjecture holds, and P != NP, it has the best approximation guarantee that any approximation algorithm for MAXCUT with non-negative weights can get in polynomial time) nobody "forbids" you to use it in situations where also negative edge weights can occur. You will loose the approximation goodness guarentee, but in practice, this algorithm nevertheless gives good results in this situation, just not certified good results.
I found SMT solves were stymied by merely running into cubic runtimes.
I just meant that there are lots of problems where I think TCS could have contributed but didn't, because the relevant analysis was not acceptable at FOCS or SODA (or whatever). A good example is the original transformers paper, which is vastly over-complicated relative to (say) a later treatment[1] by Ruslan Salakhutdinov's lab, which shows that attention is "just" plain-old Nadaraya-Watson kernel smoothing—which if you don't know is a weighted average of points covered in elementary grad stats.
I'm not saying it was TCS people would be "better" at discovering this or anything like that, I'm just saying that the opportunity for contribution from TCS people is very high, and because the fields are not well-integrated you often have to wait years to uncover what ML concepts "really" are. I think working together would benefit both ML and TCS!
[1]: https://arxiv.org/pdf/1908.11775
> I optimistically imagine a world where we replace the relatively non-specific asymptotic bounds of an algorithm with a very specific quantification of the work an algorithm will take.
I think you're getting this backwards. The (good) point you and Mitzenmacher are making is that our analysis standards are _too strict_, and hence we avoid good algorithms that are just too hard to analyze.
If we instead require exact combinatorics, e.g., using Flajolet/Sedgewick, that will be _even harder_ than if we _just_ try to do asymptotic analysis. Remember that asymptotics are a way to _approximate_ the exact number of steps. It's supposed to make it _easier_ for you.
Oh I think there is no real doubt that this is true, it's why analytic combinatorics did not succeed in replacing asymptotic analysis. I just wish it had. If it was tractable in a similar set of areas I think a lot of things would be quite a bit better.
In fact in these slides from a 2011/2012 talk, Sedgewick talks about exactly this, even going as far as to draw a contrast between "TCS" and "analysis of algorithms" (the "scientific method"): https://sedgewick.io/talks/#algorithms-for-the-masses = https://sedgewick.io/wp-content/uploads/2022/03/2011-12AlgsM... onwards.
(Also touched upon briefly in https://sedgewick.io/talks/#lasting-legacy-of-flajolet = https://sedgewick.io/wp-content/uploads/2022/03/2013-14Flajo...)
His point is exactly that, often researchers care only about worst-case analysis and asymptotics, which is not necessarily a useful scientific analysis of how long an algorithm is expected to take on a machine as a function of the input size (which was the original motivation for Knuth etc to develop the field).
The same authors (but ordered as Sedgewick and Flajolet!) also have a book called An Introduction to the Analysis of Algorithms; there's a course website at https://aofa.cs.princeton.edu/home/ and videos on Coursera (https://www.coursera.org/learn/analysis-of-algorithms) — it is advanced mathematics but less advanced than the Analytic Combinatorics book.
Coincidentally I just did an asymptotic complexity analysis yesterday (something I rarely do). It seemed right since it's by the book but I wasn't sure. After running the extensive benchmarks on the algorithm, I was so happy to find the results matched the asymptotic prediction. The lesson is verify, verify, and always verify.
[1] Algorithm, https://github.com/williamw520/toposort/blob/master/Algorith...
[2] Benchmarks, https://github.com/williamw520/toposort/blob/master/README.m...
I'm not a researcher, so start with the default assumption that I don't know what I'm talking about, but I had the same impression when studying those topics, asymptotic analysis is kind of broken.
For example, we assume that arbitrary arithmetic takes constant time. Good. Except that if you take that seriously, then P=PSPACE. So we might say that bit operations on fixed registers are constant time. But nobody does that because then arithmetic is O(lg n) and it's too much trouble to keep track of everything (and if you do, then hashtables aren't O(1) any longer!) If we are more sophisticated we might take a transidchotomous model, but that's really weird and we have results like sorting in less than n lg n time. Not to mention that the mathematical foundations get a bit shaky with more than one variable.
I think the hard problem underneath is that it's hard to define a computational model that (a) can be reasoned about and isn't extremely weird, (b) has solid mathematical foundations, (c) is reasonably predictive of real world performance, and (d) is not linked to a specific architecture. Asymptotic analysis is kind of a compromise. It can be formulated in very elementary terms, it makes mathematical sense and if you plot the performance of a sorting algorithm on a chart it kind of looks like the graph of n log n if you squint a bit.
Knuth does something similar to what you're suggesting IIRC, albeit without the analytic combinatorics machinery. For some programs he derives a precise expression as a function of the size of input that describes how many operations of each kind the underlying machine will perform. In numerical analysis there is a similar concept, e.g. evaluating a polynomial with Horner has the same asymptotic complexity as the naive method, but we say it's better because it takes exactly the optimal amount of floating point operations (and, in practice, because it uses fused multiply-adds, but that's another story...)
A research program along those lines would be certainly fascinating, but the combination of "needs to be reasonably predictive" and "needs to be portable" is a tough nut to crack.
> For example, we assume that arbitrary arithmetic takes constant time.
Arithmetic on fixed-width number types is constant. It's very much not constant on arbitrary precision integers, something that underpins much of cryptography. Indeed, if you try to e.g. use Gaussian elimination (which uses O(n^3) arithmetic operations) to compute the determinant of a large integer matrix, you'll run into trouble, which is why there is another algorithm for this use case: https://en.m.wikipedia.org/wiki/Bareiss_algorithm
(Disclaimer: Also not a researcher)
My sentence there is a bit unclear -- I meant it as an implication.
Arithmetic on fixed-width registers in O(1) is equivalent to saying that bit operations are O(1), which intuitively makes the most sense, after all adding two uint64 is not the same as adding two arbitrary precision integers. But that's not always the model we pick, often we just assume that arithmetic is O(1) and sweep length considerations under the rug.
Also, what does fixed mean? Is it like the word size in an actual computer architecture, where registers are 64 bits and that's all you'll ever get, or it depends on the size of input? Because the latter is the transdichotomous model, yet another option. Unsurprisingly you can get different results with different computational models.
This is not to say that asymptotic analysis is wrong, obviously, but it's a bit of a mess and it's important to remember what computational model the analysis was performed under.
> But that's not always the model we pick, often we just assume that arithmetic is O(1) and sweep length considerations under the rug.
So, I only have rudimentary undergraduate knowledge about complexity theory, but my materials (e.g. in theoretical CS, cryptography and computational algebra) were always incredibly clear about the fact that arbitrary integer arithmetic isn't constant - this is why complexity classes such as weakly vs. strongly NP-complete exist. Do you have any examples of where people would treat arbitrary length integer arithmetic as constant? (I think most people just never have to use arbitrary precision arithmetic, so that's why they'd not talk about it explicitly.)
You're right that computational complexity depends on the model and that this makes it nontrivial to use in practice. The theoreticians would sweep this under the rug with "these models are all polynomially equivalent to Turing machines" (which lets us formulate a conjecture such as P=NP precisely), but of course that doesn't in practice help you distinguish between a constant, linear or quadratic algorithm.
Yep, I mentioned this elsewhere in the threads—Knuth does do this quite a bit in TAOCP. I think in a lot of ways we've ended up completely full circle, in that we are now counting FLOPs. This time though I think we have very little hope for something as nice and pretty as asymptotics. At least ones that are useful.
> I optimistically imagine a world where we replace the relatively non-specific asymptotic bounds of an algorithm with a very specific quantification of the work an algorithm will take.
My understanding is that it's pretty much how it started, but proved to be impractical. Meaning, we absolutely can quantify the work any algorithm will take, but for that we need to define the architecture first, i.e., to specify what is the unit of work, what are operations available to us. And, sure, we can do that for any given task (and we always could), but it's usually too messy to include that in general discussions about algorithms, not specifically concerned with the architecture.
Yep, That's exactly why we moved to asymptotic analysis. I was just hoping analytic combinatorics would help us move back to something like this.
In cryptography it is common to get exact estimates for the number of bit operations required to break a cipher. Asymptomatic analysis is great in some circumstances. It for example explains why quicksort is faster than bubble sort for large inputs. Beyond that you want some empirical results.
Very cool!
What's meant by "it’s already too much to ask for a closed form for fibonacci numbers"? Binet's formula is usually called a closed form in my experience. Is "closed form" here supposed to mean "closed form we can evaluate without needing arbitrary-precision arithmetic"?
It is closed form .the author makes so many mistakes here. All linear recusions are closed form by simply finding the roots of the characteristic equation. This is separate from the generating function, which the author confuses with the characteristic equation. A generating function is used when it's not possible to find a closed-form expression.
> Is "closed form" here supposed to mean "closed form we can evaluate without needing arbitrary-precision arithmetic"?
You don't need arbitrary-precision arithmetic to evaluate Binet's formula - big integer arithmetic suffices (simply do your calculations in the ring Z[X]/(X^2-X-1) ).
At that point, you'd be better off just using a recursive algorithm like the one in GMP. You're swapping out arbitrary-length for arbitrary-precision.
Author probably just unaware of Binet's formula. But I'd agree with their assessment that there's unlikely to be a closed form for the (indeed, much more complicated!) quantity that they're interested in.
They then say there's an approximation for Fibonacci, which makes me think that's what they're calling Binet's formula. (I'd also expect an author with this mathematical sophistication to be aware of Binet's formula, but maybe I'm projecting.)
Yes, the Binet formula is a closed-form. It arises because of a 2nd order linear recursion , which is not the same as the generating function. Also he confuses the characteristic equation for the generating function. I would recommend a discreate math textbook if you want to learn this.
In fact, for that 'warmup' problem, the leading term has a base and coefficient that are roots of cubic polynomials, given in the OEIS entry. But often the coefficient will be trancendental for these sorts of problems.
> But often the coefficient will be tran[s]cendental for these sorts of problems.
What makes you believe that the coefficient will be transcendental? I'd rather expect some non-trivial algebraic number (i.e. root of some polynomial with rational coefficients).
This article gets things wrong and overcomplicates other things.
he confuses the characteristic equation for the generating function. This : T(z) = z + zT + zT^2 + zT^3 is the characteristic equation.
There is no simple generating function ,as it's not possible to easily invert a cubic closed in form.
Successive derivatives of the generating function are taken to find the coefficients of the infinite series that encodes the sought values. It's hard enough explaining this with a quadratic; a cubic so much more involved.
This so far beyond the scope of a blog post. You need to spend at least a few weeks working on this or take some college courses.
The article is correct. T(z) (in the warmup problem) is the generating function that counts the number of rooted ordered ternary trees, i.e. if we write T(z) = t_0 + t_1z + t_2z^2 + …, then t_n is the number of such trees with n nodes.
Yes there is no "simple" generating function, i.e. there's no simple expression for T(z). Nevertheless, T(z) (as further explained in another comment here in this thread) satisfies the equation T(z) = z + T(z) + zT(z)^2 + zT(z)^3, and this fact can be used to find the coefficients of T (i.e. the numbers t_n) via Lagrange inversion, and the code in the post does so and gets the same numbers as https://oeis.org/A036765.
(You use the term "characteristic equation", but that is a term only used in the context of linear recurrence relations AFAIK, and not applicable here.)
(Also IMO the post does not overcomplicate anything; what it presents is in fact the easiest way of solving the problem — that of finding the asymptotics of the number of unordered rooted ternary trees, up to isomorphism. If you think you have a simpler method even for counting small cases, feel free to post an answer at https://math.stackexchange.com/questions/5050941/counting-tr....)
thanks for clearing it up
There's an interesting extension to Analytic Combinatorics by Flajolet/Sedgewick called "Analytic Combinatorics in Several Variables" by Pemantle and Wilson: https://www2.math.upenn.edu/~pemantle/papers/ACSV.pdf and https://acsvproject.com/
It extends the original framework with a lot of useful new primitives, like Hadamard products. Super useful!
While this interesting, what are its main applications?
I'm currently using it to analyze a version of the Kaczmarz algorithm, generalizing my answer here: https://mathoverflow.net/a/490506/5429
But more generally, why do we want to analyze combinatorics and algorithms? I suppose it gives us some confidence that we are actually making progress.
Thanks!
What math lets you most easily be convinced you’ve come up with the right answer when you actually haven’t?
Seems like combinatorics comes in second only to statistics.
What is the relevance of this comment to this post? Are you insinuating that the author did not come up with the right answer? That's odd here because the answers are actually verified in multiple ways (Sage code, looking up the numbers on OEIS).
Not at all, the post seems solid. Just fond memories of a combinatorics class in school. I like problems that are deceptively simple. Easy to get lured in to, then satisfying as layers of complexity are peeled back.
in Haskell you can read off the generating function's coefficients directly from the functional equation T(z) = z + zT + zT² + zT²
ghci> ts=0:(1+ts+ts^2+ts^3) ghci> take 12 ts [0,1,1,2,5,13,36,104,309,939,2905,9118]
Using the Num instance for power series (very cute application of laziness expounded by Karczmarczuk and then McIlroy https://dl.acm.org/doi/abs/10.1017/s0956796899003299)
This is cool! What does one need to do to get this to work? I get:
Edit: After reading Bentley's paper that you linked, figured it out:I find this interesting but the nomenclature escapes me: How is T(z) = z + zT + zT^2 + ... ? I find the jump from the functional programming description of the tree to this recurrence relation not intuitive
I'm afraid the author is writing for an audience that's familiar with generating functions. It's using a common trick in analytic combinatorics, the notation is actually "shorthand".
Very informally, let T(z) be the generating function where the coefficient of the z^n term is the number of trees we are searching with exactly n nodes.
We know from the functional description that such a tree can be:
A leaf is a tree with one node and that's it, so it "contributes" as a single tree of size 1.How many trees of size n are there that fall in the second case? Exactly as many as the total trees of size (n-1), hence this case "contributes" as zT(z).
How many trees of size n fall in the third case? That's a bit harder to see, because now you have (n-1) nodes in total, split between the two branches. So you can have 1 in the left branch and n-2 in the right branch, 2 left n-3 right and so on. With a bit of basic combinatorics you can notice that this is exactly the convolution product of the series with itself, in analytical terms T^2(z). Hence the third case "contributes" for zT^2(z).
The fourth case is similar.
At this point we have constructed the inverse of the generating function we are looking for, i.e. z as a function of T(z) rather than the reverse, but luckily there's a standard trick for this: Lagrange inversion.
I had the same question as OP, passing knowledge of generating functions was not enough to see the link between the description and formula.
Your explanation was exactly what I needed, thanks.
Generating functions are definitely strange and unintuitive, but they can be really helpful for combinatorics.
The introduction to generating functions in this well known text on the subject (at the beginning of Chapter 1) might be more helpful than the Wikipedia article linked in the original post:
https://www2.math.upenn.edu/~wilf/gfology2.pdf
From that text:
> Although giving a simple formula for the members of the sequence may be out of the question, we might be able to give a simple formula for the sum of a power series, whose coefficients are the sequence that we’re looking for.
He doesn't explain any of the notation, like how the brackets [z] work. Also the generating function he supplies is is wrong.
This article is technically way beyond me, but can anybody explain to me what on earth a "worked example" is? That expression is completely incoherent.
It's an idiom in math meaning "a problem, together with its solution, that illustrates a general principle or technique".
A worked example of the quadratic formula would be solving a particular equation by plugging in values of a, b, c into the formula.
A worked example of analytic combinatorics is solving a combinatorics problem with techniques from real/complex analysis.
Thanks, I get it now: I'm even more ignorant than I thought.
The key is to always square the generating function—it doesn’t matter why, it just feels more correct. Learned that from a proof in an old cereal box puzzle.