3 The results of these benchmarks suggest that building this `bc` with
4 optimization at `-O3` with link-time optimization (`-flto`) will result in the
5 best performance. However, using `-march=native` can result in **WORSE**
8 *Note*: all benchmarks were run four times, and the fastest run is the one
9 shown. Also, `[bc]` means whichever `bc` was being run, and the assumed working
10 directory is the root directory of this repository. Also, this `bc` was at
11 version `3.0.0` while GNU `bc` was at version `1.07.1`, and all tests were
12 conducted on an `x86_64` machine running Gentoo Linux with `clang` `9.0.1` as
15 ## Typical Optimization Level
17 These benchmarks were run with both `bc`'s compiled with the typical `-O2`
18 optimizations and no link-time optimization.
25 tests/script.sh bc add.bc 1 0 1 1 [bc]
49 tests/script.sh bc subtract.bc 1 0 1 1 [bc]
73 tests/script.sh bc multiply.bc 1 0 1 1 [bc]
97 tests/script.sh bc divide.bc 1 0 1 1 [bc]
118 The command used was:
121 printf '1234567890^100000; halt\n' | time -p [bc] -q > /dev/null
142 [This file][1] was downloaded, saved at `../timeconst.bc` and the following
146 --- ../timeconst.bc 2018-09-28 11:32:22.808669000 -0600
147 +++ ../timeconst.bc 2019-06-07 07:26:36.359913078 -0600
150 print "#endif /* KERNEL_TIMECONST_H */\n"
157 +for (i = 0; i <= 50000; ++i) {
164 The command used was:
167 time -p [bc] ../timeconst.bc > /dev/null
186 Because this `bc` is faster when doing math, it might be a better comparison to
187 run a script that is not running any math. As such, I put the following into
191 for (i = 0; i < 100000000; ++i) {
201 The command used was:
204 time -p [bc] ../test.bc > /dev/null
223 I also put the following into `../test2.bc`:
228 while (i < 100000000) {
237 The command used was:
240 time -p [bc] ../test2.bc > /dev/null
259 It seems that the improvements to the interpreter helped a lot in certain cases.
261 Also, I have no idea why GNU `bc` did worse when it is technically doing less
264 ## Recommended Optimizations from `2.7.0`
266 Note that, when running the benchmarks, the optimizations used are not the ones
267 I recommended for version `2.7.0`, which are `-O3 -flto -march=native`.
269 This `bc` separates its code into modules that, when optimized at link time,
270 removes a lot of the inefficiency that comes from function overhead. This is
271 most keenly felt with one function: `bc_vec_item()`, which should turn into just
272 one instruction (on `x86_64`) when optimized at link time and inlined. There are
273 other functions that matter as well.
275 I also recommended `-march=native` on the grounds that newer instructions would
276 increase performance on math-heavy code. We will see if that assumption was
277 correct. (Spoiler: **NO**.)
279 When compiling both `bc`'s with the optimizations I recommended for this `bc`
280 for version `2.7.0`, the results are as follows.
284 The command used was:
287 tests/script.sh bc add.bc 1 0 1 1 [bc]
308 The command used was:
311 tests/script.sh bc subtract.bc 1 0 1 1 [bc]
332 The command used was:
335 tests/script.sh bc multiply.bc 1 0 1 1 [bc]
356 The command used was:
359 tests/script.sh bc divide.bc 1 0 1 1 [bc]
380 The command used was:
383 printf '1234567890^100000; halt\n' | time -p [bc] -q > /dev/null
404 The command for the `../timeconst.bc` script was:
407 time -p [bc] ../timeconst.bc > /dev/null
426 The command for the next script, the `for` loop script, was:
429 time -p [bc] ../test.bc > /dev/null
448 The command for the next script, the `while` loop script, was:
451 time -p [bc] ../test2.bc > /dev/null
470 ## Link-Time Optimization Only
472 Just for kicks, let's see if `-march=native` is even useful.
474 The optimizations I used for both `bc`'s were `-O3 -flto`.
478 The command used was:
481 tests/script.sh bc add.bc 1 0 1 1 [bc]
502 The command used was:
505 tests/script.sh bc subtract.bc 1 0 1 1 [bc]
526 The command used was:
529 tests/script.sh bc multiply.bc 1 0 1 1 [bc]
550 The command used was:
553 tests/script.sh bc divide.bc 1 0 1 1 [bc]
574 The command used was:
577 printf '1234567890^100000; halt\n' | time -p [bc] -q > /dev/null
598 The command for the `../timeconst.bc` script was:
601 time -p [bc] ../timeconst.bc > /dev/null
620 The command for the next script, the `for` loop script, was:
623 time -p [bc] ../test.bc > /dev/null
642 The command for the next script, the `while` loop script, was:
645 time -p [bc] ../test2.bc > /dev/null
664 It turns out that `-march=native` can be a problem. As such, I have removed the
665 recommendation to build with `-march=native`.
667 ## Recommended Compiler
669 When I ran these benchmarks with my `bc` compiled under `clang` vs. `gcc`, it
670 performed much better under `clang`. I recommend compiling this `bc` with
673 [1]: https://github.com/torvalds/linux/blob/master/kernel/time/timeconst.bc