contrib/bc/manuals/benchmarks.md

   1 # Benchmarks
   2
   3 The results of these benchmarks suggest that building this `bc` with
   4 optimization at `-O3` with link-time optimization (`-flto`) will result in the
   5 best performance. However, using `-march=native` can result in **WORSE**
   6 performance.
   7
   8 *Note*: all benchmarks were run four times, and the fastest run is the one
   9 shown. Also, `[bc]` means whichever `bc` was being run, and the assumed working
  10 directory is the root directory of this repository. Also, this `bc` was at
  11 version `3.0.0` while GNU `bc` was at version `1.07.1`, and all tests were
  12 conducted on an `x86_64` machine running Gentoo Linux with `clang` `9.0.1` as
  13 the compiler.
  14
  15 ## Typical Optimization Level
  16
  17 These benchmarks were run with both `bc`'s compiled with the typical `-O2`
  18 optimizations and no link-time optimization.
  19
  20 ### Addition
  21
  22 The command used was:
  23
  24 ```
  25 tests/script.sh bc add.bc 1 0 1 1 [bc]
  26 ```
  27
  28 For GNU `bc`:
  29
  30 ```
  31 real 2.54
  32 user 1.21
  33 sys 1.32
  34 ```
  35
  36 For this `bc`:
  37
  38 ```
  39 real 0.88
  40 user 0.85
  41 sys 0.02
  42 ```
  43
  44 ### Subtraction
  45
  46 The command used was:
  47
  48 ```
  49 tests/script.sh bc subtract.bc 1 0 1 1 [bc]
  50 ```
  51
  52 For GNU `bc`:
  53
  54 ```
  55 real 2.51
  56 user 1.05
  57 sys 1.45
  58 ```
  59
  60 For this `bc`:
  61
  62 ```
  63 real 0.91
  64 user 0.85
  65 sys 0.05
  66 ```
  67
  68 ### Multiplication
  69
  70 The command used was:
  71
  72 ```
  73 tests/script.sh bc multiply.bc 1 0 1 1 [bc]
  74 ```
  75
  76 For GNU `bc`:
  77
  78 ```
  79 real 7.15
  80 user 4.69
  81 sys 2.46
  82 ```
  83
  84 For this `bc`:
  85
  86 ```
  87 real 2.20
  88 user 2.10
  89 sys 0.09
  90 ```
  91
  92 ### Division
  93
  94 The command used was:
  95
  96 ```
  97 tests/script.sh bc divide.bc 1 0 1 1 [bc]
  98 ```
  99
 100 For GNU `bc`:
 101
 102 ```
 103 real 3.36
 104 user 1.87
 105 sys 1.48
 106 ```
 107
 108 For this `bc`:
 109
 110 ```
 111 real 1.61
 112 user 1.57
 113 sys 0.03
 114 ```
 115
 116 ### Power
 117
 118 The command used was:
 119
 120 ```
 121 printf '1234567890^100000; halt\n' | time -p [bc] -q > /dev/null
 122 ```
 123
 124 For GNU `bc`:
 125
 126 ```
 127 real 11.30
 128 user 11.30
 129 sys 0.00
 130 ```
 131
 132 For this `bc`:
 133
 134 ```
 135 real 0.73
 136 user 0.72
 137 sys 0.00
 138 ```
 139
 140 ### Scripts
 141
 142 [This file][1] was downloaded, saved at `../timeconst.bc` and the following
 143 patch was applied:
 144
 145 ```
 146 --- ../timeconst.bc     2018-09-28 11:32:22.808669000 -0600
 147 +++ ../timeconst.bc     2019-06-07 07:26:36.359913078 -0600
 148 @@ -110,8 +110,10 @@
 149
 150                 print "#endif /* KERNEL_TIMECONST_H */\n"
 151         }
 152 -       halt
 153  }
 154
 155 -hz = read();
 156 -timeconst(hz)
 157 +for (i = 0; i <= 50000; ++i) {
 158 +       timeconst(i)
 159 +}
 160 +
 161 +halt
 162 ```
 163
 164 The command used was:
 165
 166 ```
 167 time -p [bc] ../timeconst.bc > /dev/null
 168 ```
 169
 170 For GNU `bc`:
 171
 172 ```
 173 real 16.71
 174 user 16.06
 175 sys 0.65
 176 ```
 177
 178 For this `bc`:
 179
 180 ```
 181 real 13.16
 182 user 13.15
 183 sys 0.00
 184 ```
 185
 186 Because this `bc` is faster when doing math, it might be a better comparison to
 187 run a script that is not running any math. As such, I put the following into
 188 `../test.bc`:
 189
 190 ```
 191 for (i = 0; i < 100000000; ++i) {
 192         y = i
 193 }
 194
 195 i
 196 y
 197
 198 halt
 199 ```
 200
 201 The command used was:
 202
 203 ```
 204 time -p [bc] ../test.bc > /dev/null
 205 ```
 206
 207 For GNU `bc`:
 208
 209 ```
 210 real 16.60
 211 user 16.59
 212 sys 0.00
 213 ```
 214
 215 For this `bc`:
 216
 217 ```
 218 real 22.76
 219 user 22.75
 220 sys 0.00
 221 ```
 222
 223 I also put the following into `../test2.bc`:
 224
 225 ```
 226 i = 0
 227
 228 while (i < 100000000) {
 229         i += 1
 230 }
 231
 232 i
 233
 234 halt
 235 ```
 236
 237 The command used was:
 238
 239 ```
 240 time -p [bc] ../test2.bc > /dev/null
 241 ```
 242
 243 For GNU `bc`:
 244
 245 ```
 246 real 17.32
 247 user 17.30
 248 sys 0.00
 249 ```
 250
 251 For this `bc`:
 252
 253 ```
 254 real 16.98
 255 user 16.96
 256 sys 0.01
 257 ```
 258
 259 It seems that the improvements to the interpreter helped a lot in certain cases.
 260
 261 Also, I have no idea why GNU `bc` did worse when it is technically doing less
 262 work.
 263
 264 ## Recommended Optimizations from `2.7.0`
 265
 266 Note that, when running the benchmarks, the optimizations used are not the ones
 267 I recommended for version `2.7.0`, which are `-O3 -flto -march=native`.
 268
 269 This `bc` separates its code into modules that, when optimized at link time,
 270 removes a lot of the inefficiency that comes from function overhead. This is
 271 most keenly felt with one function: `bc_vec_item()`, which should turn into just
 272 one instruction (on `x86_64`) when optimized at link time and inlined. There are
 273 other functions that matter as well.
 274
 275 I also recommended `-march=native` on the grounds that newer instructions would
 276 increase performance on math-heavy code. We will see if that assumption was
 277 correct. (Spoiler: **NO**.)
 278
 279 When compiling both `bc`'s with the optimizations I recommended for this `bc`
 280 for version `2.7.0`, the results are as follows.
 281
 282 ### Addition
 283
 284 The command used was:
 285
 286 ```
 287 tests/script.sh bc add.bc 1 0 1 1 [bc]
 288 ```
 289
 290 For GNU `bc`:
 291
 292 ```
 293 real 2.44
 294 user 1.11
 295 sys 1.32
 296 ```
 297
 298 For this `bc`:
 299
 300 ```
 301 real 0.59
 302 user 0.54
 303 sys 0.05
 304 ```
 305
 306 ### Subtraction
 307
 308 The command used was:
 309
 310 ```
 311 tests/script.sh bc subtract.bc 1 0 1 1 [bc]
 312 ```
 313
 314 For GNU `bc`:
 315
 316 ```
 317 real 2.42
 318 user 1.02
 319 sys 1.40
 320 ```
 321
 322 For this `bc`:
 323
 324 ```
 325 real 0.64
 326 user 0.57
 327 sys 0.06
 328 ```
 329
 330 ### Multiplication
 331
 332 The command used was:
 333
 334 ```
 335 tests/script.sh bc multiply.bc 1 0 1 1 [bc]
 336 ```
 337
 338 For GNU `bc`:
 339
 340 ```
 341 real 7.01
 342 user 4.50
 343 sys 2.50
 344 ```
 345
 346 For this `bc`:
 347
 348 ```
 349 real 1.59
 350 user 1.53
 351 sys 0.05
 352 ```
 353
 354 ### Division
 355
 356 The command used was:
 357
 358 ```
 359 tests/script.sh bc divide.bc 1 0 1 1 [bc]
 360 ```
 361
 362 For GNU `bc`:
 363
 364 ```
 365 real 3.26
 366 user 1.82
 367 sys 1.44
 368 ```
 369
 370 For this `bc`:
 371
 372 ```
 373 real 1.24
 374 user 1.20
 375 sys 0.03
 376 ```
 377
 378 ### Power
 379
 380 The command used was:
 381
 382 ```
 383 printf '1234567890^100000; halt\n' | time -p [bc] -q > /dev/null
 384 ```
 385
 386 For GNU `bc`:
 387
 388 ```
 389 real 11.08
 390 user 11.07
 391 sys 0.00
 392 ```
 393
 394 For this `bc`:
 395
 396 ```
 397 real 0.71
 398 user 0.70
 399 sys 0.00
 400 ```
 401
 402 ### Scripts
 403
 404 The command for the `../timeconst.bc` script was:
 405
 406 ```
 407 time -p [bc] ../timeconst.bc > /dev/null
 408 ```
 409
 410 For GNU `bc`:
 411
 412 ```
 413 real 15.62
 414 user 15.08
 415 sys 0.53
 416 ```
 417
 418 For this `bc`:
 419
 420 ```
 421 real 10.09
 422 user 10.08
 423 sys 0.01
 424 ```
 425
 426 The command for the next script, the `for` loop script, was:
 427
 428 ```
 429 time -p [bc] ../test.bc > /dev/null
 430 ```
 431
 432 For GNU `bc`:
 433
 434 ```
 435 real 14.76
 436 user 14.75
 437 sys 0.00
 438 ```
 439
 440 For this `bc`:
 441
 442 ```
 443 real 17.95
 444 user 17.94
 445 sys 0.00
 446 ```
 447
 448 The command for the next script, the `while` loop script, was:
 449
 450 ```
 451 time -p [bc] ../test2.bc > /dev/null
 452 ```
 453
 454 For GNU `bc`:
 455
 456 ```
 457 real 14.84
 458 user 14.83
 459 sys 0.00
 460 ```
 461
 462 For this `bc`:
 463
 464 ```
 465 real 13.53
 466 user 13.52
 467 sys 0.00
 468 ```
 469
 470 ## Link-Time Optimization Only
 471
 472 Just for kicks, let's see if `-march=native` is even useful.
 473
 474 The optimizations I used for both `bc`'s were `-O3 -flto`.
 475
 476 ### Addition
 477
 478 The command used was:
 479
 480 ```
 481 tests/script.sh bc add.bc 1 0 1 1 [bc]
 482 ```
 483
 484 For GNU `bc`:
 485
 486 ```
 487 real 2.41
 488 user 1.05
 489 sys 1.35
 490 ```
 491
 492 For this `bc`:
 493
 494 ```
 495 real 0.58
 496 user 0.52
 497 sys 0.05
 498 ```
 499
 500 ### Subtraction
 501
 502 The command used was:
 503
 504 ```
 505 tests/script.sh bc subtract.bc 1 0 1 1 [bc]
 506 ```
 507
 508 For GNU `bc`:
 509
 510 ```
 511 real 2.39
 512 user 1.10
 513 sys 1.28
 514 ```
 515
 516 For this `bc`:
 517
 518 ```
 519 real 0.65
 520 user 0.57
 521 sys 0.07
 522 ```
 523
 524 ### Multiplication
 525
 526 The command used was:
 527
 528 ```
 529 tests/script.sh bc multiply.bc 1 0 1 1 [bc]
 530 ```
 531
 532 For GNU `bc`:
 533
 534 ```
 535 real 6.82
 536 user 4.30
 537 sys 2.51
 538 ```
 539
 540 For this `bc`:
 541
 542 ```
 543 real 1.57
 544 user 1.49
 545 sys 0.08
 546 ```
 547
 548 ### Division
 549
 550 The command used was:
 551
 552 ```
 553 tests/script.sh bc divide.bc 1 0 1 1 [bc]
 554 ```
 555
 556 For GNU `bc`:
 557
 558 ```
 559 real 3.25
 560 user 1.81
 561 sys 1.43
 562 ```
 563
 564 For this `bc`:
 565
 566 ```
 567 real 1.27
 568 user 1.23
 569 sys 0.04
 570 ```
 571
 572 ### Power
 573
 574 The command used was:
 575
 576 ```
 577 printf '1234567890^100000; halt\n' | time -p [bc] -q > /dev/null
 578 ```
 579
 580 For GNU `bc`:
 581
 582 ```
 583 real 10.50
 584 user 10.49
 585 sys 0.00
 586 ```
 587
 588 For this `bc`:
 589
 590 ```
 591 real 0.72
 592 user 0.71
 593 sys 0.00
 594 ```
 595
 596 ### Scripts
 597
 598 The command for the `../timeconst.bc` script was:
 599
 600 ```
 601 time -p [bc] ../timeconst.bc > /dev/null
 602 ```
 603
 604 For GNU `bc`:
 605
 606 ```
 607 real 15.50
 608 user 14.81
 609 sys 0.68
 610 ```
 611
 612 For this `bc`:
 613
 614 ```
 615 real 10.17
 616 user 10.15
 617 sys 0.01
 618 ```
 619
 620 The command for the next script, the `for` loop script, was:
 621
 622 ```
 623 time -p [bc] ../test.bc > /dev/null
 624 ```
 625
 626 For GNU `bc`:
 627
 628 ```
 629 real 14.99
 630 user 14.99
 631 sys 0.00
 632 ```
 633
 634 For this `bc`:
 635
 636 ```
 637 real 16.85
 638 user 16.84
 639 sys 0.00
 640 ```
 641
 642 The command for the next script, the `while` loop script, was:
 643
 644 ```
 645 time -p [bc] ../test2.bc > /dev/null
 646 ```
 647
 648 For GNU `bc`:
 649
 650 ```
 651 real 14.92
 652 user 14.91
 653 sys 0.00
 654 ```
 655
 656 For this `bc`:
 657
 658 ```
 659 real 12.75
 660 user 12.75
 661 sys 0.00
 662 ```
 663
 664 It turns out that `-march=native` can be a problem. As such, I have removed the
 665 recommendation to build with `-march=native`.
 666
 667 ## Recommended Compiler
 668
 669 When I ran these benchmarks with my `bc` compiled under `clang` vs. `gcc`, it
 670 performed much better under `clang`. I recommend compiling this `bc` with
 671 `clang`.
 672
 673 [1]: https://github.com/torvalds/linux/blob/master/kernel/time/timeconst.bc