Skip to content

Releases: utahplt/gtp-benchmarks

v9.3

03 May 19:20
Compare
Choose a tag to compare

In morsecode change an import type in main.rkt to match the export type (from Index to Integer). No change to performance.

gtp-plot
0 = old, 1 = new

morsecode-pr51.tar.gz

v9.2 minor take5 changes

22 May 21:01
Compare
Choose a tag to compare
  1. Replace the module+ main with a plain expression. Having the submodule is a problem for tools like the contract profiler (a minor problem, but it's easier to drop the submodule).
  2. Add an assert around the call to random because its type no longer guarantees nonnegative numbers. (The old type was unsound but fine to use here.)

Performance is the same afterward:
new

v9.0

01 Dec 16:06
Compare
Choose a tag to compare

Substantially revise acquire and take5. Before, acquire ran a game with AI players that all raised exceptions and take5 ignored an input list of AI players. After, the acquire players make valid moves and take5 uses its input. These changes do not affect the typed/untyped overhead.

Thank you @LLazarek . #38 #39

gtp-plot
gtp-plot

acquire_8.1.tar.gz
take5_8.2.tar.gz

v8.0

22 Oct 01:03
Compare
Choose a tag to compare

Remove racket/sandbox dependency from acquire and remove the player AI that times out.

Performance is similar before and after in a first test.

But in general, this change should make acquire measurements more stable. We care about the cost of types, not of system calls.

master-vs-nosandbox

Data:
acquire-sandbox.tar.gz

v7.0

10 Jul 21:32
Compare
Choose a tag to compare

Fix a return value in lnm. Before it was a port. After it's a void.

Affects the module benchmarks/lnm/untyped/modulegraph.rkt and function ensure-tikz

There is no change in performance
lnm-6 0-7 0

Original issue report: https://github.com/bennn/gtp-benchmarks/issues/25

v6.0

20 Jul 17:32
Compare
Choose a tag to compare

Major Changes Edited all benchmarks so that typed and untyped code are very similar.

If you compare any two typed/A.rkt and untyped/A.rkt files, the only differences should be the requires and the type annotations.

Example: gregor

In at least one place, the untyped gregor code had an extra assert. It's gone now.

diff --git a/benchmarks/gregor/untyped/date.rkt b/benchmarks/gregor/untyped/date.rkt
index a3102a9..6ceccb7 100644
--- a/benchmarks/gregor/untyped/date.rkt
+++ b/benchmarks/gregor/untyped/date.rkt
@@ -63,7 +64,6 @@
 (define date->ymd Date-ymd)
 ;(: date->jdn (-> Any Integer))
 (define (date->jdn d)
-  (unless (Date? d) (error "date->jdn type error"))
   (Date-jdn d))

Example: lnm

Typed lnm now uses asserts instead of casts to validate input data. Untyped lnm uses the same casts.

diff --git a/benchmarks/lnm/typed/spreadsheet.rkt b/benchmarks/lnm/typed/spreadsheet.rkt
index dd2dcbc..b869929 100644
--- a/benchmarks/lnm/typed/spreadsheet.rkt
+++ b/benchmarks/lnm/typed/spreadsheet.rkt
@@ -62,7 +62,7 @@
   (void)
   ;; For each row, print the config ID and all the values
   (for ([(row n) (in-indexed vec)])
-    (void (natural->bitstring (cast n Index) #:pad (log2 num-configs)))
+    (void (natural->bitstring (assert n index?) #:pad (log2 num-configs)))
     (for ([v row]) (void "~a~a" sep v))
     (void)))
 
@@ -71,8 +71,18 @@
 (define (rktd->spreadsheet input-filename
                              #:output [output #f]
                              #:format [format 'tab])
-  (define vec (cast (file->value input-filename) (Vectorof (Listof Index))))
+  (define vec
+    (for/vector : (Vectorof (Listof Index)) ((x (in-vector (assert (file->value input-filename) vector?))))
+      (listof-index x)))
   (define suffix (symbol->extension format))
   (define out (or output (path-replace-suffix input-filename suffix)))
   (define sep (symbol->separator format))
   (vector->spreadsheet vec out sep))
+
+(: listof-index (-> Any (Listof Index)))
+(define (listof-index x)
+  (if (and (list? x)
+           (andmap index? x))
+    x
+    (error 'listof-index)))
diff --git a/benchmarks/lnm/untyped/spreadsheet.rkt b/benchmarks/lnm/untyped/spreadsheet.rkt
index 18be330..6466fb0 100644
--- a/benchmarks/lnm/untyped/spreadsheet.rkt
+++ b/benchmarks/lnm/untyped/spreadsheet.rkt
@@ -14,6 +14,7 @@
 ;; ----------------------------------------------------------------------------
 
 (require
+  "../base/untyped.rkt"
   (only-in racket/file file->value)
   (only-in "bitstring.rkt" log2 natural->bitstring)
 )
@@ -55,7 +56,7 @@
   (void)
   ;; For each row, print the config ID and all the values
   (for ([(row n) (in-indexed vec)])
-    (void (natural->bitstring n #:pad (log2 num-configs)))
+    (void (natural->bitstring (assert n index?) #:pad (log2 num-configs)))
     (for ([v row]) (void "~a~a" sep v))
     (void)))
 
@@ -64,8 +65,16 @@
 (define (rktd->spreadsheet input-filename
                              #:output [output #f]
                              #:format [format 'tab])
-  (define vec (file->value input-filename))
+  (define vec
+    (for/vector ((x (in-vector (assert (file->value input-filename) vector?))))
+      (listof-index x)))
   (define suffix (symbol->extension format))
   (define out (or output (path-replace-suffix input-filename suffix)))
   (define sep (symbol->separator format))
   (vector->spreadsheet vec out sep))
+
+(define (listof-index x)
+  (if (and (list? x)
+           (andmap index? x))
+    x
+    (error 'listof-index)))

results (on Racket 7.7 BC release)

For most benchmarks, performance is the same before & after. But:

  • lnm has lower overhead
  • quadT has higher overhead
  • quadU has higher overhead

lnm

lnm typed code is much faster now (down from ~4.5s to 0.7s) because it uses assert instead of cast. The vector casts in spreadsheet.rkt and summary.rkt cost a little --- putting them back adds 1.5s and 0.5s, respectively. But the big savings comes from replacing (cast .... Index) with (assert .... index?) in bitstring.rkt --- reverting adds almost 2.5s.

quadT
quadU

Both the untyped and fully-typed quad configurations run faster now, which likely makes the mixed configs. look worse. One reason for the change is that quad? is a simple function instead of a define-predicate ... but things are harder to tease apart. (There are few changes to the main files, so things must be happening related to the base/ context, and that's hard to swap out & test.)

Full data & plots here:
gtp-benchmarks-v5-vs-v6.tar.gz

Raw gtp-measure output:
manifest-v6.tar.gz

v5.0

20 Nov 00:01
Compare
Choose a tag to compare

Fix one bug in lnm and one bug in zordoz.

lnm

The typed lnm code performs an extra cast to satisfy the type checker, BUT the code doing the cast had a use-before-definition bug. That bug is fixed, and now the typed & untyped code compute the same plots.

Pull request, with more details on the issue:
https://github.com/bennn/gtp-benchmarks/pull/19

This change improves performance a little. I guess plot throwing & handling and exception is more expensive than computing the next point to draw.

lnm-4-vs-5

zordoz

The typed zordoz contained an unused call to format. This call is gone now, so (hopefully) the typed & untyped benchmarks are now running the same code.

Pull request:
https://github.com/bennn/gtp-benchmarks/pull/20

Unfortunately this change has BIG implications for performance. That format call must have been executed often and suffered from runtime checks / wrappers.

  • old typed/untyped ratio = 10.91x
  • new typed/untyped ratio = 1.36x

The new zordoz now has worst-case <4x overhead. Before, things went up to 14x. Many thanks to @camoy for finding this small-looking error that introduced large overhead in typed code.

zordoz-4-vs-5

data for plots

zordoz-lnm-v5.tar.gz

Thank you Cameron Moy

v4.0

03 Nov 01:33
Compare
Choose a tag to compare

Replace a cast in the typed version of zombie with a predicate test.

The untyped code now uses the same predicate.

zombie is now a better gradual typing benchmark because less of its typed/untyped performance changes are explained by a call to cast.

EDIT: here's some data collected with Racket 7.4

  • old typed/untyped ratio = 4.37x
  • new typed/untyped ratio = 1.83x

plot of old (zombie-3) vs new (zombie-4) showing that the new version has MORE configurations that suffer LESS overhead
z3

full data behind the plot:
zombie-v4.tar.gz

Thank you Sam Tobin-Hochstadt and Cameron Moy

v3.0

17 Oct 15:52
Compare
Choose a tag to compare

Fix an issue with the untyped zordoz code.

Before, two untyped modules imported from a typed library. After, the untyped code imports the untyped library.

This change removes an unnecessary boundary, making the untyped code a more realistic baseline for measuring Typed Racket's overhead.

The following plot compares the overhead in zordoz for version 2 (zordoz-v2) and version 3 (zordoz-v3) of the GTP benchmarks. Version 3 is significantly worse:

zordoz-2-vs-3

Full results:
zordoz-gtp-2-vs-3.tar.gz

Thank you Cameron Moy

v2.0

21 May 01:07
Compare
Choose a tag to compare

Fix a difference between the typed and untyped mbta code. Both are the same now.

The fix does not appear to affect performance.

Attached data:

  • mbta2-vs-orig.tar.gz : output from a gtp-measure run comparing 0-mbta (after the change) to 1-mbtaorig (before). Also a tab-separated-file with 95% confidence intervals for each configuration

mbta2-vs-orig.tar.gz

  • picture of overhead before (0-mbta) and after (1-mbtaorig)

mbta2-vs-orig

Thank you Robby Findler and Sam Sundar