Tableaux 2000 benchmarks, logic K

January 2000


These benchmarks are described at the Tableaux 2000 competition homepage. They consist of modal formulas obtained by translating randomly generated quantified boolean formulas. For each test:
  • C are the of 4-literal clauses in the original QBF
  • D are the alternations of quantifiers
  • V is the maximum number of variables under the scope of each quantifier
  • mod denotes that propositional variables in the test are encoded into modal formulas in such a way that only one propositional variable survives.
Four different versions of *SAT+USB are under evaluation each one with a different window size, i.e., a different amount of tests that can be stored without any previous result being overwritten. In particular, w32 is the minimum size allowed, w1024 is the biggest size that we find convenient and w256, w512 are intermediate values that we use to understand the persistency/search time tradeoff.
For each algorithm we show the geometric mean of the Cpu time taken to solve some samples of each problem. A timelimit of 1200 seconds is enforced. >1200 in the table, means that all the samples exceded the timelimit. Also the geometric mean of the number Consistency checks is shown. For the samples that exceeed the timelimit we take into account the number of checks performed so far.

Test  w32 w256 w512 w1024
Cpu  Cons  Cpu  Cons  Cpu  Cons  Cpu  Cons 
C10-V4-D4 89.85 27860 93.96 26663 102.05 26663 120.88 26663
C20-V4-D4 352.54 98572 306.45 78459 321.15 75967 372.04 75081
C30-V4-D4 18.25 4554 17.26 3970 18.08 3953 19.94 3931
C10-V4-D6 1068.51 203512 1084.59 190330 1103.23 181517 1146.67 163568
C20-V4-D6 >1200 177875 >1200 163783 >1200 152006 >1200 133400
C10-V8-D4 >1200 122999 >1200 116173 >1200 109023 >1200 96257
C20-V8-D4 >1200 75479 >1200 70369 >1200 66377 >1200 56227
C30-V8-D4 >1200 62962 >1200 58525 >1200 55121 >1200 47566
C10-V4-D4-mod 323.72 295992 32.63 24461 30.20 20398 36.00 18982
C20-V4-D4-mod 738.35 526019 28.76 13836 28.99 12796 33.28 12321
C30-V4-D4-mod 630.14 598479 57.03 40008 48.49 28928 55.70 25995
C10-V4-D6-mod >1200 625239 125.30 38858 133.06 36251 146.53 33267
C20-V4-D6-mod >1200 497615 781.65 613351 787.94 487360 745.32 309982
C10-V8-D4-mod >1200 1042682 955.22 223980 795.03 157570 725.08 113884
C20-V8-D4-mod >1200 506046 >1200 240634 >1200 262289 >1200 237562
C30-V8-D4-mod >1200 539701 >1200 232579 1021.58 183388 1007.34 199595

Notice that for some problems a small cache is sufficient, and further increases in the window size lead to smaller and smaller improvements. On the other hand, the bigger the window size, the smaller the number of consistency checks (this is especially true for mod samples). We think that w512 (the default window size in *SAT) is the best overall (at least looking to these problems) but, in general, the optimal window size is a function of the formula under test.



Back to the state of the art page