Tableaux 2000 benchmarks, logic K

January 2000

These benchmarks are described at the Tableaux 2000 competition homepage. They consist of modal formulas obtained by translating randomly generated quantified boolean formulas. For each test:

C are the of 4-literal clauses in the original QBF
D are the alternations of quantifiers
V is the maximum number of variables under the scope of each quantifier
mod denotes that propositional variables in the test are encoded into modal formulas in such a way that only one propositional variable survives.

Four different versions of *SAT+USB are under evaluation each one with a different window size, i.e., a different amount of tests that can be stored without any previous result being overwritten. In particular, w32 is the minimum size allowed, w1024 is the biggest size that we find convenient and w256, w512 are intermediate values that we use to understand the persistency/search time tradeoff.
For each algorithm we show the geometric mean of the Cpu time taken to solve some samples of each problem. A timelimit of 1200 seconds is enforced. >1200 in the table, means that all the samples exceded the timelimit. Also the geometric mean of the number Consistency checks is shown. For the samples that exceeed the timelimit we take into account the number of checks performed so far.

Test	w32		w256		w512		w1024
Test	Cpu	Cons	Cpu	Cons	Cpu	Cons	Cpu	Cons
C10-V4-D4	89.85	27860	93.96	26663	102.05	26663	120.88	26663
C20-V4-D4	352.54	98572	306.45	78459	321.15	75967	372.04	75081
C30-V4-D4	18.25	4554	17.26	3970	18.08	3953	19.94	3931
C10-V4-D6	1068.51	203512	1084.59	190330	1103.23	181517	1146.67	163568
C20-V4-D6	>1200	177875	>1200	163783	>1200	152006	>1200	133400
C10-V8-D4	>1200	122999	>1200	116173	>1200	109023	>1200	96257
C20-V8-D4	>1200	75479	>1200	70369	>1200	66377	>1200	56227
C30-V8-D4	>1200	62962	>1200	58525	>1200	55121	>1200	47566
C10-V4-D4-mod	323.72	295992	32.63	24461	30.20	20398	36.00	18982
C20-V4-D4-mod	738.35	526019	28.76	13836	28.99	12796	33.28	12321
C30-V4-D4-mod	630.14	598479	57.03	40008	48.49	28928	55.70	25995
C10-V4-D6-mod	>1200	625239	125.30	38858	133.06	36251	146.53	33267
C20-V4-D6-mod	>1200	497615	781.65	613351	787.94	487360	745.32	309982
C10-V8-D4-mod	>1200	1042682	955.22	223980	795.03	157570	725.08	113884
C20-V8-D4-mod	>1200	506046	>1200	240634	>1200	262289	>1200	237562
C30-V8-D4-mod	>1200	539701	>1200	232579	1021.58	183388	1007.34	199595

Notice that for some problems a small cache is sufficient, and further increases in the window size lead to smaller and smaller improvements. On the other hand, the bigger the window size, the smaller the number of consistency checks (this is especially true for mod samples). We think that w512 (the default window size in *SAT) is the best overall (at least looking to these problems) but, in general, the optimal window size is a function of the formula under test.

Back to the state of the art page