Tableaux '98 benchmarks, logic K

June 1999

These benchmarks consist of nine provable parameterized formulas (ending with ``_p'') and nine unprovable parameterized formulas (ending with ``_n''). For each parameterized formula A(n), the test consists in determining the greatest natural number n <= 21 satisfying the following two conditions:

the prover returns the correct result for the formulas A(1), A(2), ..., A(n) in less than 100 seconds, and
the prover cannot do the formula A(n+1) in less than 100 seconds or n=21.

Even though it has been proved that most of these tests can be easily solved by current solvers, they are still interesting because:

they are not extended cnf formulas, and
some of these tests have not been solved yet.

In the table, we report the CPU time requested by the system to solve the last instance A(n). For TA, we do not take into account the time needed to compute the first order formula A*(n) corresponding to A(n) (which is negligible), but we do take into account the time requested by FLOTTER to convert A*(n) into a set Cl(A*(n)) of clauses (reported in the FLOTTER column), and the time requested by SPASS to determine the consistency or inconsistency of Cl(A*(n)) (reported in the SPASS column). Furthermore, we stopped TA on A(n) with n < 21, either because FLOTTER does not terminate gracefully when computing Cl(A*(n+1)), or because SPASS or FLOTTER exceed the 100 seconds time limit. In the table, these three cases correspond to the rows in which the value for n/SPASS/FLOTTER respectively is underlined.

Test	*SAT 1.2 (-k1 -e -m6)		DLP 3.1		TA 1.4
Test	Size	Time	Size	Time	Size	SPASS	FLOTTER
k_branch_n	12	94.49	12	52.29	6	84.21	12.23
k_branch_p	21	0.21	18	44.56	6	51.95	13.81
k_d4_n	21	2.87	21	3.43	14	1.14	42.92
k_d4_p	21	0.06	21	0.17	15	0.64	70.47
k_dum_n	21	0.12	21	0.13	16	1.75	64.07
k_dum_p	21	0.04	21	0.07	17	3.32	61.67
k_grz_n	21	0.01	21	0.20	21	0.16	0.17
k_grz_p	21	0.04	21	0.11	21	0.35	0.16
k_lin_n	21	47.80	21	0.70	21	16.07	63.94
k_lin_p	21	0.01	21	0.08	21	1.03	8.21
k_path_n	21	0.96	21	1.29	4	58.70	2.14
k_path_p	21	0.72	21	1.19	5	22.85	2.18
k_ph_n	12	0.60	9	40.16	9	45.21	9.92
k_ph_p	8	48.54	6	11.34	6	42.19	0.97
k_poly_n	21	2.25	21	0.65	4	1.23	7.86
k_poly_p	21	1.73	21	0.34	5	2.48	51.00
k_t4p_n	21	1.28	21	0.45	9	3.37	84.35
k_t4p_p	21	0.29	21	0.21	16	3.91	84.75

As can be observed from the table, the three systems are able to solve all the instances of a formula in four cases (the green colored entries in the table). *SAT and DLP are able to solve all the instances except for k_branch_n, k_branch_p, k_ph_n, k_ph_p (the red colored entries in the table). Except for the first of these four parameterized formulas, *SAT is able to solve more instances than DLP. For k_branch_n, both *SAT and DLP are able to solve the 12th instance, with DLP taking less time than *SAT to solve it.