Tableaux '98 benchmarks, logic K

June 1999


These benchmarks consist of nine provable parameterized formulas (ending with ``_p'') and nine unprovable parameterized formulas (ending with ``_n''). For each parameterized formula A(n), the test consists in determining the greatest natural number n <= 21 satisfying the following two conditions:  
  1. the prover returns the correct result for the formulas A(1), A(2), ..., A(n) in less than 100 seconds, and 
  2. the prover cannot do the formula A(n+1) in less than 100 seconds or n=21. 
Even though it has been proved that most of these tests can be easily solved by current solvers, they are still interesting because:  
  • they are not extended cnf formulas, and 
  • some of these tests have not been solved yet. 
In the table, we report the CPU time requested by the system to solve the last instance A(n). For TA, we do not take into account the time needed to compute the first order formula A*(n) corresponding to A(n) (which is negligible), but we do take into account the time requested by FLOTTER to convert A*(n) into a set Cl(A*(n)) of clauses (reported in the FLOTTER column), and the time requested by SPASS to determine the consistency or inconsistency of Cl(A*(n)) (reported in the SPASS column). Furthermore, we stopped TA on A(n) with n < 21, either because FLOTTER does not terminate gracefully when computing Cl(A*(n+1)), or because SPASS or FLOTTER exceed the 100 seconds time limit. In the table, these three cases correspond to the rows in which the value for n/SPASS/FLOTTER respectively is underlined. 
 
Test  *SAT 1.2 (-k1 -e -m6) DLP 3.1  TA 1.4 
Size  Time  Size  Time  Size  SPASS  FLOTTER 
k_branch_n  12  94.49  12  52.29  84.21  12.23 
k_branch_p  21  0.21  18  44.56  51.95  13.81 
k_d4_n  21  2.87  21  3.43  14  1.14  42.92 
k_d4_p  21  0.06  21  0.17  15  0.64  70.47 
k_dum_n  21  0.12  21  0.13  16  1.75  64.07 
k_dum_p  21  0.04  21  0.07  17  3.32  61.67 
k_grz_n  21  0.01  21  0.20  21  0.16  0.17 
k_grz_p  21  0.04  21  0.11  21  0.35  0.16 
k_lin_n  21  47.80  21  0.70  21  16.07  63.94 
k_lin_p  21  0.01  21  0.08  21  1.03  8.21 
k_path_n  21  0.96  21  1.29  58.70  2.14 
k_path_p  21  0.72  21  1.19  22.85  2.18 
k_ph_n  12  0.60  40.16  45.21  9.92 
k_ph_p  48.54  11.34  42.19  0.97 
k_poly_n  21  2.25  21  0.65  1.23  7.86 
k_poly_p  21  1.73  21  0.34  2.48  51.00 
k_t4p_n  21  1.28  21  0.45  3.37  84.35 
k_t4p_p  21  0.29  21  0.21  16  3.91  84.75 
 
As can be observed from the table, the three systems are able to solve all the instances of a formula in four cases (the green colored entries in the table). *SAT and DLP are able to solve all the instances except for k_branch_n, k_branch_p, k_ph_n, k_ph_p (the red colored entries in the table). Except for the first of these four parameterized formulas, *SAT is able to solve more instances than DLP. For k_branch_n, both *SAT and DLP are able to solve the 12th instance, with DLP taking less time than *SAT to solve it.