Computer Science Division of ECECS
University of Cincinnati
Cincinnati, OH 45221-0030

franco@gauss.cs.uc.edu

[ Propositional Satisfiability Algorithms ][ Groundwater Remediation Software ]

The Satisfiability Problem

This work is carried out in the Logic Complexity Theory Laboratory .

The Satisfiability problem is one of the most important in computer science. It appears in the design and testing of computer circuits, in theorem proving, in artificial intelligence, in determining a safe shutdown path for complicated, state-of-the-art robotic machine tools, and many other domains of computer science, operations research, and logic.

I am interested in propositional Satisfiability (SAT). An instance of SAT is a Boolean expression, typically in Conjunctive Normal Form (CNF). A CNF Boolean expression uses the logical connectives negation (!v), or(+), and and(*) to form a logical expression in terms of given Boolean variables. Variables may take one of two truth values: true or false. In a CNF expression, variables and their negations are called literals. If variable v has value true (false) then !v has value false (true). An or sub-expression, also known as a clause, has value true if and only if at least one of its literals has value true. An and sub-expression, called a conjunction, has value true if and only if all its clauses have value true. An example of a CNF expression is

(v[1] + !v[2]) * (!v[1] + !v[3] + v[4]).

The symbols v[1], v[2], v[3], and v[4] denote Boolean variables. The clause (v[1] + !v[2]) has value true if either variable v[1] has value true or v[2] has value false, otherwise it has value false. The clause (!v[1] + !v[3] + v[4]) has value true if either v[1] has value false or v[3] has value false or v[4] has value true. The conjunction of the two clauses has value true if both clauses have value true. This happens if either v[1] has value true and v[3] has value false; or v[1] has value true and v[4] has value true; or v[1] has value false and v[2] has value false; or v[2] has value false and v[3] has value false; or v[2] has value false and v[4] has value true.

The Satisfiability problem is stated as follows: given a CNF expression I, find a truth assignment to the variables of I which causes all clauses in I to have value true or determine that no such assignment is possible.

A naive method for solving an instance of SAT is to check all possible truth assignments on that instance. If the instance is true for one such assignment then we have a solution; otherwise we have determined that no solution exists. Unfortunately, this method is too slow to be useful. If there are 10 variables, the number of truth assignments to try is 1024; if there are 100 variables, the number of truth assignments is over a million; if there are 1000 variables, the number of truth assignments to try is over a billion. Testing each truth assignment on I can add a factor of 1000 or more to the number of computations needed to get the result. Since one can expect about a million operations per second on currently available computers, solving even moderately large problems is not practical with this method.

Surprisingly, even the best known methods for SAT are not much better than the naive method in the worst case. This is a property shared by each of a class of problems that are in some sense equivalent to SAT: such problems are NP-complete. Examples of NP-complete problems, other than SAT, are the Traveling Salesman Problem, scheduling problems, and circuit layout problems. NP-complete problems are equivalent in the sense that a fast method for solving one such problem may be used to solve all the others efficiently. However, there is no known fast method for any NP-complete problem and it is suspected that none exists. This is potentially a very serious situation since many important and common problems are NP-complete. However, in practice it has been noticed that certain methods for solving certain NP-complete problems usually are fast: that is, they are fast on almost all instances taken as input. But, why is this so? Is this true for SAT? These questions are addressed by several of my papers on SAT.

In 1979, Alan Goldberg, a student of the renowned logician Martin Davis at NYU, made what was regarded as a breakthrough at that time. He proved that the Davis-Putnam procedure, an example of regular resolution which was invented in the early 1960's, solved instances of SAT very quickly on the average. The result received so much attention that, to this day, some people still think that this was the last word on the subject. However, to some people there was something troubling about the result: in order to simplify the analysis, Goldberg had to change the Davis-Putnam procedure somewhat. The procedure he actually analysed was rather "dumb." Why was such a dumb algorithm able to arrive at a solution so quickly most of the time? The answer was provided in a 1983 Discrete Applied Mathematics paper and a 1986 Information Processing Letters paper.

In order to perform an average case analysis, some distribution on inputs must be assumed. The distribution provided by Goldberg seemed like a reasonable one: it did not favor one instance over any other. But, I showed that Goldberg's distribution was extremely dense in admitting satisfying truth assignments. So dense, in fact, that if one created a random truth assignment - WITHOUT EVEN LOOKING AT THE GIVEN INSTANCE - then, with probability nearly 1, that assignment would satisfy that instance. Moreover, I showed that, by changing the distribution slightly so as to reduce the density of satisfying assignments, the version of the Davis-Putnam procedure analysed by Goldberg was incredibly slow on almost every instance of SAT. Thus, the favorable result obtained by Goldberg was not due to any properties of the method he analysed but, rather, was due to (perhaps) unreasonable properties of the inputs he was generating for analysis.

After these results the question changed to whether there exists a fast algorithm, in the probabilistic sense, when inputs are generated according to the modified distribution. This question has been attacked by several researchers and remains only partially solved. In a 1986 paper in SIAM Journal on Computing I and my Ph.D. student showed that a "smarter" resolution procedure is able to solve random instances quickly with high probability when the random instances themselves have a high probability of being satisfiable. Thus, we showed that short "certificates" of satisfiability were usually possible when instances have solutions. This was followed by a 1989 paper in Information Sciences and a 1991 paper in SIAM Journal on Computing which improved the result somewhat. Further improvements, spanning several papers due to Frieze, Suen, Chvatal, Szemeredi, Motwani, Kamath and others appeared later.

But what about high-probability short "certificates" of unsatisfiability? One would hope that resolution, being easy to use and ubiquitous, has the power to succeed. There is no easy way to determine this since an infinitude of implementations of resolution are possible. But, a sequence of papers by Haken, Urquhart, and Chvatal and Szemeredi ending in a 1988 paper by the last two authors in Journal of the Association for Computing Machinery shows that no implementation of resolution has the needed power given the modified input distribution. The search for a good "unsatisfiability verifier" continues today. The challenge is great and has attracted new blood.

The interest in practical SAT algorithms has risen steadily over time. In recent years there have been at least three competitions among SAT solvers: one at the FAW Research Institute in Ulm, Germany, one at the the University of Paderborn in Paderborn Germany, and the third at DIMACS (Discrete Applied Mathematics and Computer Science), one of the world's most important centers for research in the areas overlapping discrete mathematics and computer science.

Groundwater Remediation Software

This work is carried out in the Innovative Software Applications Laboratory .

The problem of cleaning up (alternatively, remediating) a contaminated landfill has become a national priority only recently. This is partly because, until about ten years ago, it was not generally realized that groundwater, the water flowing beneath the surface of the earth in aquifers, can become polluted to the point of being unhealthy. Since about 50% of the drinking water in the United States comes from aquifers, remediation of contaminated sites drained by groundwater has received urgent attention.

When contaminated sites are drained by aquifers, remediation plans must include an effective means to intercept and extract a sufficient volume of contaminant from groundwater systems near the site to prevent unsatisfactory concentrations from accumulating at other sites served by these systems. The kinds of interception equipment available include injection wells, extractions wells, and trenches. Injection wells pump water into the ground and are used to control locally the direction of groundwater and therefore the flow of contaminant. Extraction wells are used to pump contaminated water out of the ground. Trenches provide a passive means for contaminant extraction and may be an attractive alternative to extraction wells in certain cases.

The optimal placement of these remediation objects is a complex function of many variables such as local knowledge, capital and operating costs of the remediation objects, availability of remediation object resources, what are considered safe levels of contaminant concentration, what can or must be done with the recovered contaminated water, and several others. This level of complexity requires some mechanical computational assistance. However, although the flow of contaminant through a groundwater system remediated by a particular extraction system can be simulated by a computer, there is currently no computer aid which finds an optimal or near optimal recovery plan: that is, a plan which recovers sufficient contaminant at least cost.

Some of the features of the program are as follows:

The project is timely: it will provide a level of computer-human interaction not seen before in remediation software for small machines. The interface will make the feedback of information so fast and efficient that near optimal solutions will be possible given sufficient site information. The software will be useful as an educational tool and as a research tool. In particular, it will be useful as a test-bed for optimization ideas and determining how much site information needs to be collected. Eventually, the program will be integrated with the ICASE package under development at the University of Cincinnati's Center Hill facility. This will make the program useful in domains other than groundwater remediation.

This work has established a connection between the Department of Computer Science and the Center Hill facility. All indications are that this tie will strengthen with time. Because the work is heavily graphics-oriented, students working on this project are developing expertise that may be useful for other projects. One possible additional project calls for matching the human-computer interface of different platforms, for example DOS and MacIntosh.