This tutorial introduces the audience to the key theoretical and
systemlevel concepts, and the delicate balance between them, that lie
at the heart of modern discrete samplers and integrators. It also
discusses opportunities and challenges for both theoreticians and
systems researchers to contribute to this area, which has diverse applications.
Discrete sampling and integration are fundamental problems in
Artificial Intelligence. The range of their applications span diverse
areas, including probabilistic inference, statistical machine
learning, network reliability analysis, functional verification,
quantitative information flow, logistics planning and the like. Early
algorithms and tools for these problems either gave strong theoretical
guarantees that performed poorly in practice, or scaled well in
practice with hardly any provable guarantees. More recently,
researchers have invested signficant effort in bridging the gap
between theory and practice in the context of discrete integration and
sampling. This has resulted in the development of algorithms for
sampling and integration that scale to constraints involving hundreds
of thousands of propositional variables, while also providing strong
theoretical guarantees. This has been made possible largely by combining
universal hashing (and its variants) with stateoftheart automated
reasoning tools like SAT/SMT solves.
This tutorial will give the audience an underthehood look at some of
the core concepts that have been key to the success of recent
hashingbased techniques. This includes theoretical underpinnings of
hashingbased sampling and integration, what impacts performance in
practice and why, and how a careful balance of theory and systemlevel
insights can be orchestrated to build stateoftheart discrete
samplers and integrators. The coverage of topics will include both
weighted and unweighted versions of sampling and integration. The
tutorial will also discuss opportunities and challenges for both
theoreticians and system researchers to contribute to scaling of
discrete integration and sampling.
Supratik Chakraborty is currently the Bajaj Chair Professor in the Department of Computer Science and Engineering at Indian Institute of Technology Bombay, India. Chakraborty's current research interests include constrained sampling and counting techniques and their applications, and also formal methods for analyzing hardware, software and biological systems.
Kuldeep Meel is an Assistant Professor of Computer Science in School of Computing at the National University of Singapore. He received his Ph.D. (2017) and M.S. (2014) degree in Computer Science (Artificial Intelligence and Formal Methods) from Rice University. He holds B. Tech. (with Honors) degree (2012) in Computer Science and Engineering from Indian Institute of Technology, Bombay. Meel's research interests lie at the intersection of Artificial Intelligence and Formal Methods. The broader goal of his research is to advance artificial intelligence techniques, which utilize ubiquity of data and formal methods, to enable computing to deal with increasingly uncertain realworld environments.
[12] 
CountingBased Reliability Estimation for PowerTransmission Grids
Proceedings of AAAI Conference on Artificial Intelligence (AAAI), 2017.
Modern society is increasingly reliant on the functionality of infrastructure facilities and utility services. Consequently, there has been surge of interest in the problem of quantification of system reliability, which is known to be #Pcomplete. Reliability also contributes to the resilience of systems, so as to effectively make them bounce back after contingencies. Despite diverse progress, most techniques to estimate system reliability and resilience remain computationally expensive. In this paper, we investigate how recent advances in hashingbased approaches to counting can be exploited to improve computational techniques for system reliability. The primary contribution of this paper is a novel framework, RelNet, that provides provably approximately correct (PAC) estimates for arbitrary networks. We then apply RelNet to ten real world power transmission grids across different cities in the U.S. and are able to obtain, to the best of our knowledge, the first theoretically sound a priori estimates of reliability between several pairs of nodes of interest. Such estimates will help managing uncertainty and support rational decision making for community resilience.

[11] 
Algorithmic Improvements in Approximate Counting for Probabilistic Inference:
From Linear to Logarithmic SAT Calls
Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), 2016.
Probabilistic inference via model counting has emerged as a scalable technique with strong formal guarantees, thanks to recent advances in hashingbased approximate counting. Stateoftheart hashingbased counting algorithms use an {\NP} oracle, such that the number of oracle invocations grows linearly in the number of variables n in the input constraint. We present a new approach to hashingbased approximate model counting in which the number of oracle invocations grows logarithmically in $n$, while still providing strong theoretical guarantees. Our experiments show that the new approach outperforms stateoftheart techniques for approximate counting by 12 orders of magnitude in running time.

[10] 
Approximate Probabilistic Inference via WordLevel Counting
Proceedings of AAAI Conference on Artificial Intelligence (AAAI), 2016.
Hashingbased model counting has emerged as a promising approach for largescale probabilistic inference on graphical models. A key component of these techniques is the use of xorbased 2universal hash functions that operate over Boolean domains. Many counting problems arising in probabilistic inference are, however, naturally encoded over fi nite discrete domains. Techniques based on bitlevel (or Boolean) hash functions require these problems to be propositionalized, making it impossible to leverage the remarkable progress made in SMT (Satisfiability Modulo Theory) solvers that can reason directly over words (or bitvectors). In this work, we present the first approximate model counter that uses wordlevel hashing functions, and can directly leverage the power of sophisticated SMT solvers. Empirical evaluation over an extensive suite of benchmarks demonstrates the promise of the approach.

[9] 
On Computing Minimal Independent Support and Its Applications to Sampling and Counting
Proceedings of International Conference on Constraint Programming (CP), 2015.
Constrained sampling and counting are two fundamental problems arising in domains ranging from artificial intelligence and security, to hardware and software testing. Recent approaches to approximate solutions for these problems rely on employing SAT solvers and universal hash functions that are typically encoded as XOR constraints of length n/2 for an input formula with n variables. As the runtime performance of SAT solvers heavily depends on the length of XOR constraints, recent research effort has been focused on reduction of length of XOR constraints. Consequently, a notion of Independent Support was proposed, and it was shown that constructing XORs over independent support (if known) can lead to a significant reduction in the length of XOR constraints without losing the theoretical guarantees of sampling and counting algorithms. In this paper, we present the first algorithmic procedure (and a corresponding tool, called MIS) to determine minimal independent support for a given CNF formula by employing a reduction to group minimal unsatisfiable subsets (GMUS). By utilizing minimal independent supports computed by MIS, we provide new tighter bounds on the length of XOR constraints for constrained counting and sampling. Furthermore, the universal hash functions constructed from independent supports computed by MIS provide two to three orders of magnitude performance improvement in stateoftheart constrained sampling and counting tools, while still retaining theoretical guarantees.

[8] 
From Weighted to Unweighted Model Counting
Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), 2015.
The recent surge of interest in reasoning about probabilistic graphical models has led to the development of various techniques for probabilistic reasoning. Of these, techniques based on weighted model counting are particularly interesting since they can potentially leverage recent advances in unweighted model counting and in propositional satisfiability solving. In this paper, we present a new approach to weighted model counting via reduction to unweighted model counting. Our reduction, which is polynomialtime and preserves the normal form (CNF/DNF) of the input formula, allows us to exploit advances in unweighted model counting to solve weighted model counting instances. Experiments with weighted model counters built using our reduction indicate that these counters performs much better than a stateoftheart weighted model counter.

[7] 
On Parallel Scalable Uniform SAT Witness Generator
Proceedings of Tools and Algorithms for the Construction and Analysis of Systems (TACAS), 2015.
Constrainedrandom verification (CRV) is widely used in industry for validating hardware designs. The effectiveness of CRV depends on the uniformity of test stimuli generated from a given set of constraints. Most existing techniques sacrifice either uniformity or scalability when generating stimuli. While recent work based on random hash functions has shown that it is possible to generate almost uniform stimuli from constraints with 100,000+ variables, the performance still falls short of today's industrial requirements. In this paper, we focus on pushing the performance frontier of uniform stimulus generation further. We present a random hashingbased, easily parallelizable algorithm, UniGen2, for sampling solutions of propositional constraints. UniGen2 provides strong and relevant theoretical guarantees in the context of CRV, while also offering significantly improved performance compared to existing almostuniform generators. Experiments on a diverse set of benchmarks show that UniGen2 achieves an average speedup of about 20X over a stateoftheart sampling algorithm, even when running on a single core. Moreover, experiments with multiple cores show that UniGen2 achieves a nearlinear speedup in the number of cores, thereby boosting performance even further.

[6] 
DistributionAware Sampling and Weighted Model Counting for SAT
Proceedings of AAAI Conference on Artificial Intelligence (AAAI), 2014.
Given a CNF formula and a weight for each assignment of values to variables, two natural problems are weighted model counting and distributionaware sampling of satisfying assignments. Both problems have a wide variety of important applications. Due to the inherent complexity of the exact versions of the problems, interest has focused on solving them approximately. Prior work in this area scaled only to small problems in practice, or failed to provide strong theoretical guarantees, or employed a computationallyexpensive maximum a posteriori probability (MAP) oracle that assumes prior knowledge of a factored representation of the weight distribution. We present a novel approach that works with a blackbox oracle for weights of assignments and requires only an {\NP}oracle (in practice, a SATsolver) to solve both the counting and sampling problems. Our approach works under mild assumptions on the distribution of weights of satisfying assignments, provides strong theoretical guarantees, and scales to problems involving several thousand variables. We also show that the assumptions can be significantly relaxed while improving computational efficiency if a factored representation of the weights is known.

[5] 
Balancing Scalability and Uniformity in SATWitness Generator
Proceedings of Design Automation Conference (DAC), 2014.
Constrainedrandom simulation is the predominant approach used in the industry for functional verification of complex digital designs. The effectiveness of this approach depends on two key factors: the quality of constraints used to generate test vectors, and the randomness of solutions generated from a given set of constraints. In this paper, we focus on the second problem, and present an algorithm that significantly improves the stateoftheart of (almost)uniform generation of solutions of large Boolean constraints. Our algorithm provides strong theoretical guarantees on the uniformity of generated solutions and scales to problems involving hundreds of thousands of variables.

[4] 
A Scalable Approximate Model Counter
Proceedings of International Conference on Constraint Programming (CP), 2013.
Propositional model counting (#SAT), i.e., counting the number of satisfying assignments of a propositional formula, is a problem of significant theoretical and practical interest. Due to the inherent complexity of the problem, approximate model counting, which counts the number of satisfying assignments to within given tolerance and confi dence level, was proposed as a practical alternative to exact model counting. Yet, approximate model counting has been studied essentially only theoretically. The only reported implementation of approximate model counting, due to Karp and Luby, worked only for DNF formulas. A few existing tools for CNF formulas are bounding model counters; they can handle realistic problem sizes, but fall short of providing counts within given tolerance and confidence, and, thus, are not approximate model counters. We present here a novel algorithm, as well as a reference implementation, that is the first scalable approximate model counter for CNF formulas. The algorithm works by issuing a polynomial number of calls to a SAT solver. Our tool, ApproxMC, scales to formulas with tens of thousands of variables. Careful experimental comparisons show that ApproxMC reports, with high confidence, bounds that are close to the exact count, and also succeeds in reporting bounds with small tolerance and high confidence in cases that are too large for computing exact model counts.

[3] 
A Scalable and Nearly Uniform Generator of SAT Witnesses
Proceedings of International Conference on ComputerAided Verification (CAV), 2013.
Functional verification constitutes one of the most challenging tasks in the development of modern hardware systems, and simulationbased verification techniques dominate the functional verification landscape. A dominant paradigm in simulationbased verification is directed random testing, where a model of the system is simulated with a set of random test stimuli that are uniformly or nearuniformly distributed over the space of all stimuli satisfying a given set of constraints. Uniform or nearuniform generation of solutions for large constraint sets is therefore a problem of theoretical and practical interest. For Boolean constraints, prior work offered heuristic approaches with no guarantee of performance, and theoretical approaches with proven guarantees, but poor performance in practice. We offer here a new approach with theoretical performance guarantees and demonstrate its practical utility on large constraint sets.

[2] 
On Computing Minimal Independent Support and Its Applications to Sampling and Counting
Constraints 21(1), 2016.
Constrained sampling and counting are two fundamental problems arising in domains ranging from artificial intelligence and security, to hardware and software testing. Recent approaches to approximate solutions for these problems rely on employing SAT solvers and universal hash functions that are typically encoded as XOR constraints of length n/2 for an input formula with n variables. As the runtime performance of SAT solvers heavily depends on the length of XOR constraints, recent research effort has been focused on reduction of length of XOR constraints. Consequently, a notion of Independent Support was proposed, and it was shown that constructing XORs over independent support (if known) can lead to a significant reduction in the length of XOR constraints without losing the theoretical guarantees of sampling and counting algorithms. In this paper, we present the first algorithmic procedure (and a corresponding tool, called MIS) to determine minimal independent support for a given CNF formula by employing a reduction to group minimal unsatisfiable subsets (GMUS). By utilizing minimal independent supports computed by MIS, we provide new tighter bounds on the length of XOR constraints for constrained counting and sampling. Furthermore, the universal hash functions constructed from independent supports computed by MIS provide two to three orders of magnitude performance improvement in stateoftheart constrained sampling and counting tools, while still retaining theoretical guarantees.

[1] 
Constrained Sampling and Counting: Universal Hashing meets SAT Solving
Proceedings of Workshop on Beyond NP(BNP), 2016.
Constrained sampling and counting are two fundamental problems in artificial intelligence with a diverse range of applications, spanning probabilistic reasoning and planning to constrainedrandom verification. While the theory of these problems was thoroughly investigated in the 1980s, prior work either did not scale to industrial size instances or gave up correctness guarantees to achieve scalability. Recently, we proposed a novel approach that combines universal hashing and SAT solving and scales to formulas with hundreds of thousands of variables without giving up correctness guarantees. This paper provides an overview of the key ingredients of the approach and discusses challenges that need to be overcome to handle larger realworld instances.
