## Publications

Except [c30], [c29], [c28], [c27], [c26], [c23],[c12], [c11], [c7] and [w1], the names of authors are sorted alphabetically and the order does not reflect contribution. Such practices are very common in theoretical areas of computer science. Here is the 2004 statement by AMS (American Mathematical Society) that explains the prevalance of this culture in mathematical areas.

### Tutorials

 2018 Scaling Discrete Integration and Sampling: Foundations and Challenges Co-presented with Supratik Chakraborty International Joint Conference on Artificial Intelligence (IJCAI 2018) 2017 Discrete Sampling and Integration for the AI Practitioner    Co-presented with Supratik Chakraborty and Moshe Y. Vardi AAAI Conference on Artificial Intelligence (AAAI 2017) 2016 Discrete Sampling and Integration in High Dimensional Spaces       Co-Presented with Supratik Chakraborty and Moshe Y. Vardi Conference on Uncertainity in Artificial Intelligence (UAI 2016)

### Theses

 2017 PhD Thesis, Rice University, September 2017 Winner of 2018 Ralph Budd Award for Best Engineering Thesis Honorable mention, 2018 ACP Doctoral Dissertation Award 2014 Masters Thesis, Rice University, April 2014 Winner of 2014 VCLA (Vienna Center of Logic and Algorithms) Outstanding Masters Thesis Announcement

### Conference Papers (Refereed and Archived)

 2019 [c30] A Weighted Model Counting Approach for Critical Infrastructure Reliability Roger Paredes, Leonardo Duenas-Osorio, Kuldeep S. Meel, and Moshe Y. Vardi Proceedings of International Conference on Applications of Statistics and Probability in Civil Engineering, (ICASP13), 2019. Reliability assessment of engineered systems such as telecommunication networks, power grids, and railroads is an important step towars supporting resilient communities. However, calculating the reliability of a network is computationally intensive. Thus, simulation methods are often preferred over exact methods in practice. Unfortunately, highly reliable and large scale systems can challenge common assumptions in simulation techniques, rendering reliability estimates as well as reported error and confidence unreliable themselves. A new generation of techniques, termed probably approximately correct (PAC) methods, delivers provable network reliability calculations with user-specified error and confidence. In this paper we focus on RelNet, a model counting-based method for network reliability estimation endowed with rigorous PAC guarantees. Despite previous success in power transmission network applications, small edge failure probabilities and dependent failures can challenge the current methodology. We put forward Weighted RelNet, a general importance sampling-based extension that treats the system?s joint probability distribution as a black box. Empirical evaluations suggest the new approach is competitive across challenging rare-event benchmarks. [c29] WAPS: Weighted and Projected Sampling Rahul Gupta, Shubham Sharma, Subhajit Roy, and Kuldeep S. Meel Proceedings of Tools and Algorithms for the Construction and Analysis of Systems (TACAS), 2019. Previous work on applying Knowledge compilation has focused on uniform sampling over all the variables. Since the constraints are written in high level languages such as Verilog, the popular CNF encoding schemes as Tseitin encoding introduces additional auxiliary variables. The resulting CNF formulas are not equivalent but equisatisfiable. In particular, for a formula $G$ specified in high level language we obtain a CNF formula F such that $G (X) = \exists Y F(X,Y)$. This makes one wonder if it is possible to extend Knowledge compilation based techniques to sample over a subset of variables. Furthermore, languages such as Verilog allow specification of weights to user-defined constraints, so there is a need to sample according to the posterior distribution. In this paper, we provide affirmative question to the above two questions: We propose KUS that samples over user defined subset of variables from posterior distribution for a given prior distribution defined over product spaces. [c28] BOSPHORUS: Bridging ANF and CNF Solvers Davin Choo, Mate Soos, Kian Ming A. Chai, and Kuldeep S. Meel Proceedings of Design, Automation, and Test in Europe(DATE), 2019. Algebraic Normal Form (ANF) and Conjunctive Normal Form (CNF) are commonly used to encode problems in Boolean algebra. ANFs are typically solved via Gr"obner basis algorithms, often using more memory than is feasible; while CNFs are solved using SAT solvers, which cannot exploit the algebra of polynomials naturally. We propose a paradigm that bridges between ANF and CNF solving techniques: the techniques are applied in an iterative manner to emph{learn facts} to augment the original problems. Experiments on over 1,100 benchmarks arising from four different applications domains demonstrate that learnt facts can significantly improve runtime and enable more benchmarks to be solved. [c27] BIRD: Engineering an Efficient CNF-XOR SAT Solver and its Applications to Approximate Model Counting Mate Soos and Kuldeep S. Meel Proceedings of AAAI Conference on Artificial Intelligence (AAAI), 2019. Given a Boolean formula F, the problem of model counting, also referred to as \#SAT is to compute the number of solutions of F. Model counting is a fundamental problem in artificial intelligence with a wide range of applications including probabilistic reasoning, decision making under uncertainty, quantified information flow, and the like. Motivated by the success of SAT solvers, there has been surge of interest in the design of hashing-based techniques for approximate model counting for the past decade. We profiled the state of the art approximate model counter ApproxMC3 and observed that over 99.99\% of time is consumed by the underlying SAT solver, CryptoMinisat. This observation motivated us to ask: Can we design an efficient underlying CNF-XOR SAT solver that can take advantage of the structure of hashing-based algorithms and would this lead to an efficient approximate model counter? The primary contribution of this paper is an affirmative answer to the above question. We present a novel architecture, called BIRD, to handle CNF-XOR formulas arising from hashing-based techniques. The resulting hashing-based approximate model counter, called ApproxMC3, employs the BIRD framework in its underlying SAT solver, CryptoMinisat. To the best of our knowledge, we conducted the most comprehensive study of evaluation performance of counting algorithms involving 1896 benchmarks with computational effort totaling 86400 computational hours. Our experimental evaluation demonstrates significant runtime performance improvement for ApproxMC3 over ApproMC2. In particular, we solve 648 benchmarks more than ApproMC2, the state of the art approximate model counter and for all the formulas where both ApproMC2 and ApproxMC3 did not timeout and took more than 1 seconds, the mean speedup is 284.40 -- more than two orders of magnitude. [c26] On testing of Uniform Samplers Sourav Chakraborty and Kuldeep S. Meel Proceedings of AAAI Conference on Artificial Intelligence (AAAI), 2019. Recent years have seen an unprecedented adoption of artificial intelligence in wide variety of applications ranging from medical diagnosis, automobile, security to aircraft collision avoidance. Probabilistic reasoning is a key component of such modern artificial intelligence systems. Sampling techniques form the core of the state of the art probabilistic reasoning systems. In contrast to testing for deterministic programs, where one trace is sufficient to prove the existence of a bug; such is not the case for samplers as one sample is typically not sufficient to prove non-conformity of the sampler to the desired distribution. This makes one wonder: whether it is possible to design testing methodology to test whether a sampler under test generates samples close to a given distribution. The primary contribution of this paper is a positive answer to the above question: We design, to the best of our knowledge, the first algorithmic framework, Barbarik, to test whether the distribution generated by $\varepsilon-$close or $\eta-$far from the uniform distribution. In contrast to the sampling techniques that require an exponential or sub-exponential number of samples for sampler whose support can be represented by $n$ bits, Barbarik requires only $\mathcal{O}(1/(\eta-\varepsilon)^2)$ samples. We present a prototype implementation of Barbarik and use it to test three state of the art uniform samplers over the support defined by combinatorial constraints. Barbarik is able to provide a certificate of uniformity to one sampler and demonstrate non-uniformity for the other two samplers. [c25] On the Hardness of Probabilistic Inference Relaxations Supratik Chakraborty, Kuldeep S. Meel, and Moshe Y. Vardi Proceedings of AAAI Conference on Artificial Intelligence (AAAI), 2019. Probabilistic inference is increasingly being used in applications that compute with uncertain data. A promising approach to inference that has attracted recent attention exploits its inter-reducibility with model counting. Since probabilistic inference and model counting are #P-complete, various relaxations are used in practice, with the hope that these relaxations lend themselves to efficient computation while also providing rigorous approximation guarantees. In this paper, we show that contrary to commonly held belief, several relaxations used in the probabilistic inference literature do not really lead to computational efficiency in a complexity theoretic sense. Our arguments proceed by showing the corresponding relaxed notions of counting to be computationally hard. We argue that approximate counting (and hence, inference) with multiplicative tolerance and probabilistic guarantees of correctness is the only class of relaxations that provably simplifies the problem, given access to an NP-oracle. Finally, we show that for applications that compare probability estimates with a threshold, a new notion of relaxation with gaps between low and high thresholds can be used. This new relaxation allows efficient decision making in practice, given access to an NP-oracle, while also bounding the approximation error. [c24] IMLI: An Incremental Framework for MaxSAT-Based Learning of Interpretable Classification Rules Bishwamittra Ghosh and KUldeep S. Meel Proceedings of AAAI/ACM Conference on AI, Ethics, and Society(AIES), 2019. The wide adoption of machine learning in the critical domains such as medical diagnosis, law, education had propelled the need for interpretable techniques due to the need for end user to understand the reasoning behind decisions due to learning systems. The computational intractability of interpretable learning led practitioners to design heuristic techniques, which fail to provide sound handles to tradeoff accuracy and interpretability. Motivated by the success of Max-SAT solvers over the past decade, recently MaxSAT-based approach, called MLIC, was proposed that seeks to reduce the problem of learning interpretable rules expressed in Conjunctive Normal Form (CNF) to a MaxSAT query. While MLIC was shown to achieve accuracy similar to that of other state of the art black-box classifiers while generating small interpretable CNF formulas, the runtime performance of MLIC is significantly lagging and renders approach unusable in practice. In this context, authors raised the question: Is it possible to achieve the best of both worlds, i.e. a sound framework for interpretable learning that can take advantage of MaxSAT solvers while scaling to real-world instances? In this paper, we take a step towards answering the above question in affirmation. We propose an incremental approach to Max-SAT based framework that achieves scalable runtime performance via partition-based training methodology. Extensive experiments on benchmarks arising from UCI repository demonstrate that IMLI achieves up to three orders of magnitude runtime improvement without loss of accuracy and interpretability. 2018 [c23] Knowledge Compilation meets Uniform Sampling Shubham Sharma, Rahul Gupta, Subhajit Roy, and Kuldeep S. Meel Proceedings of International Conference on Logic for Programming Artificial Intelligence and Reasoning (LPAR), 2018. Uniform sampling has drawn diverse applications in programming languages and software engineering, like in constrained-random verification (CRV), constrained-fuzzing and bug synthesis. The effectiveness of these applications depend on the uniformity of test stimuli generated from a given set of constraints. Despite significant progress over the past few years, the performance of the state of the art techniques still falls short of those of heuristic methods employed in the industry which sacrifice either uniformity or scalability when generating stimuli. In this paper, we propose a new approach to the uniform generation that builds on recent progress in knowledge compilation. The primary contribution of this paper is marrying knowledge compilation with uniform sampling: our algorithm, KUS, employs the state-of-the-art knowledge compilers to first compile constraints into d-DNNF form, and then, generates samples by making two passes over the compiled representation. We show that KUS is able to significantly outperform existing state-of-the-art algorithms, SPUR and UniGen2, by up to 3 orders of magnitude in terms of runtime while achieving a geometric speedup of 1.7 and 8.3 over SPUR and UniGen2 respectively. Also, KUS achieves a lower PAR-2 score, around 0.82x that of SPUR and 0.38x that of UniGen2. Furthermore, KUS achieves speedups of up to 3 orders of magnitude for incremental sampling. The distribution generated by KUS is statistically indistinguishable from that generated by an ideal uniform sampler. Moreover, KUS is almost oblivious to the number of samples requested. [c22] MLIC: A MaxSAT-Based framework for learning interpretable classification rules Dmitry Malioutov and Kuldeep S. Meel Proceedings of International Conference on Constraint Programming (CP), 2018. The wide adoption of machine learning approaches in the industry, government, medicine and science has renewed the interest in interpretable machine learning: many decisions are too important to be delegated to black-box techniques such as deep neural networks or kernel SVMs. Historically, problems of learning interpretable classifiers, including classification rules or decision trees, have been approached by greedy heuristic methods as essentially all the exact optimization formulations are NP-hard. Our primary contribution is a MaxSAT-based framework, called MLIC, which allows principled search for interpretable classification rules expressible in propositional logic. Our approach benefits from the revolutionary advances in the constraint satisfaction community to solve large-scale instances of such problems. In experimental evaluations over a collection of benchmarks arising from practical scenarios, we demonstrate its effectiveness: we show that the formulation can solve large classification problems with tens or hundreds of thousands of examples and thousands of features, and to provide a tunable balance of accuracy vs. interpretability. Furthermore, we show that in many problems interpretability can be obtained at only a minor cost in accuracy. The primary objective of the paper is to show that recent advances in the MaxSAT literature make it realistic to find optimal (or very high quality near-optimal) solutions to large-scale classification problems. The key goal of the paper is to excite researchers in both interpretable classification and in the CP community to take it further and propose richer formulations, and to develop bespoke solvers attuned to the problem of interpretable ML. [c21] Not All FPRASs are Equal: Demystifying FPRASs for DNF-Counting Kuldeep S. Meel, Aditya A. Shrotri, and Moshe Y. Vardi Proceedings of International Conference on Constraint Programming (CP), 2018. Invited to Constraints journal The problem of counting the number of solutions of a DNF formula, also called #DNF, is a fundamental problem in artificial intelligence with applications in diverse domains ranging from network reliability to probabilistic databases. Owing to the intractability of the exact variant, efforts have focused on the design of approximate techniques for #DNF. Consequently, several Fully Polynomial Randomized Approximation Schemes (FPRASs) based on Monte Carlo techniques have been proposed. Recently, it was discovered that hashing-based techniques too lend themselves to FPRASs for #DNF. Despite significant improvements, the complexity of the hashing-based FPRAS is still worse than that of the best Monte Carlo FPRAS by polylog factors. Two questions were left unanswered in previous works: Can the complexity of the hashing-based techniques be improved? How do the various approaches stack up against each other empirically? In this paper, we first propose a new search procedure for the hashing-based FPRAS that removes the polylog factors from its time complexity. We then present the first empirical study of runtime behavior of different FPRASs for #DNF. The result of our study produces a nuanced picture. First of all, we observe that there is no single best algorithm that outperforms all others for all classes of formulas and input parameters. Second, we observe that the algorithm with the worst time complexity, solves the largest number of benchmarks. [c20] Scalable Approximation of Quantitative Information Flow in Programs Fabrizio Biondi, Michael Enescu, Annelie Heuser, Axel Legay, Kuldeep S. Meel, and Jean Quilbeuf Proceedings of International Conference on Verification, Model Checking, and Abstract Interpretation (VMCAI), 2018. Quantitative information flow measurement techniques have been proven to be successful in detecting leakage of confidential information from programs. Modern approaches are based on formal methods, relying on program analysis to produce a SAT formula representing the program's behavior, and model counting to measure the possible information flow. However, while program analysis scales to large codebases like the OpenSSL project, the formulas produced are too complex for analysis with precise model counting. In this paper we use the approximate model counter ApproxMC2 to quantify information flow. We show that ApproxMC2 is able to provide a large performance increase for a very small loss of precision, allowing the analysis of SAT formulas produced from complex code. We call the resulting technique ApproxFlow and test it on a large set of benchmarks against the state of the art. Finally, we show that ApproxFlow can evaluate the leakage incurred by the Heartbleed OpenSSL bug, contrarily to the state of the art. 2017 [c19] On Hashing-Based Approaches to Approximate DNF-Counting Kuldeep S. Meel, Aditya A. Shrotri, and Moshe Y. Vardi Proceedings of IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS), 2017. Propositional model counting is a fundamental problem in artificial intelligence with a wide variety of applications, such as probabilistic inference, decision making under uncertainty, and probabilistic databases. Consequently, the problem is of theoretical as well as practical interest. When the constraints are expressed as DNF formulas, Monte Carlo-based techniques have been shown to provide a fully polynomial randomized approximation scheme (FPRAS). For CNF constraints, hashing-based approximation techniques have been demonstrated to be highly successful. Furthermore, it was shown that hashing-based techniques also yield an FPRAS for DNF counting without usage of Monte Carlo sampling. Our analysis, however, shows that the proposed hashing-based approach to DNF counting provides poor time complexity compared to the Monte Carlo-based DNF counting techniques. Given the success of hashing-based techniques for CNF constraints, it is natural to ask: Can hashing-based techniques provide an efficient FPRAS for DNF counting? In this paper, we provide a positive answer to this question. To this end, we introduce two novel algorithmic techniques: Symbolic Hashing and Stochastic Cell Counting, along with a new hash family of Row-Echelon hash functions. These innovations allow us to design a hashing-based FPRAS for DNF counting of similar complexity as that of prior works. Furthermore, we expect these techniques to have potential applications beyond DNF counting. [c18] The Hard Problems Are Almost Everywhere For Random CNF-XOR Formulas Jeffrey Dudek, Kuldeep S. Meel, and Moshe Y. Vardi Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), 2017. Recent universal-hashing based approaches to sampling and counting crucially depend on the runtime performance of \SAT~solvers on formulas expressed as the conjunction of both CNF constraints and variable-width XOR constraintsints (known as CNF-XOR formulas). In this paper, we present the first study of the runtime behavior of \SAT~solvers equipped with XOR-reasoning techniques on random CNF-XOR formulas. We empirically demonstrate that a state-of-the-art \SAT~solver scales exponentially on random CNF-XOR formulas across a wide range of XOR-clause densities, peaking around the empirical phase-transition location. On the theoretical front, we prove that the solution space of a random CNF-XOR formula shatters' at \emph{all} nonzero XOR-clause densities into well-separated components, similar to the behavior seen in random CNF formulas known to be difficult for many \SAT-solving algorithms. [c17] Counting-Based Reliability Estimation for Power-Transmission Grids Leonardo Duenas-Osorio, Kuldeep S. Meel, Roger Paredes, and Moshe Y. Vardi Proceedings of AAAI Conference on Artificial Intelligence (AAAI), 2017. Modern society is increasingly reliant on the functionality of infrastructure facilities and utility services. Consequently, there has been surge of interest in the problem of quantification of system reliability, which is known to be #P-complete. Reliability also contributes to the resilience of systems, so as to effectively make them bounce back after contingencies. Despite diverse progress, most techniques to estimate system reliability and resilience remain computationally expensive. In this paper, we investigate how recent advances in hashing-based approaches to counting can be exploited to improve computational techniques for system reliability. The primary contribution of this paper is a novel framework, RelNet, that provides provably approximately correct (PAC) estimates for arbitrary networks. We then apply RelNet to ten real world power transmission grids across different cities in the U.S. and are able to obtain, to the best of our knowledge, the first theoretically sound a priori estimates of reliability between several pairs of nodes of interest. Such estimates will help managing uncertainty and support rational decision making for community resilience. 2016 [c16] Algorithmic Improvements in Approximate Counting for Probabilistic Inference: From Linear to Logarithmic SAT Calls Supratik Chakraborty, Kuldeep S. Meel, and Moshe Y. Vardi Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), 2016. Probabilistic inference via model counting has emerged as a scalable technique with strong formal guarantees, thanks to recent advances in hashing-based approximate counting. State-of-the-art hashing-based counting algorithms use an {\NP} oracle, such that the number of oracle invocations grows linearly in the number of variables n in the input constraint. We present a new approach to hashing-based approximate model counting in which the number of oracle invocations grows logarithmically in $n$, while still providing strong theoretical guarantees. Our experiments show that the new approach outperforms state-of-the-art techniques for approximate counting by 1-2 orders of magnitude in running time. [c15] Combining the k-CNF and XOR Phase-Transitions Jeffrey Dudek, Kuldeep S. Meel, and Moshe Y. Vardi Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), 2016. The runtime performance of modern SAT solvers on random k-CNF formulas is deeply connected with the phase-transition' phenomenon seen empirically in the satisfiability of random k-CNF formulas. Recent universal hashing-based approaches to sampling and counting crucially depend on the runtime performance of SAT solvers on formulas expressed as the conjunction of both k-CNF and XOR constraints (known as k-CNF-XOR formulas), but the behavior of random k-CNF-XOR formulas is unexplored in prior work. In this paper, we present the first study of the satisfiability of random k-CNF-XOR formulas. We show empirical evidence of a surprising phase-transition that follows a linear trade-off between k-CNF and XOR constraints. Furthermore, we prove that a phase-transition for k-CNF-XOR formulas exists for k = 2 and (when the number of k-CNF constraints is small) for k > 2. [c14] Approximate Probabilistic Inference via Word-Level Counting Supratik Chakraborty, Kuldeep S. Meel, Rakesh Mistry, and Moshe Y. Vardi Proceedings of AAAI Conference on Artificial Intelligence (AAAI), 2016. Hashing-based model counting has emerged as a promising approach for large-scale probabilistic inference on graphical models. A key component of these techniques is the use of xor-based 2-universal hash functions that operate over Boolean domains. Many counting problems arising in probabilistic inference are, however, naturally encoded over fi- nite discrete domains. Techniques based on bit-level (or Boolean) hash functions require these problems to be propositionalized, making it impossible to leverage the remarkable progress made in SMT (Satisfiability Modulo Theory) solvers that can reason directly over words (or bit-vectors). In this work, we present the first approximate model counter that uses word-level hashing functions, and can directly leverage the power of sophisticated SMT solvers. Empirical evaluation over an extensive suite of benchmarks demonstrates the promise of the approach. [c13] Automatic Data Layout Generation and Kernel Mapping for CPU+GPU Architectures Deepak Majeti, Kuldeep S. Meel, Raj Barik, and Vivek Sarkar Proceedings of International Conference on Compiler Construction (CC), 2016. The ubiquity of hybrid CPU+GPU architectures has led to renewed interest in automatic data layout generation owing to the fact that data layouts have a large impact on performance, and that different data layouts yield the best performance on CPUs vs. GPUs. Unfortunately, current programming models still fail to provide an effective solution to the problem of automatic data layout generation for CPU+GPU processors. Specifically, the interaction among whole-program data layout optimizations, data movement optimizations, and mapping of kernels across heterogeneous cores poses a major challenge to current programming systems. In this paper, we introduce a novel two-level hierarchical formulation of the data layout and kernel mapping problem for modern heterogeneous architectures. The top level formulation targets data layouts and kernel mapping for the entire program for which we provide a polynomial- time solution using a graph-based shortest path algorithm that uses the data layouts for the code regions (sections) for a given processor computed in the bottom level formulation. The bottom level formulation deals with the data layout problem for a parallel code region on a given processor, which is NP-Hard, and we provide a greedy algorithm that uses an affinity graph to obtain approximate solutions. We have implemented this data layout transformation in the new Heterogeneous Habanero-C (H2C) parallel programming framework and propose performance models to characterize the data layout impact on both the CPU and GPU. Our data layout framework shows significant performance improvements of up to 2.9 (geometric mean 1.5) on a multicore CPU+GPU compared to the manually specified layouts for a set of parallel programs running on a heterogeneous platform consisting of an Intel Xeon CPU and a NVIDIA GPU. Further, our framework also shows performance improvements of up to 2.7 (geometric mean 1.6) on just the multicore CPU, demonstrating the applicability of our approach to both heterogeneous and homogeneous hardware platforms. [c12] Design and Verification of Distributed Phasers Karthik, Murthy, Sri Raj, Paul, Kuldeep S., Meel, Tiago Cogumbreiro, and John Mellor-Crummey Proceedings of International European Conference on Parallel and Distributed Computing (Euro-Par), 2016. A phaser is an expressive synchronization construct that unifies collective and point-to-point coordination with dynamic registration of parallel tasks. Each task can participate in a phaser as a signaler, a waiter, or both. The participants in a phaser may change over time as tasks are added and deleted. In this paper, we present a highly concurrent and scalable design of phasers for a distributed memory environment. Our design for a distributed phaser employs a pair of skip lists augmented with the ability to collect and propagate synchronization signals. To enable a high degree of concurrency, addition and deletion of participant tasks are performed in two phases: a "fast single-link-modify" step followed by multiple hand-over-hand "lazy multi-link-modify" steps. Verifying highly-concurrent protocols is difficult. We analyze our design for a distributed phaser using the SPIN model checker. A straight-forward approach to model checking a distributed phaser operation requires an infeasibly large state space. To address this issue, we employ a novel "message-based" model checking scheme to enable a non- approximate complete model checking of our phaser design. We guarantee the semantic properties of phaser operations by ensuring that a set of linear temporal logic formulae are valid during model checking. We also present complexity analysis of the cost of synchronization and structural operations. 2015 [c11] On Computing Minimal Independent Support and Its Applications to Sampling and Counting Alexander Ivrii, Sharad Malik, Kuldeep S. Meel, and Moshe Y. Vardi Proceedings of International Conference on Constraint Programming (CP), 2015. Best Student Paper Award Constrained sampling and counting are two fundamental problems arising in domains ranging from artificial intelligence and security, to hardware and software testing. Recent approaches to approximate solutions for these problems rely on employing SAT solvers and universal hash functions that are typically encoded as XOR constraints of length n/2 for an input formula with n variables. As the runtime performance of SAT solvers heavily depends on the length of XOR constraints, recent research effort has been focused on reduction of length of XOR constraints. Consequently, a notion of Independent Support was proposed, and it was shown that constructing XORs over independent support (if known) can lead to a significant reduction in the length of XOR constraints without losing the theoretical guarantees of sampling and counting algorithms. In this paper, we present the first algorithmic procedure (and a corresponding tool, called MIS) to determine minimal independent support for a given CNF formula by employing a reduction to group minimal unsatisfiable subsets (GMUS). By utilizing minimal independent supports computed by MIS, we provide new tighter bounds on the length of XOR constraints for constrained counting and sampling. Furthermore, the universal hash functions constructed from independent supports computed by MIS provide two to three orders of magnitude performance improvement in state-of-the-art constrained sampling and counting tools, while still retaining theoretical guarantees. [c10] From Weighted to Unweighted Model Counting Supratik Chakraborty, Dror Fried, Kuldeep S. Meel, and Moshe Y. Vardi Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), 2015. The recent surge of interest in reasoning about probabilistic graphical models has led to the development of various techniques for probabilistic reasoning. Of these, techniques based on weighted model counting are particularly interesting since they can potentially leverage recent advances in unweighted model counting and in propositional satisfiability solving. In this paper, we present a new approach to weighted model counting via reduction to unweighted model counting. Our reduction, which is polynomial-time and preserves the normal form (CNF/DNF) of the input formula, allows us to exploit advances in unweighted model counting to solve weighted model counting instances. Experiments with weighted model counters built using our reduction indicate that these counters performs much better than a state-of-the-art weighted model counter. [c9] On Parallel Scalable Uniform SAT Witness Generator Supratik Chakraborty, Daniel J. Fremont, Kuldeep S. Meel, Sanjit A. Seshia, and Moshe Y. Vardi Proceedings of Tools and Algorithms for the Construction and Analysis of Systems (TACAS), 2015. Constrained-random verification (CRV) is widely used in industry for validating hardware designs. The effectiveness of CRV depends on the uniformity of test stimuli generated from a given set of constraints. Most existing techniques sacrifice either uniformity or scalability when generating stimuli. While recent work based on random hash functions has shown that it is possible to generate almost uniform stimuli from constraints with 100,000+ variables, the performance still falls short of today's industrial requirements. In this paper, we focus on pushing the performance frontier of uniform stimulus generation further. We present a random hashing-based, easily parallelizable algorithm, UniGen2, for sampling solutions of propositional constraints. UniGen2 provides strong and relevant theoretical guarantees in the context of CRV, while also offering significantly improved performance compared to existing almost-uniform generators. Experiments on a diverse set of benchmarks show that UniGen2 achieves an average speedup of about 20X over a state-of-the-art sampling algorithm, even when running on a single core. Moreover, experiments with multiple cores show that UniGen2 achieves a near-linear speedup in the number of cores, thereby boosting performance even further. 2014 [c8] ADHA: Automatic Data layout framework for Heterogeneous Architectures Deepak Majeti, Kuldeep S. Meel, Raj Barik, and Vivek Sarkar Proceedings of Parallel Architecture and Compilation Techniques (PACT), 2014. Data layouts play a crucial role in determining the performance of a given application running on a given architecture. Existing parallel programming frameworks for both multicore and heterogeneous systems leave the onus of selecting a data layout to the programmer. Therefore, shifting the burden of data layout selection to optimizing compilers can greatly enhance programmer productivity and application performance. In this work, we introduce ADHA: a two-level hierarchal formulation of the data layout problem for modern heterogeneous architectures. We have created a reference implementation of ADHA in the Heterogeneous Habanero-C (H2C) parallel programming system. ADHA shows significant performance benefits of up to 6.92X compared to manually specified layouts for two benchmark programs running on a CPU+GPU heterogeneous platform. [c7] Distribution-Aware Sampling and Weighted Model Counting for SAT Supratik Chakraborty, Daniel J. Fremont, Kuldeep S. Meel, Sanjit A. Seshia, and Moshe Y. Vardi Proceedings of AAAI Conference on Artificial Intelligence (AAAI), 2014. Given a CNF formula and a weight for each assignment of values to variables, two natural problems are weighted model counting and distribution-aware sampling of satisfying assignments. Both problems have a wide variety of important applications. Due to the inherent complexity of the exact versions of the problems, interest has focused on solving them approximately. Prior work in this area scaled only to small problems in practice, or failed to provide strong theoretical guarantees, or employed a computationally-expensive maximum a posteriori probability (MAP) oracle that assumes prior knowledge of a factored representation of the weight distribution. We present a novel approach that works with a black-box oracle for weights of assignments and requires only an {\NP}-oracle (in practice, a SAT-solver) to solve both the counting and sampling problems. Our approach works under mild assumptions on the distribution of weights of satisfying assignments, provides strong theoretical guarantees, and scales to problems involving several thousand variables. We also show that the assumptions can be significantly relaxed while improving computational efficiency if a factored representation of the weights is known. [c6] Balancing Scalability and Uniformity in SAT-Witness Generator Supratik Chakraborty, Kuldeep S. Meel, and Moshe Y. Vardi Proceedings of Design Automation Conference (DAC), 2014. Constrained-random simulation is the predominant approach used in the industry for functional verification of complex digital designs. The effectiveness of this approach depends on two key factors: the quality of constraints used to generate test vectors, and the randomness of solutions generated from a given set of constraints. In this paper, we focus on the second problem, and present an algorithm that significantly improves the state-of-the-art of (almost-)uniform generation of solutions of large Boolean constraints. Our algorithm provides strong theoretical guarantees on the uniformity of generated solutions and scales to problems involving hundreds of thousands of variables. 2013 [c5] A Scalable Approximate Model Counter Supratik Chakraborty, Kuldeep S. Meel, and Moshe Y. Vardi Proceedings of International Conference on Constraint Programming (CP), 2013. Propositional model counting (#SAT), i.e., counting the number of satisfying assignments of a propositional formula, is a problem of significant theoretical and practical interest. Due to the inherent complexity of the problem, approximate model counting, which counts the number of satisfying assignments to within given tolerance and confi- dence level, was proposed as a practical alternative to exact model counting. Yet, approximate model counting has been studied essentially only theoretically. The only reported implementation of approximate model counting, due to Karp and Luby, worked only for DNF formulas. A few existing tools for CNF formulas are bounding model counters; they can handle realistic problem sizes, but fall short of providing counts within given tolerance and confidence, and, thus, are not approximate model counters. We present here a novel algorithm, as well as a reference implementation, that is the first scalable approximate model counter for CNF formulas. The algorithm works by issuing a polynomial number of calls to a SAT solver. Our tool, ApproxMC, scales to formulas with tens of thousands of variables. Careful experimental comparisons show that ApproxMC reports, with high confidence, bounds that are close to the exact count, and also succeeds in reporting bounds with small tolerance and high confidence in cases that are too large for computing exact model counts. [c4] A Scalable and Nearly Uniform Generator of SAT Witnesses Supratik Chakraborty, Kuldeep S. Meel, and Moshe Y. Vardi Proceedings of International Conference on Computer-Aided Verification (CAV), 2013. Functional verification constitutes one of the most challenging tasks in the development of modern hardware systems, and simulation-based verification techniques dominate the functional verification landscape. A dominant paradigm in simulation-based verification is directed random testing, where a model of the system is simulated with a set of random test stimuli that are uniformly or near-uniformly distributed over the space of all stimuli satisfying a given set of constraints. Uniform or near-uniform generation of solutions for large constraint sets is therefore a problem of theoretical and practical interest. For Boolean constraints, prior work offered heuristic approaches with no guarantee of performance, and theoretical approaches with proven guarantees, but poor performance in practice. We offer here a new approach with theoretical performance guarantees and demonstrate its practical utility on large constraint sets.

### Journal Papers (Refereed and Archived)

 2018 [j3] Network Reliability Estimation in Theory and Practice Roger Paredes, Leonardo Duenas-Osorio, Kuldeep S. Meel, and Moshe Y. Vardi Submitted to Reliability Engineering and System Safety, 2018. As engineered systems expand, become more interdependent, and operate in real-time, reliability assessment is indispensable to support investment and decision making. However, network reliability problems are known to be #P-complete, a computational complexity class largely believed to be intractable. The computational intractability of network reliability motivates our quest for reliable approximations. Based on their theoretical foundations, available methods can be grouped as follows: (i) exact or bounds, (ii) guarantee-less sampling, and (iii) probably approximately correct (PAC). Group (i) is well regarded due to its useful byproducts, but it does not scale in practice. Group (ii) scales well and verifies desirable properties, such as the bounded relative error, but it lacks error guarantees. Group (iii) is of great interest when precision and scalability are required, as it harbors computationally feasible approximation schemes with PAC-guarantees. We give a comprehensive review of classical methods before introducing modern techniques and our developments. We introduce K-RelNet, an extended counting-based estimation method that delivers PAC-guarantees for the K-terminal reliability problem. Then, we test methods' performance using various benchmark systems. We highlight the range of application of algorithms and provide the foundation for future resilience engineering as it increasingly necessitates methods for uncertainty quantification in complex systems. 2016 [j2] On Computing Minimal Independent Support and Its Applications to Sampling and Counting Alexander Ivrii, Sharad Malik, Kuldeep S. Meel, and Moshe Y. Vardi Constraints 21(1), 2016. Constrained sampling and counting are two fundamental problems arising in domains ranging from artificial intelligence and security, to hardware and software testing. Recent approaches to approximate solutions for these problems rely on employing SAT solvers and universal hash functions that are typically encoded as XOR constraints of length n/2 for an input formula with n variables. As the runtime performance of SAT solvers heavily depends on the length of XOR constraints, recent research effort has been focused on reduction of length of XOR constraints. Consequently, a notion of Independent Support was proposed, and it was shown that constructing XORs over independent support (if known) can lead to a significant reduction in the length of XOR constraints without losing the theoretical guarantees of sampling and counting algorithms. In this paper, we present the first algorithmic procedure (and a corresponding tool, called MIS) to determine minimal independent support for a given CNF formula by employing a reduction to group minimal unsatisfiable subsets (GMUS). By utilizing minimal independent supports computed by MIS, we provide new tighter bounds on the length of XOR constraints for constrained counting and sampling. Furthermore, the universal hash functions constructed from independent supports computed by MIS provide two to three orders of magnitude performance improvement in state-of-the-art constrained sampling and counting tools, while still retaining theoretical guarantees.

### Workshop Papers (Refereed and Archieved)

 2016 [w1] Constrained Sampling and Counting: Universal Hashing meets SAT Solving Kuldeep S. Meel, Moshe Y. Vardi, Supratik Chakraborty, Daniel J. Fremont, Sanjit A. Seshia, Dror Fried, Alexander Ivrii, and Sharad Malik Proceedings of Workshop on Beyond NP(BNP), 2016. Constrained sampling and counting are two fundamental problems in artificial intelligence with a diverse range of applications, spanning probabilistic reasoning and planning to constrained-random verification. While the theory of these problems was thoroughly investigated in the 1980s, prior work either did not scale to industrial size instances or gave up correctness guarantees to achieve scalability. Recently, we proposed a novel approach that combines universal hashing and SAT solving and scales to formulas with hundreds of thousands of variables without giving up correctness guarantees. This paper provides an overview of the key ingredients of the approach and discusses challenges that need to be overcome to handle larger real-world instances.