CBA (v2.1) (Last Modify June, 25, 2001) is a data mining tool developed at School of Computing, National
University of Singapore. Its main algorithm was presented in
KDD-98. The paper is entitled "Integrating Classification and
Association Rule Mining"
improvements were made from the ideas in our papers presented at
KDD-99 and KDD-00.
CBA originally stands for Classification Based on
Associations. However, it is not only able to produce an accurate
classifier for prediction, but also able to
mine various forms of association rules.
In summary, CBA (v2.1) has the following
1. Classification and prediction using association rules
- Build accurate classifiers from relational data, where
each record is described with a fixed number of attributes. This
type of data is what traditional classification techniques use,
e.g., decision tree, neural networks, and many others.
Better classification accuracy (compared to CBA v1.0, C4.5, RIPPER, Naive Bayes): After testing with 26
datasets used in our KDD-98 paper from UCI repository for Machine
Learning, we achieved 15.2% of average error rate over
these datasets. For these data sets, C4.5 (release 8) obtains the error
rate of 16.7%, while for CBA v1.0, it is 15.8%). Click here to see the
detailed results. You can
also download these datasets from the CBA download section.
2. Mining association rules from relational data or transactional
3. Mining with multiple minimum supports (KDD'99)
analysis (curve) added to the Predicting module
5. Faster mining speed
(compared to CBA v1.0)
6. A HTML viewer to help user in understanding rules
7. Text categorization and classification (single class, at this
Build accurate classifiers from transactional data, where each
data record has a variable number of items, e.g., items bought in
a supermarket by a customer, or the keywords in a text document.
CBA also has many other features, e.g., cross-validation for evaluating
classifiers, and allows the user to view and to query the discovered