Integers versus identifiers
===========================

Most Grids on the market use an integer scheme for getting and setting
the elements.  Many of these Grids however do not allow columns to be
reordered dynamically by the view, and it is this reordering of
columns that brings up the debate about how they should be referenced.

There are two main choices. The first obvious one is to use an
identifier rather than an index for each column. That way when the
columns are reordered in the UI the identifier moves with the column
and the user of the JTable can continue to reference the columns
without needing to worry about what has moved where.

This solves the problem for columns but not for rows. If the rows are
sorted into a different order by an intermediary we have the same
problem when making queries about the JTable's state. The index of the
selected row, for example, will not be obviously related to the index
of the row in the (unsorted) model.

Using identifiers for rows
==========================

Again, we could choose to go with identifiers but here the solution is
more involved.  It is very important that the design of the JTable is
not overly swayed in the direction of databases and it has been our
goal from the start to ensure that the JTable is flexible enough to
cover both the "database view" and "address book" ends of the spectrum
of situations where the it might be used. At the same time,
database usage is likely to be a very common case for the JTable and
a row identification strategy that did not map well to the
relational world would not make a good solution.

Following the relational lead, the row identifier might just be one of
the columns in the table.  This, the "primary key" of a database
representation, sometimes needs to be composite before is unique so we
would probably want a list of columns which constituted a "unique
primary key".  This would work perfectly as a row identifier except
that not all database tables have primary keys and even when they do
there is no constraint that they be unique.  Although editing a table
without a unique key is a worrying idea, this does not seem to be a
good enough reason to refuse to render it. Having a different API for
writable tables is out but other variations are possible and we could
easily have required all models to use identifiers for the rows 
in the model interface and left each model implementation to make 
its own choice.

Returning once again to a minimal case of a simple String[][]
structure for an address book with three rows and ten columns (we
don't get out much) this row identifying API seems to add considerable
complexity to what is a simple and well defined task.  Again, default
mapping methods could be included in an abstract model implementation
but this hiding of access fundamentals lead to confusion in the previous 
releases.

An all integer approach
=======================

Another approach is to abandon identifiers altogether and use integers
everywhere with the understanding that there are a number of different
co-ordinate systems at work.  Integers are fine for references into a
two dimensional structure as the table is only ever dependent on one
property of the identifiers: their uniqueness.  The model's
co-ordinate system however is clearly different to the co-ordinate
system in the view as columns may have been reordered or hidden. If we
distinguish these two co-ordinate systems we then have a general
solution for columns and the chaining technique above. Each element in
a chain has it's own co-ordinate system and any reference to cells,
rows or columns must be made with a clear understanding of which
co-ordinate system is being used.

Advantages of the identifier approach
-------------------------------------

The advantage of using identifiers for columns is mainly seen in the
case where a model specifically uses some specific object to produce
the data values for each column.  One example is a Vector of similarly
typed objects and a model that exposes their values using reflection:

// Pseudo-code
getValueAt(int row, Object column) {
	Method method = (Method)column;
	Object row = elementAt(aRow);
	return method.invoke(row, NO_ARGS);
}

In the integer based scheme there is the added annoyance of having to
maintain an array of Methods (or whatever the identifiers are) and
indirect through it at each lookup:

// Pseudo-code
getValueAt(int row, int column) {
	Method method = methods[column];
	Object row = elementAt(aRow);
	return method.invoke(row, NO_ARGS);
}

More generally, an all integer model interface forces the implementor
to write a "cover method" for any model which is fundamentally
identifier based.  Suppose the identifier based model has a method
getValueAt() which cannot be changed or circumvented:

// Can't change this. 
getValueAt(int row, Object column) { 
	...
}

The cover method we will need to implement the integer based table
model interface might look like this:

getValueAt(int row, int column) { 
	return getValueAt(row, ids[column]); 
}

We are also left with the task of maintaining the array of ids when
previously this was handled for the model by the JTable.

Disadvantages of the identifier approach
----------------------------------------

In the opposite case, where the model is fundamentally integer based
in both row and column we need to map the identifier to an integer
in an identifier based interface.

Suppose the method we need to call is:

getValueAt(int row, int column) { 
	// This is the way the model is, can't be changed. 
}

Here are some solutions to the question of how to map from the
identifier based scheme to the integer based one:

// Use Integer objects as identifiers in this case. 
getValueAt(int row, Object columnId) { 
	getValueAt(row, ((Integer)columnId).intValue()); 
}

// Use a hashtable to get the Integer value. 
getValueAt(int row, Object columnId) { 
	Integer columnInteger = identifierToIntegerTable.get(columnId); 
	getValueAt(row, ((Integer)columnInteger).intValue()); 
}

// Do a linear search to find the element in an array of ints. 
getValueAt(int row, Object columnId) { 
	for(int column = 0; column < getColumnCount(); column++) { 
		if (columnIdentifiers[column] == columnId) 
			return getValueAt(row, column); 
	}
}

Performance considerations
--------------------------

Since this interface will be shared by the sorting algorithm we have
some extra constraints.  We might be writing a map to a Vector of
ResultSet objects provided by JDBC. In this case it is possible to
have, say, 20,000 rows of data to sort. If the sorting algorithm is
NLog(N) that might require, say, 15 compares per column and the sort
might require aggregate key of, on average say, two columns. In
an optimal implementation, this would require more than half a million
calls to getValueAt().  So performance of the access methods is now
far more important than it was when the interface was just drawing the
exposed cells in a view.

On performance grounds the linear search is probably out and the
hashtable lookup is a potential bottleneck for some uses. The first
solution is fast enough though and, depending on style, could be made
robust with a check for "instanceof". The upshot though is that many
people would be tempted to use integers in this large generic case and
at the point integers are used the convenience of being able to adjust
the JTable's column attributes by name would have been lost.

These are useful examples as most people's initial reaction to the
idea of using integers rather than identifiers is that it is simply
"narrowing" the domain of the abstraction. This is probably a
misunderstanding, there are some implementation details that are
changed and in some cases it is clearly less convenient to use
integers to identify the columns; but the changes that need to be made
to workaround either convention are fairly limited in scope which ever
way round things are.

Identifiers for Columns, integers for rows
==========================================

Advantages
----------

In the most common case, when a fairly simple application needs to
display a simple table (with no sorting etc.) there is no need to
distinguish co-ordinate systems of the JTable and the model.

It is clear in the JTable API that all references to column indices 
are in the coordinate space of the JTable, column identifiers are of a
different type and therefore easily distinguished.

Forcing the implementor to think about what it is that uniquely 
identifies a column up front makes writing the the serialization 
code for the column model easier. Serializing the column model 
for the purpose of preserving the customizations a user makes to 
the width and order of the columns is more difficult if a model 
(eg. database table) changes in between application invocations 
and all the application has to go on is a set of integer 
references to where the columns used to be.  

Most developers see identifiers as a more meaningful way to refer 
to columns. 

Disadvantages
-------------

In JTable, the methods:

    public void addColumnSelectionInterval(int index0, int index1) 
    public void removeColumnSelectionInterval(int index0, int index1)
    public int getSelectedColumn() 
    public int[] getSelectedColumns() 
    public boolean isColumnSelected(int column) 

which cover for methods in the ColumnModel (and are analogous to their
row selection equivalents) are not meaningful in the referencing
scheme of the model. To find out which columns have been selected the
model must refer to the JTable to map the indices to the column
identifiers by which the model would be defined. If identifiers were
to be used, and used consistently, it would be better to redefine all
of these methods in terms of column identifiers. 

Since these are cover methods the same changes would probably have to
happen to the TableColumnModel and it would then have to fully cover the
ListSelectionModel it contained since this is also fundamentally based
on a stable contiguous range of integers. 

Previously it was tempting to think of the column identifiers
as names and to deal with the special case of repeated names by
expecting the table model to implement a class that was, on the one
hand as convenient to use as the name, yet possessed some other
quality for uniqueness. This seems to be a difficult idea to implement
in practice and our many attempts to recommend good generic solutions
all had serious downsides.

The most natural way to implement a (int row, Object id) interface is
to use a Vector of HashTables where each HashTable represents a row in
the Table. This is a bad choice both in terms of performance and
memory usage. It is still possible to implement good TableModels with
this scheme but the general problems and performance constraints for
large data sets mentioned above would have to be dealt with and solved
evey time a new TableModel was implemented.

Some symmetry and consistency would be lost in the way that the table
API handles row and column selection. There would now be two concepts
at work in any application that used model chaining: identifiers for
columns and different co-ordinate systems for rows.

The TableModel interface (which primarily describes data) would be
designed for the current UI features in the JTable, in particular its
ability to reorder columns. Whether or not our JTable ever supports
row headers itself, developers have already started to subclass the
JTable to provide these features. It would be a shame if this task were
made more difficult by an API that had made special arrangements for
identifying columns if there were no simple extensions that would 
handle rows.

The cover methods that allowed its integer based primitives to be used
with identifiers were numerous but not complete. Adding the full set
of methods to isolate the developer from ever needing to use integer
methods for columns would add a lot more complexity to an already
large API.  Providing another set of cover methods for row identifiers
would bloat the API even further without providing any new functionality.

Conclusion
==========

We have decided to go with all integer solution, mostly because the
identifier solution is incomplete without an analogous procedure for
identifying rows. In places this will add complexity by introducing
the idea of multiple co-ordinate systems where previously there was
none. The advantage is that it removes many of the difficult design
decisions from the model implementor and therefore makes it much more
compelling to write significant parts of an application in the well
defined space of the model. This leaves these implementations
significantly less dependent on the specifics of the JTable API and
allows much of this work to be done without constant requirements for
enhancements to Swing.

Because, in all cases, the table's columns are unstable the identifier
mechanism will be left in the surface level JTable API (but not in the
table model). That way TableColumns in the JTable can be manipulated by
identifier provided identifiers have been submitted with each column.
In addition, these identifiers default to the name of the column
so that column attributes can easily be modified by name. When these
defaulted names are not unique it is not be possible to reliably
access the column attributes of any duplicate elements using their
name.  So, for example, widening or moving a column programmatically
by name will have undefined behavior when many columns have this
name. The option still exist to set alternative identifiers 
for these columns though and the table will always render correctly 
as identifiers are not used for data access into the model.