Integers versus identifiers =========================== Most Grids on the market use an integer scheme for getting and setting the elements. Many of these Grids however do not allow columns to be reordered dynamically by the view, and it is this reordering of columns that brings up the debate about how they should be referenced. There are two main choices. The first obvious one is to use an identifier rather than an index for each column. That way when the columns are reordered in the UI the identifier moves with the column and the user of the JTable can continue to reference the columns without needing to worry about what has moved where. This solves the problem for columns but not for rows. If the rows are sorted into a different order by an intermediary we have the same problem when making queries about the JTable's state. The index of the selected row, for example, will not be obviously related to the index of the row in the (unsorted) model. Using identifiers for rows ========================== Again, we could choose to go with identifiers but here the solution is more involved. It is very important that the design of the JTable is not overly swayed in the direction of databases and it has been our goal from the start to ensure that the JTable is flexible enough to cover both the "database view" and "address book" ends of the spectrum of situations where the it might be used. At the same time, database usage is likely to be a very common case for the JTable and a row identification strategy that did not map well to the relational world would not make a good solution. Following the relational lead, the row identifier might just be one of the columns in the table. This, the "primary key" of a database representation, sometimes needs to be composite before is unique so we would probably want a list of columns which constituted a "unique primary key". This would work perfectly as a row identifier except that not all database tables have primary keys and even when they do there is no constraint that they be unique. Although editing a table without a unique key is a worrying idea, this does not seem to be a good enough reason to refuse to render it. Having a different API for writable tables is out but other variations are possible and we could easily have required all models to use identifiers for the rows in the model interface and left each model implementation to make its own choice. Returning once again to a minimal case of a simple String[][] structure for an address book with three rows and ten columns (we don't get out much) this row identifying API seems to add considerable complexity to what is a simple and well defined task. Again, default mapping methods could be included in an abstract model implementation but this hiding of access fundamentals lead to confusion in the previous releases. An all integer approach ======================= Another approach is to abandon identifiers altogether and use integers everywhere with the understanding that there are a number of different co-ordinate systems at work. Integers are fine for references into a two dimensional structure as the table is only ever dependent on one property of the identifiers: their uniqueness. The model's co-ordinate system however is clearly different to the co-ordinate system in the view as columns may have been reordered or hidden. If we distinguish these two co-ordinate systems we then have a general solution for columns and the chaining technique above. Each element in a chain has it's own co-ordinate system and any reference to cells, rows or columns must be made with a clear understanding of which co-ordinate system is being used. Advantages of the identifier approach ------------------------------------- The advantage of using identifiers for columns is mainly seen in the case where a model specifically uses some specific object to produce the data values for each column. One example is a Vector of similarly typed objects and a model that exposes their values using reflection: // Pseudo-code getValueAt(int row, Object column) { Method method = (Method)column; Object row = elementAt(aRow); return method.invoke(row, NO_ARGS); } In the integer based scheme there is the added annoyance of having to maintain an array of Methods (or whatever the identifiers are) and indirect through it at each lookup: // Pseudo-code getValueAt(int row, int column) { Method method = methods[column]; Object row = elementAt(aRow); return method.invoke(row, NO_ARGS); } More generally, an all integer model interface forces the implementor to write a "cover method" for any model which is fundamentally identifier based. Suppose the identifier based model has a method getValueAt() which cannot be changed or circumvented: // Can't change this. getValueAt(int row, Object column) { ... } The cover method we will need to implement the integer based table model interface might look like this: getValueAt(int row, int column) { return getValueAt(row, ids[column]); } We are also left with the task of maintaining the array of ids when previously this was handled for the model by the JTable. Disadvantages of the identifier approach ---------------------------------------- In the opposite case, where the model is fundamentally integer based in both row and column we need to map the identifier to an integer in an identifier based interface. Suppose the method we need to call is: getValueAt(int row, int column) { // This is the way the model is, can't be changed. } Here are some solutions to the question of how to map from the identifier based scheme to the integer based one: // Use Integer objects as identifiers in this case. getValueAt(int row, Object columnId) { getValueAt(row, ((Integer)columnId).intValue()); } // Use a hashtable to get the Integer value. getValueAt(int row, Object columnId) { Integer columnInteger = identifierToIntegerTable.get(columnId); getValueAt(row, ((Integer)columnInteger).intValue()); } // Do a linear search to find the element in an array of ints. getValueAt(int row, Object columnId) { for(int column = 0; column < getColumnCount(); column++) { if (columnIdentifiers[column] == columnId) return getValueAt(row, column); } } Performance considerations -------------------------- Since this interface will be shared by the sorting algorithm we have some extra constraints. We might be writing a map to a Vector of ResultSet objects provided by JDBC. In this case it is possible to have, say, 20,000 rows of data to sort. If the sorting algorithm is NLog(N) that might require, say, 15 compares per column and the sort might require aggregate key of, on average say, two columns. In an optimal implementation, this would require more than half a million calls to getValueAt(). So performance of the access methods is now far more important than it was when the interface was just drawing the exposed cells in a view. On performance grounds the linear search is probably out and the hashtable lookup is a potential bottleneck for some uses. The first solution is fast enough though and, depending on style, could be made robust with a check for "instanceof". The upshot though is that many people would be tempted to use integers in this large generic case and at the point integers are used the convenience of being able to adjust the JTable's column attributes by name would have been lost. These are useful examples as most people's initial reaction to the idea of using integers rather than identifiers is that it is simply "narrowing" the domain of the abstraction. This is probably a misunderstanding, there are some implementation details that are changed and in some cases it is clearly less convenient to use integers to identify the columns; but the changes that need to be made to workaround either convention are fairly limited in scope which ever way round things are. Identifiers for Columns, integers for rows ========================================== Advantages ---------- In the most common case, when a fairly simple application needs to display a simple table (with no sorting etc.) there is no need to distinguish co-ordinate systems of the JTable and the model. It is clear in the JTable API that all references to column indices are in the coordinate space of the JTable, column identifiers are of a different type and therefore easily distinguished. Forcing the implementor to think about what it is that uniquely identifies a column up front makes writing the the serialization code for the column model easier. Serializing the column model for the purpose of preserving the customizations a user makes to the width and order of the columns is more difficult if a model (eg. database table) changes in between application invocations and all the application has to go on is a set of integer references to where the columns used to be. Most developers see identifiers as a more meaningful way to refer to columns. Disadvantages ------------- In JTable, the methods: public void addColumnSelectionInterval(int index0, int index1) public void removeColumnSelectionInterval(int index0, int index1) public int getSelectedColumn() public int[] getSelectedColumns() public boolean isColumnSelected(int column) which cover for methods in the ColumnModel (and are analogous to their row selection equivalents) are not meaningful in the referencing scheme of the model. To find out which columns have been selected the model must refer to the JTable to map the indices to the column identifiers by which the model would be defined. If identifiers were to be used, and used consistently, it would be better to redefine all of these methods in terms of column identifiers. Since these are cover methods the same changes would probably have to happen to the TableColumnModel and it would then have to fully cover the ListSelectionModel it contained since this is also fundamentally based on a stable contiguous range of integers. Previously it was tempting to think of the column identifiers as names and to deal with the special case of repeated names by expecting the table model to implement a class that was, on the one hand as convenient to use as the name, yet possessed some other quality for uniqueness. This seems to be a difficult idea to implement in practice and our many attempts to recommend good generic solutions all had serious downsides. The most natural way to implement a (int row, Object id) interface is to use a Vector of HashTables where each HashTable represents a row in the Table. This is a bad choice both in terms of performance and memory usage. It is still possible to implement good TableModels with this scheme but the general problems and performance constraints for large data sets mentioned above would have to be dealt with and solved evey time a new TableModel was implemented. Some symmetry and consistency would be lost in the way that the table API handles row and column selection. There would now be two concepts at work in any application that used model chaining: identifiers for columns and different co-ordinate systems for rows. The TableModel interface (which primarily describes data) would be designed for the current UI features in the JTable, in particular its ability to reorder columns. Whether or not our JTable ever supports row headers itself, developers have already started to subclass the JTable to provide these features. It would be a shame if this task were made more difficult by an API that had made special arrangements for identifying columns if there were no simple extensions that would handle rows. The cover methods that allowed its integer based primitives to be used with identifiers were numerous but not complete. Adding the full set of methods to isolate the developer from ever needing to use integer methods for columns would add a lot more complexity to an already large API. Providing another set of cover methods for row identifiers would bloat the API even further without providing any new functionality. Conclusion ========== We have decided to go with all integer solution, mostly because the identifier solution is incomplete without an analogous procedure for identifying rows. In places this will add complexity by introducing the idea of multiple co-ordinate systems where previously there was none. The advantage is that it removes many of the difficult design decisions from the model implementor and therefore makes it much more compelling to write significant parts of an application in the well defined space of the model. This leaves these implementations significantly less dependent on the specifics of the JTable API and allows much of this work to be done without constant requirements for enhancements to Swing. Because, in all cases, the table's columns are unstable the identifier mechanism will be left in the surface level JTable API (but not in the table model). That way TableColumns in the JTable can be manipulated by identifier provided identifiers have been submitted with each column. In addition, these identifiers default to the name of the column so that column attributes can easily be modified by name. When these defaulted names are not unique it is not be possible to reliably access the column attributes of any duplicate elements using their name. So, for example, widening or moving a column programmatically by name will have undefined behavior when many columns have this name. The option still exist to set alternative identifiers for these columns though and the table will always render correctly as identifiers are not used for data access into the model.