[an error occurred while processing this directive]

Page One
Page Two
The Databank
What Is Swing?
Special Report
IDE Roundup
Swing and the Web
Swing Text
Tech Topics
Friends
Tips and Tricks
The PLAF Papers
Call 911
The Archive
JFC Home
Download Swing
Swing API Docs
Download JDK
JDK Docs
Java Tutorial
 
The Swing Connection Swing Text

Understanding the ElementBuffer
A Closer Look at Modeling Text in Documents

By Scott Violet

Element buffer iconThis article is a continuation of a previous article titled "The Element Interface," which appeared in last month's issue of The Swing Connection and is republished in this issue. To understand the material presented this article, it may help you to read or review the previous article.

More articles in this series will appear in future issues. Installments now being prepared for publication include a discussion of how Swing's Undo mechanism works; the handling of DocumentEvents; and the creating of customized View implementations.


About this article

This article shows how to insert text into documents and how to remove text from documents. It also shows how changes of attributes can affect the data structures used by the Swing text package, with an emphasis on insertions into styled text. Finally, it touches on a few additions to the Swing text package that make it easier to work with HTML. (You may find that his last topic is somewhat advanced; to understand it, it will help to have some understanding of the contents of the javax.swing.text package.)

This article may be particularly helpful to people writing applications that manipulate the Elements that comprise a Document -- for example, HTML or XML editors. It may also come in handy in applications that need to implement styled copy and paste operations.

Finally, this article may help you understand upcoming articles in this series, including the articles that will discuss the Undo mechanism, the handling of DocumentEvents, and the writing of customized View implementations.

These are the major topics covered in this article:


A Quick Review

Here's a brief review of the topics introduced in the first article in this series, "The Element Interface":

In Swing, a Document is made up of a set of root Elements that characterize the document's content. Each Element has a parent Element and can have any number of children Elements. Each Element also has a start offest and and end offset, as well as an AttributeSet that defines a set of key value pairs.

The Document Interface

The Document interface defines methods for inserting and removing text. The AbstractDocument class, the default implementation of Document, is responsible for maintaining the document's Elements.

It is up to your particular implementation of Document to define what the document's Elements represent. For example, the PlainDocument class, which is used to represent plain text, is made up of a root Element containing one child Element per line of text. It is important to note that AbstractDocument -- the superclass of both PlainDocument and DefaultStyledDocument -- has two root Elements. The second root Element is used to represent runs of bidirectional text (which are not discussed in this article). Figure 1 shows how PlainDocument can represent two lines of text.

Diagram: Two paragraphs
Figure 1
How PlainDocument represents two lines of text

 

DefaultStyledDocument

DefaultStyledDocument, which is used to represent styled text, has a root Element containing one child Element per paragraph of text. In turn, each paragraph Element contains a child Element for each run of text. A run of text is a sequence of characters with a common set of attributes.

Figure 2 shows how two paragraphs of text can be represented using DefaultStyledDocument. The first paragraph has two styles, and the second paragraph has three.

Diagram: Two paragraphs (2)
Figure 2
A representation of two paragraphs of text

As mentioned previously, each Element has a start offset and an end offset. Figure 3 shows how the Elements in Figure 1 would map to some text. Each line contains five characters.

Diagram: Mapping elements

Figure 3
Mapping Elements to text

AbstractElement

The default implementation of Element is AbstractElement, an inner class of AbstractDocument. This AbstractElement implementation does not store content, as you can see in Figure 3. Instead, AbstractDocument defines the interface Content, which is responsible for storing the content. This mechanism makes it particularly easy to plug in different implementations for storing content without having to rewrite all the Document-related classes.

Swing provides two implementations of AbstractDocument Content: GapContent, which is the default, and StringContent. The Content implementation is not only responsible for storing the content; it is also responsible for returning instances of Position. (In Swing, Position is an interface that is used to define a location within a document.) The Document interface defines a method named createPosition(), but AbstractDocument implements this method by forwarding it to a createPosition() method defined by the current Content instance.

The Position Interface

The Position interface defines one method: getOffset(). A Position changes its offset as the content of the Document is modified by the insertString() and remove() methods. A Position points between two characters, with a bias toward the end of the model (except for Positions representing offset 0). As the text around the character is altered, the Position's offset changes in such a way that it stays between the same characters.

As an example, let's say you set up two Positions: pLeft, which you initially create at an offset of 2, and pRight, which you initially create at an offset of 6. Figure 4 shows this arrangement.

Diagram: Offsets
Figure 4
A pair of positions represented by offsets

Now if you were to insert a character, X, at Offset 3, pRight would move to an offset of 7. At this point, if you invoked getOffset() on pRight, the return value would be 7. But the offset of pLeft would still be 2, as shown in Figure 5.

Inserting a character
Figure 5
The result of inserting a character

Now suppose you delete two characters starting at offset 3. When you have finished doing that, pRight winds up at offset 5, but pLeft is still at Offset 2 (Figure 6).

Deleting characters
Figure 6
Deleting characters

Summing Up

The operations we have just carried out can be summed up as follows: When text is inserted, all Positions with an offset greater than or equal to the insertion offset are incremented by the number of characters inserted, with the exception of Positions representing offset 0. Positions representing offset 0 are never altered when inserting. Similarly, when text is deleted, all Positions with an offset greater than of equal to the end offset (end offset is the offset plus length of text deleted) are decremented by the number of characters deleted. The offset of a Position representing an offset in the deleted range becomes the offset of the starting offset of the deletion.

The default implementation of leaf Elements use Positions to track offsets. Branch Elements determine start and end offsets from their children (this does in fact imply that the default implementation of branch Elements will not work correctly if a branch Element has no children).

Because the default implementation of leaf Elements use Position, an Element's offsets change automatically as its content changes. The automatic updating is handled by the Content.


Inserting Text into a Styled Document

As we have seen, the default implementation of Elements have start and end offset Positions that can be used to determine their offsets. When an insert occurs, the offsets of the Elements are automatically adjusted. And when text is inserted into an AbstractDocument, it is passed to the current Content instance. At this point, all the Positions created from the current Content instance are automatically updated.

Most of the time, from the model's perspective, this is all that is required. That is, text is usually inserted into the middle of an Element, and the appropraite offsets are automatically adjusted accordingly. The text being inserted is passed to Content, and a DocumentEvent is generated and passed to the current DocumentListeners.

Special Cases

Usually, this strategy works fine. But it is not the desired behavior in three special cases:

  • Inserting at the begining of a paragraph.

  • Inserting characters with a different set of attributes than those of the leaf Element under the insertion offset.

  • Inserting text that contains newlines.

To illustrate why this is true, consider a Styled Document with two paragraphs, each paragraph having one style (Figure 7). In the illustration, the numbers in brackets show the start and end offsets.

Individually styled paragraphs

Figure 7
A document with two individually styled paragraphs

In Figure 7, if a character is inserted at offset 10 -- that is after the newline character following the first paragraph-- the Element structure changes to the structure shown in Figure 8.

The wrong result
Figure 8
Inserting a character: The wrong outcome

As you can see, the first paragraph now includes the new character. This is not the outcome we wanted; the newline is supposed to be the last character of a paragraph but is not shown in Figure 8. Instead, this is the pattern we wanted to end up with:

Correct result
Figure 9
Inserting a character: The right outcome

What went wrong? Because the end offset of the first paragraph corresponds to the insertion offset, it is automatically incremented. To fix this, we must re-create the leaf Elements so they will have the correct offsets. You will learn how to do this in the next section, ElementBuffer: Inserting Text.


ElementBuffer: Inserting Text

ElementBuffer, an inner class of DefaultStyledDocument, is used to modify the Element structure as a result of an insertion, deletion or style changes. ElementBuffer can delete and insert Elements, as required in the previous example, but it relies on the caller to pass in a description of what needs to be changed. This description takes the form of an array of instances of ElementSpec.

How ElementSpec Works

An instance of ElementSpec contains five parts:

  • A type.
  • A set of attributes.
  • The length of the Element (the start offset is determined from the insertion point).
  • A direction.
  • The characters to insert.

An instance of ElementSpec can be thought of as a tree operation: that is, either the creation of a new branch Element or the creation of a new leaf Element. ElementBuffer is able to modify the current Elements based on an array of ElementSpecs. However, even though ElementSpec defines an array of characters, ElementBuffer does not reference them. Instead, there are two DefaultStyledDocument methods -- create() and insert() -- which can extract the characters and add them as necessary to the current Content.

The ElementSpec type

The type part of an ElementSpec instance specifies the action that should be taken. There are three type designations:

  • ElementSpec.StartTagType, which indicates that a new branch Element should be created.
  • ElementSpec.ContentType, which indicates that a new leaf Element should be created.
  • ElementSpec.EndTagType, which indicates that the Element to which new Elements are added should become the current Element's parent Element.

When a new Element is created, there must be a way to specify the Element to which new Elements will be added. You can use the type ElementSpec.EndTagType for this purpose.

When ElementBuffer is asked to insert an array of ElementSpecs, it finds the deepest branch Element at the insertion point in the document. Any new Elements created are then added to that branch Element. When a new branch Element is created, any subsequently created Elements are added to it. When an EndTagType ElementSpec is encountered, the Element to which new Elements are added becomes the current Element's parent.

What Went Wrong in Figure 8

In the previous example (Figure 8), the Element to which new elements were added became Paragraph 1. You may wonder why it didn't become Paragraph 2. The reason is that ElementBuffer was asked to update the Elements after the text had been inserted into the Content and after the offsets of the Positions had changed. Consequently, as Figure 8 shows, the Element to which new Elements were added wound up being Paragraph 1, not Paragraph 2. Furthermore, in the setup shown in the illyustrations, if there were any subsequent ElementSpecs indicating that a new Element should be created, they would also be added to Paragraph 1, because it is the deepest branch (at offset 10).

A Better Way: Figure 9

Now let's see how we can use ElementSpecs to perform the same operation but end up with the result we really want, as shown in Figure 9.

Once again, the Element to which we will add new Elements will be Paragraph 1. But the newly added character should belong to Paragraph 2's child -- that is, Style 1. This means that the first ElementSpec should be of type EndTagType.

When ElementBuffer encounters this situation, it changes the Element to which new Elements are added to be the root Element. (I use the word root because it is the parent of Paragraph 1, which is the deepest branch element at the insertion point). Because an EndTagType has been reached and no leaf Elements have been created, ElementBuffer recognizes that it needs to recreate Paragraph 1's child -- Style 1 -- to have a start offset of its original start offset (0) and an end offset of the insertion point (10).

So far, so good. Style 1 will be recreated with offsets 0 to 10 -- which is what we want -- and we have positioned where new branch Elements are to be added to be the root Element

This is perfect; we don't want the new content to end up in Paragraph 1, but in Paragraph 2. So now we create an ElementSpec of type StartTagType, with the AttributeSet matching that of Paragraph 2.

Fine Tuning

OK, we're almost there; now we just need the Element that will represent the content. For that we use an ElementSpec of type ElementSpec.ContentType, with an AttributeSet matching that of Paragraph 2's child, Style 1, and a length of 1. Doing this results in:

Almost there
Figure 10
Almost There

Well, we're almost there. We don't want a new Paragraph 2; we want to use the existing Paragraph 2. But we need a way to specify that we don't want a new branch Element, but just want to use the existing one.

This is what the direction property of ElementSpec is for. By default, this property is set to the static variable OriginateDirection, which means that a new Element should be created. In this case, we want to use the static variable JoinNextDirection. For branch Elements, JoinNextDirection indicates that instead of creating a new branch Element, we want to use a child of the current branch Element to which new Elements are added. The Element chosen is the child that can be found after the child at the insertion point and beyond any newly created Elements by way of ElementSpecs of type StartTagType. In this case, the branch Element is the root and the child at the insertion point is Paragraph 1, so the next child is Paragraph 2.

That is exactly what we want. Now, if we change the StartTagType ElementSpec to have a direction of JoinNextDirection, we end up with the arrangement shown in Figure 11.

Close, but not quite
Figure 11
Our progress so far; close, but not quite

The Finishing Touch

Close, but not quite. If you're paying close attention (and are still awake), you may recognize what's wrong. The ElementSpec created with type ContentType has a direction of OriginateDirection. We really want it to have a direction of JoinNextDirection as well. If we now change the direction to the one we want, we will end up with the Element structure shown in Figure 9 -- which is exactly what we wanted!

Other Ways to Specify Direction

Now that I have described the static variables JoinNextDirection and OriginateDirection, it's time to point out that there are two other static variables used for specify direction. One is JoinPreviousDirection, and the other is JoinFractureDirection.

JoinPreviousDirection

JoinPreviousDirection is very similar to JoinNextDirection, but it is only applicable to the first ElementSpec, and only if that ElementSpec represents content. Consider the Element structure shown in Figure 12:

'JoinPreviousDirecton' setup
Figure 12
A setup for 'JoinPreviousDirection'

Let's say you want to insert two characters at Offset 5, with the first character having the same attributes as Style 1, and the second character having attributes Style 3. In this case, the first ElementSpec would have direction JoinPreviousDirection, and the next ElementSpec would have direction OriginateDirection. A direction of JoinPreviousDirection is used for the first ElementSpec, because the attributes of the Element at the insertion point match that of the first character. This results in the Element structure shown in Figure 13:

Using 'JoinPreviousDirecton'

Figure 13
Using 'JoinPreviousDirection'

JoinFractureDirection

The last possibility for specifying direction is JoinFractureDirection, which is very similar to JoinNextDirection. A fracture results when an insert occurs in the middle of a branch Element and a new branch needs to be recreated. The previous examples did not result in a fractures because the insert happened at the boundary of Elements, or because the insert did not result in a new branch Element.

In the case of a fracture, all the children Elements that lie along and after the insertion point must be recreated. A direction of type JoinFractureDirection indicates that instead of creating a new Element, the Element that will be used is the one that was created as the result of the fracture.

To understand what this means, consider our first example, Figure 7. Referring to the Element structure shown in that diagram, suppose that instead of inserting at Offset 10, we insert two characters at Offset 5. Let's also assume that the first character we insert is a newline, and that the second character is anything other than a newline (the first character must be a newline so that a new paragraph is needed, any other character will not result in a fracture). What we want to end up with is the structure shown in Figure 14.

Inserting a newline and a character
Figure 14
Inserting a newline and another character

In this diagram, the first Paragraph 1 has a length of 6 because the newline character we inserted has become part of it. The second Paragraph 1 also has a length of 6 because the second inserted character has become part of it. The ElementSpecs to generate this structure would consist of a ContentType with direction JoinPreviousDirection, an EndTagType type, a StartTagType with direction JoinFractureDirection, and an ContentType with direction JoinNextDirection.

As mentioned previously, a fracture results in copying all Elements along and after the insertion point. The previous example resulted in a fracture that necessitated copying the paragraph Element and part of its child. To illustrate fractures in more detail, let's take a look at another example. Consider an ordered list with two list items, where the first list item has two different styles. This might be represented as shown in Figure 15.

2 list items with individual styles
Figure 15
Two list items with individual styles

Referring to Figure 15, let's say the user presses the Return key at an offset of 2. We would like this to fracture the first list item into two list items. We want to end up with the structure shown in Figure 16.

Desired structure
Figure 16
The structure we want

The first ElementSpec is of type ContentType, with a direction of JoinPreviousDirection. This specification is for the newline character. It has a direction of JoinPreviousDirection because the newline shares the same attributes as the list items paragraph child, Style 1. This attribute is followed by two EndTagType ElementSpecs. These specs position further inserts into the Element representing the ordered list.

The next ElementSpec is of type StartTagType, with a direction of JoinFractureDirection. When ElementBuffer encounters the JoinFractureDirection ElementSpe,c it realizes that the insertion was not at the end offset labeled li 1, and that it must recreate all Elements along the insertion point (li 1, p, and the remainder of Style 1) and to the right of the insertion point (in this case only Style 2 is to the right of the insertion point).

That's it. These ElementSpecs result in creating a structure like the one shown in Figure 16.


ElementBuffer: Removing Text

As previously noted, ElementBuffer can also handle removal of Elements that occur when text is removed. The Elements completely contained in the removal range are simply removed. For the two Elements along the two endpoints, they are first checked to see if they can be merged into a single Element.

ElementBuffer.canJoin is used to perform this comparison. It will allow a join if both the Elements are branches or both leaves, and their attributes are equal, or they both have the name AbstractDocument.ParagraphElementName. If the two Elements can not be joined this process is again repeated for the children of the two Elements.

Let's say we want to delete text starting at Offset 5 and with a length of 10 from the structure shown in Figure 14. The two Elements Paragraph 1 (the first one) and Paragraph 2 are both branches and both have the name AbstractDocument.ParagraphElementName, so they can be joined. When the join occurs, the two children along the insertion point are compared -- and, assuming they have the same attributes, they, too are combined. The second paragraph is completely removed because it is contained in the removal range. This operation results in the structure shown in Figure 17.

Using ElementBuffer.canJoin
Figure 17
Using ElementBuffer.canJoin

This, like an insert, is a straightforward operation that you generally should not have to worry about. It should work just fine in most cases.


ElementBuffer: Changing Text

The last thing the ElementBuffer can be used for is setting character attributes. Because Element stores its own attributes, this is usually a trivial operation. But there is one complication that can develop when attributes are to be set on part of an existing Element. Again, let's refer to our first element structure: the one shown in Figure 7. If we wanted to apply a new set of attributes, Style 2, to the characters in range 4 to 8, we need to split Paragraph 1's child, Style 1, resulting in the structure shown in Figure 18.

Setting elements
Figure 18
Setting Elements on part of an existing Element

To make this operation work, ElementBuffer makes sure that the leaf Element at the first endpoint has a start offset matching that of the first endpoint, and that the leaf Element at the last endpoint has an end offset matching that of the last endpoint. If these conditions do not exist, the Elements need to be split.


HTML Extensions

In Swing 1.1 Beta 2, HTMLEditorKit contains a method named insertHTML(). You can use the insertHTML() method to insert additional HTML code into an existing HTML document. When you invoke insertHTML(), you can specify a String containing the HTML code that you want to insert, along with:

  • An offset specifying where you want to insert your HTML code.
  • The number of ElementSpecs of type ElementSpecs.EndTagType to generate.
  • The number of ElementSpecs of type StartTagType with a direction of JoinNextDirection that you want to generate.
  • The first tag in your HTML String that is to be used to start creating Elements.

Example: Adding a List Item

As an example, let's say we want to add a new list item to the following Element structure. Let's also suppose that we have five characters to add, beginning at Offset 5, as shown in Figure 19.

The desired result
Figure 19
Adding a list item to an eEement structure

Figure 20 shows the element structure we want to end up with.

Illustration 20
Figure 20
The desired result

The Specifics

To get more specific, assume that we are adding the following HTML string to a document:

<ol><li>foob</li></ol>

The fifth character will be a newline, which is generated for you.

In the current version of Swing, it is OK not to provide the necessary HTML and body tags, because Swing's current parser knows how to do that. The number of ElementSpecs of type EndTagType to generate should be two. This makes ol the current branch Element to which new Elements will be added, because that is where we want to add the new list item. Since we are creating a new Element, the number of StartTagTypes to generate with JoinNextDirection is zero, and the first tag to start adding from is HTML.Tag.LI.

Another Example

As another example, let's say we want to insert a new list item into an existing ordered list. But this time, assume that we want to start our insert at Offset 10. So our initial Element structure looks like the one shown in Figure 21.

Initial setup
Figure 21
Another example: The initial setup

For this example, the same HTML string that we used in the previous example will work fine. But this time, we need to generate only one ElementSpec of type EndTagType. This means that root is the current branch Element to which new Elements will be added. Because we want the new list item to be part of ol, we must generate one ElementSpec of type StartTagType, with a direction of JoinNextDirection. This means that ol is the current branch Element to which new Elements will be added. And this is just what we want.

As in the previous example, the first tag to start adding from is HTML.Tag.LI. This will result in the Element structure shown in Figure 22.

The result

Figure 22
Another example: The result

InsertOrderedListItem

It's worth noting that that Swing HTMLEditorKit provides an action that does exactly what the previous examples did. This action is named InsertOrderedListItem.

InsertOrderedListItem has one very interesting feature: If actionPerformed() is invoked on it and the current offset is not inside an ordered list, an ordered list is created. HTMLEditorKit also defines actions for creating a new table, adding a table row, adding a table data cell, adding an unordered list, adding an unordered list item, adding an ordered list and lastly adding an ordered list item. All of these operations are handled by the class HTMLEditorKit.InsertHTMLTextAction. It determines the number of ElementSpecs to generate and invokes the insertHTML method to handle the creation of the new Elements. For more information on this class, refer to the InsertHTMLTextAction API.


Problem Areas

While DefaultStyledDocument can generate the appropriate array of ElementSpecs for an insert, it typically leaves a few areas that could benefit from some fine-tuning. Assume, for example, that we wanted to insert a newline at an offset of 11 into the Element structure shown earlier in Referring back to Figure 21. Most editors would fulfill this requirement by creating a new list item. With Swing, however, the result is the structure shown in Figure 23.

Inserting a newline
Figure 23
Inserting a newline into a structure

In this example, as you can see, a new list item is not created. Instead, a new paragraph in the list item is created. It is possible to override this behavior by creating a new insert action for newline characters that will either conditionally insert newlines or will create new structures. Another option is to override the insertUpdate() method and message ElementBuffer directly.

Another potential problem area for Element structures with depths greater than three (which HTML can have) is in the removal of Elements. For reasons that another article will discuss, all paragraph Elements should end in a newline. Consider the result of deleting 8 characters at an offset of 5 from Figure 21. This operation results in the Element structure shown in Figure 25.

Deleting characters
Figure 24
Deleting characters from an element structure

As you can see, the first paragraph no longer ends in a newline. One possible way to get around this result is to perorm the removal first, and then insert newlines as possible. In a future article, a future article will discuss the ramifications of not having a newline at the end of a paragraph. Meanwhile, the first most noticeable result of this omission is that positioning the caret after the last character of the first paragraph becomes problematic.


Two Useful Tips

If you encounter problems in experimenting with the exercises presented in this article, here are a couple of tips to help you out:

  • AbstractDocument offers a method named dump() that will write out a tree of the current Element structure to a PrintStream.

  • There is a useful class named ElementTreePanel in Notepad. This class manages a JTree that shows the Elements in an AbstractDocument and updates the JTree as the Elements change. The ElementTreePanel class can be particularly useful in giving you a better understanding of how the Element structure changes as you insert and remove text.


The Wrapup

I have covered quite a few concepts in this article, and if you have gotten this far, I applaud you. ElementBuffer is a very powerful class, given the appropriate array of ElementSpecs it is able to recreate practically any Element structure. Taking the time to understand what ElementSpecs need to be generated for a particular class of inserts, as well as how to best use the direction property, will save you headache in the long run.

I have outlined a few areas for improvement that will be addressed in future versions of Swing. But I have not touched on how DocumentEvents are generated (yes, ElementBuffer does this too). That is another topic that will be covered in a future article.

[an error occurred while processing this directive]