1	Optimizing predictive text entry for mobile phone short messages (SMS) Yijue How and Min-Yen Kan kanmy@comp.nus.edu.sg School of Computing, National University of Singapore
2	Short Message Service Over 24 billion in 2002 100 million sent on 2005 Lunar New Year Eve in China alone Problem Input is difficult How to make input easier? Make keystrokes more efficient Ease cognitive load
3	Problem Statement Write English messages using only 12 keys 1-to-1 mapping of letters to keys not possible Need more than one keystroke to type a letter We review current approaches and propose improvements using corpus-based methods Key remapping Word prediction Key point: how to measure performance? Keystroke Level Model (Better) Operation Level Model On actual SMS text
4	Current approaches Many approaches. Among most popular: Multi-tap Press key multiple times to reach desired letter 3 × “c” + wait + “a” + “t” = “cat” Tegic T1 Use frequency of English words to place most likely alternatives first Use a next key to indicate next alternative 2 × “ba” + “act” + next = “cat” Common feature: Use one key for space (e.g., 0), another for symbols (e.g., 1), so less than 12 keys
5	Outline Corpus Collection Evaluation: KLM vs. OLM Benchmark entry methods Key Remapping Word Prediction
6	SMS Corpus Formal English is not SMS text Closer to chatroom language Most published research uses English text Lack of publicly available corpora NUS SMS corpus Medium scale (10K) messages Demonstrates breadth and depth Corpus of messages from college students
7	Evaluation Models Keystroke Level Model (Card et al. 83) Used previously in SMS (Dunlop and Crossan 00, Kieras 01) Problem: keystrokes are weighted equally We developed an Operation Level Model Similar to (Pavlovch and Stuerzlinger 04) Tie keystrokes to one of 13 operation types (e.g., enter a symbol = MPSymK, directional keypad move = MPDirK, press a different key to enter a letter = MPAlphaK press a same key to enter a letter = RPAlphaK
8	Using OLM to derive times Reach home @ ard 930 Reach_ 5 MPAlphaK, 1 RPAlphaK home_ 4 MPAlphaK, 1RPAlphaK, 1 MPNextK @_ 1 1MPAlphaK , 1 MPSymK, 1 MPDirK, 1MPSelectK ard_ 1 InsertWord, 4 MPAlphaK, 2 RPAlphaK 930 3 MPHAlphaK Derive timings for each operation by videotaping novice and expert users Chose messages with wide variety of operations
9	Outline Corpus Collection Evaluation: KLM vs. OLM Benchmark: Baseline: Tegic T1 Improvement: Key Remapping Improvement: Word Prediction
10	Methodology and Baseline For each of the 10K messages: Calculate KLM and OLM timing for message entry Average over total for both novices and experts Baseline: Tegic T1 (based on 2004 Nokia phone) Need to know order of alternative words E.g., 6334 = “good” next “home” Reverse-engineered dictionary Results: 74 keystrokes (average KLM) 74 seconds (average OLM) 59.7 and 149.56 for expert / novice OLM
11	Key Remapping Shuffle the keyboard (similar to Tulsidas 02) Too many combinations: ~1.5 x 10¹⁹ Use Genetic Algorithms to search space Swapping letter to key assignments per generation Keep “best” keyboards (e.g, have lowest average input times by OLM) Result: Average 15.7% reduction in time needed Due to reduction in next key presses
12	Predictive Word Completion Allows completion of partially-spelled word Similar to ZiCorp’s eZiText Our model: Select w with highest conditional probability given evidence from: Current word’s key sequence Previous word Display a single prediction only when confident Cycle through completions based on confidence
13	Example and Result Writing: Meet at home later So far: Meet at in 46* = in, go, got, how, god, good, home, ink, hold, holiday … P (home \| at, 46) > threshold P (in \| at, 46) < threshold … Display: Meet at in home Result: 14.1% savings in time (OLM) Compare with 60% in early work on PDAs (Masui 98)
14	Combining methods Both methods complement each other Allows up to 21.8% average time savings Remapping improves slightly more than word completion May be caused by conservative word completion strategy
15	Future Work Doesn’t account for cognitive load Remapping is hard to learn Codec in development Regular Text to SMS / chat Text Speeding up Named Entity entry = People, places, times and dates
16	Conclusions Can save 20+% time in entering SMSes Use corpus to drive and benchmark optimization Evaluation using OLM (finer than KLM) Public SMS corpus available (ongoing work) See Yijue How’s thesis for more details and additional experiments Google: “SMS Corpus”
17	Backup Slides
18	Guidelines for talk 15 minutes 2 to 3 minutes for questions