Welcome to TiddlyWiki created by Jeremy Ruston, Copyright © 2007 UnaMesa Association
Kui-Lam Kwok
Web IR
* response time
* adversarial, commercially tied
* aol debacle 2006. ids removed, but easy to trace back user info.
* deep web queries as TREC style research
** brightplanet.com DQM/P and Deepwebtech.com
* Garfield's Algorithm Deviation Indexing for Emma
* Kwok 95 PIRCS Network
3243 midterm again
ZhaoJinEmail (5 Oct)
LRECemailReminder - done
citeseer copy - done
HYP/UROP
* parallel corpus collection / MorphoMT (done)
* lightweight nlp - (done)
* citation typing (urop) - done
* FirefoxExtension (impl) (done)
* scientific ir ** (postponed)
* keyphrase and tagging (hyp) - done ** (removed)
TCS
* visit / slides
* gautam/anantaram follow up
Bought new server
*cte down again!!?!
* ilo slidealign - done, yay
* tomm editing?
* more a0001 stuff
* Finished google queries of all of citeseer. yay!
* fixing mailman. seems to be a host (wing vs aye) config problem.
* done copying out of db1, now on papers.
* kenny lew meeting, ask to contact yves dassas. do ip reports.
* finished shiren editing. whew!
* basic version of cache crypt function
zz grp
hanoi pics
son rec
melvin verif x2
caslyn2: icadl hotel, ijcnlp request
sophia recs to ie
icadl day email followups
icadl presentation
send ed parcels toolkit
email kristine cfp for idjl,
send pictures to group
ask yf to read kazunari sugiyama's work, email yf reply to jcdl lf-sf
fix emma webpage
phys bill catchup
msra cfp
emailed ijdl reminders
crazy limsoon request
email kristine about paula proctor
* Sent Springer email.
* Renewed domain names.
* IM meeting for journal paper.
* Dongwon's visit application packet prep.
* lyrics text file results uploaded.
Back from Seoul, Korea
* Chimetext announcements
* battery exchange
* long arm stapler borrow
* seoul expenses
* print cover of program
* dAnth email
* edited program stuff
clearing emails
fax starhub giro
gm's presentation stuff
do header parsing - partial need to install more libs - > push to admins
mail sembcorp parking giro
lrec paper edits from bonnie
sfr re-report x3
cm stuff
1101
* student cm problem - done
* post summ - done
* fraction representation - done.
* testing and debugging lecture notes
premia board meeting
3243 mt - prelim - finalized > print
ntu lectures > followup
resched dental
do yeefan prop
* key for EC for group meeting
* reboot cte (sigh)
* volunteer for AIRS booklet
* reply to drago
''todo''
* get speakers back
* write chee how
* moderation
* editing
** shiren's paper
** bang's paper
* reading
** yee fan's query document
* xs
* meetings
** franz och
** dongwon lee
* fo's invited talk at iscslp
* thang urop meeting
* huangzy lunch
* citeseer cleanup
** back online again
** all logs moved to rp and cte
* tyl ntu meeting with jin
* nyberg meeting and lunch
* hari letter
* acl reg figure out
* mail letter / do date change for oap
* practice sessions with hendra (weps & acl)
* submit oap simone
* slides from jesse for jcdl
* mileage claims
* slideseer conversion
* income tax interest payment / nol 2004/2005
* do slides
* meetings
* 5246 initial website created
* more pdfbox
* airs program finalization and prep
* dell monitor
* netflixprize download
* lib meeting
* clearlyunderstood stuff
* tomm meeting
* tomm editing
no coverage but optional to student
* assertions ch 8
* file i/o ch 12
* no maps in array chapter 10
* recursion but no data recursion, just procedural recursion. (for recursion ch15)
* testing and debugging
aaron's slides are in CS1101X. ~cs1101x
1st 4 wks, must cover up to iteration
* first lab wk 3.
* will set PE with leeml
have 18 UDLs + 3 TAs
6 UDL + 1 TA per group
Grading
15% {PE,TT1,TT2}, 15% Labs, 40% Exam
* 5246 tut 2, lect 6, tab mt feedback, web interface revamp, update gcal
* pdfbox conversion restarted
** mapReduce.rb updated with new machines
** problem converting with new machines
* got 2xside tape
* chimetext graeme
* wangshuo rec
* editing
** bang's paper
** hendra cv
* lin rec
* reviews
** jcdl printouts
* submit www07textgraphs
* submit hiring forms
* got the disk sent down to helpdesk
* 5246
** tutorial
** lecture
** hw 1 grading, demo
* renew domain aam
* ask for fare quotes
* pt work claims for staff
* duc fax to nist; speaker proposal
* editing ecdl: yeefan and jin
* hiring pt forms
* buy light for bathroom
* ergin recs
file blog06 reimbursements, file mda grant hiring - jesse
dac2008 x1, figures
3243: nb and knn posting
1101: grade TT2, dl meeting, signing,
ijdl special issue tracking down.
malindo workshop
mit vidconf: position paper, vc, followup 1 para
su nam interview
dac discussion
3243: grade 2T
* 5244 lecture
* tomm editing
* email replies
* dAnth work for drago?
* indian buffet reading
* airs scheduling
first classes: late to 3243 first lect :-P
completed icadl reviews -yay
218 to cao
review yf's proposal
1101 site updates
ivle updates for 3243
3243 poll
3243 ta tanyeefa
request robots 3243
did tech report
zaw lin meeting
ask for switch
udl meeting sched
minh thang seats
scholarship for qiul and hendrase
489.44 ziheng
vldl reviewer assignments
updated sps system
police report to clementi branch
meeting with ramesh
request rack shelf / aye / ecp,pie movement
npic - connecting to boostexter.
creating boostexter class
* read wang dong's grp / wang dong's grp
* noi reading
* reminded alicia about bookings
* catering by gourmet (barry 6275-4058), emailed menu, pr blitz.
* xs
* noi meeting / problem prep / programming soln 1 / programming soln 2
* bbq 1,2 (thanks hugh) pit and func room booking
* googled linkage paper editing
* dad's gallery install / firstcycle update
* hw 2 grading
* intrusion problem fixed for now
* bug ppl about gcpps
* get cardboard from storage closet
* fix wing webserver/mailinglist
** survey ides for workshop
* graphreading
* special group meeting
* more airs stuff
* dAnth sample conversion
* 5244 lecture discussion
* dAnth mailing list set up, invitations sent
* IJDL proposal editing
* cacm editing done
* fix up joomla pages
* prager, hang li, emma slides on chimetext, meetings
* check liangzhu's june pay
* ivle 5244 site fixes
* ask unixsp to remount disks, done
* Kenny Lew IDF for npic
* initial try to integrate db to citeseer
* DavidChiangSynchronousGrammars seminar talk
must finish:
* alexia's article -- finished, yeah!
* npic cgi - rats, the multipart handling is not standard as per normal cgi.rb. -- ''done'' ok a really crappy version is up now. Got lots of bugs to go fix later.
* birthday migration to calendar -- ''done''
* start npic poster and handout
* eTochi stuff
* acl practice session 1 / snack prep / practice session 2 announcements / premia / qe marks meeting
* simone email / cv / citeseer sending / hr about sabbatical
* ask cuntai about hari
* omnipage scp hookup / try acl anth conversion
* book lab for RoR / RoR meeeting / RoR mailing list set-up
* vldl follow ups / rebroadcasts / wyma vldl
* mohanan meeting / follow ups
* kokkoon email
* slideseer prelim slide view
* group meeting
* retrained xuan's model
* sent out wing news
* revise slides for plag / moss
* fix meetings page
* tomm edits
* prep for grading 5244 - imms2
* hyp for wangye
* MS apps finished.
* update WING
* cs fixed from isaac
* cs copying to lacie started
* reading thesis proposal
* cs 5244 class #1
* stevenha's tp
* student eval
Zhao Jin: 4:07 - 4:17
Qiu Long: 4:25 - 4:35
Yee Fan: 4:39 - 4:49
Jesse: 4:51 - 5:01
Ziheng: 5:10 - 5:20
Bang: 5:29 - 5:39
Emma: 5:43 - 5:53
Hendra: 6:00 - 6:10
* 5246
** wk2 lect notes
** wk2 lect
** figure out tutorial rooms
* car
** transfer monies
* editing
** linziheng
** yeefan
** ss
** jesse
** syan
* hari oap app
* fax duc thing under "Timestamped Graph: a Graph Model for Text Summarization"
* ss dev
** cvs'ed the whole darn thing
** lucene primitive support / highlighting mess
** fix metadata import
** fix screenshots
* paper review meeting
got lyrics done for ieee tmm - finally!
connecting text extractor and slide gif extractor -- still working , humph!
also printed out the alignment papers - finally!
qiul ijnlp 2008 x2, done
vldl x2
blog corpus: faxed
visa reimbursement: submitted
cs1101:peq x2
ijit
widm07 pdf link
cs3243: loa links, grades, scoreboard
* CS 5244 grading
* airs brochure stuff
* write load balancer. fix MR bugs
* send preview of group meeting
* start us tax
* bang airtix
* finish final exam duties
* qe questions
* chimetext sem
* 5246 setup hw2 grading
* read hyp theses
* grade hyp theses 2/6
* op15 install on kpe
* op15 figure out
* book car
* hyp evaluations
* still copying to mnt/usb
* lots of 5244 posts
* ILO stuff
* updated tw with old blog entries. see tag oldBlog
* special prog stuff
* decompress acl anth mirror to sf3
* son's rec letter
* yee fan's cover letter
* partially complete - scanned picture edits
* sent isaac copy of bin and lib of cs
* sent danth url for good/bad conversions
Jing Jiang's talk
v2 of annot guide
packing for as6
forecite refactor
citeseer check
yeefan's / ergin's widm
isaac parsCit send
hari prep / dinner
jin's survey
danth hw email
galv4 restore
grad apps
velardi email
Editing IPM, Shiren's DUC notebook and learning more about CRFs from Sutton and Mc Callum's book chapter. After talking with Yee Whye, we said:
* read Semi Markov CRF by Sarawagi and Cohen
* read Maximizing Log Probability by Wainwright
* deconstruct CRF packages
* done with 3rd round on cacm.
* hsbc fax
* phd app review
* chuats meeteing
* cs metadata checking
* answered sherman.
* send out papers for group review
* more old picture edits
6789 8188 11-13 Jan (MI 368 18:35-20:00 - MI 367 20:40-22:15) Langkawi - 15K miles, 135 taxes KDNNID
ticket bangalore flight
az pass x 3
atap eval
simone talk
simone cheque
ergin recs + email recs
son recs
siva interview
group meeting
book raffles buffet
mcomp reviews - done
3243 midterm administrator
vet jesse's ityouth
chimetext scheduling
ror xtremeapps team
tcs trip
* flight / accom done by tcs
* title and abstract - written
* biodata fetched
cm coord
* mapping
icadl
* flight - done
* copyright release - done
* sher - confirm 524008204
* registration - bank transfer - fax - done
partial premia
* newsletter - out part
* forum - installed, forget bridge
initial acl08 reviewer list
* converting acl anth via pdftohtml
** acl2004, coling2004, hlt-naacl2004, muc7
** X,T,A,I,M,N
** in progress: J, C, W, E, eacl2003
** need to do: P, H
* hang's thesis?
* read over patent stuff in acl anth
* more phd apps - now all done
* dongwon meeting
* sv, dv debugging
* xs
* serc
* wing
** portal fixing
** administrator's update
* 5246 prep
** updated syllabus page - links to chapters and slides from 2 textbooks
** added s52 forms to course pack
* chimetext errands
* editing
** bang's paper
** yeefan's joint paper
** jin's paper
* victor's recommendation done
* Working on getting the appropriate pdf files, connecting them into the pdf995edit utility via RDC.
* the alignment baseline
* replaced p2da-1.0 tgz with 1.1 tgz
* 5246
** homework grading, enter grades to xls
** lecture notes, disc q
** book tutorial room
* hlt trip prep, hk trip prep
* hyp prep for interviews, interviews
* wangye acmmm
* edit neoshiyo, zhaojin, nghongi
* citi payment
* ask off quote banff
3243: sent out grades, reported means, emails, immsnet
1101: immsnet
DAC 2008: done
do hyp grading - started
1101 queries and collections.
qiul/hendrase rsearch fee waiver
seating for hanoi
do ijcnlp hotel booking - started
grader.pl fix by tyf
Reading/Editing hendrase: sect 2-4
"A component model for internet-scale applications"
* do start mailing to airs authors
* imms testing acceptance
* uist editing
* 5246 lecture notes wk 12
* file sg tax
* cjx defense
* jcdl final revision round 2
* acl flight book, hotel inq
* check opac is up
* print hang's papers
* emnlp printouts
* basic qe set
* basic exam set
* hyp disc and decs
** kalpana ch 1 2 and 6
** bang ch 1 and 6 - done
** emma all - done
** yue ch 1 2 3 4
** ziheng all - done
** jesse chap 6
*omnipage 15 auct
*added self to all wing sudo files
* bring spinelli cards in
* RoR
** rar 1.2.3 install on aye
** installed fcgi-2.4.0.tar.gz on aye
** installed httpd-devel via yum on aye
** installed mod_fcgid2.0
** updated forum with sysadm posts
** got cookbook example to work on /cookbook, with local path /var/www/cookbook
* Hang's thesis editing
* Dongwon IJDL first draft done
* HKUST emails done
* Hang's SMS collection set up
* editing cacm article with Yee Fan
* incampus
* mmies review prep
* tw canonicalization
* taslp prep
* opac / danth networking
* textbook comp
* zawlin meeting / annotators hi / kp email
* cec accredit work
* late late icadl sub
* hang visit
* law visit
* cancel newspaper
* chase airs student reg - done
* ask about grant
* grading done
* leave app
report for feedback - redo
payment zaw lin
linked anthology email draft
submit qiul paperwork
rpnlpir zone account appn
arrange kathy time
give / get monitor back
did 1101 lecture notes
resub reviewers ijdl
get breakthrough driver working on sunfire again.
redo rdale email
refactor gcal events to public private
dive options
sigir emails and sched
* Coded jing's basic HMM algorithm
* Coded bigram jaccard
* tomm to taslp conversion
* xuan practice
* grp report sub
* hs' scholarship renewal
* got check back from gh office
* ijdl invite to google
* cert prep and distribution
* hep shot
* xs
* franz och's invite
* dell replacement
* ts proposal for cu
* eepeng/dongwon special issue
* serc init prep
* airs
** post springer receipt to av consultants
** airs 2006 makes it into dblp
* 5244
** archive 5244 project presentation files
** prep 5244 project grading
** revision lecture
* poster session
** archive poster session work
* hlt-naacl 07 reviews
* AIRS booklet
* denny practice session
* student meetings
center title
2:53 start - 5 min
good examle of diff btw voice and singing
histogram notes not clear b.2
too much nav
need very clear explanation of exper syl vs word - perhaps disclaim, good comeback on single syl
Don't need A and B subscripts (ok) prop to experimental results table.
* finished cs copying for hidetsugu
* emails out
* patent stuff for chuats
* ijcai reviews half done
* short week (CNY holidays)
* chimetext scheduling
* dAnth conversion (0-200, finished 700s, 600-650, posted on danth Wiki)
* dAnth garbled conversion investigation
* HLT-NAACL review
* 5246 tuts 3 and 4 released, emails
* ganglia installed by admins
* did cny mailout
* JCDL review
* updated my schedule. Finally!
* airs proceedings paper collection
* acl practice session 2 and reimbs
* RoR connect to ParsCit
* mit csail stuff / ppt prep
* jair review yay
* mohanan project comment cat, job desc
* annual interview kalpana
* skype ijdl meeting
* prep 5244 survey grading / printouts
* fixed merlion image
* check dblp+ spidering
* finish materials for prelim cd
* 5244 discussion
* cd design
* embassy stuff and personal record keeping done
* check on acl anth conversion:
** done: A, C, E, H, I, J, M, N, P, T, X, acl2004, coling2004, hlt-naacl2004, eacl2003,
** in progress: W
* 5244 wk2 stuff
* 5244 class
* reading hang's tois edits r1-3 ok.
* hang's thesis re-edit. not yet there...
* 5246
** lecture 3 and 4
** prep hw1 - corpus, web interface, indexing, qrel judging, qrel assignment, qrel sample files
** tut 0 and 1
** room switch to SR 2
* car
** pick up
** stickers
* editing
** qiul (x3)
** hendra (x3)
** yeefan - and fax cprght form to acm
** ziheng (x2)
* syan paper
* bang's survey
* meetings
** cu
** donny
* hlt short review printout
* hsbc cc waive and redeem
* rg 7 for 245
* did flights booking
* aligner inspection done
* answer yves and begona's email
* answered yee fan's email
need to do today
* prep or do lyrics
* prep or do RG 5
* npic poster
setting exams: 1101 3243
3243: demo scheduling, ivle and mails
wingnews out
CS 1101: exam question setting, make up lecture, poll offline, plag
derry's grp: done comments
yeefan's prop
pick up pottery
* AIRS
* meetings
* grade remaining hw1
todo
* 5244 hw2 source finding
* ijdl form
* kp mohanan meeting
* ror install on macos
* 5246 grading
* svm training over crf++ for citation parsing
* ''hlt naacl / text graphs 2 / duc 2007''
* email sendouts / wingnews invites
* update my pub list
* ask hendra update wing pub list
* rec letters for ziheng, vu
* crp work
* trip expenses
* done with ACL anthology conversion by pdftohtml
* SERC HFE workshop
* hang's phd thesis edits (again)
* citeseer/singapore_copy/papers/
* emic versus etic (think phonetic vs phonemic): what are the differences in units for semantic
tuition waiver
pic continue backup
move to as6
read shanhengs prop
mmies
hari's 1st talk
grad apps
* dAnth stuff / links / new slice prep for umich
* setup new computer at home
* launchy install for office pc
* noi 2007 initial problem setting
* short paper IUI
* http://www.nus.edu.sg/comcen/acctman/
* chimetext chia tee kiah
* wk 8 discussion questions
* list for ijdl
* tomm revision / discussion to retarget to TASLP
* started new grant proposal
* dongwon visit stuff
* do ppt/pdf upload to AIRS 2006
* do metadata for AIRS 2006 for DBLP
* borrow drill
* TASLP email retarget
* fixup geoip usage data on citeseer
* w10 lecture revision and distribution
* final exam writing and origin report
* course pack artwork for 5246
* w11 lecture reading
* ijdl cover and cfp
* start printing cds
* stuck with dvd reading problems / udf problem with citeseer
* serc proposal time
do ijdl assignments
acl logins assigned
qiul thesis chapters: applications, sim step 1, sim step 2
az wrap up: context tags
install printer for psn518
print and read [PDF] Citation Analysis and Discourse Analysis Revisited -HD WHITE
Review/Non-review classification web snippet classification
ijclclp review
Returned from COLING/ACL/EMNLP. Whew.
* Hang's defense
* Claim forms
* Dongwon's visit application
* iras stuff
feedback x 2
phd review app
ieee taslp
premia newsletter
header reparse x2
bring fork
do jesse it to naomi to sign
yf proposal scheduling
qiul ijcnlp paper
acl08 reviewer list x2
final reports to students.
final data for archiving
3243 - midterm grading - tutorial updates - hw2 poll - qa - tut sol 8
for hw1 - midterm answers
sigir email warn
cl lab hyp/urop/admin students
* get back dvds and drive
* bang to copy sigmod anthology to cte,citeseer
* usage and crontab job on citeseer, http://citeseer.comp.nus.edu.sg/usage/ (don't forget trailing slash)
* a001 / cv
https://aces01.nus.edu.sg/sop/WebPageHandler
* graphreading
* airs
** tiddlywiki - done
** invited talks
* cs5244
** hw2 prep done
** makeup lecture logistics
** project page updating
* xs
* fix up ssSpider.rb class instances
* at jobs for ss
* cd printing
* altw reviews
* short week
* course pack submission
* dongwon
** final talks with Dongwon on jcdl sub
** atung discussion
** final report
* scan, pgped review
* special programme
** review wiki thang's email.
** thang's access
* wing
** email group meeting sched
** update website
* serc hfe
** set up phone meeting with tyl
** download template
* ijdl
** ia re-invites
* ss
** did print and full slide view
** fix css for hrefs
** fix alignment bugs
** redid url munging for data sources
* editing
** jin's short jcdl
** bit of shiren's paper
** jinxiu's thesis reporting
* got reimb from GH for bbq
* edit sigir poster
* resurrected (partially) parsCit
* 5246
** slides: qa, intro sum
** qrels for hw1
* jcdl reviews
* faxes
** acm tois page charges
** steves ip to cu
* chimetext
** ad
** updated schedule / room booking
* wing
** sending out reminder
** updated project descripts, finally
** work for google page
* editing
** ieee mm
** textgraphs2 final
* AIRS stuff
* added marshaling to aligner.rb
* coded mmStats.rb
Not yet done with align jump gotta fix that tomorrow.
* 5246
** tut, lect notes
* jcdl cam ready
* hyp interviews
* editing
** emma hyp, ziheng hyp, qiul emnlp, jesse uist, bang qlw final, ziheng duc
* simone email
* 4/12 to amex plat us fidel
* reserve with airserve
* chimetext
* cv editing for rita
* sf3/rpnlpir mig details
* opac virthost
* omnipage disc, ebay bid
* 4247 mod
* emnlp review prefs
* get shot
* bring back hair spray
* send out new land's end to us
computer fixed
1101: final grading, immsnet
3243: final, grading, immsnet, make up mt and final grading
2305: prop grading
Anthology: bug fixes started, NLG waiting on busemann
car: paint bought
slides for chuats
taslp fix again
cancel rg1 extension
vldl review for hvds
do icadl slides
cm bachelors coord: questions and clarifications
http://www.textfrompdf.com/tfpspeed.htm
http://jabref.sourceforge.net/
Citeseer: think about parsehed alignment
do tenure list
sum blog sum to kathy
Shroff: SusanFeldman / Autonomy
* altw reviews
* sigmod anth reimbursements
* ask for support from sanjay - got it
* reading survey papers for 5244 / about half done.
do and driver to webpage
muime's card
do xuan's exercise
sigir edit
icadl jin problem resolved
did nutch exercises - 1/2
did 3243 tut #1
mohan ramesh mtg
annual review docs x2
paula procter mtg
batts
cliqa sub review
call raffles do 2x confirm
sub icadl final
do nutch 2/2
* 5244
** particip grades
** project pres grades
** grade projects
** grade final
* marks moderation
*noi 2007 trace debugging
* bbq
** CS: david & diane, huangzy (4), ooiwt, ben & waiping, abhik & tulika, haifeng, kok lim, mun choon, samarjit, chee yong,
** IS: calvin
** WING: bang, ziheng, yue & shuo, emma and hoang oanh, jin, long, hendra, jesse and ailin, yee fan, lianngzhu and friend
** guests: dongwon
* xs
* check funds for hendra to go to iscslp
* avik sarkar training data parcels
* editing
** denny x3, now done
** taslp text results compilation
* meetings
* mcomp meeting
* mapreduce.rb debug(ging)
* graphreading
* airs
** booklet update
** schedule re-export
** cd printing finished
** final number pushed to springer
* did student claims
* a0001 forms turned in
* sigmod anthology copying done? got to ask acm for permission to host
* pics done and uploaded
* tois fourth round edits finished
* cache code
* got pdf995edit pipeline working
* airs 2006 publication stuff
* fixed schedule with google calendar embed
* reimbursements for software
* csail-mit workshop / demo prep
* fd ocbc
* annual reviews
* citeseer maintenance / log rotation
* danth email for project
* semeval sub email
* pubs updating
* emma email
* jcdl slides from csail pres
* us taxes, turbotax
* phys mail bounce followup: tatsuya, merry
* check collaborators tyltheng in wing
* ipm review
* talk to laizs about loose machines.
* talk to sanjay about sabbatical/leave
* airs
** blank cd and sleeves distributed
** student reg pushed to AVC
* figure out the thread bug in the mapreduce.rb (hopefully)
* reply to drago, dAnth
* mcomp apps
* citeseer df/mount to cte only/log gz
* 5244 class preparation
* finish ijcai reviews
* look over idm stuff (too high level for dl?)
* query cache revamp
* finished building caches for old pdfs, ppts, cache
* 5246
** got folder
** answer discussion questions
** tut 1
** new renamed corpus
* upali def and slides
* register cny lunch
* meetings
** ziheng on duc/textgraphs
* photos uploading
* chimetext
** upali
** xinyi announce
** booking
* wing-news
** prep, hendra update
** add subscribers
** sent update
* more phd apps
* gp review: linlin
* filming for research
* danth postings
* editing
** ss paper
* ngo rec
* phd app review
* jcdl bids
* found that pdf995edit is really just running standard pdftohtml with option -c turned on.
** pdftohtml with -c invokes gs which sometimes causes problems with creating a .ps file that is neverending in size. solution: thread it in a ruby call and terminate pdftohtml process if it doesn't terminal after 30 seconds.
** still have problems with pdftohtml -c creating files with garbage symbols. Not sure how do deal with this.
* pdf995edit pipeline working - but no longer needed with pdftohtml pipeline in cte
* PPTExtractor pipeline working but still dies on some files.
* work on iptables in citeseer.comp
** iptables -I INPUT 1 -i eth0 -p tcp -s 137.132.81.27 -j ACCEPT
** iptables -L INPUT
* don't forget to save iptables to file. use session save (google this)
* fixed up citeseer iptables and sshd_config
* got the rp 5 to 0305 grant finished. yay!
sans paper comments
lrecARC x1
lrecParsCit x1, subbed
1101: lecture notes, pe, pe done, emails
bank transfer blog 06
3243 sub final
hiranmay meeting
bartneck reimburse
dongxiang grp
booking hanoi
nanba: waiting on eepeng
vietnam form
lmthang urop: partial
dac08: partial
* citeseer
** mounted getRange.pl mod
** get range for 710-714, 720-729.
** finish slices 710-719 and put on danth@rp
** fix wiki pointers
* 5244
** uploaded lecture notes
** grading - continues
* mapreduce.rb
** mod to pick random free machine
** include other processing, ps2pdf, slow and fast pdfbox conversion
* tomm editing - di first edits, tl edit, di's second edits.
* url segmentation
** get - data from webbase
** get + data from citeseer-metadata060816
* wing people needs zhaojin - fixed
* cacm work.
* bing liu's talk
* continues web spider of www.comp.nus.edu.sg
interviews, shortlisting
1101: grading, exam moderations
simone prep
chime text prep
i2r seminars
icpc task
lr: l-bfgs integration
3243: grading, exam moderation
* wing
** group meeting room established
* reimbursements for cluster
* ss
** fixed fsv with tooltips, floating nav
* editing
** hendra's acl
** long's acl
** yee fan query probe
* writing
** hci proposal
** ss paper
* meetings
** yl ntu
** ben 3243
** vu exit
** tung hiring
* finish collecting all pdf forms for AIRS. Missing one CRF. Requested volume number from LNCS
* CHIME text seminar web page updating.
* 5244
** finished grading survey papers
** put up discussion questions
* airs cd and program booklet to burn
* url retraining for pub within domain
* ijdl
3243 tutorials - still missing trees in NLP tut 8, and questions 1 and 2.
installing missing software
mohan xiangyu meeting
do axs
do loudspeaker in lecture room
did 1st ver of lrec
1101
- post summ - done
- student cm problem - still not done.
- xtra problems
- fraction representation
- gcd problem
* give ht proceedings
* upload proceedings to wing and send admin request to update rpnlpir
* sent out recommendations
* import cds
* reimbursements: bang visa, bang www, hlt-naacl
* crp to nght, comments on it
* tyf renewal
* chuats slides for mit
* qe grading
* 5246 grading - final exam, hw2 regrades
* labor day holiday
* acm s045 copyright fax
* fedex question
* do q to ore about housing ownership
* jcdl reg / hotel book
* last meetings: ziheng, jesse
* bang practice and slide edits
* hendra acl edits round 5
* weps edits round 2
* do acl preview emails
* xuan's editing
* tois editing
* running diff between old and new cs metadata
* kw.rb script for emma
Hari seminar 2
grad apps finished
shanheng proposal
acl/jcdl reimbursements
3243 prelim lecture note upload
picked up potteries
10 digit iu
updated wing address, typos, pubs
wing group picture from phone
reimbursement claims
hari hmm
taslp sent, yay
* rewired tppt2pdfListing to be getCiteSeerPDFs.rb - fetches files directly from citeseer.cs now using citeseerMetadata.tsv as bridge.
* lunch boxes for coling acl
* fixed citation search URL in citeseer
* lan man's presubmission thesis
* grant writing
* w11 media lecture upload
* turned in grant proposal
* turned in moderators report
* updating cache with new cs entries
* creating acl metadata
* feeding diff cs metatdata to ssSpider.rb
* feeding acl metadata to ssSpider.rb, done
letters jie yang, chris yang
jiang jing chimetext seminar
do ijdl reviews assigned to self, done!
claims in progress
write philip about registration
print van de sompel paper
correlations to minh
hendra's paper revision
lta/samsung runaround
jie yang invite
gordon mohr wapi, kristine ijdl, brent ho's shipping request emails
set up fin08
lta license renewal
jin annotation: started
sigir reg meetings
do final report for nlp web q
prep/pack ijcnlp
qiul related work chapter
write isaac councill
pick up monitor
build raw text preprocessor
restart emma project
ACL09 or AAAI-spring 09
check on acl anthology fixing request
do personal message
az, cfc: tf*idf other features approximated
http://www.informatics.sussex.ac.uk/research/groups/nlp/rasp/index.html
moves jien-chen wu et al. computational analysis of move structures in academic abstracts
NetDraw www.analytictech.com/download.htm
Ask Gordon about web services architecture.
cm bachelors coord: questions and clarifications
http://www.textfrompdf.com/tfpspeed.htm
http://jabref.sourceforge.net/
Citeseer: think about parsehed alignment
http://statgen.iop.kcl.ac.uk/bgim/mle/sslike_3.html
do tenure list
* finished with course pack stuff, more or less
* updated firstcycle
* emma related work chapter proofreading
* citeseer progress up to 637/730 = 80%
* filed student claims
* dell pie.ddns service to helpdesk
* denny paper proofreading done
* SERC HFE abstract done. Done with slides, too. submitted
* printed ijcai papers for review
* robot pics for cs3243
* wangxuan's lor
* anubhav's lor
Fredo Durand
easier to author
3d model -> line drawings
coded apeture to get better model on blurriness
digital photograph reprocessing
* PREMIA meeting at NTU
* HP grant stuff with tancl, dpoo
* finally done with CACM article draft (I think)
* reset cte server (again!! :-( :-( )
* more sms collection mods
* looked over hidetsugu's data. Looks fine and well formatted. Gotta go convert them now.
* Aug2006PremiaMeeting
* editing
** taslp (edit, send out)
** shiren
* xs
* moderation
* proposals
** cu
* bbq prep (see 27 nov)
** CS profs = 15, IS profs = 1, guests = 1, WING = 10, family = 3.
** sent emails
* dongwon visit prep
* franz visit prep
* yi sok goong and sam sok goong visit prep
* almost finished with airs publication stuff. whew
* slideseer stuff also working out
* did npic poster, finally
* emma icadl paper
* bpowley acl init
* hari invite
* server movement
* group meeting prep
* admin hrs claim
* qiul ext abs jair
* finish basic icadl emma sub
* transfer danth files to rpnlpir
* edit radev's wiki at umich.
* writing getRange.pl for slice extraction.
* canteen stall ~500
* acl anthology
* tomm editing at night
* xinyi defense and slides
* 3243 sent mt
* 5246 lect 5, tut2 send out
* group meeting
* premia newsletter
* editing
** ziheng poster
** aaai web paper (ergin)
** bang's paper
* reviewing
** karthik
** lan man
what am I supposed to do today? Hmm. NPIC has to be brought online, poster must be printed out.
would be really nice to get getSummaryStats working to generate some sample pages for inspection.
Wow it's a month already. Gotta get moving! Yikes!
The new monitors are now working just fine. Yay.
''Lucene updates''
Got lucene to work with webapps
Modified basic script to index titles and search on titles
Modified basic script to show date and path info
Got the highlighter module to work with the contents of the document. Note that the field being highlighted must be stored (Field.Store.Yes or Field.Store.Compress)
Got highlighter to work with webapp, sort of.
POI PowerPointExtractor, ridiculously easy to run. But can we do better? Need to check out the rest of it.
* ivor tsang seminar CVM and meeting
* send out updates
* luzheng rec
* 5246 prep, demos, lecture 8 and 9, discussion questions, more on homework #2
* update services
* update chimetext
* group meeting and archiving
* draft hyps
* update hyp.html
* spc ann
* dbs ann fee
* simone los, edited
* duc dung urop call
* ecdl initial editing meeting arrangement.
* parsCit
** parsCit back up: pdf partially working
** crf++ download
** installed locally to cte:~/crf++/example/parsCit
** cvs into kanmy/parsCit
** crf adaptation
** xsl viewing
** template and length output
Trying out a journal. What is it anyways? A blog of sorts?
lmthang x1, ch 5 and abst, title
dac2008 x2
1101 mu lects
3243 demos
book ijcnlp flight bangalore - started
simone book hostel - started
simone 12/13 schedule - started
pmp email
vet rogerz exam: photocopy solutions
* vu updated the rpnlpir page.
* trung fixed the lacie disk, got back disk.
* im meeting with Denny
* cl lab meeting with Wee Sun, Hwee Tou and students
* lecture notes for tomorrow's class done. yay!
* put thumbnail of coursepack.
* one run czppt2txt
* lecture notes should contain MORE EXAMPLES to illustrate for each technique or concept
* homework requirement are NOT VERY CLEAR
* math not explained well enough
* assignment load TOO HIGH
* references need to be prioritized
* should cover some cutting edge methods
* explain terminology better (WordNet, MiniPar)
* should ask questions more clearly
* lecture should finish on time
* need application to real search engine
* want tutorial answers before final
* want tutorial questions and notes earlier
* logistics for course pack can be better (in one spot, with correct materials)
* proofread slides and tutorials
* team assignments
* math a prerequisite compared to hypermedia
textbook adoption form
fix stalled cte: another hdd failure and iptables problem
turn in hari's keys
find graphics key
mtjoseph data slices 397 398 399
udl interviews
sub wangdong
turn in forms for trung and tung
sub hari report
wing group meeting and prep
kymn email
update WING project page
get robots from graphics lab
sub irj
dale seminar
icadl
ijit review
yee fan
lrec
tech report jeprab
tech report isaac
admin meeting
* slideseer buildCoordinatedMedia done
* OAP application for Dongwon to visit in Winter?
* finishing NPIC webpage / demo stuff
* back from hk
* www debrief / chimetext conf preview sched
* jair review printout
* feedback analysis
* vldl
* asiamiles claims
* away on personal leave for past week
* expenses done
* airs2006 archive
* exam moderation and submission
* email and physical mail handling
* ijdl invitations out
* correct proposal
* IP for NPIC
* bill brody meeting (x2)
* 5244 emails / bill brody email / graphreading manual mail / personal emails to alan, tatsuya
* bang proofreading / yee fan reply / xuan's thesis 1-6
* revised related work acm / update
* set up hp printer
* room reservation for graphreading / group meeting / dongwon seminar
* deal with dongwong lee visit wrt hosf
* check timing on the graphreading
* erik recommendation
* prep grad course joint poster session
* d/led hw2 to grade for later
* AIRS 2006 post conference archive ok and email out.
* brought check for hsbc
* some xs
* updating pubs page / new ppt files d/l and converted
* deal with graph reading new location in 15th S16 (04-33) / 22nd in MR 3 (SoC 1 05-28) send out mail
* ask cath about india interns
* emailed lecturers for poster session
* more airs 2006 stuff
* run pdfbox
* finally wrote back to Lee
* updated cvs copies.
* premia stuff
* copying data out of 300 GB transport disk (takes hours/ usb 1 too slow bleack !!)
* mirror acl.ldc.upenn.edu
* met with faezeh
* met with tancl, dpoo wrt hp grant
* first pass at tsv'ing the dblp xml data
* ordered sigmod silver anthology
* thioachie
* MOE oct large-scale / A*star
* 5246
** try making rbr collection
** slides for wk 1
* library visits - reservations, return reserves
* ng hong i def
* buy vga connectors
* simone conv
** friday
* yl conv
** monday, thursday
** updated stuff for pams
** proposal 1 uploaded to pams
** 1pg cv
* ss
** tomcat redev started
** lucene basic text index
* meetings
** hari meetings
** hyp meeting
** group meeting email
** group meeting latex tutorial
all UROP/HYPs out
3243: released Hw2
contact susan silva of ece
Citeseer: finish copying, back up
bleong baby 50
3243 grade corrections to gradebook
traffic fine erp
sent scanned version to simone
Anthology: EACL sftp setup
vldl late notification and reviewer reminders
Reading/Editing
* vldl reviews: x4
* zhaojin grp
* tanyeefa's prop slides
* qiul ijcnlp
send mailed version to simone
acl reviewers
did ora young researcher award
taslp again
dhl reimburse
SIGIR: set papers
tcs: send reports by week end
tsinghua: wingnews
CS 1101:
* tutorial post
* PE
* grade consol, plag
PREMIA: forum post
* finished runs of pdfbox on acl anthology
* danth errands
* citeseer 710 slice pdfbox conversion
* WING admin and group meeting
* 5246 last lecture, demo setup, exam answers, hw2 setup
* RoR book printing
* chime text sched
* claim forms for admins
* emnlp reviews
* ng hong i thesis rev
* cuihang best thesis nom
* mcomp apps: round 1 (28/28) round 2 (9/29)
* read kalpana's chapters (2/4)
* wing / chimetext / wingnews reminders announce
* tancl/chuats area report
* redo duc copyrights commits
* bang / zhiqiang letters
* group meeting
* hendra funding and tutorial and workshops
* hendra's cam ready
ror
udl r2
tata
dyhsu meeting
hang acm diss award
hsbc
basic hari sched
go see doctor
new annote guide for kzl et al x2
icadl printouts prep
wx hyp prop
hari accom round 2
taslp editing round 1
ijhcs review
firstcycle.org renew
chris gil wedding return
widm review
setup 3243 website basic
mmies review
irj review
Rashid M. Abdalla and Simone Teufel
* semi-fixed cue phrases: semi fixed : find syntactic variants
* don't use thesaurus as not strict synonyms
* toss out negation, and usage done by others, previous mention. ''Q'' but why?
do you learn these. ''Q'' how to handle multiple word expressions verbs "narrow down" Is this based on RASP?
* eval on precision at 1.
similar to relation finding. agichtein and gravano, ravichadran and hovy => IE
problem: cannot easily find negative examples
Bollegala et al.
- segment = defined a la text tiling,
- svm to combine data points
- similar to hac in binary bottom-up agglomerative but recovers an ordering as well.
- automatic evaluation is a version of bleu
Masaaki Nagata, Kuniko Saito, Kazuhide Yamamoto and Kazuteru Ohashi
prev work: tillman-zhang 05
their new model: {monotone,reverse}{adjacent|gap}
gapping wrt the source language.
gapping appears often in p SMT in japanese/english pairs (verb final)
''hendra'' read this. especially in terms of training sentence coverage and reordering and gap histograms
Cheng-Zen Yang, Che-Min Chen and Ing-Xiang Chen
* sarawagi's 03 cross trained svm
* q:
Christoph Tillman and Tong Zhang
Che Wanxiang (Min Zhang)
Has SRL Demo
dictionary from PropNet and VerbNet
like other SRLs doesn't handle/tag copula as no predicate
use CoNLL dataset (post processed from PropBank)
vector rep of parse tree <# of subtrees of config 1, # of subtrees of config 2... config n>
exponential number of features
use kernel function to solve dot product (after collins / duffy, moschitti 2005?)
the idea: split path info and constituent portion into two feature spaces then linearly combine
problem/observation noted: constituent too big
validate using only wsj sections 2-5
do soft margin classification by tuning C.
my observation: not all subtrees in constituent are useful. they use rule 1 in preprocessing to remove most of constituent tree.
Hung-Ming Yu, Wei-Ho Tsai and Hsin-Min Wang
* background music reduction
* observation: query should match from line start not middle (using BIC)
Yupeng Fu, Rongjing Xiang, Min Zhang, Yiqun Liu and Shaoping Ma
PDD = person description document
Idea: build description of person and then do retrieval on these documents
describe person using keywords
* web listing pages of people form context for each description
* word pair is basically a density based metric?
* based on bm25
q: how about blogs? resume? they say gen web docs not clean.
''got best result in trec enterprise 2005'' on expert finding task.
Chi-Ho Li, Minghui Li, Dongdong Zhang, Mu Li, Ming Zhou and Yi Guan
handling long distance reordering
use syntax to do this
key idea: generate n-best reordering to be used at decoding time, rather than 1-best
incorporate prob of reordering as another feature in the decoding log linear model
if they use only 1-best, shows negative effect. Needs to use multiple n-best in order to capture
q (dekai wu): trinary productions actually can be explained as a binary productions
Toshiyuki Shimizu and Masatoshi Yoshikawa
Benefit: benefit is geq child elements
Effort: independent of query, less or equal to sum of reading effort of child
similar to set cover alg of shiren in summarization
Dmitry V. Khmelev and William J Teahan
- In SIGIR '03
- highlighted, printed and filed
- related to plagiarism detection, webpage similarity, corpus verification, PARCELS.
Simple repetition of text substrings for plagiarism and duplicate detection. The formula involves computing a concatenated suffix array for an entire set of documents. The idea is to use not only the single longest common substring but a sum of the longest common substrings across all prefixes of a target document.
The R measure is apparently good not just for duplicate detection but also for authorship detection in the test corpora demonstrated in their paper.
To think about: how to adapt this measure to have an effective (and speedy) tool for web page fragment classification and classification.
too many online teaching resources
* instructional architect (IA)
* lets people wrap text around nsdl resources
small grain resource can fit in, require teachers to use/wrap text around, improvise
large grain resources don't fit as well.
Guoping Hu, Jingjing Liu, Hang Li, Yunbo Cao, Jian-Yun Nie and Jianfeng Gao
different features across different entity search types.
4 features in their approach:
1) word features (tf*idf); 2) position features; 3) title; 4) structure (tree / section processing).
Jin-Kyu Park, Eenjun Hwang and Yunyoung Nam
Do CBIR for (tree) leaf images
Other ways:
* leaf contour (perimeter)
* center counter distance (distance from center to edge)
Instead, use leaf vein shape and contour to do CBIR. Extract vein contour shape.
do corner detection to detect intersection/branching and ending point.
li haizhou 1:50-2:15
min zhang 2:30-2:55
mstislav maslennikov 2:55-3:20
Min Zhao, Hang Li, Adwait Ratnaparkhi, Hsiao-Wuen Hon and Jue Wang
for metasearching ranking of different search results
* use standard bm25
* also use click through distribution to rank, learn by NB.
* probably not too helpful as-is on web queries where noise is a larger concern.
Zhaoqi Chen, Dmitri Kalashnikov, Sharad Mehrotra
-use entity relationship graph as well as intrinsic sim
-do consolidation
-handles robustness
www.ics.uci.edu/~dvk
S Kriewel and N Fuhr
good list of toolbox for doing acad search, need to think about these for auto methods
- what suggestions didn't work?
- explaining instead?
- q: sparseness, strategy for ff extension for google
- firefox extension
It's a reading day!! Hooray!
printed out most related work on alignment.
going to read them now -- wow mostly are from Japan on the PRESRI system, it seems.
It's still surprisingly hard to find relevant papers, even for this project. Gotta think about how to find appropriate venues for searching.
Let's see (I should really dump these to citeulike but I'm not going to bother):
* Automatic Slide Generation Based on Discourse Structure Analysis (IJCNLP 2005) - Shibata, Kurohashi: deep nlp on raw text (not necessarily scholarly texts) - discourse analysis (intra, inter sentence). It's really summarization since they just simply split the resulting text into multiple slides after every 12th line.
* Automatic Slide Presentation from Semantically Annotated Documents: Utiyama, Hasida (ACL Coref workshop) - uses Global Document Annotation tags (GDA) GDA approximates today's task of semantic role labeling. Uses topic detection by non-stopword bigrams and frequency threshold of 2. Then use spreading activation to the network (syntactic stuff represents links). Slide generation is a bit more interesting. Namely, they use redundancy removal and coref pronomalization and other editing to make the slide more fluent.
Note that both of these approaches don't explicitly use corpus information which we are.
Genre detection:
* Automatic Detection of Survey Articles (ECDL 05)- Nanba, Okumura: use pHITS plus text features such as title word, cue phrases, and their own citation types. Best non pHITS features are cue and title words.
Hmm, how about alignment itself? Let's start with slide/paper. Note that Hayama et al. reference quite a bit of other work but all of it is in Japanese. Help!
* ''gotta read this one again!'' Alignment between a Techinical Paper and Presentation Sheets Using a Hidden Markov Model - Hayama, Nanba and Kunifuji (AMT 05): Improves Jing's model by using content analysis. They observe three problems with Jing's model: 1&2) deleted and added words cause problems for HMM (this was observed by Jing herself; see summary below), 3) Similar word sequences happen very often in slides, causing problems with prob estimation using Jing's heuristic rules. They improve the approach by:
** degree of alignment: using the slide as a whole bag of words rather than as a word sequence. this is a good idea, similar to doing simple vsm (only on the slide side though).
** considering match sequence length: a longer sentence match should match better. I think this is better encoded as an alignment gap penalty, something like affine gaps.
** alignment using position gaps: their position gap constraint is a bit like our diagonal constraint. I wonder whether it helped much.
** using heuristic rules for titles: title words get a bonus.
However in the end, they only get a minute improvement, over 49 presentation/document pairs.
* Detection and Resolution of References to Meeting Documents - Popescu-Belis and Lalanne (MLMI 05): using anaphora resolution to do alignment. not much said here that really links their technique to performance. Gotta go read their ICDAR 2005 paper and their CIKM 2004 paper.
* Using a Bi-modal Alignment and Clustering Techniques for Documents and Speech Thematic Segmentation: Mekhaldi, Lalanne and Ingold (CIKM 04): they are actuall considering the opposite problem - ''improving'' theme segmentation (a la TextTiling) using bi modal alignment information. They do single modality thematic segmentation first, then do similarity calculation across //nm// units Note that their segmentation is done simultaneously across both media. Then clustering via k means and reprojection to each single media.
And for abstract to paper?
* Using Hidden Markov Modeling to Decompose human-Written Summaries - Jing (CL 02): (according to Hayama et al): generates a HMM from the word sequence in the summary to predict the position / occurrence of corresponding word in the source document. Only considers lexically identical words. Considers 6 possible alignments, giving higher probability to heuristically more plausible alignment. 1) same sentence, adjacent word, 2&3) same sentence, but next or back words, 4) within window of next sentence, 5) within window of back sentence 6) otherwise. Problems with inserted words (corrected by postediting phase, section 3.4) by finding isolated words that have been misaligned.
* SpectralClustering - one problem here is that it assumes no direction, doesn't (naturally) model the fact that the paper is the source of the presentation.
* JumpAndRewrite - PBHMM phrase based HMM by Daume and Marcu; Jing
Implementations so far:
* text sim methods implemented: jaccard, bigram + unigram jaccard, bigram only jaccard, cosine with only TF, unigram
* align methods: jing with majority rounding, max.
* mmStats: records relative jump probabilities including na probabilities as in daume + marcu's work
Implementation must also handle non-aligned slides.
* Example
* Slides at the end of a presentation, backups
* Outline slides
* Conclusion and question slides.
Frederick G. Kilgour
JASIST 51(1):74-80, 2004
- Relevant to: Known Item queries, query rewriting
- Printed and Filed
- Available at: LINC
This paper goes into historical detail on past query retrieval studies on known items. Kilgour investigates known-item query studies from the era of card catalogs. Some notable results distilled from this survey of earlier work includes facts useful for our current study of known item queries: Tagliacozza et al. (1970) notes that users had a higher likelihood of having correct title information rather than correct author information. Also that title searches are more common in today's OPAC than in the older card catalog systems, although I concur with Kilgour that this is largely an artifact of only having limited title entries in the card catalog system.
Torsten Zesch and Iryna Gurevych
wikipedia category graph wcg
conclude small world, scale free graphs
wikipedia categories "mostly" organized by hierarchy - how do distinguish?
Jovan Popovic
give higher-level primitives for creation of shapes
edit at higher-level
capture shapes and 3d at higher level
Penelope Sanderson, Queensland
regularization to prevent overfitting in language model for unseen data.
regularization by lasso methods = optimize versus a loss function
Boosting is a greedy algorithm that has overffitting. Employ shrinkage to minimize this problem.
Incorporate overfitting into Boosting by proposing Boosting Lasso (BLasso). Forward and backward steps, where backward steps allow model simplification while continuing to minimize Lasso loss
Agenda:
Look up Koh Hian Chye
Techsource partnership?
copy premia website to laptop to bring to agm
wording for certificate for student ICPR awards
nominate people
jenny lim from STB
6:30 arrive
Elena Filatova, Vasileios Hatzivassiloglou and Kathleen McKeown
contributions
* create an evaluation measure for domain template extraction
1. verb centric: starting point is to identify most important verbs using frequency based methodologies. Verb instance frequency (VIF, something like tf.idf for verbs wrt classes). how about RF for text categorization or log odds?
2. IR find all sent with top k (= 50) sentences and parse for syntax
3. mine out trees, looking for syntactic agent/patient. ok, trying to recover semantic roles w/o semantic role labeler. a bit odd.
4. generalize tags (didn't do this for verbs so its also a bit strange). But do it for subject object and other roles in a two step procedure => 1. sub NE for general tags 2. merge frequent subtrees
5. union all tuples to form the scenario template. ax out any that are not specific to the domain.
the template extracted have ordering constraints (because they didn't use a role labeler) and seem to consist of 2 tuples only. they call these ''slot structures''.
* can we get their training and testing data sets for comparison?
Lee et al. (UCLA group)
* info vs nav only
* used cs queries originating from UCLA only (as group is most knowledgeable about own queries)
* used click distribution and anchor text distribution
** click distribution with thresh \tau = 1.5 gives 80% accuracy. Note most misclassification occurs with info queries here).
** anchor text with thresh \tau = 1.0 gives 75% accuracy. Note most misclassification occurs with nav queries here).
* also has summary of kang and kim's work (anchor usage rate, query term distribution and term dependence)
Simone Teufel, Advaith Siddharthan and Dan Tidhar
* citation function + ''short summary''
* annotation guidelines done w/o rt domain knowledge
* 12 cats
* sims not frequent, as contrast is then expected
''BUG'' - how about higher level support only ?
* anaphora? sentence discourse effects?
* bonnie: domain specific
* john prage: information digitalization/visualization
Ben Wellner and James Pustejovsky
prob: find head of two args of a discourse connective
uses penn discourse tree bank (PDTB)
heuristic pruning for cand selection to keep complexity down
log linear rerank model integrating simple model for both arguments independently then re-ranking
diff error types for arg1 arg2.
SN Koh
robust speech recognition, denoising
AURORA corpora
future work: bilingual speech recognition
seems to be work joint with li haizhou
This directory contains scripts to start and stop the cs system as well as invoke the various daemons that run facilities for cs.
* start-citeseer
* stop-citeseer: the following three services do not stop when cs is stopped through this script. I adjusted this script to change it to kill off queryd, rankd and dlLimiterd as well.
* queryd
* rankd
* dlLimiterd
The speedy cgi caches the perl scripts and does not (seem to) notice changes unless all of cs is shutdown. You have to stop-citeseer and start-citeseer for changes in binDirectory or libDirecotry to be reflected.
Wei Wang, Kevin Knight and Daniel Marcu
GHKM Galley et al. 2004
problem with ptb trees being n trees and arguments that don't exactly match
not rule binarization, but actually binarization source lang trees
have to decide which way (l/r) to do binarization, or since syntax based, try head based binarization.
their sol'n: do all binarization, save in a forest
yuck: they have forest based alignment to adapt GHKM.
better idea: find best using adapted EM
seems a bit like compensation artifact of needing to use CKY binarization constraint
Class Assocation Rules (CAR) but with min support and min confidence pruning turned off.
reason: people want to see the context of the rules
helpful in motorola case study
showed engineers want to see actionable rules = short, two to three attribute rules with trend analysis
Makes me think of limits of human perception. Bing asked us about the expressability of the hypothesis space given his rules only give about 2-3 attribute rules. I think classic ML theory deals with hypothesis space quite well. Am I mistaken?
Naveen Sivadasan
large corpus, 15M abstracts, .5M citations added/yr
given corpus, extract interaction of prot, molecule, enzyme
has workflow
search for experimental conditions, methods
relevant patterns
search / extract / ask q / find essense / ''correlate''
over increasing vol of data
over diff source of data
under time pressure
domain knowledge presentations in 2nd day
Franz Och (Google)
* auto scoring helps community / using standard corpora
** best mt system achieve near- or beyond-human bleu scores (google arabic english translation 110% of human, but human translation actually nice).
** bleu score favors statistical systems - only large improvements are statistically significant - 25k word test corpora.
* standard model architecture - 1) stat word alignment 2) phrase based smt 3) log linear feature model 4) discriminative training to optimize against eval metric
* to date: nlp systems not used - hurts performances, really hard to integrate (wu&carpuat05 - > doesn't help to use WSD)
* current problems (!!)
** named entitites
** dictionaries/data - OOV
** morphology
** syntax - wrong dependencies
** word allignment
* translation models: 100 m to 1 b, language model (monolingual) 5 b > 1 t words
* data scaling
** 2x monolingual + .5 BLEU, 2x parallel + 2.5 BLEU (% BLEU)
* google cluster infrastructure - barroso et al IEEE Micro volume 23, issue 2 march 2003.
* ''language model requires only few number of bits?? (down to 4 bits)''
* target for rare features
* sentence specific LMs - project LM or phrase table to target sentence.
MapReduce apps:
* lm training
* EM training for word alignment
* phrase extraction
Diversity versus Universality (Tsuji): sentence not as significant in Asian languages
wa, koso, sura, dake, mo (case and topic/theme marking together in one particle) => topic packing different?
Asian anaphoric ref and other discourse refs don't map well to English. semantics of "sentence" conditional on context (more naturally occurring in CJK)
Linguistic Differences for NLP (Tsou): higher entropy. genitive construction -> ambiguity.
Corpora (Bhattacharaya): morphological mining: syncretism (exhibiting same surface form for different cases). syncretism dealing with ambiguity / WSD. homology. Idea -> deal with less entropic languages first.
(Bird): digital divide in asian languages. interesting students in native language topics. publication quality for applications in different languages.
(Calzolari): Basic Language Resource Kit.
(Maxwell): minor languages deserve special session.
* children want entirely different way to read books -> scrollable interface
* survey 12 children and supporting adults
* have available 152 transcripts and tech report
* many languages may it difficult for an interface
* preferred reading physical books but use ICDL for searching (especially since device too expensive to use, operate and insure)
transition to use the book or to use the device?
Chao Wang, Michael Collins and Philipp Koehn
syntactic reordering. rule based system with weights?
local reordering not very effective in terms of improvements in BLEU, perhaps due to phrase table capturing it.
did a study of which rules at are applied.
Stanford - Stat MT, NER, Speech Reco (Bayesian Inference Models)
New project: looking at collaboration networks and analyzing work for seeing whether interdisclipinary centres really worth $$$
- coauthor ship
- textual analysis
Pascal RTE participation - slide 3
* not just textual overlap
s4:
~tilde operator (semantically similar)
* can we build such tools for sentences, paragraphs?
parse tree to dependency parse relations
Current work:
* predicate argument structure
* natural logic => very cool (broadening ok, narrowing ok in negative contexts)
* downward montonicity comes from negation in implicit contexts too.
s45 give to qiul natural logic
--------------
evidence based retrieval
complexity of analysis
data structures for indexing
how to increase coverage
CiteSeer (hereafter, cs) is divided into several directories. The main ones are bin/, db1/, lib/, papers/. Several of the directories in our distribution are empty.
cs is written entirely in perl, and has a lot of legacy code. It has been refactored into a number of perl modules, which are contained in the lib/ subdirectory.
We can look at each of these directories separately:
- BinDirectory: scripts for starting, shutting down and running maintenance on the cs system.
- DB1Directory: databases for keeping the forward and backward links used by the system. About 50 gbs or so.
- LogDirectory: logs of the searches and other things that happen (index, spidering events) in citeseer.
- LibDirectory: common perl libraries for running things in bin/.
- PapersDirectory: the bulk of cs, cached copies of all the papers it indexes that it kept. About 1.8 tbs.
To get an idea of what happens when a query is issued you might also want to look at the tiddlers that are tagged as {{{walkthrough}}}. They walk through one screen and focus the procedure calls to get the information that are output to the screen.
Rainer Lienhart and Alexander Hartmann
J Electronic Imaging 11 (4), 1-0 (Oct 2002)
Available: in paper only, LINC has some problems getting this through OpenURL
Relevant to: image classification, non-photographic image classification
Printed, highlighted, and filed.
The authors examine web image classification over a database of 300,000 images. They divide the non-image categories into presentation slides, comics/cartoons and other. Our classification for Fei's project is a bit more comprehensive but not motivated by corpus study.
The work is a machine learning feature oriented work, achieving high accuracy using simple image only (raster based) features. Colormap and proportion of the picture wrt to the colormap seems to be some of the most salient features.
For the top level photo/non-photo classification, AdaBoost was used (similar to our work) and feature pruning is inherently done through decision stump feature selection, The highlighted features show that the four features for classification include: 1) total colors 2) what is the prevalent color 3) fraction of pixels with distance > 0 (f1) and 4) ratio of of f1/f2, where f2 is similar to 3) but using a high threshold rather than zero. Surprisingly, edge detection (an expensive feature) doesn't appear to be too useful. All selected features were based on the colormap and not on the locality / placement of the pixels in the image. Dimension features were not used.
For the non-photo classification, text proves to be an important feature, and they capitalize on their group's previous research to detect text. Here, edge detection proves to be the second most useful feature after aspect ratio, which according to Table 3, accounts for over 95% accuracy. This leads me to believe that an optimized Hough transform for only vertical lines may be able to be used, to lower the complexity of the feature extraction. Also, presentation slides exported from powerpoint and others might be detectable by their embedded metadata rather than raster data properties.
Strengths:
- demonstrates that colormap features are a very strong key for non-photograph image classification.
- also does some error analysis that illustrates some borderline cases.
- uses only jpeg compressed images for their study.
Weaknesses:
- no information about the kappa or percentage agreement between assessors. It is presumed that the task is easy and 100% doable.
- 95% + accuracy in non-photograph classification only subdivides into two classes: comics vs. presentation slides.
Katsuro Inoue (Osaka Univ.)
clone can be motivated by efficiency (copy code rather than proc call which is more expensive)
* type 1 - identical
* type 2 - as given example (clones can have different identifiers but largely are structurally identical)
* type 3 - semantically sim, but syntax very diff
* Program Dependency Graph (PDG) vs. AST.
** sub-graphs are idnetified as code clones R Monodoor and S Horwitz Using slicing to identify duplication in source code ISSA
* AST ref: ID Baxter A Yahin et al. Clone Detection Using Abstract Synax Tree ICSM 98. - made commercial tool: CloneDR.
** Shortcoming: syntax of two progs need to be syntactically comparable
* Metrics
** by binning - J Mayland, C LeBlanc and EM Merlo Experiment on the automatic detection of function clones in a software system using metrics
* Token-based
** T Kamiya et al. see ToSE CCFinder - canonicalize identifiers then index using suffix array to find repeated. finds lcs subsequences. http://ccfinder.net/ccfinderx.html
** Libra - IR system for source code fragments - also SparsJ
Mike Goffley - Maintenance
* also see cp-miner ToSE in march 2006 and MOSS Aiken's UCBerkeley system
Background: #ffc
Foreground: #000
PrimaryPale: #fc8
PrimaryLight: #f81
PrimaryMid: #b40
PrimaryDark: #410
SecondaryPale: #ffc
SecondaryLight: #fe8
SecondaryMid: #db4
SecondaryDark: #841
TertiaryPale: #e88
TertiaryLight: #c66
TertiaryMid: #944
TertiaryDark: #633
Vasudeva Varma
With Demo by Prasad Pingale
Encoding ingest to Unicode8
(2nd byte preserves phonetic similarity) artifact of Unicode
Search in CLIR
Indian Lang have larger alphabet (all phonetically)
Lots of spelling variations
/***
|''Name:''|CryptoFunctionsPlugin|
|''Description:''|Support for cryptographic functions|
***/
//{{{
if(!version.extensions.CryptoFunctionsPlugin) {
version.extensions.CryptoFunctionsPlugin = {installed:true};
//--
//-- Crypto functions and associated conversion routines
//--
// Crypto "namespace"
function Crypto() {}
// Convert a string to an array of big-endian 32-bit words
Crypto.strToBe32s = function(str)
{
var be = Array();
var len = Math.floor(str.length/4);
var i, j;
for(i=0, j=0; i<len; i++, j+=4) {
be[i] = ((str.charCodeAt(j)&0xff) << 24)|((str.charCodeAt(j+1)&0xff) << 16)|((str.charCodeAt(j+2)&0xff) << 8)|(str.charCodeAt(j+3)&0xff);
}
while (j<str.length) {
be[j>>2] |= (str.charCodeAt(j)&0xff)<<(24-(j*8)%32);
j++;
}
return be;
};
// Convert an array of big-endian 32-bit words to a string
Crypto.be32sToStr = function(be)
{
var str = "";
for(var i=0;i<be.length*32;i+=8)
str += String.fromCharCode((be[i>>5]>>>(24-i%32)) & 0xff);
return str;
};
// Convert an array of big-endian 32-bit words to a hex string
Crypto.be32sToHex = function(be)
{
var hex = "0123456789ABCDEF";
var str = "";
for(var i=0;i<be.length*4;i++)
str += hex.charAt((be[i>>2]>>((3-i%4)*8+4))&0xF) + hex.charAt((be[i>>2]>>((3-i%4)*8))&0xF);
return str;
};
// Return, in hex, the SHA-1 hash of a string
Crypto.hexSha1Str = function(str)
{
return Crypto.be32sToHex(Crypto.sha1Str(str));
};
// Return the SHA-1 hash of a string
Crypto.sha1Str = function(str)
{
return Crypto.sha1(Crypto.strToBe32s(str),str.length);
};
// Calculate the SHA-1 hash of an array of blen bytes of big-endian 32-bit words
Crypto.sha1 = function(x,blen)
{
// Add 32-bit integers, wrapping at 32 bits
add32 = function(a,b)
{
var lsw = (a&0xFFFF)+(b&0xFFFF);
var msw = (a>>16)+(b>>16)+(lsw>>16);
return (msw<<16)|(lsw&0xFFFF);
};
// Add five 32-bit integers, wrapping at 32 bits
add32x5 = function(a,b,c,d,e)
{
var lsw = (a&0xFFFF)+(b&0xFFFF)+(c&0xFFFF)+(d&0xFFFF)+(e&0xFFFF);
var msw = (a>>16)+(b>>16)+(c>>16)+(d>>16)+(e>>16)+(lsw>>16);
return (msw<<16)|(lsw&0xFFFF);
};
// Bitwise rotate left a 32-bit integer by 1 bit
rol32 = function(n)
{
return (n>>>31)|(n<<1);
};
var len = blen*8;
// Append padding so length in bits is 448 mod 512
x[len>>5] |= 0x80 << (24-len%32);
// Append length
x[((len+64>>9)<<4)+15] = len;
var w = Array(80);
var k1 = 0x5A827999;
var k2 = 0x6ED9EBA1;
var k3 = 0x8F1BBCDC;
var k4 = 0xCA62C1D6;
var h0 = 0x67452301;
var h1 = 0xEFCDAB89;
var h2 = 0x98BADCFE;
var h3 = 0x10325476;
var h4 = 0xC3D2E1F0;
for(var i=0;i<x.length;i+=16) {
var j,t;
var a = h0;
var b = h1;
var c = h2;
var d = h3;
var e = h4;
for(j = 0;j<16;j++) {
w[j] = x[i+j];
t = add32x5(e,(a>>>27)|(a<<5),d^(b&(c^d)),w[j],k1);
e=d; d=c; c=(b>>>2)|(b<<30); b=a; a = t;
}
for(j=16;j<20;j++) {
w[j] = rol32(w[j-3]^w[j-8]^w[j-14]^w[j-16]);
t = add32x5(e,(a>>>27)|(a<<5),d^(b&(c^d)),w[j],k1);
e=d; d=c; c=(b>>>2)|(b<<30); b=a; a = t;
}
for(j=20;j<40;j++) {
w[j] = rol32(w[j-3]^w[j-8]^w[j-14]^w[j-16]);
t = add32x5(e,(a>>>27)|(a<<5),b^c^d,w[j],k2);
e=d; d=c; c=(b>>>2)|(b<<30); b=a; a = t;
}
for(j=40;j<60;j++) {
w[j] = rol32(w[j-3]^w[j-8]^w[j-14]^w[j-16]);
t = add32x5(e,(a>>>27)|(a<<5),(b&c)|(d&(b|c)),w[j],k3);
e=d; d=c; c=(b>>>2)|(b<<30); b=a; a = t;
}
for(j=60;j<80;j++) {
w[j] = rol32(w[j-3]^w[j-8]^w[j-14]^w[j-16]);
t = add32x5(e,(a>>>27)|(a<<5),b^c^d,w[j],k4);
e=d; d=c; c=(b>>>2)|(b<<30); b=a; a = t;
}
h0 = add32(h0,a);
h1 = add32(h1,b);
h2 = add32(h2,c);
h3 = add32(h3,d);
h4 = add32(h4,e);
}
return Array(h0,h1,h2,h3,h4);
};
}
//}}}
update summary task focus
lm background model building by waterloo
umd
multiple alternative sentences compressions - like fergus
using grammar as restrictor
mmr version accounts for synonyms - use e-f-e bridge for paraphrase generation with frequency
eval
pyramids - what is included what is not good for error checking
still quite big gap between top pyramid and gaps?
some way to see which summaries really hard
d0739 - eg. really hard to id
2008 every November colo proceeding TREC (monday and tuesday)
- submissions due in summer
- held at Gaithersburg
- collab with IR researchers
- send hoa email about any related duc work just citation not pdf
- biggest change - qa comes to duc from trec
- entailment / inferencing track?
- main task dropped
- update task to become primary task, to include noisy documents in 2009
- longer sequence is really difficult because event evolution is generally short
- each cluster would still have same amount (eg 10) articles per cluster
- nist assessors to do nugget based / pyramid based annotation and scoring
- majority cost is in making 4 manual summaries
pilot task
- blogs opinions = doc is a post plus comments on it
- univ of glasgow
- clustering opinions and summarizing them
- maybe a classification task? IE task rather than summarization generation task?
- assign wangxuan to do this task?
- how to create a summary of a blog from nist? lucy asked about this.
- paul jones ibiblo
john conroy did linear fit with responsiveness with rouge (automatic). if we improve rouge do we improve content responsiveness.
lucy: reflection on duc / post duc analysis
what differentiates work in duc vs trec?
- we do assessment without assessment pooling
- we do fluency evaluation
Transformation based dependency parsing
transform trees for the purpose of learning to better help replication.
* change prague style annotation to melcuk style annotations. melcuk style annotation gives more tree dependency encoding, which might help in learning trees.
* focus on coordination and verb groups
originated in prog language - aho 1969
2006 had some papers for nl to lf
- pruning n best list by entropy? nah - argreement, systemic syntax problems. David says introduce syntax.
- rank3 trick from both sides?
- get cites for synch cfg for pls.
Chaomei Chen et al. (Drexel)
information overload - find who's at the forefront of the research
accounting for the timeliness of information is the key aspect of this work
especially when dealing with large datasets as in astronomy (gray&szalay 2004) sloan digital sky survey (sdss)
ackerman study why key areas explode:
* increase in multi author paper
* presence of one or small group of seminal papers
Use '''H index''' to classify, but modify to include H_c H_t
* where S_t measures the recent impact for relatively recent citations than earlier ones
* where S_c is adjusted for publication age
use burst detection, only looking at surface text
*
/***
|''Name:''|DeprecatedFunctionsPlugin|
|''Description:''|Support for deprecated functions removed from core|
***/
//{{{
if(!version.extensions.DeprecatedFunctionsPlugin) {
version.extensions.DeprecatedFunctionsPlugin = {installed:true};
//--
//-- Deprecated code
//--
// @Deprecated: Use createElementAndWikify and this.termRegExp instead
config.formatterHelpers.charFormatHelper = function(w)
{
w.subWikify(createTiddlyElement(w.output,this.element),this.terminator);
};
// @Deprecated: Use enclosedTextHelper and this.lookaheadRegExp instead
config.formatterHelpers.monospacedByLineHelper = function(w)
{
var lookaheadRegExp = new RegExp(this.lookahead,"mg");
lookaheadRegExp.lastIndex = w.matchStart;
var lookaheadMatch = lookaheadRegExp.exec(w.source);
if(lookaheadMatch && lookaheadMatch.index == w.matchStart) {
var text = lookaheadMatch[1];
if(config.browser.isIE)
text = text.replace(/\n/g,"\r");
createTiddlyElement(w.output,"pre",null,null,text);
w.nextMatch = lookaheadRegExp.lastIndex;
}
};
// @Deprecated: Use <br> or <br /> instead of <<br>>
config.macros.br = {};
config.macros.br.handler = function(place)
{
createTiddlyElement(place,"br");
};
// Find an entry in an array. Returns the array index or null
// @Deprecated: Use indexOf instead
Array.prototype.find = function(item)
{
var i = this.indexOf(item);
return i == -1 ? null : i;
};
// Load a tiddler from an HTML DIV. The caller should make sure to later call Tiddler.changed()
// @Deprecated: Use store.getLoader().internalizeTiddler instead
Tiddler.prototype.loadFromDiv = function(divRef,title)
{
return store.getLoader().internalizeTiddler(store,this,title,divRef);
};
// Format the text for storage in an HTML DIV
// @Deprecated Use store.getSaver().externalizeTiddler instead.
Tiddler.prototype.saveToDiv = function()
{
return store.getSaver().externalizeTiddler(store,this);
};
// @Deprecated: Use store.allTiddlersAsHtml() instead
function allTiddlersAsHtml()
{
return store.allTiddlersAsHtml();
}
// @Deprecated: Use refreshPageTemplate instead
function applyPageTemplate(title)
{
refreshPageTemplate(title);
}
// @Deprecated: Use story.displayTiddlers instead
function displayTiddlers(srcElement,titles,template,unused1,unused2,animate,unused3)
{
story.displayTiddlers(srcElement,titles,template,animate);
}
// @Deprecated: Use story.displayTiddler instead
function displayTiddler(srcElement,title,template,unused1,unused2,animate,unused3)
{
story.displayTiddler(srcElement,title,template,animate);
}
// @Deprecated: Use functions on right hand side directly instead
var createTiddlerPopup = Popup.create;
var scrollToTiddlerPopup = Popup.show;
var hideTiddlerPopup = Popup.remove;
// @Deprecated: Use right hand side directly instead
var regexpBackSlashEn = new RegExp("\\\\n","mg");
var regexpBackSlash = new RegExp("\\\\","mg");
var regexpBackSlashEss = new RegExp("\\\\s","mg");
var regexpNewLine = new RegExp("\n","mg");
var regexpCarriageReturn = new RegExp("\r","mg");
}
//}}}
Alexander Yates, Stefan Schoenmackers and Oren Etzioni
* QA parsing does implausible parsing, phrase attachment
* ''BUG'': look at text runner paper paper 2006. is any of their tuple data available
* chris manning: differentiate between fishy parsing and value of using the web data.
* have four filters to try to prove/correct parse
Jeffrey Pomerantz, Sanghee Oh, Barbara M. Wildemuth, Seungwon Yang, Edward A Fox
ask jeff for modules
http://curric.dlib.vt.edu/
Danushka Bollegara, Yutaka Matsuo, Mitsuru Ishizuka (ECAI 06)
find key terms for each namesake in a collection
assume one namesake per article (one sense per discourse)
c-value nc-value need to read k t franzi s. ananiadou
used group averaged HAC to do clustering
compared against baseline TF*IDF. hard to say whether it actually does well; no analysis.
Saif Mohammad and Graeme Hirst
* only look at distributional similarity
* use coarse sense to limit complexity (1000^^2^^, instead of 100000^^2^^)
* first pass build base WCCM: category and a word -> just get the primary sense?
* is their human data available? papers on edge weights?
* eval: do word correction in context; doing hirst and st-onge correction ratio
* Q: what about antonyms?
Bryan Chee and Bruce Schatz
use lexical network
then use graph clustering algorithm to find config with best modularity
cluto clustering software C++
http://beespace.uiuc.edu/
PMI as an indicator of what? -> collocation?
use only mid frequency terms then do PMI calc
Xiaojun Wan, Jianwu Yang and Jianguo Xiao
Not actually ranking, but re-ranking initially relevant documents
* texttile all documents first
* manifold ranking structure
* tiling done on query "doc" as well, queries are from LDC TDT-3
** compared doc + MR and doc + TextTiling
** only seems to work for few (<75) initial documents.
George Buchanan (Swansea) (with fernando)
- information literacy in ugrads changes all the time
- info needs (as serendipity may push new goals from focus goal)
judging first glance relevance
needed to give relevance ratings rather than 1/0, unlimited time
result list method different when present
how many documents? 20 docs
paper: 23
result: 17 (1/5th time use just result list)
electronic folder: 28
headers and captions more useful than title - "titles can be misleading"
conclusion works for paper but not for electronic mode, why?
figures and tables more looked at.
34% docs never scroll and looking only above the fold (1/3 of page)
64% only first page
search using ctrl-f not much
scrolling more painful with electronic mode
doc length - more negative effect on paper
"first glance <10 sec" - relevance made - other time to only confirm
q: unlimited time really correct exp setup?
q: interference in context to first glance relevance?
q: image navigation?
All cs applications have to first initialize and read the database. After which the $did variable can be altered (e.g., set to 1 - MAXDOCS) to retrieve the hash of information for the document.
Go to 1.html of the server. It should show the document "36 problems for semantic interpretation" by Scheler. There are several areas of the results page:
The light blue title box, containing:
* a left hand panel:
** title
** author
** hyperlinks
* a right hand panel
** view or download
** cache links
** from (source) information
The body, containing:
* summary
* grey rating box
* abstract
* similar documents
* bibtex
* citations
* samesite
* hyperlinks about online articles
* cs credits: these are generated by the $footnote variable.
Aside from the cs credits, these are generated from the call to DocumentToHTML
DocumentToHTML calls DocumentDownloadBar as
&DocumentDownloadBar ($hit, $param, \%SourceHTML, $sArticleLink);
then the code below is from DDB:
my $sURL = $$hit{'url'};
if (defined $$hit{'homepages'} && $$hit{'homepages'} =~ m/\b(?:url|file)\s*=\s*(\S+)/si) {
my $url = $1; if ($url =~ /^(http|ftp):\/\// && $url !~ /\?/) { $sURL = $url; }
}
PSHREF is a variable that is the actual url written. it gets a new value from the sURL variable that hits the logging facility of cs.
my $sPSHREF = &RedirectHREF ($sURL, $param, $profile::nDownloadValue, 'Download'); # my $sPSHREF = &RedirectHREF ($sURL, \%param, 0, 'Download');
if ($$hit{'locatedon'} !~ m/^0 /) {
print $sPSHREF . &URLShortLinkMax ($sURL, 36) . "</a><br>\n";
}
Yee Seng Chan and Hwee Tou Ng
1:37 - 1:55 - 2:02
EM predictor
sense priors - saerens et al 2002 em based method
count merging - active learning to assign diff weights for diff examples
- em part not even worth explanation too brief
- count merging need to be tuned, how tuned?
- results given by % but not with respect to training data or abs counts, how are they similar?
- - sense prior part quite confusing
- purple triangles not the same number
- nb based wsd - what about for other methods?
- count merge helps more for less data, why?
[[EditSf0|file:///M:/public_html/knmnynWiki.html]]
[[EditMacStick|file:///Volumes/IMATION%202G/knmnynWiki.html]]
Robert Capra, Gary Marchionini, Jung Sun Oh, Fred Stutzman, Yan Zhang
topic, genre, region, format facets for bureau labor stats data
three interface style: orig web site, relation browser, simple facet browser + breadcrumb
no breadcrumb trail
three task types: simple lookup (one facet w/ time + place), complex lookup, exploratory
bookmarklet to capture answer and page and highlighted span
lookup tasks are as easy using simple stuff as well as complex lookup
explore seems better on relation browser
two way anova
no significant difference on all almost all facets of the evaluation
qualitative use was better
* users recognize / value good organization
* ''need'' keyword search (maybe first before browse?)
Steven Garcia and Andrew Turpin
d gap analogy for document retrieval
cluster related documents together in index
brought up time sensitivity messing with results - actually seems quite stable.
50% to 300% speedup by access-ordering
Wisam Dakka and Luis Gravano
search results to generate multi-doc summarizatins
desiderata
*informative snippets: highligh essense
*browsing ability: navigate, to related stories
*speed: fast, online
baseline
* offline summary and match (irrelevant)
* or online summaries and clustering (slow)
hybrid
* reuse old clusters, merge w/ newly generated clusters
* online clusters and offline clusters tog