Web Based Linkage

Related Work

Abundant research on related problems

DB: approximate join, merge/purge, record linkage

DL: citation matching, author name disambiguation

AI: identity uncertainty

LIS: name authority control

In a nutshell, existing approaches often do:

For two entities, e1 and e2, capture their information in data structures,

D(e1) and D(e2)

Measure the distance or similarity between data structures: dist(D(e1),

D(e2)) = d

Determine for matching:

If d < threshold, then e1 and e2 are matching entities

Work well for common applications

Ours performs better when

Entities lack useful information

WIDM 2007