To harness the rich amount of information available on the Web today,
many organizations aggregate public (and private) data to build
profiles for real world entities, and understand how these entities
evolve over time. Since a real world entity may be described by
different sources in various ways with overlapping information, and
possibly conflicting or even erroneous values, we need to collate data
records that refer to the entity, as well as correct any erroneous
values. We also need to understand how data records from different
sources are related to one another over time if they refer to the same
entity.
In this project, we develop a framework that interleaves record
linkage with error correction, taking into consideration the
reliability of data sources to lower the impact of erroneous values.
We also design a novel transition model that captures how attribute
values change over time, and a source-aware temporal matching
algorithm that jointly considers the value transitions and the
freshness of data sources to link temporal records to entities in the
right time period.
The goal is to obtain an increasingly complete and
up-to-date entity profile as more and more records are
aggregated from different sources.