Problems in multimedia systems
We represent this situation in the following definition of the problem. Let us assume that we are given S1, S2 … Sn synchronized data streams belong to the space of multimedia data streams . These data streams have K types of data in the form of image sequence, audio stream, motion detector, annotations, symbolic streams, and any other type that may be relevant and available. Also, metadata for each of the streams MD1, MD2 …, MDn is available in the context of the environment. This metadata may include things like location and type of the sensor, viewpoint, angles, camera calibration parameters or any other similar parameters relevant to the data stream. Since a data stream is usually not directly very useful, some feature detectors must be applied to each data stream to obtain features that are relevant in the current environment. When features are based on time intervals, they will be considered as detected at the end of interval.
Given the above data environment, there are now many very interesting problems that one faces, including the following that are directly relevant to the main theme that we wish to address in this paper:
How to focus on the most relevant data in a particular data stream?
How to focus on the most relevant data in multiple correlated data streams?
For the given task, what is the minimum number of data streams required?
How does one sample the data streams? How can one minimize sampling for maximizing the efficiency?
Can one use alternate data streams to perform the same task with different costs?
Given that M streams are necessary for a given task, how does one combine the information from the data streams?
We believe that this issue of determining which data streams are relevant and even among those streams which ones provide most relevant information at any given moment is a very important problem that needs to be addressed and has been ignored in the current literature. Current multimedia systems, usually start with the assumption that there is a given set of N data streams, unfortunately in most cases N=1 making it a signal analysis rather than a multimedia problem, and one must deduce or extract all information from there to build the schema representing the environment. There are other issues related to semantics and indexing that we do not wish to address here.