Representation and
digitization of multimedia*
|
|
|
Week 2
Min-Yen KAN |
|
|
|
*Heavily scaled down from
original lecture outline :-( |
Media types in the DL
Distribution of media
types in the library
|
|
|
LoC NUS U Toronto |
|
Library Type Gov’t Acad Acad |
|
Books and manuscripts 19 M 2.2M 9.1 M |
|
Maps 4 M 278 K |
|
Photographs 12 M 22.1 K 622 K |
|
Music 2.7M 186 K |
|
Motion pictures .9 M 21 K |
|
CD-ROM Databases 1.4K 2.1 K |
|
|
|
Question: is the distribution of what
we’d like in the digital library the same as in the automated library? |
|
|
Outline
|
|
|
Representation / Digitization |
|
Textual images |
|
Images |
|
Audio |
|
Coordinated multimedia |
Textual images
Cost basis for archives
Digitization
|
|
|
|
Scanning |
|
Binding |
|
Planetary scanner |
|
|
|
Resolution of scan |
|
300 dpi* for access |
|
600 or higher for archival copy |
|
|
|
|
Digitization
|
|
|
|
|
Purpose: |
|
Archival |
|
Quality |
|
Stability in the
long term |
|
Accessibility |
|
Delivery |
|
Editing |
|
Annotation |
|
|
|
Initiate the digitalization project |
|
Establish start-up costs and secure
funding |
|
Prepare a detailed project plan include
milestones and deliverables |
|
Assess and select materials for
digitization |
|
Digitize materials (prepare source
materials, digitize, check quality) |
|
Post-process digital materials: edit,
OCR, store, catalog and index |
|
Deliver and make materials accessible |
|
Support and maintenance of materials |
|
|
|
-- From Chowdhury and Chowdhury (03) |
Document capture costs in
USD (ca. 1999)
Images of text
|
|
|
You’ve scanned in an image like this… |
|
|
|
What to do with it? |
|
|
|
How would we like to store and access
this information? |
Storing a textual image
|
|
|
|
Mostly bi-level (two-tone) until
recently |
|
|
|
CCITT Fax III and IV |
|
Bi-level transmission and storage
standard |
|
Optimized for Roman alphabet |
|
|
|
Textual image compression |
|
Codebook of marks |
|
A level for access and one for
preservation |
CCITT Fax IV
Slide 13
CCITT fax group IV
Textual image compression
|
|
|
Find and isolate marks (connected group
of black pixels) |
|
Construct library of symbols |
|
Identify the symbol closes to each mark
and get coordinates |
|
Store information |
|
*Store additional information to
reconstruct original image |
Library
Residue
Text image outline
|
|
|
|
Storage √ |
|
CCITT Fax Group III and IV √ |
|
Textual image compression √ |
|
|
|
Access |
|
De-skew |
|
Segmentation |
|
Media detection |
De-Skew
|
|
|
|
Projection profile |
|
Accumulate Y-axis pixel histogram |
|
Rotate to find most crisp histogram |
|
|
|
One of three common algorithms |
Segmentation
|
|
|
Top-down |
|
(e.g., X-Y cut) |
|
Bottom-up |
|
(e.g. smearing) |
|
|
Classification
|
|
|
|
Separate: |
|
Images |
|
Text |
|
Line art |
|
Equations |
|
Tables |
|
|
|
One technique: |
|
Slope Histogram (Hough transform) |
Hough Transform
|
|
|
A line-to-point transform |
|
In practice, used to find lines in an
image (e.g., set of pixels on a line) |
Hough Transform
|
|
|
Create virtual lines for each point |
|
Accumulate counts for bin in Hough
space |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Robust Document
Understanding
|
|
|
|
|
OCR and document understanding are
(currently) fragile technologies |
|
Full scan Þ OCR Þ store pipeline makes many
assumptions |
|
What are some? |
|
________________ |
|
________________ |
|
________________ |
|
________________ |
|
________________ |
A solution (one of many)
|
|
|
|
Courtesy Henry Baird’s ICDAR 03 slides. |
|
|
|
http://www.cse.lehigh.edu/~baird/Talks/icdar03.ppt#21 |
|
|
Image data
|
|
|
|
Raster graphics |
|
As an array of pixels |
|
|
|
Vector graphics |
|
As a collection of vectors |
|
|
|
Which format appropriate for which
images? |
|
Maps |
|
Photographs |
|
Line art |
|
For which use? |
|
Fidelity? |
|
Re-scaling? |
|
Compression? |
GIF / PNG
|
|
|
|
|
GIF (‘jiff’, Graphics Interchange
Format) |
|
Stable, lossless color format |
|
Compression achieved by: |
|
8-bit format (256 colors) |
|
LZW encoding (Unisys patent) |
|
__________________________________. |
|
Interlacing options for low-bandwidth
accessibility |
|
|
|
PNG (‘ping’, Portable Network Graphics) |
|
Uses ____________________________ |
|
Up to 48 bits of color (compared to 8
in GIF) |
|
Support for alpha channels
(transparency) and gamma correction (white balancing) |
Joint Photography Experts
Group
|
|
|
Breaks image into 8×8 pixel blocks,
each pixel 24 bits (YUV channels = 3×8 bits each) |
|
Compresses each block separately,
__________________ |
|
|
JPEG, continued
|
|
|
|
Transform yields coefficients |
|
Ordered from low frequency (gradual
change) to high frequency |
|
|
|
Gradual changes well represented |
|
Good for scenery, natural images |
|
|
|
JPEG 2000 incorporates wavelet
compression |
|
Better for sharp edges |
|
|
|
|
Postscript
|
|
|
|
A programming language whose operators
draw graphics on the page. |
|
Text is a deemed a type of graphic |
|
To “draw” a page, you construct a paths
used to create the image. |
|
A stack based, usually interpreted
language |
|
Uses reverse polish notation |
A simple Postscript
example
|
|
|
A method to place some text down the
left margin of the a page. |
|
|
|
You can use this after the marker for
the beginning of a page. |
|
|
|
gsave % save graphics state on stack |
|
90 rotate % rotate 90 degrees |
|
100 .55 -72 mul moveto % go to coords
100, (.55*-72) |
|
/Times-Roman findfont % Get the font
(set of operators) Times-Roman |
|
10 scalefont % set the font size |
|
setfont % Use the specified font |
|
0.3 setgray % Change the color to
gray |
|
(PUT NOTE HERE) show % call the
individual operators P,U,T … |
|
% to draw letters |
|
grestore % restore the graphics state |
|
|
Portable Document Format
|
|
|
|
An object database |
|
|
|
Subset of Postscript, makes it faster
to process |
|
Can use several different compression
techniques (e.g., LZW and Huffman) |
|
Proprietary |
|
Has capabilities for hyperlinks |
Geospatial Datasets
|
|
|
Which image format is best for maps? |
|
Hmm, let’s think about it. What goes into a map? |
|
|
|
___________________,
which provides the position and shapes of specific geographic features. |
|
____________________,
which provides additional non-graphic information about each feature. |
|
_________________,
which describes how the features will appear on the screen. |
Audio
|
|
|
|
Limit representation to what people can
hear |
|
Humans: ~ ____________ KHz |
|
|
|
Highest frequency (pitch) determines
storage size. |
|
Speech: limited range: up to 3 KHz |
|
Music: full dynamic range, 20 KHz |
|
Can be referred to as its bandwidth |
|
|
Sampling
|
|
|
|
Take continuous signal and discretize |
|
Higher sampling rate = better fidelity |
|
|
|
|
|
|
|
Nyquist and Shannon show minimum
sampling rate = 2 × bandwidth |
|
Music: full dynamic range: ~ 22K × 2 =
44K |
|
Speech: 4K × 2 = 8 K |
Amplitude and Channels
|
|
|
|
Sampling at these time intervals to get
amplitude of signal |
|
a total of ~30-60 dB in loudness |
|
Human ear more sensitive to soft sounds |
|
Compand amplitude
(________________________________________________________) |
|
1 or 2 bytes |
|
|
|
For each time interval, may have to
sample one or more channels |
|
Differential coding (joint stereo) |
|
Dolby AC 3 = ____ channels |
|
Stereo = 2 channels |
Storage Requirements
(bitrate)
|
|
|
|
Digital Music: |
|
44 K samples/sec × 16 bits/sample ×
2 channels = ~1.4 M bits/sec |
|
Digital Voice: |
|
8 K samples/sec × 8 bits/sample ×
1 channel = ~64 K bits/sec |
|
|
|
Analog |
|
FM stereo: 40 K samples/sec × 8
bits/sample ×
3 channels = ~900 K bits/sec |
|
Telephony: ~6 K samples/sec × 2
bits/sample ×
1 channel = ~12 K bits/sec |
|
|
|
Formats |
|
AAC: ______________________ |
|
MP3: ______________________ |
|
GSM: ______________________ |
Putting media together
|
|
|
Have multimedia, will travel… |
|
|
XML
|
|
|
A basis for many other technologies |
|
No semantics (eXtensible, not rigid),
just allows for hierarchical containment |
|
A meta markup language |
|
|
XML, continued
|
|
|
|
|
Features: |
|
Separation of content from presentation |
|
Content: Document Type Definition
(DTD), optional |
|
Presentation: _____________________, |
|
__________________ |
|
|
|
Enhanced hyperlinking capabilities |
|
Bidirectional linking |
|
Finer grained linking (XPointer) |
Text Encoding
Initiative
|
|
|
To encode knowledge “of literary
and linguistic texts for online
research and teaching” |
|
|
|
better interchange and integration of
scholarly data |
|
support for all texts, in all
languages, from all periods |
|
guidance for the perplexed: ___ to
encode --- hence, a user-driven codification of existing best practice |
|
assistance for the specialist: ___ to
encode --- hence, a loose framework into which unpredictable extensions can
be fitted |
|
|
|
The “beef” in XML. All the semantics and none of the
filling. It’s quite filling, weighing
in at 600 K words! (Think 8 kg of books) |
Synchronized Multimedia
Integration Language :-)
|
|
|
|
A script for orchestrating a
presentation |
|
Think TV news |
|
|
|
Basics: |
|
Define a root window |
|
Layers |
|
Timing |
|
<par> parallel playback |
|
<seq> sequential playback |
|
Media clips have begin and end
attributes |
|
|
|
To think about: what’s the alternative
format to SMIL? How does it enhance
presentation? |
Summary
|
|
|
|
Representation of knowledge |
|
The more you know about the media, the
faster, smaller you can transmit and store it |
|
Different formats for different
purposes, difference isn’t superficial |
|
|
|
Multimedia representation |
|
Trend toward accessibility, not
compressibility |
|
Separation of compression from format |
|
|
References
|
|
|
More on SMIL: http://www.bu.edu/webcentral/learning/smil1/ |
|
SMIL demos: http://www.ludicrum.org/demos/SMILTimingForTheWeb-Demos.html |
|
http://www.geocomm.com/ and http://www.usgs.gov
are good spots for GIS information. |
|
Genomic DL indexing and retrieval: http://goanna.cs.rmit.edu.au/~jz/fulltext/ieeekade02.pdf |
|
JPEG: Pennebaker and Mitchell (93), The
JPEG Still Image Data Compression Standard |
|
TEI Pizza talk:
http://www.tei-c.org/Talks/ |