Notes
Slide Show
Outline
1
Representation and digitization of multimedia*
  • Week 2  Min-Yen KAN


  • *Heavily scaled down from
    original lecture outline :-(
2
Media types in the DL


3
Distribution of media types in the library
  • LoC NUS U Toronto
  • Library Type Gov’t Acad Acad
  • Books and manuscripts 19 M 2.2M 9.1 M
  • Maps 4 M 278 K
  • Photographs 12 M 22.1 K 622 K
  • Music 2.7M 186 K
  • Motion pictures .9 M 21 K
  • CD-ROM Databases 1.4K 2.1 K


  • Question: is the distribution of what we’d like in the digital library the same as in the automated library?


4
Outline
  • Representation / Digitization
  • Textual images
  • Images
  • Audio
  • Coordinated multimedia
5
Textual images
6
Cost basis for archives
7
Digitization
  • Scanning
    • Binding
    • Planetary scanner


  • Resolution of scan
    • 300 dpi* for access
    • 600 or higher for archival copy


8
Digitization
  • Purpose:
    • Archival
      • Quality
      • Stability in the
        long term
    • Accessibility
      • Delivery
      • Editing
      • Annotation


  • Initiate the digitalization project
  • Establish start-up costs and secure funding
  • Prepare a detailed project plan include milestones and deliverables
  • Assess and select materials for digitization
  • Digitize materials (prepare source materials, digitize, check quality)
  • Post-process digital materials: edit, OCR, store, catalog and index
  • Deliver and make materials accessible
  • Support and maintenance of materials


  • -- From Chowdhury and Chowdhury (03)
9
Document capture costs in USD (ca. 1999)
10
Images of text
  • You’ve scanned in an image like this…


  • What to do with it?


  • How would we like to store and access this information?
11
Storing a textual image
  • Mostly bi-level (two-tone) until recently


  • CCITT Fax III and IV
    • Bi-level transmission and storage standard
    • Optimized for Roman alphabet

  • Textual image compression
    • Codebook of marks
    • A level for access and one for preservation
12
CCITT Fax IV
13
 
14
CCITT fax group IV
15
Textual image compression
  • Find and isolate marks (connected group of black pixels)
  • Construct library of symbols
  • Identify the symbol closes to each mark and get coordinates
  • Store information
  • *Store additional information to reconstruct original image
16
Library
17
Residue


18
Text image outline
  • Storage √
    • CCITT Fax Group III and IV √
    • Textual image compression √


  • Access
    • De-skew
    • Segmentation
    • Media detection
19
De-Skew
  • Projection profile
    • Accumulate Y-axis pixel histogram
    • Rotate to find most crisp histogram

  • One of three common algorithms
20
Segmentation
  • Top-down
  • (e.g., X-Y cut)
  • Bottom-up
  • (e.g. smearing)


21
Classification
  • Separate:
    • Images
    • Text
    • Line art
    • Equations
    • Tables

  • One technique:
    • Slope Histogram (Hough transform)
22
Hough Transform
  • A line-to-point transform
  • In practice, used to find lines in an image (e.g., set of pixels on a line)
23
Hough Transform
  • Create virtual lines for each point
  • Accumulate counts for bin in Hough space








24
Robust Document Understanding
  • OCR and document understanding are (currently) fragile technologies
    • Full scan Þ OCR Þ store pipeline makes many assumptions
    • What are some?
      • ________________
      • ________________
      • ________________
      • ________________
      • ________________
25
A solution (one of many)
  • Courtesy Henry Baird’s ICDAR 03 slides.


    • http://www.cse.lehigh.edu/~baird/Talks/icdar03.ppt#21

26
Image data
  • Raster graphics
    • As an array of pixels


  • Vector graphics
    • As a collection of vectors


  • Which format appropriate for which images?
    • Maps
    • Photographs
    • Line art
  • For which use?
    • Fidelity?
    • Re-scaling?
    • Compression?
27
GIF / PNG
  • GIF (‘jiff’, Graphics Interchange Format)
    • Stable, lossless color format
    • Compression achieved by:
      • 8-bit format (256 colors)
      • LZW encoding (Unisys patent)
    • __________________________________.
    • Interlacing options for low-bandwidth accessibility


  • PNG (‘ping’, Portable Network Graphics)
    • Uses ____________________________
    • Up to 48 bits of color (compared to 8 in GIF)
    • Support for alpha channels (transparency) and gamma correction (white balancing)
28
Joint Photography Experts Group
  • Breaks image into 8×8 pixel blocks, each pixel 24 bits (YUV channels = 3×8 bits each)
  • Compresses each block separately, __________________


29
JPEG, continued
  • Transform yields coefficients
  • Ordered from low frequency (gradual change) to high frequency


  • Gradual changes well represented
    • Good for scenery, natural images


  • JPEG 2000 incorporates wavelet compression
    •   Better for sharp edges



30
Postscript
  • A programming language whose operators draw graphics on the page.
    • Text is a deemed a type of graphic
    • To “draw” a page, you construct a paths used to create the image.
  • A stack based, usually interpreted language
  • Uses reverse polish notation
31
A simple Postscript example
  • A method to place some text down the left margin of the a page.


  • You can use this after the marker for the beginning of a page.


  • gsave % save graphics state on stack
  • 90 rotate % rotate 90 degrees
  • 100 .55 -72 mul moveto % go to coords 100, (.55*-72)
  • /Times-Roman findfont % Get the font (set of operators) Times-Roman
  • 10 scalefont % set the font size
  • setfont % Use the specified font
  • 0.3 setgray % Change the color to gray
  • (PUT NOTE HERE) show % call the individual operators P,U,T …
  • % to draw letters
  • grestore % restore the graphics state


32
Portable Document Format
  • An object database


    • Subset of Postscript, makes it faster to process
    • Can use several different compression techniques (e.g., LZW and Huffman)
    • Proprietary
    • Has capabilities for hyperlinks
33
Geospatial Datasets
  • Which image format is best for maps?
  • Hmm, let’s think about it.  What goes into a map?


  • ___________________,
    which provides the position and shapes of specific geographic features.
  • ____________________,
    which provides additional non-graphic information about each feature.
  • _________________,
    which describes how the features will appear on the screen.
34
Audio
  • Limit representation to what people can hear
    • Humans: ~ ____________ KHz


  • Highest frequency (pitch) determines storage size.
    • Speech: limited range: up to 3 KHz
    • Music: full dynamic range, 20 KHz
    • Can be referred to as its bandwidth


35
Sampling
  • Take continuous signal and discretize
  • Higher sampling rate = better fidelity




  • Nyquist and Shannon show minimum sampling rate = 2 × bandwidth
    • Music: full dynamic range: ~ 22K × 2 = 44K
    • Speech: 4K × 2 = 8 K
36
Amplitude and Channels
  • Sampling at these time intervals to get amplitude of signal
    • a total of ~30-60 dB in loudness
    • Human ear more sensitive to soft sounds
    • Compand amplitude (________________________________________________________)
    • 1 or 2 bytes

  • For each time interval, may have to sample one or more channels
    • Differential coding (joint stereo)
    • Dolby AC 3 = ____ channels
    • Stereo = 2 channels
37
Storage Requirements (bitrate)
  • Digital Music:
    • 44 K samples/sec × 16 bits/sample ×
      2 channels = ~1.4 M bits/sec
  • Digital Voice:
    • 8 K samples/sec × 8 bits/sample ×
      1 channel = ~64 K bits/sec


  • Analog
    • FM stereo: 40 K samples/sec × 8 bits/sample ×
      3 channels = ~900 K bits/sec
    • Telephony: ~6 K samples/sec × 2 bits/sample  × 
      1 channel = ~12 K bits/sec


  • Formats
    • AAC: ______________________
    • MP3: ______________________
    • GSM: ______________________
38
Putting media together
  • Have multimedia, will travel…


39
XML
  • A basis for many other technologies
  • No semantics (eXtensible, not rigid), just allows for hierarchical containment
  • A meta markup language


40
XML, continued
  • Features:
    • Separation of content from presentation
      • Content: Document Type Definition (DTD), optional
      • Presentation: _____________________,
      • __________________


    • Enhanced hyperlinking capabilities
      • Bidirectional linking
      • Finer grained linking (XPointer)
41
Text Encoding
Initiative
  • To encode knowledge “of literary
     and linguistic texts for online
    research and teaching”


  • better interchange and integration of scholarly data
  • support for all texts, in all languages, from all periods
  • guidance for the perplexed: ___ to encode --- hence, a user-driven codification of existing best practice
  • assistance for the specialist: ___ to encode --- hence, a loose framework into which unpredictable extensions can be fitted


  • The “beef” in XML.  All the semantics and none of the filling.  It’s quite filling, weighing in at 600 K words! (Think 8 kg of books)
42
Synchronized Multimedia
Integration Language :-)
  • A script for orchestrating a presentation
    • Think TV news

  • Basics:
    • Define a root window
    • Layers
  • Timing
    • <par> parallel playback
    • <seq> sequential playback
    • Media clips have begin and end attributes

  • To think about: what’s the alternative format to SMIL?  How does it enhance presentation?
43
Summary
  • Representation of knowledge
    • The more you know about the media, the faster, smaller you can transmit and store it
    • Different formats for different purposes, difference isn’t superficial

  • Multimedia representation
    • Trend toward accessibility, not compressibility
    • Separation of compression from format

44
References
  • More on SMIL: http://www.bu.edu/webcentral/learning/smil1/
  • SMIL demos: http://www.ludicrum.org/demos/SMILTimingForTheWeb-Demos.html
  • http://www.geocomm.com/ and http://www.usgs.gov are good spots for GIS information.
  • Genomic DL indexing and retrieval: http://goanna.cs.rmit.edu.au/~jz/fulltext/ieeekade02.pdf
  • JPEG: Pennebaker and Mitchell (93), The JPEG Still Image Data Compression Standard
  • TEI Pizza talk:
    http://www.tei-c.org/Talks/