1	Representation and digitization of multimedia* Week 2 Min-Yen KAN *Heavily scaled down from original lecture outline :-(
2	Media types in the DL
3	Distribution of media types in the library LoC NUS U Toronto Library Type Gov’t Acad Acad Books and manuscripts 19 M 2.2M 9.1 M Maps 4 M 278 K Photographs 12 M 22.1 K 622 K Music 2.7M 186 K Motion pictures .9 M 21 K CD-ROM Databases 1.4K 2.1 K Question: is the distribution of what we’d like in the digital library the same as in the automated library?
4	Outline Representation / Digitization Textual images Images Audio Coordinated multimedia
5	Textual images
6	Cost basis for archives
7	Digitization Scanning Binding Planetary scanner Resolution of scan 300 dpi* for access 600 or higher for archival copy
8	Digitization Purpose: Archival Quality Stability in the long term Accessibility Delivery Editing Annotation Initiate the digitalization project Establish start-up costs and secure funding Prepare a detailed project plan include milestones and deliverables Assess and select materials for digitization Digitize materials (prepare source materials, digitize, check quality) Post-process digital materials: edit, OCR, store, catalog and index Deliver and make materials accessible Support and maintenance of materials -- From Chowdhury and Chowdhury (03)
9	Document capture costs in USD (ca. 1999)
10	Images of text You’ve scanned in an image like this… What to do with it? How would we like to store and access this information?
11	Storing a textual image Mostly bi-level (two-tone) until recently CCITT Fax III and IV Bi-level transmission and storage standard Optimized for Roman alphabet Textual image compression Codebook of marks A level for access and one for preservation
12	CCITT Fax IV
13
14	CCITT fax group IV
15	Textual image compression Find and isolate marks (connected group of black pixels) Construct library of symbols Identify the symbol closes to each mark and get coordinates Store information *Store additional information to reconstruct original image
16	Library
17	Residue
18	Text image outline Storage √ CCITT Fax Group III and IV √ Textual image compression √ Access De-skew Segmentation Media detection
19	De-Skew Projection profile Accumulate Y-axis pixel histogram Rotate to find most crisp histogram One of three common algorithms
20	Segmentation Top-down (e.g., X-Y cut) Bottom-up (e.g. smearing)
21	Classification Separate: Images Text Line art Equations Tables One technique: Slope Histogram (Hough transform)
22	Hough Transform A line-to-point transform In practice, used to find lines in an image (e.g., set of pixels on a line)
23	Hough Transform Create virtual lines for each point Accumulate counts for bin in Hough space
24	Robust Document Understanding OCR and document understanding are (currently) fragile technologies Full scan Þ OCR Þ store pipeline makes many assumptions What are some? ________________ ________________ ________________ ________________ ________________
25	A solution (one of many) Courtesy Henry Baird’s ICDAR 03 slides. http://www.cse.lehigh.edu/~baird/Talks/icdar03.ppt#21
26	Image data Raster graphics As an array of pixels Vector graphics As a collection of vectors Which format appropriate for which images? Maps Photographs Line art For which use? Fidelity? Re-scaling? Compression?
27	GIF / PNG GIF (‘jiff’, Graphics Interchange Format) Stable, lossless color format Compression achieved by: 8-bit format (256 colors) LZW encoding (Unisys patent) __________________________________. Interlacing options for low-bandwidth accessibility PNG (‘ping’, Portable Network Graphics) Uses ____________________________ Up to 48 bits of color (compared to 8 in GIF) Support for alpha channels (transparency) and gamma correction (white balancing)
28	Joint Photography Experts Group Breaks image into 8×8 pixel blocks, each pixel 24 bits (YUV channels = 3×8 bits each) Compresses each block separately, __________________
29	JPEG, continued Transform yields coefficients Ordered from low frequency (gradual change) to high frequency Gradual changes well represented Good for scenery, natural images JPEG 2000 incorporates wavelet compression Better for sharp edges
30	Postscript A programming language whose operators draw graphics on the page. Text is a deemed a type of graphic To “draw” a page, you construct a paths used to create the image. A stack based, usually interpreted language Uses reverse polish notation
31	A simple Postscript example A method to place some text down the left margin of the a page. You can use this after the marker for the beginning of a page. gsave % save graphics state on stack 90 rotate % rotate 90 degrees 100 .55 -72 mul moveto % go to coords 100, (.55*-72) /Times-Roman findfont % Get the font (set of operators) Times-Roman 10 scalefont % set the font size setfont % Use the specified font 0.3 setgray % Change the color to gray (PUT NOTE HERE) show % call the individual operators P,U,T … % to draw letters grestore % restore the graphics state
32	Portable Document Format An object database Subset of Postscript, makes it faster to process Can use several different compression techniques (e.g., LZW and Huffman) Proprietary Has capabilities for hyperlinks
33	Geospatial Datasets Which image format is best for maps? Hmm, let’s think about it. What goes into a map? ___________________, which provides the position and shapes of specific geographic features. ____________________, which provides additional non-graphic information about each feature. _________________, which describes how the features will appear on the screen.
34	Audio Limit representation to what people can hear Humans: ~ ____________ KHz Highest frequency (pitch) determines storage size. Speech: limited range: up to 3 KHz Music: full dynamic range, 20 KHz Can be referred to as its bandwidth
35	Sampling Take continuous signal and discretize Higher sampling rate = better fidelity Nyquist and Shannon show minimum sampling rate = 2 × bandwidth Music: full dynamic range: ~ 22K × 2 = 44K Speech: 4K × 2 = 8 K
36	Amplitude and Channels Sampling at these time intervals to get amplitude of signal a total of ~30-60 dB in loudness Human ear more sensitive to soft sounds Compand amplitude (________________________________________________________) 1 or 2 bytes For each time interval, may have to sample one or more channels Differential coding (joint stereo) Dolby AC 3 = ____ channels Stereo = 2 channels
37	Storage Requirements (bitrate) Digital Music: 44 K samples/sec × 16 bits/sample × 2 channels = ~1.4 M bits/sec Digital Voice: 8 K samples/sec × 8 bits/sample × 1 channel = ~64 K bits/sec Analog FM stereo: 40 K samples/sec × 8 bits/sample × 3 channels = ~900 K bits/sec Telephony: ~6 K samples/sec × 2 bits/sample × 1 channel = ~12 K bits/sec Formats AAC: ______________________ MP3: ______________________ GSM: ______________________
38	Putting media together Have multimedia, will travel…
39	XML A basis for many other technologies No semantics (eXtensible, not rigid), just allows for hierarchical containment A meta markup language
40	XML, continued Features: Separation of content from presentation Content: Document Type Definition (DTD), optional Presentation: _____________________, __________________ Enhanced hyperlinking capabilities Bidirectional linking Finer grained linking (XPointer)
41	Text Encoding Initiative To encode knowledge “of literary and linguistic texts for online research and teaching” better interchange and integration of scholarly data support for all texts, in all languages, from all periods guidance for the perplexed: ___ to encode --- hence, a user-driven codification of existing best practice assistance for the specialist: ___ to encode --- hence, a loose framework into which unpredictable extensions can be fitted The “beef” in XML. All the semantics and none of the filling. It’s quite filling, weighing in at 600 K words! (Think 8 kg of books)
42	Synchronized Multimedia Integration Language :-) A script for orchestrating a presentation Think TV news Basics: Define a root window Layers Timing <par> parallel playback <seq> sequential playback Media clips have begin and end attributes To think about: what’s the alternative format to SMIL? How does it enhance presentation?
43	Summary Representation of knowledge The more you know about the media, the faster, smaller you can transmit and store it Different formats for different purposes, difference isn’t superficial Multimedia representation Trend toward accessibility, not compressibility Separation of compression from format
44	References More on SMIL: http://www.bu.edu/webcentral/learning/smil1/ SMIL demos: http://www.ludicrum.org/demos/SMILTimingForTheWeb-Demos.html http://www.geocomm.com/ and http://www.usgs.gov are good spots for GIS information. Genomic DL indexing and retrieval: http://goanna.cs.rmit.edu.au/~jz/fulltext/ieeekade02.pdf JPEG: Pennebaker and Mitchell (93), The JPEG Still Image Data Compression Standard TEI Pizza talk: http://www.tei-c.org/Talks/