1
|
- Week 2 Min-Yen KAN
- *Heavily scaled down from
original lecture outline :-(
|
2
|
|
3
|
- LoC NUS U Toronto
- Library Type Gov’t Acad Acad
- Books and manuscripts 19 M 2.2M 9.1 M
- Maps 4 M 278 K
- Photographs 12 M 22.1 K 622 K
- Music 2.7M 186 K
- Motion pictures .9 M 21 K
- CD-ROM Databases 1.4K 2.1 K
- Question: is the distribution of what we’d like in the digital library
the same as in the automated library?
|
4
|
- Representation / Digitization
- Textual images
- Images
- Audio
- Coordinated multimedia
|
5
|
|
6
|
|
7
|
- Scanning
- Binding
- Planetary scanner
- Resolution of scan
- 300 dpi* for access
- 600 or higher for archival copy
|
8
|
- Purpose:
- Archival
- Quality
- Stability in the
long term
- Accessibility
- Delivery
- Editing
- Annotation
- Initiate the digitalization project
- Establish start-up costs and secure funding
- Prepare a detailed project plan include milestones and deliverables
- Assess and select materials for digitization
- Digitize materials (prepare source materials, digitize, check quality)
- Post-process digital materials: edit, OCR, store, catalog and index
- Deliver and make materials accessible
- Support and maintenance of materials
- -- From Chowdhury and Chowdhury (03)
|
9
|
|
10
|
- You’ve scanned in an image like this…
- What to do with it?
- How would we like to store and access this information?
|
11
|
- Mostly bi-level (two-tone) until recently
- CCITT Fax III and IV
- Bi-level transmission and storage standard
- Optimized for Roman alphabet
- Textual image compression
- Codebook of marks
- A level for access and one for preservation
|
12
|
|
13
|
|
14
|
|
15
|
- Find and isolate marks (connected group of black pixels)
- Construct library of symbols
- Identify the symbol closes to each mark and get coordinates
- Store information
- *Store additional information to reconstruct original image
|
16
|
|
17
|
|
18
|
- Storage √
- CCITT Fax Group III and IV √
- Textual image compression √
- Access
- De-skew
- Segmentation
- Media detection
|
19
|
- Projection profile
- Accumulate Y-axis pixel histogram
- Rotate to find most crisp histogram
- One of three common algorithms
|
20
|
- Top-down
- (e.g., X-Y cut)
- Bottom-up
- (e.g. smearing)
|
21
|
- Separate:
- Images
- Text
- Line art
- Equations
- Tables
- One technique:
- Slope Histogram (Hough transform)
|
22
|
- A line-to-point transform
- In practice, used to find lines in an image (e.g., set of pixels on a
line)
|
23
|
- Create virtual lines for each point
- Accumulate counts for bin in Hough space
|
24
|
- OCR and document understanding are (currently) fragile technologies
- Full scan Þ OCR Þ store pipeline makes many
assumptions
- What are some?
- ________________
- ________________
- ________________
- ________________
- ________________
|
25
|
- Courtesy Henry Baird’s ICDAR 03 slides.
- http://www.cse.lehigh.edu/~baird/Talks/icdar03.ppt#21
|
26
|
- Raster graphics
- Vector graphics
- As a collection of vectors
- Which format appropriate for which images?
- Maps
- Photographs
- Line art
- For which use?
- Fidelity?
- Re-scaling?
- Compression?
|
27
|
- GIF (‘jiff’, Graphics Interchange Format)
- Stable, lossless color format
- Compression achieved by:
- 8-bit format (256 colors)
- LZW encoding (Unisys patent)
- __________________________________.
- Interlacing options for low-bandwidth accessibility
- PNG (‘ping’, Portable Network Graphics)
- Uses ____________________________
- Up to 48 bits of color (compared to 8 in GIF)
- Support for alpha channels (transparency) and gamma correction (white
balancing)
|
28
|
- Breaks image into 8×8 pixel blocks, each pixel 24 bits (YUV channels = 3×8
bits each)
- Compresses each block separately, __________________
|
29
|
- Transform yields coefficients
- Ordered from low frequency (gradual change) to high frequency
- Gradual changes well represented
- Good for scenery, natural images
- JPEG 2000 incorporates wavelet compression
|
30
|
- A programming language whose operators draw graphics on the page.
- Text is a deemed a type of graphic
- To “draw” a page, you construct a paths used to create the image.
- A stack based, usually interpreted language
- Uses reverse polish notation
|
31
|
- A method to place some text down the left margin of the a page.
- You can use this after the marker for the beginning of a page.
- gsave % save graphics state on stack
- 90 rotate % rotate 90 degrees
- 100 .55 -72 mul moveto % go to coords 100, (.55*-72)
- /Times-Roman findfont % Get the font (set of operators) Times-Roman
- 10 scalefont % set the font size
- setfont % Use the specified font
- 0.3 setgray % Change the color to gray
- (PUT NOTE HERE) show % call the individual operators P,U,T …
- % to draw letters
- grestore % restore the graphics state
|
32
|
- An object database
- Subset of Postscript, makes it faster to process
- Can use several different compression techniques (e.g., LZW and
Huffman)
- Proprietary
- Has capabilities for hyperlinks
|
33
|
- Which image format is best for maps?
- Hmm, let’s think about it. What
goes into a map?
- ___________________,
which provides the position and shapes of specific geographic
features.
- ____________________,
which provides additional non-graphic information about each
feature.
- _________________,
which describes how the features will appear on the screen.
|
34
|
- Limit representation to what people can hear
- Humans: ~ ____________ KHz
- Highest frequency (pitch) determines storage size.
- Speech: limited range: up to 3 KHz
- Music: full dynamic range, 20 KHz
- Can be referred to as its bandwidth
|
35
|
- Take continuous signal and discretize
- Higher sampling rate = better fidelity
- Nyquist and Shannon show minimum sampling rate = 2 × bandwidth
- Music: full dynamic range: ~ 22K × 2 = 44K
- Speech: 4K × 2 = 8 K
|
36
|
- Sampling at these time intervals to get amplitude of signal
- a total of ~30-60 dB in loudness
- Human ear more sensitive to soft sounds
- Compand amplitude
(________________________________________________________)
- 1 or 2 bytes
- For each time interval, may have to sample one or more channels
- Differential coding (joint stereo)
- Dolby AC 3 = ____ channels
- Stereo = 2 channels
|
37
|
- Digital Music:
- 44 K samples/sec × 16 bits/sample ×
2 channels = ~1.4 M bits/sec
- Digital Voice:
- 8 K samples/sec × 8 bits/sample ×
1 channel = ~64 K bits/sec
- Analog
- FM stereo: 40 K samples/sec × 8 bits/sample ×
3 channels = ~900 K bits/sec
- Telephony: ~6 K samples/sec × 2 bits/sample ×
1 channel = ~12 K bits/sec
- Formats
- AAC: ______________________
- MP3: ______________________
- GSM: ______________________
|
38
|
- Have multimedia, will travel…
|
39
|
- A basis for many other technologies
- No semantics (eXtensible, not rigid), just allows for hierarchical
containment
- A meta markup language
|
40
|
- Features:
- Separation of content from presentation
- Content: Document Type Definition (DTD), optional
- Presentation: _____________________,
- __________________
- Enhanced hyperlinking capabilities
- Bidirectional linking
- Finer grained linking (XPointer)
|
41
|
- To encode knowledge “of literary
and linguistic texts for
online
research and teaching”
- better interchange and integration of scholarly data
- support for all texts, in all languages, from all periods
- guidance for the perplexed: ___ to encode --- hence, a user-driven
codification of existing best practice
- assistance for the specialist: ___ to encode --- hence, a loose
framework into which unpredictable extensions can be fitted
- The “beef” in XML. All the
semantics and none of the filling.
It’s quite filling, weighing in at 600 K words! (Think 8 kg of
books)
|
42
|
- A script for orchestrating a presentation
- Basics:
- Define a root window
- Layers
- Timing
- <par> parallel playback
- <seq> sequential playback
- Media clips have begin and end attributes
- To think about: what’s the alternative format to SMIL? How does it enhance presentation?
|
43
|
- Representation of knowledge
- The more you know about the media, the faster, smaller you can transmit
and store it
- Different formats for different purposes, difference isn’t superficial
- Multimedia representation
- Trend toward accessibility, not compressibility
- Separation of compression from format
|
44
|
- More on SMIL: http://www.bu.edu/webcentral/learning/smil1/
- SMIL demos: http://www.ludicrum.org/demos/SMILTimingForTheWeb-Demos.html
- http://www.geocomm.com/ and http://www.usgs.gov are good spots for GIS
information.
- Genomic DL indexing and retrieval: http://goanna.cs.rmit.edu.au/~jz/fulltext/ieeekade02.pdf
- JPEG: Pennebaker and Mitchell (93), The JPEG Still Image Data
Compression Standard
- TEI Pizza talk:
http://www.tei-c.org/Talks/
|