|
1
|
- Module 4 Min-Yen KAN
- *Portions of this lecture based on Managing Gigabytes textbook
|
|
2
|
|
|
3
|
- Scanning
- Binding
- Planetary scanner
- Resolution of scan
- 300 dpi for access
- 600 or higher for archival copy
|
|
4
|
- Purpose:
- ________
- Quality
- Stability in the
long term
- _______
- Delivery
- Editing
- Annotation
- Initiate the digitalization project
- Establish start-up costs and secure funding
- Prepare a detailed project plan include milestones and deliverables
- Assess and select materials for digitization
- Digitize materials (prepare source materials, digitize, check quality)
- Post-process digital materials: edit, OCR, store, catalog and index
- Deliver and make materials accessible
- Support and maintenance of materials
- -- From Chowdhury and Chowdhury (03)
|
|
5
|
|
|
6
|
- You’ve scanned in an image like this…
- What to do with it?
- How would we like to store and access this information?
|
|
7
|
- Mostly bi-level (two-tone)
- CCITT Fax III and IV
- Bi-level transmission and storage standard
- Optimized for Roman alphabet
- Textual image compression
- Codebook of marks
- A level for access and one for preservation
|
|
8
|
|
|
9
|
|
|
10
|
|
|
11
|
- Find and isolate marks (connected group of black pixels)
- Construct library of symbols
- Identify the symbol closes to each mark and get coordinates
- Store information
- *Store additional information to reconstruct original image
|
|
12
|
- Storage √
- CCITT Fax Group III and IV √
- Textual image compression √
- Access
- De-skew
- Segmentation
- Media detection
|
|
13
|
- Projection profile
- Accumulate Y-axis pixel histogram
- Rotate to find most crisp histogram
- One of three common algorithms
|
|
14
|
- Top-down
- (e.g., _______)
- Bottom-up
- (e.g. _________)
|
|
15
|
- Separate:
- Images
- Text
- Line art
- Equations
- Tables
- One technique:
- Slope Histogram (Hough transform)
|