Notes
Slide Show
Outline
1
Open Archives Initiative (OAI)
  • A low-barrier interoperable
    standard for the dissemination
    of content
    • In priniciple, not tied to a
      specific purpose
    • Note: open in terms of open
      architecture, not necessarily) free


  • Protocol for Metadata Harvesting
    • Defines standard for advertising metadata in a repository.
    • Standard packages for harvesting have been defined.
  • DP 9
    • A standard for exposing metadata to web crawlers as web pages.
2
Identifiers
  • Week 4      Min-Yen KAN
  • *Partially based on William Arms presentation at Cornell University
  •  Modified by permission
3
You see this everyday…
4
Desirable Properties of Identifiers
  • Location independent name
  • Globally unique
  • Persistent across time
  • Choice of human generated or automatic generation
  • Fast resolution
  • Decentralized administration
  • Supported from standard user interfaces
5
Identifier systems
  • We’ll look at several different systems today


  • URN
  • PURL
  • DOI
  • OpenURL
6
Uniform Resource Names (URN)
  • Globally unique, persistent, and accessible over the network
    • Persistence: That is, the URN will be globally unique forever.
    • Scalability: URNs can be assigned to any resource
    • Legacy / Extensible: Backward and forward compatible
  • Some Examples:


  • urn:hdl:cnri.dlib/august95
  • urn:lifn:some.domain:anything-goes-here
  • urn:path:/A/B/C/doc.html
  • urn:inet:library.bigstate.edu:aj17-mcc



7
Persistent URLs
  • http://purl.org/
  • PURL is a normal URL
  • Implement a layer of indirection
  • Uses standard HTTP redirect
  • Simple model
8
More details on PURL
  • Partial redirection
    • http://purl.org/kanmy/pictures/nus.jpg
    • http://www.comp.nus.edu.sg/~kanmy/
      pictures/nus.jpg

  • A PURL with no associated indirection causes the PURL resolver to generate a history page


  • Private and universal indirection with access control
9
PURL Issues
  • Places the burden of resolution on the manager of information
  • PURL resolvers don’t know about each other: federated, no centralized registry
  • If URL goes down, doesn’t force or notify maintainer
  • Doesn’t guarantee that document will be available, indirection can lead to a 404


10
Examples of DOIs
11
Hierarchy of Naming Authorities
12
Address Rules
  • The Global Handle Service stores:
  • a record for each naming authority
  • a record for each local handle service


  • The record for each naming authority includes:
  • the home handle service for that naming authority


  • For each handle, the home handle service stores:
  • the handle record
13
 
Multiple Resolution
  • Leave the resolution up to the client
  • Return all DOI data to the client
14
DOIs in action
15
Flexibility
16
Reorganization by Publisher
17
Change of Publisher
18
Citation
19
Catalogs and Indices
20
Multiple Copies
21
The General Model
22
DOI Summary
  • Uses multiple levels of indirection
  • More robust than PURL
  • But also more complicated, relies on central authority
  • Supported by consortium of publishers (big and small)


23
OpenURL
  • A identifier system that takes user’s context into account
  • Created to solve the appropriate copy problem


24
"Different providers use different URL..."
  • Different providers use different URL and points of access to the data
25
Indirection in OpenURL
  • Dissociate document from vendor-, library-specific provisions
  • OpenURL lists access metadata only
26
Input: OpenURL Example
  • Moll JR, Olive & M, Vinson C. Attractive interhelical electrostatic
    interactions in the proline- and acidic-rich region (PAR) leucine zipper
    subfamily preclude heterodimerization with other basic leucine zipper
    subfamilies. J Biol Chem. 2000 Nov 3 ; 275(44):34826-32.
    doi:10.1074/jbc.M004545200


  • http://sfx1.exlibris-usa.com/demo?sid=ebsco:medline&aulast=Moll&auinit=JR&date=
    2000-11-03&stitle=J%20Biol%20Chem&volume=275&issue=44&spage=34826


  • http://sfxserv.rug.ac.be:8888/rug?id=doi:10.1074/jbc.M004545200


  • Legend:
    • red - BASE-URL of service component
    • blue - identifier of the resource where the user clicks the OpenURL, added by publisher’s rewrite
    • grey - metadata and identifiers


  • DOI can be used to resolve the actual content
27
OpenURL Issues
  • Service component gets metadata query information
    • Access and use information goes to library, not to publisher
  • Not just user-to-user, but for generalized dynamic linking
    • Web page to journal article full-text
    • Abstract to library catalog collection

  • Demo:
    http://www.ukoln.ac.uk/distributed-systems/openurl/


28
Summary
  • PURLs
    • Good for small, local solutions
    • Single level indirection


  • DOI
    • Multiple, hierarchical layers of indirection
    • Purpose:
      • Actionable identifiers to content
      • Resolution to multiple items of current state data
        • Notably including location(s) and metadata

  • OpenURL
    • Purpose: solves appropriate copy problem
    • Selects between multiple items returned by DOI
29
References
  • URN: http://www.w3.org/Addressing/
  • PURL: http://www.purl.org/
  • DOI: http://www.doi.org/
  • openURL: http://www.sfxit.com/open/index.html


30
Tea break!
  • See ya!
31
Digital Library Policy
  • Week 4                      Min-Yen KAN
  • Legal, Economical, and Social Aspects


32
Outline
  • Intellectual property rights


  • Economics of the (digital) library


  • Social Policy with respect to the DL
33
Jerome’s translation of the Bible
  • Perhaps the first copyright dispute
  • In 521, the Irish missionary Columba secretly copied a very treasured translation of the Bible.  When his master Finnian found out, he demanded that Columba turn over the copy.  Columba refused and the matter went to the High King of Ireland, Diarmit.


  • What do you guess the ruling was?


34
Two worlds: digital and print media
35
Rights Management
  • In general,
  • “Rights” can mean many things:
    • Access rights – can I see/use/copy it?
    • Intellectual Property Rights (IPR) – who owns it?  Where do I go to get access rights?
36
Access Policy
  • We have been mostly concentrating on making the distribution of materials as easy and quick as possible.
  • But that’s not
    always the case.
37
Restricting Access in DLs
  • Integrated with the Warwick Framework
    • Cryptolope
    • Steganography /
      Document watermarking
    • Hardware solutions
  • No copy protection
    • Better than it may seem
38
Copyrights
  • Copyright
  • Public domain
  • Open source
39
Open Source Licensing
  • All open source licenses:
  • Allow free redistribution,
  • Make the source code available
  • Allow derived works (modify the code and offer a “new” program)
  • Must not discriminate against persons, groups, or fields of endeavor
  • Must not be product specific.
  • MIT License which grants unrestricted rights to copy, modify, and redistribute as long as the original copyright and license terms are retained.
  • BSD License requires acknowledgements to be made in advertisements and documentation.
  • The Artistic License allows unrestricted rights to copy, use, and locally modify. It allows the redistribution of modified binary programs, but restricts distribution of modified sources.
  • The GNU General Public License (GPL) requires that a program that uses portions of GPL'ed source code must also be licensed under the GPL.
40
Take a quick break: a survey
  • How much do you value your library?


  • Take a guess! à



  • Here’s are some ways to do it.
    • What’s the cost of buying the sources yourself?
    • What’s the opportunity cost if you didn’t have access to the information?
41
A cost model for libraries
  • Griffiths & King (93): corporate employees
    • Found that US companies spent about $400-1K per capita on libraries.
    • Reported about 3:1 return on investment
  • With library:
  • $515 Library subscription
    cost
  • $95 Library


  • No library:
  • $3300 Cost to access
    individual materials


  • These cost only includes buying material, not administrative time in acquiring them.
  • So actual savings is higher.
42
A brief history of the economics of information
  • Ancient Era
    • Public – for religious conversion
    • Private – for knowledge and prestige

  • The copying of the Bible by monks in the dark ages
    • To educate them
    • To spread religion
43
Gutenberg printing press
  • Johann Gutenberg
    (c. 1397-1468):
    • Neither the inventor of moveable type nor printing
    • Paired a wine press with moveable type

  • Transformed Europe’s spread of information
    • First publication was the Bible
      • Speed allowed mass production and cheaper pricing
44
The dichotomy today
    • Public – for religious conversion
      government clearinghouse
      • Make sure the public has:
      • Access to the information
      • Gets authoritative information


    • Private – for knowledge and prestige
      business and entertainment
45
Economics of scholarly media
  • Will the automated library as we know it survive?
46
Economics of scholarly media
  • Will the automated library as we know it survive?
47
Two worlds: digital and print media
48
Models for digital economies
  • Subscription fees
    • Per month, per year
  • Connection time fee
    • Per minute (e.g., Mead Data Central)
  • Advertising
    • By an interested party
    • other economic models apply here
  • Access fee
    • Per download, may not have profile to remember that you accessed this resource before
  • Per-byte fee
    • Typical of connection services (e.g., Broadband)
49
Access versus ownership
  • With DL materials we can’t really track ownership, just access


  • Trend towards microanalysis
    • Publisher: better targeted marketing
    • Library: better profile of user community

50
Crisis for publishers
  • Ease of publication allows more information to be free
    • And for people to break copyright
      (perhaps accidentally)

  • Ease of accessing (free) information deters users from accessing more cumbersome-to-use sources


  • Traditional functions of publishers are taken on by free services
    • Free e-journals do rigorous peer review
    • Search engines act as distributor
51
Self-archiving
  • To deposit a digital document in a publicly accessible website.
    • Preprint: before copyright restrictions have been signed


    • Not a true publication*: hasn’t been peer-reviewed, not in prestigious publication.


    • Detractors: accessibility will hurt future revenues of the journal
      • Perhaps 60-80% of a publisher’s budget doesn’t go towards the direct publication costs
52
E-prints
  • Differing acceptance from different fields
    • Physics: accept only if concurrently preprinted
    • Medicine, Business: accept only if not preprinted


  • E-journal model: who assumes the cost?
    • Authoring a text
    • Peer review
    • Marketing
    • Editor
    • Publication
53
Peer review limitations
  • Goal of peer review is to insure:
    • Previous work adequately acknowledged
    • Experimental methodology realistic and reproducible
    • Analysis of data justifies conclusions

  • Peters and Ceci (82):
    • Resubmitted 12 psychology articles already published with different author names, 8 of 9 recommended against acceptance and were rejected “serious methodological flaw”, not because of déjà vu.


  • Inglefinger study of NEJM reviewers:
    • Concordance of reviews only slightly better than chance
    • Reviewers not skilled in all areas of a study, unable to discern poor writing and have their own biases

54
Cost structuring
  • Movie distribution as a possible model (Lesk, p. 206)



55
Legal Deposit
56
Internet Archive and Bookmobile
  • Internet Archive
  • http://www.archive.org


  • An archive
    of the www




  • “The goal of universal
    access to our cultural
    heritage is within our grasp.”


  • Are these examples of legal deposit?
  • Who funds this initiative?
  • Internet Bookmobile


  • Prints out of copyright books for reading
  • Over 1m books
  • $1 USD per book printed
57
Preservation
  • Y2K – two digits to mean four
    • If you knew COBOL, you could get a high paid job.
    • Legacy systems and knowledge need to be preserved


  • Use standard formats!
  • Media lifetime
    • Tape 15 years
    • CDR 10-50 years
    • HD 30 years


  • Software/Hardware lifetime
    • New hardware 3-7 years
    • Software cycles faster
    • How to access old files, applications?
58
The Digital Divide
  • A case of the rich getting richer?
59
Undoing the Divide
  • Can use access rights to impose an unequal payment scheme


    • Blackwell’s – all 600 journals made free to the Russian Federation.
    • JSTOR – cost to access its DL depends on the size of the organization.
    • Open source movement – make software available to anyone
60
Libraries of the Future
  • Immediate, random-access to recent knowledge
  • May not understand foundation material
  • More effort in selection of materials
  • Publisher models changing, unifying
  • International policy becoming more prominent
  • Customized books as the future?
61
To think about…
  • How does the economics of libraries and the information explosion influence publication rates?  What about as we make the transition to the digital library?


  • Do you think self-archiving and e-journal venues pose a threat to the journal publisher?


  • As a single site, the Internet Archives, cannot keep track of all web pages on the web
    • Can you think of a better solution?
    • How would you go about designing a national web page archive for Singapore?
62
References
  • Copyright in Singapore
    http://www.ipos.gov.sg/newdesign/indexpage/inner_frame.html?section=aboutip&sub=4


  • Self-Archiving FAQ
    http://www.eprints.org/self-faq/


  • JSTOR
  • www.jstor.org


  • The future of libraries?
    Stephenson, Neal (00) Diamond Age: A young lady’s illustrated primer, Doubleday