[Lecture Notes by Prof Yuen and possibly others 2000-2001] Electronic Publishing --------------------- Traditional publishing involves four steps. First the authors produce their material for review by editors working on behalf of publishers. Material found suitable for a publisher is then sent to be typeset, to format the content into individual pages with a chosen style. These pages are then reproduced in multiple sets by putting ink on paper, and are bound into individual books and journal issues. Finally these are distributed to the audience by post to mailing lists of subscribers, book club members, libraries with standing orders and people who order books on line, or through retail outlets like bookstores and news agents. Computers have played certain parts in this process for some time. Authors use wordprocessors to build up, revise and print the source material, and more complex typesetting software is used to format pages with humans putting in the required commands to specify type fonts and sizes; pages sizes and partitions; insertion of figures, equations and footnotes; linking parts from different sources, such as chapters by multiple authors or reproduction of existing material, into a complete volume etc. Most of you would be familiar with MS Word, and some of you may have used Latex, which formats an ASCII text file with embedded type- setting commands, into a DVI file specifying the content of each page, and this is further processed using a DVI to printing file (such as PDF or Postscript) converter which specifies in minute detail what kind of symbol to put onto each page where. Desktop publishing software basically combines wordprocessing with easy to use typesetting software so that an author or editor can produce camera ready pages to be sent to a printer on his own PC, while HTML is a desktop publishing language that specifies the content and format of single pages for output on a computer screen with provision to add colour, sound, video, etc. The old equipment that were used to do the job, typewriters, molten lead typesetting machines, even paper typesetting machines (to produce masters for offset printers) are hardly used these days. While computers have also played some part in controlling printing machines and in helping with the mailing and selling of printed material, their impact there has been much less fundamental, since they have not changed the basic process of putting ink on paper and distributing the paper piles. This is a highly inefficient process since the paper contributes almost all the weight, but the information is only carried by the ink. It takes organization to record, store and move around all that heavy bulk. In fact, one reason authors have to go through publishers to publish is the latter control the physical distribution system: while authors can produce content and make copies on their own, they lack means to put the copies into the libraries and retail shops, or to use large mailing lists to send copies to readers. The Internet has radically changed this: Content is now specified as modulations of electromagnetic waves, conducted instantaneously across the world via wires, satellite links and optical fibres, instead of ink on paper moved around on lorries and aeroplanes. With numerous search engines crawling the web looking at every page and cateloging the content, it does not take long before your pages would turn up in a search list of someone looking for related material. In short, with the help of PC on Internet, anyone can write, format and distribute his writings in the most direct way. However, while this solves one problem, it creates a new sets of issues for authors and publishers: Content represented in this way is also easily reproduced, without any graphical quality loss nor the work of copying, collating and binding. The previous exclusivity of control is now lost. For commercial publishing, the question is how to get paid when someone reads something you own. For scholarly publishing, the issue is establishing who published a particular piece of work at what time. It is sometimes said that encryption would protect intellectual property rights: the material is stored on the web in a coded form, which can only be reversed if you know the decoding algorithm. But this is not really effective, because one can always make copies of the decoded result, whether text, picture or music, and give them to others, unless the decoding is embedded into the final display unit so that the user never gets the decoded file itself. However, I can still make copies for other users provided they have the same embedded decoding system which would work on the same undecoded files. To prevent this, you have to have several versions of the embedded decoding algorithms, so that the coded file that works with my system may not work with yours, but then the content provider will have to know which coded file will work for which user, making his system more complicated. The idea of digital watermarking has also been suggested: a unique pattern is converted into a signal that looks like low intensity background noise, which is added to a picture or music file; when the reverse conversion is applied to the combined signal, the original unique pattern is recovered, so that if someone has a copy of the file, you can prove that the copy came from you by revealing the watermark in it. Unfortunately, this too is not a simple situation: I can take anybody's file, and run a program on it that would add a new watermark, then claim that he stole the file from me. He will then have to prove that my program is a fake and is adding rather than revealing the watermark. The authentication problem is merely transferred to another domain. Basically, the ownership issue of electronic publishing remains unresolved, but with most material on the web, the issue does not really arise as the authors/publishers are only too keen to give the content away, either as a kind of vanity publishing, or in hope of building up an advertising market so that income is generated in ways other than payments from the readers. A small number of publications, such as Wall Street Journal, succeed in getting subscribers to pay annually to register, but expects that some readers will copy and email some of the articles to people who are willing to wait. The Microsoft online journal Slate tried paid subscriptions, but gave up after less than a year and reverted to free access. Some authors have tried putting their books on line a bit at a time asking for voluntary payment and promising to add the next bit after enough money comes in but this too has not worked out very well for those trying it. (See www.nytimes.com/2000/12/5/opinion/05MANG.html for Stephan King case) In any case, web publishing for a living remains a doubtful proposition; while printed material, being directly readable without equipment and hence easily portable, remain an important means of communication.