|
In the past decades, millions of digital documents have been generated.
The most widespread format for
these digital documents is text in which the characters of the documents
are represented by machine-readable
codes. On the other hand, modern technology has made it possible to
produce, process, store, and transmit
document images(i.e. imaged documents) efficiently. As we look through
the documents stored in digital
libraries or the Internet, large quantities of them are simply scanned
and archived in image form. PDF has become a popular file format for both text documents and imaged documents, especially in the web environments. Searching for a specified word that is interesting for users in these PDF documents has its practical value. For this purpose, Adobe Acrobat provides a "search" tool for finding user specified words in the text documents. However, it does NOT work for the imaged documents. |
|
We have developed a tool using Acrobat SDK, based on document image
analysis technique, for searching
words in PDF files that contain imaged documents. When a PDF file is
opened in Adobe Acrobat, the plug-in
tool is able to detect and locate the user's specified words in the
imaged documents, like the tool "Search"
provided by Adobe Acrobat for word search in text format documents.
Our tool can work on the PDF files |
|
(1) Create a subdirectory "AcrobatSDK" under C:\Program Files\Adobe\Acrobat
X.X\Acrobat\Plug_ins\ |
|
All the copyrights related to the plug-in tool are reserved.
Should you have any suggestion or find any problem,
please contact us: |
| [1] Lu Y, Tan C L. A Nearest-neighbor-chain Based Approach to Skew Estimation in Document Images. Pattern Recognition Letters, 2003, 24(14): 2315-2323. |
| [2] Lu Y, Tan C L. Document Retrieval From Compressed Images. Pattern Recognition, 2003, 36(4): 987-996. |
| [3] Lu Y, Tan C L. Improved Nearest Neighbor Based Approach to Accurate Document Skew Estimation. The 7th International Conference on Document Analysis and Recognition (ICDAR'03), pp.503-507, Edinburgh, August 3-6, 2003. |
|
[4] Lu Y, Tan C L, Lin
L. An Approach to Matching Partial Word Image and Its Application to
Document Image Retrieval. Proc. of SPIE Vol. 4929 Optical
Information Processing Technology, pp.379-387, Shanghai, China,
October 14-18, 2002.
|
|
[5] Lu Y, Tan C L. Word
Searching in Document Images using Word Portion Matching. Fifth
IAPR International Workshop on Document Analysis Systems,
Princeton, New Jersey, USA, August 19-21, 2002. D. Lopresti, J. Hu,
and R. Kashi (Eds.) Lecture Notes in Computer Science, Vol.
2423, pp.319-328, Springer-Verlag.
|
|
[6] Lu Y, Tan C L,
Huang W, Fan L. An Approach to Word Image Matching Based on Weighted
Hausdorff Distance. Proc. of the 6th International Conf. on Document
Analysis and Recognition, 2001, Seattle, USA, pp.921-925.
|
|
[7] Zhang L, Lu Y, Tan C L. A Web-based System for Retrieving Document Images from Digital Library. International Workshop on Document Image Analysis and Retrieval (in conjunction with IEEE CVPR 2003), Wisconsin, 2003. |
| [8] Lu Y, Zhang L, Tan C L. Retrieving Imaged Documents in Digital Libraries Based on Word Image Coding. International Workshop on Document Image Analysis for Libraries, CA, USA, 2004. |
| [9] Lu Y, Zhang L, Tan C L. A Search Engine for Imaged Documents in PDF Files. 27th Annual International ACM SIGIR Conference, Sheffield, UK, 2004. |
| [10] Zhang L, Lu Y, Tan C L. Italic Font Recognition Using Stroke Pattern Analysis on Wavelet Decomposed Word Images. International Conference of Pattern Recognition, Cambridge, UK, 2004. |