Zend FrameWork: [fw-mvc] Zend_Search

2009年9月8日星期二

[fw-mvc] Zend_Search_Lucene and PDF files

Hello,

I am implementing Lucene and need to index my PDF files.

I have found several solutions, but they all require some non PHP component such as XPDF, etc... I need this to be cross platform, so those are generally out.

I also started looking for ways to get inside Zend_PDF to get at the elements of each page with no success yet. I was hoping that I could iterate the pages in a PDF (done), get a list of the elements on that page (?) and then grab the text from perhaps the Zend_Pdf_Element_String I was able to find in there. Since I am not going to be displaying the context in my search, the location of the text does not matter to me so much.

I am getting totally bogged down in the source code for the pages and the parsers, partially at least because I am not familiar with the nomenclature of PDF internals :(

Does anyone have any pointers on how to approach this? Ideally I'd like to keep it Zend, but I can use other PDF libraries if I need to.

Thanks

Bill

没有评论:

发表评论

Zend FrameWork

2009年9月8日星期二

[fw-mvc] Zend_Search_Lucene and PDF files

没有评论:

博客归档