2008年7月24日星期四

[fw-formats] Zend_Lucene + UTF8 search problem... Help!

Hi everybody,

I have a problem with searching russian strings, utf8 encoded, with
Zend_Search_Lucene. Here is my short sample code:

<?php
require_once 'ZendInit.php';
require_once 'Zend/Search/Lucene.php';
require_once 'Zend/Search/Lucene/Document.php';

// Create index
$index = Zend_Search_Lucene::create('data/index');
$doc = new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::Text('samplefield', 'русский
текст; english text', 'utf-8'));
$index->addDocument($doc);
$index->commit();

// Open index and search:
$index = Zend_Search_Lucene::open('data/index');
Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');
Zend_Search_Lucene::setDefaultSearchField('samplefield');

// Query the index:
$queryStr = 'english';
$query = Zend_Search_Lucene_Search_QueryParser::parse($queryStr, 'utf-8');
$hits = $index->find($query);
foreach ($hits as $hit) {
/*@var $hit Zend_Search_Lucene*/
$doc = $hit->getDocument();
echo $doc->getField('samplefield')->value, PHP_EOL;
}

The 'samplefield' of the document contain string in too languages �
russian and english(see code). If we'll search 'english' it's all fine
- we successfully find the document, but if we'll try to find russian
part of field( set $queryStr to 'русский') then we don't find any
document.

What is a problem with my code? Help me find solution...

Thank you guys

Maxim Savenko
maxim.savenko@gmail.com

没有评论: