2008年7月24日星期四

[fw-formats] Zend_Lucene + UTF8 search problem... Help!

Hi everybody,

I have a problem with searching russian strings, utf8 encoded, with
Zend_Search_Lucene. Here is my short sample code:


require_once 'ZendInit.php';

require_once 'Zend/Search/Lucene.php';

require_once 'Zend/Search/Lucene/Document.php';


// Create index

$index = Zend_Search_Lucene::create('data/index');

$doc = new Zend_Search_Lucene_Document();

$doc->addField(Zend_Search_Lucene_Field::Text('samplefield', 'русский
текст; english text', 'utf-8'));

$index->addDocument($doc);

$index->commit();


// Open index and search:

$index = Zend_Search_Lucene::open('data/index');

Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');

Zend_Search_Lucene::setDefaultSearchField('samplefield');


// Query the index:

$queryStr = 'english';

$query = Zend_Search_Lucene_Search_QueryParser::parse($queryStr, 'utf-8');

$hits = $index->find($query);

foreach ($hits as $hit) {

/*@var $hit Zend_Search_Lucene*/

$doc = $hit->getDocument();

echo $doc->getField('samplefield')->value, PHP_EOL;

}


The 'samplefield' of the document contain string in too languages –
russian and english(see code). If we'll search 'english' it's all fine -
we successfully find the document, but if we'll try to find russian part
of field( set $queryStr to 'русский') then we don't find any document.

What is a problem with my code? Help me find solution...

Thank you guys

Maxim Savenko

没有评论: