2008年7月24日星期四

Re: [fw-formats] Zend_Lucene + UTF8 search problem... Help!

What's up with the spam?

On Thu, Jul 24, 2008 at 3:21 PM, Maxim Savenko <maxix@orbita1.ru> wrote:
Hi everybody,

I have a problem with searching russian strings, utf8 encoded, with Zend_Search_Lucene. Here is my short sample code:


require_once 'ZendInit.php';

require_once 'Zend/Search/Lucene.php';

require_once 'Zend/Search/Lucene/Document.php';


// Create index

$index = Zend_Search_Lucene::create('data/index');

$doc = new Zend_Search_Lucene_Document();

$doc->addField(Zend_Search_Lucene_Field::Text('samplefield', 'русский текст; english text', 'utf-8'));

$index->addDocument($doc);

$index->commit();


// Open index and search:

$index = Zend_Search_Lucene::open('data/index');

Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');

Zend_Search_Lucene::setDefaultSearchField('samplefield');


// Query the index:

$queryStr = 'english';

$query = Zend_Search_Lucene_Search_QueryParser::parse($queryStr, 'utf-8');

$hits = $index->find($query);

foreach ($hits as $hit) {

/*@var $hit Zend_Search_Lucene*/

$doc = $hit->getDocument();

echo $doc->getField('samplefield')->value, PHP_EOL;

}


The 'samplefield' of the document contain string in too languages – russian and english(see code). If we'll search 'english' it's all fine - we successfully find the document, but if we'll try to find russian part of field( set $queryStr to 'русский') then we don't find any document.

What is a problem with my code? Help me find solution...

Thank you guys

Maxim Savenko



没有评论: