2008年7月25日星期五

RE: [fw-formats] Zend_Lucene + UTF8 search problem... Help!(8EB-F5F)

Hi Maxim,

The problem is that default analyzer works only with ascii text - http://framework.zend.com/manual/en/zend.search.lucene.charset.html#zend.search.lucene.charset.default_analyzer

That's so because mbstring PHP extension is not included into PHP installation by default and iconv() doesn't have necessary functionality.

You should use special UTF-8 analyzers to work with non-ascii text which can't be transliterated by iconv() - http://framework.zend.com/manual/en/zend.search.lucene.charset.html#zend.search.lucene.charset.utf_analyzer


---------------------------------
<?php
require_once 'ZendInit.php';
require_once 'Zend/Search/Lucene.php';


Zend_Search_Lucene_Analysis_Analyzer::setDefault(
new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8_CaseInsensitive ());

// Create index
$index = Zend_Search_Lucene::create('data/index');
$doc = new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::Text('samplefield',
'русский текст; english text',
'utf-8'));
$index->addDocument($doc); $index->commit();

...
-----------------

Don't forget to set the same analyzer as default before searching:
---------------------------------
<?php
require_once 'ZendInit.php';
require_once 'Zend/Search/Lucene.php';


Zend_Search_Lucene_Analysis_Analyzer::setDefault(
new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8_CaseInsensitive ());

// Open index
$index = Zend_Search_Lucene::open('data/index');
...

Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');
foreach ($index->find($query) as $hit) {
echo $hit->samplefield, PHP_EOL;
}
...
-----------------


With best regards,
Alexander Veremyev.


> -----Original Message-----
> From: Maxim Savenko [mailto:maxim.savenko@gmail.com]
> Sent: Thursday, July 24, 2008 3:58 PM
> To: fw-formats@lists.zend.com
> Subject: [fw-formats] Zend_Lucene + UTF8 search problem... Help!(8EB-F5F)
>
> Hi everybody,
>
> I have a problem with searching russian strings, utf8 encoded, with
> Zend_Search_Lucene. Here is my short sample code:
>
> <?php
> require_once 'ZendInit.php';
> require_once 'Zend/Search/Lucene.php';
> require_once 'Zend/Search/Lucene/Document.php';
>
> // Create index
> $index = Zend_Search_Lucene::create('data/index');
> $doc = new Zend_Search_Lucene_Document();
> $doc->addField(Zend_Search_Lucene_Field::Text('samplefield', 'русский
> текст; english text', 'utf-8'));
> $index->addDocument($doc);
> $index->commit();
>
> // Open index and search:
> $index = Zend_Search_Lucene::open('data/index');
> Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');
> Zend_Search_Lucene::setDefaultSearchField('samplefield');
>
> // Query the index:
> $queryStr = 'english';
> $query = Zend_Search_Lucene_Search_QueryParser::parse($queryStr, 'utf-8');
> $hits = $index->find($query);
> foreach ($hits as $hit) {
> /*@var $hit Zend_Search_Lucene*/
> $doc = $hit->getDocument();
> echo $doc->getField('samplefield')->value, PHP_EOL;
> }
>
> The 'samplefield' of the document contain string in too languages -
> russian and english(see code). If we'll search 'english' it's all fine
> - we successfully find the document, but if we'll try to find russian
> part of field( set $queryStr to 'русский') then we don't find any
> document.
>
> What is a problem with my code? Help me find solution...
>
> Thank you guys
>
> Maxim Savenko
> maxim.savenko@gmail.com

没有评论: