2008年7月25日星期五

Re: [fw-formats] Zend_Lucene + UTF8 search problem... Help!(8EB-F5F)

Hi

Thank you Alexander....
I have understood the problem. My script works fine now...


2008/7/25 Alexander Veremyev <alexander.v@zend.com>:
> Hi Maxim,
>
> The problem is that default analyzer works only with ascii text - http://framework.zend.com/manual/en/zend.search.lucene.charset.html#zend.search.lucene.charset.default_analyzer
>
> That's so because mbstring PHP extension is not included into PHP installation by default and iconv() doesn't have necessary functionality.
>
> You should use special UTF-8 analyzers to work with non-ascii text which can't be transliterated by iconv() - http://framework.zend.com/manual/en/zend.search.lucene.charset.html#zend.search.lucene.charset.utf_analyzer
>
>
> ---------------------------------
> <?php
> require_once 'ZendInit.php';
> require_once 'Zend/Search/Lucene.php';
>
>
> Zend_Search_Lucene_Analysis_Analyzer::setDefault(
> new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8_CaseInsensitive ());
>
> // Create index
> $index = Zend_Search_Lucene::create('data/index');
> $doc = new Zend_Search_Lucene_Document();
> $doc->addField(Zend_Search_Lucene_Field::Text('samplefield',
> 'русский текст; english text',
> 'utf-8'));
> $index->addDocument($doc); $index->commit();
>
> ...
> -----------------
>
> Don't forget to set the same analyzer as default before searching:
> ---------------------------------
> <?php
> require_once 'ZendInit.php';
> require_once 'Zend/Search/Lucene.php';
>
>
> Zend_Search_Lucene_Analysis_Analyzer::setDefault(
> new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8_CaseInsensitive ());
>
> // Open index
> $index = Zend_Search_Lucene::open('data/index');
> ...
>
> Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');
> foreach ($index->find($query) as $hit) {
> echo $hit->samplefield, PHP_EOL;
> }
> ...
> -----------------
>
>
> With best regards,
> Alexander Veremyev.
>
>
>> -----Original Message-----
>> From: Maxim Savenko [mailto:maxim.savenko@gmail.com]
>> Sent: Thursday, July 24, 2008 3:58 PM
>> To: fw-formats@lists.zend.com
>> Subject: [fw-formats] Zend_Lucene + UTF8 search problem... Help!(8EB-F5F)
>>
>> Hi everybody,
>>
>> I have a problem with searching russian strings, utf8 encoded, with
>> Zend_Search_Lucene. Here is my short sample code:
>>
>> <?php
>> require_once 'ZendInit.php';
>> require_once 'Zend/Search/Lucene.php';
>> require_once 'Zend/Search/Lucene/Document.php';
>>
>> // Create index
>> $index = Zend_Search_Lucene::create('data/index');
>> $doc = new Zend_Search_Lucene_Document();
>> $doc->addField(Zend_Search_Lucene_Field::Text('samplefield', 'русский
>> текст; english text', 'utf-8'));
>> $index->addDocument($doc);
>> $index->commit();
>>
>> // Open index and search:
>> $index = Zend_Search_Lucene::open('data/index');
>> Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');
>> Zend_Search_Lucene::setDefaultSearchField('samplefield');
>>
>> // Query the index:
>> $queryStr = 'english';
>> $query = Zend_Search_Lucene_Search_QueryParser::parse($queryStr, 'utf-8');
>> $hits = $index->find($query);
>> foreach ($hits as $hit) {
>> /*@var $hit Zend_Search_Lucene*/
>> $doc = $hit->getDocument();
>> echo $doc->getField('samplefield')->value, PHP_EOL;
>> }
>>
>> The 'samplefield' of the document contain string in too languages -
>> russian and english(see code). If we'll search 'english' it's all fine
>> - we successfully find the document, but if we'll try to find russian
>> part of field( set $queryStr to 'русский') then we don't find any
>> document.
>>
>> What is a problem with my code? Help me find solution...
>>
>> Thank you guys
>>
>> Maxim Savenko
>> maxim.savenko@gmail.com
>
>

--
Good Luck.

Maxim Savenko
EMail: maxim.savenko@gmail.com

没有评论: