首页 > 其他 > 详细

Solr Tips

时间:2015-03-25 12:21:39      阅读:390      评论:0      收藏:0      [点我收藏+]

Flexible schema for new field:



Sorting:

Use separate fields for searching and sorting: field for searching requires a full analyzer that splits it into multiple tokens. field for sorting needs to be preserved as a single token

Eg:

<fieldType name="sortabletext" class="solr.TextField" sortMissingLast="true" omitNorms="true">

                <analyzer>

                             <!--  KeywordTokenizer does no actual tokenizing, sothe entire input string is preserved as a single token -->

                                <tokenizerclass="solr.KeywordTokenizerFactory"/>

                                <!--  The LowerCase TokenFilter does what youexpect, which can be when you want your sorting to be case insensitive -->

                                <filterclass="solr.LowerCaseFilterFactory" />

                                <!-- The TrimFilterremoves any leading or trailing whitespace -->

                                <filterclass="solr.TrimFilterFactory" />

                 </analyzer>

</fieldType>


<fieldType name="text"class="solr.TextField" positionIncrementGap="100">

                <analyzer>

                                <tokenizerclass="solr.StandardTokenizerFactory" />

                                <filterclass="solr.LowerCaseFilterFactory" />

                                <filterclass="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" />

                                <filterclass="solr.StopFilterFactory" ignoreCase="true"words="stopwords.txt" enablePositionIncrements="true" />

                </analyzer>

</fieldType>


<dynamicField name="*_text"type="text" indexed="true" stored="true" />

<dynamicField name="*_sortabletext"type="sortabletext" indexed="true" stored="true"/>




Multi-language search:

Use separate field and field type for multi-language support, so that each language can have different tokenizer and filter configuration

Eg:

<fieldType name="text_zh" class="solr.TextField" positionIncrementGap="100">

                <analyzer>

                                <tokenizerclass="de.hybris.search.analyze.IKTokenizerFactory" useSmart="true"/>

                                <filterclass="de.hybris.search.suggest.PinYinFilterFactory"/>

                                <filterclass="de.hybris.platform.solrfacetsearch.ysolr.synonyms.HybrisSynonymFilterFactory" ignoreCase="true" synonyms="zh" coreName="${solr.core.name}"/>

                </analyzer>

</fieldType>


<fieldType name="text_en"class="solr.TextField" positionIncrementGap="100">

                <analyzer>

                                <tokenizerclass="solr.StandardTokenizerFactory" />

                                <filterclass="solr.StandardFilterFactory" />

                                <filterclass="solr.LowerCaseFilterFactory" />

                                <filterclass="de.hybris.platform.solrfacetsearch.ysolr.synonyms.HybrisSynonymFilterFactory" ignoreCase="true" synonyms="en"coreName="${solr.core.name}"/>

                                <filterclass="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" />

                                <filterclass="de.hybris.platform.solrfacetsearch.ysolr.stopwords.HybrisStopWordsFilterFactory" ignoreCase="true" coreName="${solr.core.name}"/>

                                <filterclass="solr.StopFilterFactory"words="lang/stopwords_en.txt" ignoreCase="true" />

                                <filterclass="solr.ASCIIFoldingFilterFactory" />

                                <filterclass="solr.SnowballPorterFilterFactory" language="English"/>

                </analyzer>

</fieldType>


<dynamicField name="*_text_en"type="text_en" indexed="true" stored="true" />

<dynamicField name="*_text_zh"type="text_zh" indexed="true" stored="true" />


Multi-language suggestion:

Define multiple spell checkers with different names inside SpellCheckComponent. Each spell checker is for a specific language and is built on a different field with different analyzer configuration

Eg:

<requestHandler name="/suggest"class="solr.SearchHandler">

        <lstname="defaults">

            <strname="spellcheck">true</str>

            <strname="spellcheck.dictionary">default</str>

            <strname="spellcheck.onlyMorePopular">true</str>

            <strname="spellcheck.count">5</str>

            <strname="spellcheck.collate">true</str>

        </lst>

        <arr name="components">

           <str>suggest</str>

        </arr>

</requestHandler>

<searchComponent name="suggest"class="solr.SpellCheckComponent">

<strname="queryAnalyzerFieldType">text_spell</str>

                <lstname="spellchecker">

                                <strname="name">default</str>

                                <strname="classname">org.apache.solr.spelling.suggest.Suggester</str>

                                <strname="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>

                                <strname="field">autosuggest_en</str>

                                <strname="buildOnCommit">true</str>

                                <strname="buildOnOptimize">true</str>

                                <strname="accuracy">0.35</str>

                </lst>

                <lstname="spellchecker">

                                <strname="name">en</str>

                                <strname="classname">org.apache.solr.spelling.suggest.Suggester</str>

                                <strname="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>

                                <strname="field">autosuggest_en</str>

                                <strname="buildOnCommit">true</str>

                                <strname="buildOnOptimize">true</str>

                                <strname="accuracy">0.35</str>

                </lst>

                <lstname="spellchecker">

                                <strname="name">zh</str>

                                <strname="classname">org.apache.solr.spelling.suggest.Suggester</str>

                                <strname="lookupImpl">com.hybris.search.suggest.PinYinTSTLookupFactory</str>

                                <strname="storeDir">spellcheckdata</str>

                                <strname="field">autosuggest_zh</str>

                                <strname="buildOnCommit">true</str>

                                <strname="buildOnOptimize">true</str>

                                <strname="accuracy">0.35</str>

                </lst>

</searchComponent>

 

During search time, format the query as:
   Query.setQueryType(“/suggest”)    -> matches name of the request handler
   Query.set(“spellcheck.dictionary”,  “zh”) -> matches name of the spellchecker in solr configuration
   Query.set(“spellcheck.q”, autosuggest keyword)


Multi-value facet search:

A field may appear in filter as well as facet, resulting in all facet count=0 except for value appearing in filter
Eg: q=mainquery&fq=status:public&fq=doctype:pdf&facet=on&facet.field=doctype

Use tag and exclusion to solve the problem: still return facet count for the other values that are not included in filter
Eg: q=mainquery&fq=status:public&fq={!tag=dt}doctype:pdf&facet=on&facet.field={!ex=dt}doctype

Support faceting on the same field with different exclusions
Eg: facet.field={!ex=dt key=mylabel}doctype
Renames doctype to  mylabel  with exclusion as dt -- useful for display purpose



Solr Tips

原文:http://shadowisper.blog.51cto.com/3189863/1624112

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!