问题描述
很抱歉,希望能得到有Lucene经验的人的帮助.
Sorry for the concern, but I hope to get any help from Lucene-experienced people.
现在我们在应用程序中使用 Lucene.Net 3.0.3 来索引和搜索约 2.500.000 个项目.每个实体包含27个可搜索字段,以这种方式添加到索引中:new Field(key, value, Field.Store.YES, Field.Index.ANALYZED))
Now we use in our application Lucene.Net 3.0.3 to index and search by ~2.500.000 items. Each entity contains 27 searchable field, which added to index in this way: new Field(key, value, Field.Store.YES, Field.Index.ANALYZED))
现在我们有两个搜索选项:
Now we have two search options:
- 使用模糊搜索仅搜索 4 个字段
- 使用精确搜索按 4-27 个字段进行搜索
我们有一项搜索服务,每周自动搜索大约 53000 人,例如Bob Huston"、Sara Conor"、Sujan Hong Uin Ho"等.
We have a search service that every week automatically searches by about 53000 people such "Bob Huston", "Sara Conor", "Sujan Hong Uin Ho", etc.
所以我们在选项 1 中遇到了缓慢的搜索速度,在 searcher.Search 中平均需要 4-8 秒,这是我们的主要问题.
So we experience slow search speed in option 1, its an average 4-8 sec in searcher.Search and it
s our major problem.
搜索示例代码:
var index = FSDirectory.Open(indexPath);
var searcher = new IndexSearcher(index, true);
this.analyzer = new StandardAnalyzer(Version.LUCENE_30, new HashSet<string>())
var queryParser = new MultiFieldQueryParser(Version.LUCENE_30, queryFields, this.analyzer);
queryParser.AllowLeadingWildcard = false;
Query query;
query = queryParser.Parse(token);
var results = searcher.Search(query, NumberOfResults);// NumberOfResults==500
我们的模糊搜索查询在 4 个字段中找到bob cong hong":
Our fuzzy search query to find "bob cong hong" in 4 fields:
((((PersonFirstName:bob~0.6) OR (PersonLastName:bob~0.6) OR (PersonAliases:bob~0.6) OR (PersonAlternativeSpellings:bob~0.6)) AND ((PersonFirstName:cong~0.6) OR (PersonLastName:cong~0.6) OR (PersonAliases:cong~0.6) OR (PersonAlternativeSpellings:cong~0.6)) AND ((PersonFirstName:hong~0.6) OR (PersonLastName:hong~0.6) OR (PersonAliases:hong~0.6) OR (PersonAlternativeSpellings:hong~0.6)))
(((PersonFirstName:bob~0.6) OR (PersonLastName:bob~0.6) OR (PersonAliases:bob~0.6) OR (PersonAlternativeSpellings:bob~0.6)) AND ((PersonFirstName:cong~0.6) OR (PersonLastName:cong~0.6) OR (PersonAliases:cong~0.6) OR (PersonAlternativeSpellings:cong~0.6)) AND ((PersonFirstName:hong~0.6) OR (PersonLastName:hong~0.6) OR (PersonAliases:hong~0.6) OR (PersonAlternativeSpellings:hong~0.6)))
当前的改进:
- 我们将这 4 个字段合并为 1 个搜索字段
- 我们决定在服务中使用单个 IndexSearcher,而不是在每个搜索请求中都打开
- MergeFactor=2
综合改进带来大约30-40% 的速度提升.
根据这篇文章,我们做了大部分可能的优化:
Following this article we`ve made most of possible optimizations:
- 索引放置在速度非常快的 SAS 驱动器上:http://accessories.euro.dell.com/sna/productdetail.aspx?c=ie&l=en&s=dhs&cs=iedhs1&sku=400-AHWT#Overview
- 我们有足够的 RAM 内存
- 合并因子 2
- 尝试将索引移动到RAMDirectory,但测试结果不稳定,有时速度相同
您有其他建议如何在我们的情况下提高搜索速度?
Do you have other suggestions how to improve search speed in our situation?
谢谢.
推荐答案
您可以通过将模糊查询的前缀长度设置为非零值来提高模糊查询的速度.这将允许 lucene 有效地缩小可能结果的范围.像这样:
You can improve the speed of Fuzzy Queries by setting their prefix length to a non-zero value. This will allow lucene to narrow the set of possible results efficiently. Like this:
queryParser.FuzzyPrefixLength = 2;
此外,它不会影响您作为示例提供的查询,但如果您完全关心性能,则应删除 queryParser.AllowLeadingWildcard = false;
行.领先的通配符绝对会影响性能.
Also, it doesn't affect the query you've provided as an example, but if you care at all about performance, you should remove the line queryParser.AllowLeadingWildcard = false;
. Leading wildcards will absolutely kill performance.
这篇关于Lucene.Net 模糊搜索速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!