问题描述
我正在使用 Lucene.Net 2.0 来索引数据库表中的某些字段.其中一个字段是允许特殊字符的名称"字段.当我执行搜索时,它找不到包含带有特殊字符的术语的文档.
I am using Lucene.Net 2.0 to index some fields from a database table. One of the fields is a 'Name' field which allows special characters. When I perform a search, it does not find my document that contains a term with special characters.
我这样索引我的字段:
Directory DALDirectory = FSDirectory.GetDirectory(@"C:IndexesName", false);
Analyzer analyzer = new StandardAnalyzer();
IndexWriter indexWriter = new IndexWriter(DALDirectory, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED);
Document doc = new Document();
doc.Add(new Field("Name", "Test (Test)", Field.Store.YES, Field.Index.TOKENIZED));
indexWriter.AddDocument(doc);
indexWriter.Optimize();
indexWriter.Close();
我搜索执行以下操作:
value = value.Trim().ToLower();
value = QueryParser.Escape(value);
Query searchQuery = new TermQuery(new Term(field, value));
Searcher searcher = new IndexSearcher(DALDirectory);
TopDocCollector collector = new TopDocCollector(searcher.MaxDoc());
searcher.Search(searchQuery, collector);
ScoreDoc[] hits = collector.TopDocs().scoreDocs;
如果我搜索名称"字段和测试"值,它会找到文档.如果我执行与名称"相同的搜索和与测试(测试)"相同的值,那么它不会找到该文档.
If I perform a search for field as 'Name' and value as 'Test', it finds the document. If I perform the same search as 'Name' and value as 'Test (Test)', then it does not find the document.
更奇怪的是,如果我删除 QueryParser.Escape 行搜索 GUID(当然,其中包含连字符),它会找到 GUID 值匹配的文档,但执行与测试"值相同的搜索(Test)' 仍然没有结果.
Even more strange, if I remove the QueryParser.Escape line do a search for a GUID (which, of course, contains hyphens) it finds documents where the GUID value matches, but performing the same search with the value as 'Test (Test)' still yields no results.
我不确定我做错了什么.我正在使用 QueryParser.Escape 方法来转义特殊字符并存储字段并通过 Lucene.Net 的示例进行搜索.
I am unsure what I am doing wrong. I am using the QueryParser.Escape method to escape the special characters and am storing the field and searching by the Lucene.Net's examples.
有什么想法吗?
推荐答案
StandardAnalyzer 在索引过程中去除特殊字符.您可以传入明确的停用词列表(不包括您想要的停用词).
StandardAnalyzer strips out the special characters during indexing. You can pass in a list of explicit stopwords (excluding the ones you want in).
这篇关于Lucene 和特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!