全文索引从pdf文件流返回没有结果

fulltext index returning no results from pdf filestream(全文索引从pdf文件流返回没有结果)
本文介绍了全文索引从pdf文件流返回没有结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个在 Windows 8.1 x64 机器上的 SQL Server 2012 上运行的文件流表,其中已经存储了一些 PDF 和 TXT 文件,因此我决定使用以下命令创建全文索引来搜索这些文件:

I have a filestream table running on SQL Server 2012 on a Windows 8.1 x64 machine, which already have a few PDF and TXT files stored, so I decided to create a fulltext index to search through these files by using the following command:

CREATE FULLTEXT CATALOG FileStreamFTSCatalog AS DEFAULT;

CREATE FULLTEXT INDEX ON storage
(FileName Language 1046, File TYPE COLUMN FileExtension Language 1046)
KEY INDEX PK__storage__3214EC077DADCE3C
ON FileStreamFTSCatalog
WITH CHANGE_TRACKING AUTO;

然后我在阅读了一些和我有同样问题的人后发送了这些命令:

Then I sent these commands after reading some people having the same problem as me:

EXEC sp_fulltext_service @action='load_os_resources', @value=1;
EXEC sp_fulltext_service 'verify_signature', 0;
EXEC sp_fulltext_service 'update_languages';
Exec sp_fulltext_service 'ft_timeout', 600000;
Exec sp_fulltext_service 'ism_size',@value=16;
EXEC sp_fulltext_service 'restart_all_fdhosts';
EXEC sp_help_fulltext_system_components 'filter';
reconfigure with override

我可以看到配置的PDF IFilter

I can see the PDF IFilter configured

filter  .pdf    E8978DA6-047F-4E3D-9C78-CDBE46041603    C:Program FilesAdobeAdobe PDF iFilter 11 for 64-bit platformsinPDFFilter.dll  11.0.1.36   Adobe Systems, Inc.

我什至可以做一个

select * from storage
where contains(*, 'data')

但它只返回索引的 TXT 文件,所以我想知道:我还需要做些什么来开始索引我的 PDF 吗?或者是否有必要创建另一个表并重新插入我已经存储的所有这些 PDF,即使 TXT 文件已被索引?

but it's returning only the TXT files indexed, so I'm wondering: is there anything else I need to do to start indexing my PDFs? Or is it necessary to create another table and reinsert all these PDFs which I already had stored, even though the TXT files are getting indexed justfined?

更新 1:

打开 SQLFTXXX.LOG 我收到这条消息(对于 FileTable):

Opening the SQLFTXXX.LOG I get this message (for the FileTable):

2014-08-20 06:32:09.48 spid29s     Warning: No appropriate filter was found during full-text index population for table or indexed view '[text_storage].[dbo].[storage_table]' (table or indexed view ID '355584405', database ID '7'), full-text key value '篰磧'. Some columns of the row were not indexed.

还有这个(对于 FileStream 表):

And this one (for the FileStream table):

2014-08-19 22:14:50.58 spid20s     Warning: No appropriate filter was found during full-text index population for table or indexed view '[text_storage].[dbo].[storage]' (table or indexed view ID '674101442', database ID '7'), full-text key value '1797'. Some columns of the row were not indexed.

推荐答案

我终于找到了解决方案,在尝试了 Adob​​e 和 Foxit Ifilter 并出现相同的错误消息后,我发现了另一个名为PDFlib",我下载并关注了 其说明 使其可用于 SQL Server,重建索引,现在我的 pdf已编入索引并可搜索.

I've finally found a solution, after trying both Adobe and Foxit Ifilter with the same error message, I found this other Ifilter called "PDFlib", I downloaded it and followed its instructions to make it available to SQL Server, rebuilt the index and now my pdfs are indexed and can be searched.

相信,如果我对其他 ifilter 遵循这些相同的说明,它们也能正常工作,我会在完成测试并更新结果后尝试这样做.

I believe that if I follow these same instructions for the other ifilters they will work as well, gonna try that after I'm done with my tests and update with the results.

这篇关于全文索引从pdf文件流返回没有结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Execute complex raw SQL query in EF6(在EF6中执行复杂的原始SQL查询)
SSIS: Model design issue causing duplications - can two fact tables be connected?(SSIS:模型设计问题导致重复-两个事实表可以连接吗?)
SQL Server Graph Database - shortest path using multiple edge types(SQL Server图形数据库-使用多种边类型的最短路径)
Invalid column name when using EF Core filtered includes(使用EF核心过滤包括时无效的列名)
How should make faster SQL Server filtering procedure with many parameters(如何让多参数的SQL Server过滤程序更快)
How can I generate an entity–relationship (ER) diagram of a database using Microsoft SQL Server Management Studio?(如何使用Microsoft SQL Server Management Studio生成数据库的实体关系(ER)图?)