使用 Apache Lucene 索引 MySQL 数据库，并使它们保持同步

本文介绍了使用 Apache Lucene 索引 MySQL 数据库，并使它们保持同步的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在 MySQL 中添加新项目时，它也必须被 Lucene 索引.
从 MySQL 中删除现有项目时，它也必须从 Lucene 的索引中删除.

我们的想法是编写一个脚本，该脚本将通过调度程序每 x 分钟调用一次(例如 CRON 任务).这是一种保持 MySQL 和 Lucene 同步的方法.到目前为止我所管理的:

The idea is to write a script that will be called every x minutes via a scheduler (e.g. a CRON task). This is a way to keep MySQL and Lucene synchronized. What I managed until yet:

对于 MySQL 中的每个新添加项，Lucene 也会对其进行索引.
对于 MySQL 中已添加的每个项目，Lucene 不会对其重新编制索引(没有重复的项目).

这就是我请求你帮助管理的一点:

This is the point I'm asking you some help to manage:

对于每个先前添加的项目，然后从 MySQL 中删除，Lucene 也应该取消它的索引.

这是我使用的代码，它试图索引一个 MySQL 表 tag (id [PK] | name):

Here is the code I used, which tries to index a MySQL table tag (id [PK] | name):

public static void main(String[] args) throws Exception {

    Class.forName("com.mysql.jdbc.Driver").newInstance();
    Connection connection = DriverManager.getConnection("jdbc:mysql://localhost/mydb", "root", "");
    StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
    IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_36, analyzer);
    IndexWriter writer = new IndexWriter(FSDirectory.open(INDEX_DIR), config);

    String query = "SELECT id, name FROM tag";
    Statement statement = connection.createStatement();
    ResultSet result = statement.executeQuery(query);

    while (result.next()) {
        Document document = new Document();
        document.add(new Field("id", result.getString("id"), Field.Store.YES, Field.Index.NOT_ANALYZED));
        document.add(new Field("name", result.getString("name"), Field.Store.NO, Field.Index.ANALYZED));
        writer.updateDocument(new Term("id", result.getString("id")), document);
    }

    writer.close();

}

PS:此代码仅用于测试目的，无需告诉我它有多糟糕:)

PS: this code is for tests purpose only, no need to tell me how awful it is :)

一种解决方案是删除任何预先添加的文档，并重新索引所有数据库:

One solution could be to delete any previsouly added document, and reindex all the database:

writer.deleteAll();
while (result.next()) {
    Document document = new Document();
    document.add(new Field("id", result.getString("id"), Field.Store.YES, Field.Index.NOT_ANALYZED));
    document.add(new Field("name", result.getString("name"), Field.Store.NO, Field.Index.ANALYZED));
    writer.addDocument(document);
}

我不确定这是最优化的解决方案，是吗?

I'm not sure it's the most optimized solution, is it?

使用 Apache Lucene 索引 MySQL 数据库，并使它们保持同步

问题描述

推荐答案

相关文档推荐