分布式分析系统数据一致性架构设计

本文介绍了分布式分析系统数据一致性架构设计的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在重构一个将进行大量计算的分析系统，我需要一些关于可能的架构设计的想法，以解决我面临的数据一致性问题.

I am refactoring an Analytic system that will do a lot of calculation, and I need some ideas on possible architectural designs to a data consistency issue I am facing.

当前架构

我有一个基于队列的系统，其中不同的请求应用程序创建最终由工作人员使用的消息.

I have a queue based system, in which different requesting applications create messages that are eventually consumed by workers.

每个请求应用"将大型计算分解成较小的部分，这些部分将被发送到队列并由工作人员处理.

Each "Requesting App" breaks down a large calculation into smaller pieces that will be sent to the queue and processed by the workers.

当所有部分都完成后，原始请求应用"将合并结果.

When all the pieces are finished, the originating "Requesting app" will consolidate the results.

此外，workers 使用来自中央数据库 (SQL Server) 的信息来处理请求(重要:worker 不会更改数据库上的任何数据，只会使用它).

Also, the workers consume information from a centralized database (SQL Server) in order to process the requests (Important: the workers do not change any data on the database, only consume it).

问题

好的.到现在为止还挺好.当我们包含更新数据库信息的 Web 服务时，就会出现问题.这可能随时发生，但至关重要的是，源自同一个请求应用程序"的每个大型计算"都会在数据库中看到相同的数据.

Ok. So far, so good. The problem arises when we include a web service that updates the information on the database. This can happen at any time, but it is critical that each "large calculation" originated from the same "Requesting App" sees the same data on the database.

例如:

App A 生成消息 A1 和 A2，将其发送到队列
Worker W1 选择消息 A1 进行处理.
Web 服务器更新数据库，从状态 S0 更改为 S1.
Worker W2 拿起消息 A2 进行处理

App A generates messages A1 and A2, sending it to queue
Worker W1 picks up message A1 for processing.
The web server updates the database, changing from state S0 to S1.
Worker W2 picks up message A2 for processing

我不能让工作人员 W2 使用数据库的状态 S1.为了使整个计算保持一致，应该使用之前的 S0 状态.

I just can´t have worker W2 using state S1 of the database. for the whole calculation to be consistent it should use the previous S0 state.

想法

锁定模式，以防止 Web 服务器在有工作人员从数据库中使用信息时更改数据库.

A lock pattern to prevent the web server from changing the database while there is a worker consuming information from it.

缺点:锁定可能会持续很长时间，因为不同请求应用程序"的计算可能会重叠(A1、B1、A2、B2、C1、B3 等).

cons: The lock might be on for a long time, since the calculation form different "Request Apps" might overlap (A1, B1, A2, B2, C1, B3, etc.).

在数据库和工作程序之间创建新层(通过请求应用程序控制数据库缓存的服务器)

Create new layer between the database and the workers (a server that controls db caching by req. app)

缺点:添加另一层可能会带来很大的开销(也许?)，而且工作量很大，因为我将不得不重写工作人员的持久性(大量代码).

cons: Adding another layer might impose significant overhead (maybe?), and it is a lot of work, since I will have to rewrite the persistence of the workers (a lot of code).

我正在等待第二种解决方案，但对它不是很有信心.

I am pending to the second solution, but not very confident about it.

有什么绝妙的主意吗?我设计错了，还是遗漏了什么?

Any brilliant ideas ? Am I designing it wrong, or missing something ?

OBS:

这是一个巨大的 2 层遗留系统(在 C# 中)，我们正在尝试以最少的努力演变为更具可扩展性的解决方案可能.
每个工作人员可能在不同的服务器上运行.

分布式分析系统数据一致性架构设计

问题描述

推荐答案

相关文档推荐