当你只关心速度时如何存储二进制数据?

2022-07-14 C/C++问题得得之家

How to store binary data when you only care about speed?(当你只关心速度时如何存储二进制数据?)

本文介绍了当你只关心速度时如何存储二进制数据?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在 D 维度上有 N 个点，假设 N 是 100 万，D 是 100.我所有的点都有二进制坐标，即{0, 1}^D，我只对速度感兴趣.

I have N points in D dimensions, where let's say N is 1 million and D 1 hundred. All my points have binary coordinates, i.e. {0, 1}^D, and I am only interested in speed.

目前我的实现使用 std::vector<int>.我想知道是否可以通过更改我的数据结构.我只做插入和搜索(我不改变位).

Currently my implementation uses std::vector<int>. I am wondering if I could benefit in terms of faster execution by changing my data-structure. I am only doing insertions and searches (I don't change the bits).

我发现的所有相关问题都提到了 std::vector<char>、std::vector<bool> 和 std::bitset，但都提到了使用这种结构应该获得的空间优势.

All related questions I found mention std::vector<char>, std::vector<bool> and std::bitset, but all mention the space benefits one should get by using such structures.

当速度是主要关注点时，对于 C++ 中的二进制数据，什么是合适的数据结构?

What's the appropriate data structure, when speed is of main concern, for binary data in C++?

我打算用二进制数据填充我的数据结构，然后进行大量连续搜索(我的意思是我并不真正关心点的第 i 个坐标，如果我正在访问一个点，我会连续访问其所有坐标).我将计算彼此之间的汉明距离.

I intend to populate my data structure with the binary data and then do a lot of contiguous searches (I mean that I don't really care for the i-th coordinate of a point, if I am accessing a point I will access all of its coordinates continuously). I will compute the Hamming distance between each other.

推荐答案

参考位置可能是驱动力.所以很明显，您将单个点的 D 坐标表示为一个连续的位向量.std::bitset<D> 将是一个合乎逻辑的选择.

Locality of reference will likely be the driving force. So it's fairly obvious that you represent the D coordinates of a single point as a contiguous bitvector. std::bitset<D> would be a logical choice.

不过，接下来要意识到的重要一点是，您可以轻松看到高达 4KB 的局部性优势.这意味着您不应选择一个点并将其与所有其他 N-1 个点进行比较.取而代之的是，以 4KB 为一组对点进行分组，然后对这些组进行比较.两种方式都是O(N*N)，但是第二种会快很多.

However, the next important thing to realize is that you see locality benefits easily up to 4KB. This means that you should not pick a single point and compare it against all other N-1 points. Instead, group points in sets of 4KB each, and compare those groups. Both ways are O(N*N), but the second will be much faster.

你可以通过使用三角不等式击败 O(N*N) - Hamming(a,b)+Hamming(b,c) >= Hamming (a,c).我只是想知道如何.这可能取决于您希望输出的方式.天真的输出将是一组 N*N 距离，这不可避免地是 O(N*N).

You may be able to beat O(N*N) by use of the triangle inequality - Hamming(a,b)+Hamming(b,c) >= Hamming (a,c). I'm just wondering how. It probably depends on how you want your output. The naive output would be a N*N set of distances, and that's unavoidably O(N*N).

这篇关于当你只关心速度时如何存储二进制数据?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持编程学习网！

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除！

上一篇：十进制到二进制(反之亦然) 下一篇：如何将十进制转换为二进制?

相关文档推荐

在STM32 Nucleo上多次触发上升沿中断

Rising edge interrupt triggering multiple times on STM32 Nucleo(在STM32 Nucleo上多次触发上升沿中断)

如何在一系列包装函数调用中正确使用 va_list?

How to use va_list correctly in a sequence of wrapper functions calls?(如何在一系列包装函数调用中正确使用 va_list?)

OpenGL透视投影裁剪多边形，顶点在视锥外=错误的纹理映射?

OpenGL Perspective Projection Clipping Polygon with Vertex Outside Frustum = Wrong texture mapping?(OpenGL透视投影裁剪多边形，顶点在视锥外=错误的纹理映射?)

如何正确地将字节数组反序列化回 C++ 中的对象?

How does one properly deserialize a byte array back into an object in C++?(如何正确地将字节数组反序列化回 C++ 中的对象?)

您可以为嵌入式系统推荐什么免费的最小闪存文件系统?

What free tiniest flash file system could you advice for embedded system?(您可以为嵌入式系统推荐什么免费的最小闪存文件系统?)

易失性成员变量与易失性对象?

Volatile member variables vs. volatile object?(易失性成员变量与易失性对象?)

栏目导航

前端问题 php问题 Java问题 Python问题 C/C++问题 C#/.NET问题移动开发问题数据库问题

最新文章

热门文章

热门标签

html vue validate adobe dreamweaver hbuilder vscode aptana editor dedecms ckeditor 编辑器过滤规则织梦图片本地化模板缩略图图集图片删除 ajax 瀑布流无限下拉 cms 判断 sql 清除 tag 文档数 angularjs2 按钮切换效果 vue3 thinkphp yii2 css 项目列表 li go Beego Buffalo Echo Gin Iris Revel 百度云虚拟主机 pbootcms 伪静态框架排序数据库对象字段 sql语句 php 字符串分割 D3.js bootstrap 函数 svg selectAll 织梦cms 关键词解析采集长度限制日期正则表达式