问题描述
将二进制字符串更改为十六进制时,我只能根据我找到的答案将其设置为特定大小.但我想以比这更有效的方式将 MASSIVE 二进制字符串更改为完整的十六进制字符串,这是我遇到的唯一完全做到这一点的方法:
for(size_t i = 0; i <(binarySubVec.size() - 1); i++){字符串 binToHex, tmp = "0000";for (size_t j = 0; j < binaryVecStr[i].size(); j += 4){tmp = binaryVecStr[i].substr(j, 4);if (!tmp.compare("0000")) binToHex += "0";否则 if (!tmp.compare("0001")) binToHex += "1";否则 if (!tmp.compare("0010")) binToHex += "2";否则 if (!tmp.compare("0011")) binToHex += "3";否则 if (!tmp.compare("0100")) binToHex += "4";否则 if (!tmp.compare("0101")) binToHex += "5";否则 if (!tmp.compare("0110")) binToHex += "6";否则 if (!tmp.compare("0111")) binToHex += "7";否则 if (!tmp.compare("1000")) binToHex += "8";否则 if (!tmp.compare("1001")) binToHex += "9";否则 if (!tmp.compare("1010")) binToHex += "A";否则 if (!tmp.compare("1011")) binToHex += "B";否则 if (!tmp.compare("1100")) binToHex += "C";否则 if (!tmp.compare("1101")) binToHex += "D";否则 if (!tmp.compare("1110")) binToHex += "E";否则 if (!tmp.compare("1111")) binToHex += "F";否则继续;}hexOStr <
它彻底而绝对,但速度很慢.
有没有更简单的方法?
更新最后添加比较和基准
这是另一个基于完美哈希的方法.完美的哈希是使用 gperf
生成的(如下所述:10 小时前一时兴起.如果您真的想要高吞吐量,完美的哈希是一个不错的开始,但可以考虑手动滚动基于 SIMD 的解决方案
When changing a Binary string to Hex, I could only do it to a certain size based off of the answers I found. But I want to change MASSIVE Binary strings into their complete Hex counterpart in a more efficient way than this which is the only way I've come across that does it completely:
for(size_t i = 0; i < (binarySubVec.size() - 1); i++){
string binToHex, tmp = "0000";
for (size_t j = 0; j < binaryVecStr[i].size(); j += 4){
tmp = binaryVecStr[i].substr(j, 4);
if (!tmp.compare("0000")) binToHex += "0";
else if (!tmp.compare("0001")) binToHex += "1";
else if (!tmp.compare("0010")) binToHex += "2";
else if (!tmp.compare("0011")) binToHex += "3";
else if (!tmp.compare("0100")) binToHex += "4";
else if (!tmp.compare("0101")) binToHex += "5";
else if (!tmp.compare("0110")) binToHex += "6";
else if (!tmp.compare("0111")) binToHex += "7";
else if (!tmp.compare("1000")) binToHex += "8";
else if (!tmp.compare("1001")) binToHex += "9";
else if (!tmp.compare("1010")) binToHex += "A";
else if (!tmp.compare("1011")) binToHex += "B";
else if (!tmp.compare("1100")) binToHex += "C";
else if (!tmp.compare("1101")) binToHex += "D";
else if (!tmp.compare("1110")) binToHex += "E";
else if (!tmp.compare("1111")) binToHex += "F";
else continue;
}
hexOStr << binToHex;
hexOStr << " ";
}
Its thorough and absolute, but slow.
Is there a simpler way of doing this?
UPDATE Added comparison and benchmarks at the end
Here's another take, based on a perfect hash. The perfect hash was generated using gperf
(like described here: Is it possible to map string to int faster than using hashmap?).
I've further optimized by moving function local statics out of the way and marking hexdigit()
and hash()
as constexpr
. This removes unnecessary any initialization overhead and gives the compiler full room for optimization/
Idon'texpectthingstogetmuchfasterthanthis.
You could try reading e.g. 1024 nibbles at a time if possible, and give the compiler a chance to vectorize the operations using AVX/SSE instruction sets. (I have not inspected the generated code to see whether this would happen.)
The full sample code to translate std::cin
to std::cout
in streaming mode is:
#include <iostream>
int main()
{
char buffer[4096];
while (std::cin.read(buffer, sizeof(buffer)), std::cin.gcount())
{
size_t got = std::cin.gcount();
char* out = buffer;
for (auto it = buffer; it < buffer+got; it += 4)
*out++ = Perfect_Hash::hexchar(it);
std::cout.write(buffer, got/4);
}
}
Here's the Perfect_Hash
class, slightly redacted and extended with the hexchar
lookup. Note that it does validate input in DEBUG
builds using the assert
:
Live On Coliru
#include <array>
#include <algorithm>
#include <cassert>
class Perfect_Hash {
/* C++ code produced by gperf version 3.0.4 */
/* Command-line: gperf -L C++ -7 -C -E -m 100 table */
/* Computed positions: -k'1-4' */
/* maximum key range = 16, duplicates = 0 */
private:
static constexpr unsigned char asso_values[] = {
27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 15, 7, 3, 1, 0, 27,
27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27};
template <typename It>
static constexpr unsigned int hash(It str)
{
return
asso_values[(unsigned char)str[3] + 2] + asso_values[(unsigned char)str[2] + 1] +
asso_values[(unsigned char)str[1] + 3] + asso_values[(unsigned char)str[0]];
}
static constexpr char hex_lut[] = "???????????fbead9c873625140";
public:
#ifdef DEBUG
template <typename It>
static char hexchar(It binary_nibble)
{
assert(Perfect_Hash::validate(binary_nibble)); // for DEBUG only
return hex_lut[hash(binary_nibble)]; // no validation!
}
#else
template <typename It>
static constexpr char hexchar(It binary_nibble)
{
return hex_lut[hash(binary_nibble)]; // no validation!
}
#endif
template <typename It>
static bool validate(It str)
{
static constexpr std::array<char, 4> vocab[] = {
{{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}},
{{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}},
{{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}},
{{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}},
{{'1', '1', '1', '1'}}, {{'1', '0', '1', '1'}},
{{'1', '1', '1', '0'}}, {{'1', '0', '1', '0'}},
{{'1', '1', '0', '1'}}, {{'1', '0', '0', '1'}},
{{'1', '1', '0', '0'}}, {{'1', '0', '0', '0'}},
{{'0', '1', '1', '1'}}, {{'0', '0', '1', '1'}},
{{'0', '1', '1', '0'}}, {{'0', '0', '1', '0'}},
{{'0', '1', '0', '1'}}, {{'0', '0', '0', '1'}},
{{'0', '1', '0', '0'}}, {{'0', '0', '0', '0'}},
};
int key = hash(str);
if (key <= 26 && key >= 0)
return std::equal(str, str+4, vocab[key].begin());
else
return false;
}
};
constexpr unsigned char Perfect_Hash::asso_values[];
constexpr char Perfect_Hash::hex_lut[];
#include <iostream>
int main()
{
char buffer[4096];
while (std::cin.read(buffer, sizeof(buffer)), std::cin.gcount())
{
size_t got = std::cin.gcount();
char* out = buffer;
for (auto it = buffer; it < buffer+got; it += 4)
*out++ = Perfect_Hash::hexchar(it);
std::cout.write(buffer, got/4);
}
}
Demo output for e.g. od -A none -t o /dev/urandom | tr -cd '01' | dd bs=1 count=4096 | ./test
03bef5fb79c7da917e3ebffdd8c41488d2b841dac86572cf7672d22f1f727627a2c4a48b15ef27eb0854dd99756b24c678e3b50022d695cc5f5c8aefaced2a39241bfd5deedcfa0a89060598c6b056d934719eba9ccf29e430d2def5751640ff17860dcb287df8a94089ade0283ee3d76b9fefcce3f3006b8c71399119423e780cef81e9752657e97c7629a9644be1e7c96b5d0324ab16d20902b55bb142c0451e675973489ae4891ec170663823f9c1c9b2a11fcb1c39452aff76120b21421069af337d14e89e48ee802b1cecd8d0886a9a0e90dea5437198d8d0d7ef59c46f9a069a83835286a9a8292d2d7adb4e7fb0ef42ad4734467063d181745aaa6694215af7430f95e854b7cad813efbbae0d2eb099523f215cff6d9c45e3edcaf63f78a485af8f2bfc2e27d46d61561b155d619450623b7aa8ca085c6eedfcc19209066033180d8ce1715e8ec9086a7c28df6e4202ee29705802f0c2872fbf06323366cf64ecfc5ea6f15ba6467730a8856a1c9ebf8cc188e889e783c50b85824803ed7d7505152b891cb2ac2d6f4d1329e100a2e3b2bdd50809b48f0024af1b5092b35779c863cd9c6b0b8e278f5bec966dd0e5c4756064cca010130acf24071d02de39ef8ba8bd1b6e9681066be3804d36ca83e7032274e4c8e8cacf520e8078f8fa80eb8e70af40367f53e53a7d7f7afe8704c46f58339d660b8151c91bddf82b4096
BENCHMARKS
I came up with three different approaches:
- naive.cpp (no hacks, no libraries); Live disassembly on Godbolt
- spirit.cpp (Trie);
Livedisassembly on pastebin - and this answer: perfect.cpp hash based; Live disassembly on Godbolt
In order to do some comparisons, I've
- compiled them all with the same compiler (GCC 4.9) and flags (
-O3 -march=native -g0 -DNDEBUG
) - optimized input/output so it doesn't read by 4 chars/write single characters
- created a large input file (1 Gigabyte)
Here are the results:
- Surprisingly, the
naive
approach from the first answer does rather well - Spirit does really badly here; it nets 3.4MB/s so that the whole file would take at 294 seconds (!!!). We've left it off the charts
- The average throughputs are ~720MB/s for naive.cpp and ~1.14GB/s forperfect.cpp
- This makes the perfect hash approach roughly 50% faster than the naive approach.
*Summary I'd say the naive approach was plenty good as I posted it on whim 10 hours ago. If you really want high throughput, the perfect hash is a nice start, but consider hand-rolling a SIMD based solution
这篇关于二进制字符串到十六进制 c++的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!