模拟“双重"使用 2 个“浮动"

Emulate quot;doublequot; using 2 quot;floatquot;s(模拟“双重使用 2 个“浮动)
本文介绍了模拟“双重"使用 2 个“浮动"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为仅支持 32 位单精度浮点运算的嵌入式硬件编写程序.但是,我正在实现的算法需要 64 位双精度加法和比较.我正在尝试使用两个 float 的元组来模拟 double 数据类型.因此,double d 将被模拟为包含元组的 struct:(float d. float d.low).

I am writing a program for an embedded hardware that only supports 32-bit single-precision floating-point arithmetic. The algorithm I am implementing, however, requires a 64-bit double-precision addition and comparison. I am trying to emulate double datatype using a tuple of two floats. So a double d will be emulated as a struct containing the tuple: (float d. float d.low).

使用字典顺序进行比较应该很简单.然而,添加有点棘手,因为我不确定我应该使用哪个基地.应该是 FLT_MAX 吗?我如何检测进位?

The comparison should be straightforward using a lexicographic ordering. The addition however is a bit tricky because I am not sure which base should I use. Should it be FLT_MAX? And how can I detect a carry?

如何做到这一点?

编辑(清晰):我需要额外的有效数字而不是额外的范围.

Edit (Clarity): I need the extra significant digits rather than the extra range.

推荐答案

double-float 是一种技术,它使用单精度数对实现几乎两倍于单精度算术的精度,同时略微降低单精度指数范围(由于范围远端的中间下溢和溢出).基本算法由 T.J.1970 年代的德克尔和威廉·卡汉.下面我列出了两篇最近发表的论文,展示了这些技术如何适用于 GPU,但这些论文中涵盖的大部分内容都适用于独立于平台的平台,因此应该对手头的任务有用.

double-float is a technique that uses pairs of single-precision numbers to achieve almost twice the precision of single precision arithmetic accompanied by a slight reduction of the single precision exponent range (due to intermediate underflow and overflow at the far ends of the range). The basic algorithms were developed by T.J. Dekker and William Kahan in the 1970s. Below I list two fairly recent papers that show how these techniques can be adapted to GPUs, however much of the material covered in these papers is applicable independent of platform so should be useful for the task at hand.

https://hal.archives-ouvertes.fr/hal-00021443纪尧姆·达·格拉萨,大卫·德福在图形硬件上实现 float-float 运算符,第七届实数与计算机会议,RNC7.

https://hal.archives-ouvertes.fr/hal-00021443 Guillaume Da Graça, David Defour Implementation of float-float operators on graphics hardware, 7th conference on Real Numbers and Computers, RNC7.

http://andrewthall.org/papers/df64_qf128.pdf安德鲁·索尔用于 GPU 计算的扩展精度浮点数.

http://andrewthall.org/papers/df64_qf128.pdf Andrew Thall Extended-Precision Floating-Point Numbers for GPU Computation.

这篇关于模拟“双重"使用 2 个“浮动"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Rising edge interrupt triggering multiple times on STM32 Nucleo(在STM32 Nucleo上多次触发上升沿中断)
How to use va_list correctly in a sequence of wrapper functions calls?(如何在一系列包装函数调用中正确使用 va_list?)
OpenGL Perspective Projection Clipping Polygon with Vertex Outside Frustum = Wrong texture mapping?(OpenGL透视投影裁剪多边形,顶点在视锥外=错误的纹理映射?)
How does one properly deserialize a byte array back into an object in C++?(如何正确地将字节数组反序列化回 C++ 中的对象?)
What free tiniest flash file system could you advice for embedded system?(您可以为嵌入式系统推荐什么免费的最小闪存文件系统?)
Volatile member variables vs. volatile object?(易失性成员变量与易失性对象?)