CUDA 和类

CUDA and Classes(CUDA 和类)
本文介绍了CUDA 和类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经到处搜索有关如何在 CUDA 中使用类的一些见解,虽然普遍认为它可以完成并且显然是由人们完成的,但我很难找到如何真正做到这一点.

我有一个类,它实现了一个带有运算符重载等的基本位集.我需要能够在主机和设备上实例化此类的对象,在两者之间进行复制等.我是否在 .cu 中定义了此类?如果是这样,我如何在我的主机端 C++ 代码中使用它?类的函数不需要访问像threadId这样的特殊CUDA变量;它只需要能够用于主机和设备端.

感谢您的帮助,如果我以完全错误的方式处理这个问题,我很乐意听到替代方案.

解决方案

在 #include 的头文件中定义类,就像在 C++ 中一样.

任何必须从设备代码调用的方法都应该使用 __device__ 和 __host__ 声明规范,包括构造函数和析构函数,如果您打算使用 new/delete 在设备上(注意 new/delete 需要 CUDA 4.0 和计算能力 2.0 或更高的 GPU).>

你可能想定义一个像

这样的宏

#ifdef __CUDACC__#define CUDA_CALLABLE_MEMBER __host__ __device__#别的#define CUDA_CALLABLE_MEMBER#万一

然后在你的成员函数上使用这个宏

class Foo {民众:CUDA_CALLABLE_MEMBER Foo() {}CUDA_CALLABLE_MEMBER ~Foo() {}CUDA_CALLABLE_MEMBER void aMethod() {}};

这样做的原因是只有 CUDA 编译器知道 __device____host__ —— 你的主机 C++ 编译器会引发错误.

注意 __CUDACC__ 已定义由 NVCC 在编译 CUDA 文件时使用.这可以是在使用 NVCC 编译 .cu 文件时,也可以是在使用命令行选项 -x cu 编译任何文件时.

I've searched all over for some insight on how exactly to use classes with CUDA, and while there is a general consensus that it can be done and apparently is being done by people, I've had a hard time finding out how to actually do it.

I have a class which implements a basic bitset with operator overloading and the like. I need to be able to instantiate objects of this class on both the host and the device, copy between the two, etc. Do I define this class in a .cu? If so, how do I use it in my host-side C++ code? The functions of the class do not need to access special CUDA variables like threadId; it just needs to be able to be used host and device side.

Thanks for any help, and if I'm approaching this in completely the wrong way, I'd love to hear alternatives.

解决方案

Define the class in a header that you #include, just like in C++.

Any method that must be called from device code should be defined with both __device__ and __host__ declspecs, including the constructor and destructor if you plan to use new/delete on the device (note new/delete require CUDA 4.0 and a compute capability 2.0 or higher GPU).

You probably want to define a macro like

#ifdef __CUDACC__
#define CUDA_CALLABLE_MEMBER __host__ __device__
#else
#define CUDA_CALLABLE_MEMBER
#endif 

Then use this macro on your member functions

class Foo {
public:
    CUDA_CALLABLE_MEMBER Foo() {}
    CUDA_CALLABLE_MEMBER ~Foo() {}
    CUDA_CALLABLE_MEMBER void aMethod() {}
};

The reason for this is that only the CUDA compiler knows __device__ and __host__ -- your host C++ compiler will raise an error.

Edit: Note __CUDACC__ is defined by NVCC when it is compiling CUDA files. This can be either when compiling a .cu file with NVCC or when compiling any file with the command line option -x cu.

这篇关于CUDA 和类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Rising edge interrupt triggering multiple times on STM32 Nucleo(在STM32 Nucleo上多次触发上升沿中断)
How to use va_list correctly in a sequence of wrapper functions calls?(如何在一系列包装函数调用中正确使用 va_list?)
OpenGL Perspective Projection Clipping Polygon with Vertex Outside Frustum = Wrong texture mapping?(OpenGL透视投影裁剪多边形,顶点在视锥外=错误的纹理映射?)
How does one properly deserialize a byte array back into an object in C++?(如何正确地将字节数组反序列化回 C++ 中的对象?)
What free tiniest flash file system could you advice for embedded system?(您可以为嵌入式系统推荐什么免费的最小闪存文件系统?)
Volatile member variables vs. volatile object?(易失性成员变量与易失性对象?)