问题描述
我已经到处搜索有关如何在 CUDA 中使用类的一些见解,虽然普遍认为它可以完成并且显然是由人们完成的,但我很难找到如何真正做到这一点.
我有一个类,它实现了一个带有运算符重载等的基本位集.我需要能够在主机和设备上实例化此类的对象,在两者之间进行复制等.我是否在 .cu 中定义了此类?如果是这样,我如何在我的主机端 C++ 代码中使用它?类的函数不需要访问像threadId这样的特殊CUDA变量;它只需要能够用于主机和设备端.
感谢您的帮助,如果我以完全错误的方式处理这个问题,我很乐意听到替代方案.
在 #include 的头文件中定义类,就像在 C++ 中一样.
任何必须从设备代码调用的方法都应该使用 __device__ 和
>__host__
声明规范,包括构造函数和析构函数,如果您打算使用 new
/delete
在设备上(注意 new
/delete
需要 CUDA 4.0 和计算能力 2.0 或更高的 GPU).
你可能想定义一个像
这样的宏#ifdef __CUDACC__#define CUDA_CALLABLE_MEMBER __host__ __device__#别的#define CUDA_CALLABLE_MEMBER#万一
然后在你的成员函数上使用这个宏
class Foo {民众:CUDA_CALLABLE_MEMBER Foo() {}CUDA_CALLABLE_MEMBER ~Foo() {}CUDA_CALLABLE_MEMBER void aMethod() {}};
这样做的原因是只有 CUDA 编译器知道 __device__
和 __host__
—— 你的主机 C++ 编译器会引发错误.
注意 __CUDACC__
已定义由 NVCC 在编译 CUDA 文件时使用.这可以是在使用 NVCC 编译 .cu 文件时,也可以是在使用命令行选项 -x cu
编译任何文件时.
I've searched all over for some insight on how exactly to use classes with CUDA, and while there is a general consensus that it can be done and apparently is being done by people, I've had a hard time finding out how to actually do it.
I have a class which implements a basic bitset with operator overloading and the like. I need to be able to instantiate objects of this class on both the host and the device, copy between the two, etc. Do I define this class in a .cu? If so, how do I use it in my host-side C++ code? The functions of the class do not need to access special CUDA variables like threadId; it just needs to be able to be used host and device side.
Thanks for any help, and if I'm approaching this in completely the wrong way, I'd love to hear alternatives.
Define the class in a header that you #include, just like in C++.
Any method that must be called from device code should be defined with both __device__
and __host__
declspecs, including the constructor and destructor if you plan to use new
/delete
on the device (note new
/delete
require CUDA 4.0 and a compute capability 2.0 or higher GPU).
You probably want to define a macro like
#ifdef __CUDACC__
#define CUDA_CALLABLE_MEMBER __host__ __device__
#else
#define CUDA_CALLABLE_MEMBER
#endif
Then use this macro on your member functions
class Foo {
public:
CUDA_CALLABLE_MEMBER Foo() {}
CUDA_CALLABLE_MEMBER ~Foo() {}
CUDA_CALLABLE_MEMBER void aMethod() {}
};
The reason for this is that only the CUDA compiler knows __device__
and __host__
-- your host C++ compiler will raise an error.
Edit:
Note __CUDACC__
is defined by NVCC when it is compiling CUDA files. This can be either when compiling a .cu file with NVCC or when compiling any file with the command line option -x cu
.
这篇关于CUDA 和类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!