问题描述
这是我上一个问题的后续:.NET interop 是来回复制数组数据,还是固定数组?
This is a follow-up to my previous question: Does .NET interop copy array data back and forth, or does it pin the array?
我的方法是 COM 接口方法(而不是 DllImport
方法).C# 签名如下所示:
My method is a COM interface method (rather than a DllImport
method). The C# signature looks like this:
void Next(ref int pcch,
[In, Out, MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 0)]
char [] pchText);
MSDN 说:
当托管 Char 类型(默认具有 Unicode 格式)是传递给非托管代码,互操作封送拆收器转换字符设置为 ANSI.您可以将 DllImportAttribute 属性应用于平台调用声明和 StructLayoutAttribute 属性到 COM 互操作声明以控制哪个字符集封送 Char 类型使用.
When a managed Char type, which has Unicode formatting by default, is passed to unmanaged code, the interop marshaler converts the character set to ANSI. You can apply the DllImportAttribute attribute to platform invoke declarations and the StructLayoutAttribute attribute to a COM interop declaration to control which character set a marshaled Char type uses.
另外,@HansPassant 在他的回答中这里说:
Also, @HansPassant in his answer here says:
一个 char[] 不能被编组为 LPWStr,它必须是 LPArray.现在CharSet 属性起作用,因为您没有指定它,所以char[] 将被编组为 8 位 char[],而不是 16 位 wchar_t[].封送的数组元素大小不同(不是"blittable") 所以编组器必须复制数组.
A char[] can't be marshaled as LPWStr, it has to be LPArray. Now the CharSet attribute plays a role, since you did not specify it, the char[] will be marshaled as an 8-bit char[], not a 16-bit wchar_t[]. The marshaled array element is not the same size (it is not "blittable") so the marshaller must copy the array.
非常不受欢迎,特别是考虑到您的 C++ 代码期望wchar_t.在这种特定情况下,一个非常简单的判断方法是没有得到数组中的任何内容.如果数组是通过复制编组的,那么您必须明确告诉编组器该数组需要通话后复制回来.您必须应用 [In, Out]论据上的属性.你会得到中文.
Pretty undesirable, particularly given that your C++ code expects wchar_t. A very easy way to tell in this specific case is not getting anything back in the array. If the array is marshaled by copying then you have to tell the marshaller explicitly that the array needs to be copied back after the call. You'd have to apply the [In, Out] attribute on the argument. You'll get Chinese.
我找不到 CharSet
(通常与 DllImportAttribute
和 StructLayoutAttribute
一起使用),可应用于 COM 接口方法.
I coudn't find an analog of CharSet
(normally used with DllImportAttribute
and StructLayoutAttribute
) which could be applied to a COM interface method.
不过,我没有在输出中看到中文".一切似乎都很好,我确实从 COM 得到了正确的 Unicode 字符.
Nevertheless, I don't get "Chinese" on the output. Everything seems to work fine, I do get correct Unicode characters back from COM.
这是否意味着 Char
对于 COM 方法互操作总是被解释为 WCHAR
?
Does it mean Char
is always interpreted as WCHAR
for COM method interop?
我找不到任何文件证实或否认这一点.
I couldn't find any documentation confirming or denying this.
推荐答案
我认为这是一个很好的问题,char
(System.Char
) 互操作行为确实如此值得关注.
I think this is a good question, and the char
(System.Char
) interop behavior does deserve some attention.
在托管代码中,sizeof(char)
始终等于 2
(两个字节),因为在 .NET 中字符始终是 Unicode.
In managed code, sizeof(char)
is always equal 2
(two bytes), because in .NET characters are always Unicode.
然而,当 char
用于 P/Invoke(调用导出的 DLL API)和 COM(调用 COM 接口方法)时,编组规则不同.
Nevertheless, the marshaling rules differ between cases when char
for P/Invoke (calling an exported DLL API) and COM (calling a COM interface method).
对于 P/Invoke,CharSet
可以与任何 [DllImport]
属性一起显式使用,或通过 [module|assembly:DefaultCharSet(CharSet.Auto|Ansi|Unicode)]
,更改每个模块或每个程序集的所有 [DllImport]
声明的默认设置.
For P/Invoke, CharSet
can be used explictly with any [DllImport]
attribute, or implicitly via [module|assembly: DefaultCharSet(CharSet.Auto|Ansi|Unicode)]
, to change the default setting for all [DllImport]
declarations per module or per assembly.
默认值为CharSet.Ansi
,表示会有Unicode到ANSI的转换.我通常使用 [module: DefaultCharSet(CharSet.Unicode)]
将默认值更改为 Unicode,然后在那些罕见的情况下选择性地使用 [DllImport(CharSet = CharSet.Ansi)]
我需要调用 ANSI API 的情况.
The default value is CharSet.Ansi
, which means there will be Unicode-to-ANSI conversion. I ussualy change the default to Unicode with [module: DefaultCharSet(CharSet.Unicode)]
, and then selectively use [DllImport(CharSet = CharSet.Ansi)]
in those rare case where I need call an ANSI API.
还可以使用 MarshalAs(UnmanagedType.U1|U2)
或 MarshalAs(UnmanagedType.LPArray, ArraySubType = 更改任何特定的
(对于 char
类型参数UnmanagedType.U1|U2)char[]
参数).例如,你可能有这样的事情:
It is also possible to alter any specific char
-typed parameter with MarshalAs(UnmanagedType.U1|U2)
or MarshalAs(UnmanagedType.LPArray, ArraySubType = UnmanagedType.U1|U2)
(for a char[]
parameter). E.g., you may have something like this:
[DllImport("Test.dll", ExactSpelling = true, CharSet = CharSet.Unicode)]
static extern bool TestApi(
int length,
[In, Out, MarshalAs(UnmanagedType.LPArray] char[] buff1,
[In, Out, MarshalAs(UnmanagedType.LPArray,
ArraySubType = UnmanagedType.U1)] char[] buff2);
在这种情况下,buff1
将作为双字节值的数组传递(原样),但 buff2
将与单字节数组相互转换字节值.请注意,这仍将是 buff2
的智能 Unicode 到操作系统当前代码页(和返回)转换.例如,Unicode 'x20AC' (€
) 将在非托管代码中变为 x80
(假设操作系统代码页为 Windows-1252
).这就是 MarshalAs(UnmanagedType.LPArray, ArraySubType = UnmanagedType.U1)] char[] buff
的编组与 MarshalAs(UnmanagedType.LPArray, ArraySubType = UnmanagedType.U1)] ushort 的不同之处[] buff
.对于 ushort
,0x20AC
将被简单地转换为 0xAC
.
In this case, buff1
will be passed as an array of double-byte values (as is), but buff2
will be converted to and from an array of single byte values. Note, this still will be a smart, Unicode-to-OS-current-code-page (and back) conversion for buff2
. E.g, a Unicode 'x20AC' (€
) will become x80
in the unmanaged code (rovided the OS code page is Windows-1252
). This is how marshalling of MarshalAs(UnmanagedType.LPArray, ArraySubType = UnmanagedType.U1)] char[] buff
would be different from MarshalAs(UnmanagedType.LPArray, ArraySubType = UnmanagedType.U1)] ushort[] buff
. For ushort
, 0x20AC
would be simply converted to 0xAC
.
对于调用 COM 接口方法,情况完全不同.在那里,char
始终被视为表示 Unicode 字符的双字节值.或许,Don Box 的Essential COM"(引用 本页):
For calling a COM interface method, the story is quite different. There, char
is always treated as a double-byte value representing a Unicode character. Perhaps, the reason for such design decision could be implied from Don Box's "Essential COM" (quoting the footnote from this page):
选择 OLECHAR
类型是为了支持 Win32 API 使用的通用 TCHAR
数据类型,以减少支持每个接口的两个版本(CHAR
和 WCHAR
).通过仅支持一种字符类型,对象开发人员与客户使用的 UNICODE 预处理器符号的状态分离.
The
OLECHAR
type was chosen in favor of the commonTCHAR
data type used by the Win32 API to alleviate the need to support two versions of each interface (CHAR
andWCHAR
). By supporting only one character type, object developers are decoupled from the state of the UNICODE preprocessor symbol used by their clients.
显然,同样的概念也出现在 .NET 中.即使对于旧版 ANSI 平台(如 Windows 95,其中 Marshal.SystemDefaultCharSize == 1
),我也很有信心.
Apparently, the same concept made its way to .NET. I'm pretty confident this is true even for legacy ANSI platforms (like Windows 95, where Marshal.SystemDefaultCharSize == 1
).
请注意,当 DefaultCharSet
是 COM 接口方法签名的一部分时,它对 char
没有任何影响.也没有办法显式应用 CharSet
.但是,您仍然可以使用 MarshalAs
完全控制每个单独参数的编组行为,其方式与上面的 P/Invoke 完全相同.例如,如果非托管 COM 代码需要 ANSI 字符的缓冲区,您的 Next
方法可能如下所示:
Note that DefaultCharSet
doesn't have any effect on char
when it's a part of the COM interface method signature. Neither there is a way to apply CharSet
explicitly. However, you still have full control over the marshaling behavior of each individual parameter with MarshalAs
, in exactly the same way as for P/Invoke above. E.g., your Next
method might look like below, in case the unmanaged COM code expects a buffer of ANSI characters:
void Next(ref int pcch,
[In, Out, MarshalAs(UnmanagedType.LPArray,
ArraySubType = UnmanagedType.U1, SizeParamIndex = 0)] char [] pchText);
这篇关于COM 方法、Char 类型和 CharSet的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!