问题描述
我们的 .NET 应用程序存在 x 文件问题.或者,更确切地说,混合 Win32 和 .NET 应用程序.
We have an x-files problem with our .NET application. Or, rather, hybrid Win32 and .NET application.
当它尝试与 Oracle 通信时,它就死了.消失.向着天空中的黑色大虚空而去.没有事件日志消息,没有异常,没有任何东西.
When it attempts to communicate with Oracle, it just dies. Vanishes. Goes to the big black void in the sky. No event log message, no exception, no nothing.
如果我们只是要求应用程序与 MS SQL Server 对话,其效果是将 OracleConnection 和相关类的使用替换为 SqlConnection 和相关类,它会按预期工作.
If we simply ask the application to talk to a MS SQL Server instead, which has the effect of replacing the usage of OracleConnection and related classes with SqlConnection and related classes, it works as expected.
今天我们取得了突破.
出于某种原因,一位客户发现,通过将所有应用程序文件放在他桌面上的一个目录中,Oracle 也可以正常工作.将目录向下移动到驱动器的根目录,或者在 C:Temp 中,或者,稍微移动一下,导致崩溃再次出现.
For some reason, a customer had figured out that by placing all the application files in a directory on his desktop, it worked as expected with Oracle as well. Moving the directory down to the root of the drive, or in C:Temp or, well, around a bit, made the crash reappear.
基本上,如果从桌面目录运行,应用程序可以 100% 重现,如果从根目录运行,应用程序会失败.
Basically it was 100% reproducable that the application worked if run from directory on desktop, and failed if run from directory in root.
今天我们发现重要的区别在于目录名称中是否有空格.
Today we figured out that the difference that counted was wether there was a space in the directory name or not.
所以,这些目录可以工作:
So, these directories would work:
C:Program FilesAppDirExecutable.exe
C:Temp LempAppDirExecutable.exe
C:Documents and SettingssomeuserDesktopAppDirExecutable.exe
而这些不会:
C:CompanyNameAppDirExecutable.exe
C:ProgramfilerAppDirExecutable.exe <-- Program Files in norwegian
C:TempAppDirExecutable.exe
我希望读到这篇文章的人看到过类似的行为,并且有啊哈,你需要在 oracle glitz 驱动程序配置上调整 frob"或类似的东西.
I'm hoping someone reading this has seen similar behavior and have a "aha, you need to twiddle the frob on the oracle glitz driver configuration" or similar.
有人吗?
跟进 #1: 好的,我现在已经处理了 procmon 输出,这两个文件都是从我点击尝试打开触发级联故障的窗口的按钮时开始的,我注意到了他们主要跟踪,两个文件顶部附近有一些小的差异,而且他们跟踪很长的路要走.
Followup #1: Ok, I've processed the procmon output now, both files from when I hit the button that attempts to open the window that triggers the cascade failure, and I've noticed that they keep track mostly, there's some smallish differences near the top of both files, and they they keep track a long way down.
但是,当一个运行失败时,另一个会继续运行,日志输出的下几行是:
However, when one run fails, the other keeps going and the next few lines of the log output are these:
ReadFile C:oracleproduct10.2.0db_1BINorageneric10.dll SUCCESS Offset: 274 432, Length: 32 768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O
ReadFile C:oracleproduct10.2.0db_1BINorageneric10.dll SUCCESS Offset: 233 472, Length: 32 768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O
在此之后,工作运行继续执行,另一个在线程关闭和应用程序关闭之前触摸 mscorwks.dll 文件几次.因此,失败的运行不会触及上述文件.
After this, the working run continues to execute, and the other touches the mscorwks.dll files a few times before threads close down and the app closes. Thus, the failed run does not touch the above files.
后续 #2: 想尝试升级 oracle 客户端驱动程序,但 10.2.0.1 显然是 Windows 2003 服务器和 XP 客户端可用的最高版本.
Followup #2: Figured I'd try to upgrade the oracle client drivers, but 10.2.0.1 is apparently the highest version available for Windows 2003 server and XP clients.
跟进#3:好吧,我们最终得到了一个黑盒解决方案.基本上我们发现问题与 XPO 和 Oracle 有关.XPO 有一个它管理的系统表,称为 XPObjectType,它包含三列:Oid、TypeName 和 AssemblyName.由于 Oracle 在我们与之交谈的数据库中的配置方式,列名是 OID、TYPENAME 和 ASSEMBLYNAME.这通常不会成为问题,除了 XPO 直接与架构信息对话并检查表是否具有正确的列名,并且 XPO 不处理大小写差异,因此它会看到一个 XPObjectType 表,其中包含三个未知列并且没有它期望的那些.
Followup #3: Well, we've ended up with a black-box solution. Basically we found that the problem is somewhere related to XPO and Oracle. XPO has a system-table it manages, called XPObjectType, with three columns: Oid, TypeName and AssemblyName. Due to how Oracle is configured in the databases we talk to, the column names were OID, TYPENAME and ASSEMBLYNAME. This would ordinarily not be a problem, except that XPO talks to the schema information directly and checks if the table is there with the right column names, and XPO doesn't handle case differences so it sees a XPObjectType table with three unknown columns and none of those it expects.
XPO 现在到底做了什么我真的不知道,但是如果我删除了这个表,并用正确的大小写重新创建它,在所有列名周围使用双引号来正确区分大小写,问题就不会出现起来.
Exactly what XPO does now I don't really know, but if I dropped this table, and recreated it with the right case, using double quotes around all the column names to get the case right, the problem doesn't crop up.
文件夹名称中的空格到底是从哪里来的,我仍然不知道,但这个问题有两个层次:
Exactly where the space in the folder name comes into this, I still have no idea, but this problem had two tiers:
- 阻止应用程序在我们的客户面前崩溃,短期解决方案
- 修复错误,长期解决方案
现在第 1 层已解决,第 2 层将暂时放回队列并优先处理.无论如何,我们的数据层面临着一些更大的变化,所以这可能不是我们需要解决的问题,至少如果我们所有的 Oracle 客户都验证表修复确实解决了问题的话.
Right now tier 1 is solved, tier 2 will be put back into the queue for now and prioritized. We're facing some bigger changes to our data tier anyway so this might not be a problem we need to solve, at least if all our Oracle-customers verify that the table-fix actually gets rid of the problem.
我会接受 Dave Markle 的回答,因为 Process Monitor(文件监视器的老大哥)实际上并没有查明问题,我能够使用它来确定在我在 XPO 为该表建立查询的用户代码中的断点之后,直到应用程序关闭的所有条目都没有发生 I/O记录,这让我相信是这张桌子是罪魁祸首,或者至少以某种方式影响了问题.
I'll accept the answer by Dave Markle since though Process Monitor (the big brother of File Monitor) didn't actually pinpoint the problem, I was able to use it to determine that after my breakpoint in user-code where XPO had built up the query for this table, no I/O happened until all the entries for the application closing down was logged, which led me to believe it was this table that was the culprit, or at least influenced the problem somehow.
如果我能找到真正的原因,我会更新帖子.
If I manage to get to the real cause of this, I'll update the post.
推荐答案
这就是我要做的.首先,三重检查您是否看到了您认为您正在看到的行为.通过不使用 System.IO.Path 连接路径,我可以看到这种情况发生了相反的情况,但不像你看到的那样.三重检查文件权限是否有意义.
Here's what I would do. First, TRIPLE-check that you're seeing the behavior you think you're seeing. I can see this happening the other way around by not using System.IO.Path to concatenate paths, but not like you're seeing it. Triple-check that the file permissions make sense.
接下来,从 MS 下载 Filemon 并观察发生了什么当您的程序遇到这些麻烦的地方时,文件系统会受到影响.您可以过滤掉特定的文件活动(例如,删除您的防病毒文件活动),以使您在执行此操作时看起来更干净.使用 FileMon 查找程序的成功案例和错误案例的文件访问错误.这应该指出您正在访问什么文件并导致问题.例如,如果您在访问无意义的文件名时看到 FILE_NOT_FOUND
错误,则可以确定您或供应商做错了什么,可能导致您的问题...
Next, download Filemon from MS and watch what's happening on the filesystem as your program hits these troubled spots. You can filter out specific file activity (removing your anti-virus file activity, for example) to make everything look a bit cleaner while you do this. Look for file access errors using FileMon for both the success case and the error case for your program. That should point you to what file's being accessed and causing the problem. For example, if you see a FILE_NOT_FOUND
error accessing a nonsense filename, you can be assured that you or the vendor are doing something wrong, possibly leading to your problem...
这篇关于与 oracle 通信时应用程序崩溃,除非可执行路径包含空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!