问题描述
我得到了一个 Json,其中包含一个存储 base64 编码字符串的数据字段.这个 Json 被序列化并发送到客户端.
I got a Json, which contains among others a data field which stores a base64 encoded string. This Json is serialized and send to a client.
在客户端,newtonsoft json.net 反序列化器用于取回 Json.但是,如果数据字段变大(~ 400 MB),反序列化器将抛出内存不足异常:数组尺寸超出支持范围.我还在任务管理器中看到,内存消耗确实增长得很快.
On client side, the newtonsoft json.net deserializer is used to get back the Json. However, if the data field becomes large (~ 400 MB), the deserializer will throw an out of memory exception: Array Dimensions exceeded supported Range. I also see in Task-Manager, that memory consumption really grows fast.
有什么想法吗?json字段或其他东西有最大大小吗?
Any ideas why this is? Is there a maximum size for json fields or something?
代码示例(简化):
HttpResponseMessage responseTemp = null;
responseTemp = client.PostAsJsonAsync(client.BaseAddress, message).Result;
string jsonContent = responseTemp.Content.ReadAsStringAsync.Result;
result = JsonConvert.DeserializeObject<Result>(jsonContent);
结果类:
public class Result
{
public string Message { get; set; }
public byte[] Data { get; set; }
}
更新:
我认为我的问题不是序列化程序,而只是试图处理内存中如此巨大的字符串.在我将字符串读入内存时,应用程序的内存消耗会爆炸式增长.对该字符串的每个操作都执行相同的操作.目前,我认为我必须找到一种方法来处理流,并停止一次将整个内容读入内存.
I think my problem is not the serializer, but just trying to handle such a huge string in memory. At the point where I read the string into memory, the memory consumption of the application explodes. Every operation on that string does the same. At the moment, I think I have to find a way to work with streams and stop reading the whole stuff into memory at once.
推荐答案
这里有两个问题:
您的 JSON 响应中有一个单个 Base64 数据字段,大于 ~400 MB.
You have a single Base64 data field inside your JSON response that is larger than ~400 MB.
您正在将整个响应加载到中间字符串 jsonContent
中,该字符串更大,因为它嵌入了单个数据字段.
You are loading the entire response into an intermediate string jsonContent
that is even larger since it embeds the single data field.
首先,我假设您使用的是 64 位.如果没有,切换.
Firstly, I assume you are using 64 bit. If not, switch.
不幸的是,第一个问题只能改善而不是修复,因为 Json.NET 的 JsonTextReader
无法读取块"中的单个字符串值.与 <代码>XmlReader.ReadValueChunk().它将始终完全实现每个原子字符串值.但是 .Net 4.5 添加了以下可能有帮助的设置:
Unfortunately, the first problem can only be ameliorated and not fixed because Json.NET's JsonTextReader
does not have the ability to read a single string value in "chunks" in the same way as XmlReader.ReadValueChunk()
. It will always fully materialize each atomic string value. But .Net 4.5 adds the following settings that may help:
<gcAllowVeryLargeObjects enabled=true"/>
.
此设置允许数组最多具有 int.MaxValue
条目,即使这会导致底层内存缓冲区大于 2 GB.但是,您仍然无法读取长度超过 2^31 个字符的单个 JSON 令牌,因为 JsonTextReader
在 private char[] 中缓冲每个单个标记的全部内容_chars;
数组,并且在 .Net 中,一个数组最多只能容纳 int.MaxValue
项.
This setting allows for arrays with up to
int.MaxValue
entries even if that would cause the underlying memory buffer to be larger than 2 GB. You will still be unable to read a single JSON token of more than 2^31 characters in length, however, sinceJsonTextReader
buffers the full contents of each single token in aprivate char[] _chars;
array, and, in .Net, an array can only hold up toint.MaxValue
items.
GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce
.
此设置允许压缩大型对象堆,并可能减少由于地址空间碎片导致的内存不足错误.
This setting allows the large object heap to be compacted and may reduce out-of-memory errors due to address space fragmentation.
不过,第二个问题可以通过流式反序列化来解决,如 this answer to this question 中所示a href="https://stackoverflow.com/users/2318354/dilip0165">Dilip0165;使用 HttpClient 和 JSON.NET 的高效 api 调用 作者:John Thiriet;性能提示:优化内存使用 由 Newtonsoft 提供;和 使用新的 .NET HttpClient 和 HttpCompletionOption 进行流式传输Tugberk Ugurlu 的 .ResponseHeadersRead.综合这些来源的信息,您的代码应如下所示:
The second problem, however, can be addressed by streaming deserialization, as shown in this answer to this question by Dilip0165; Efficient api calls with HttpClient and JSON.NET by John Thiriet; Performance Tips: Optimize Memory Usage by Newtonsoft; and Streaming with New .NET HttpClient and HttpCompletionOption.ResponseHeadersRead by Tugberk Ugurlu. Pulling together the information from these sources, your code should look something like:
Result result;
var requestJson = JsonConvert.SerializeObject(message); // Here we assume the request JSON is not too large
using (var requestContent = new StringContent(requestJson, Encoding.UTF8, "application/json"))
using (var request = new HttpRequestMessage(HttpMethod.Post, client.BaseAddress) { Content = requestContent })
using (var response = client.SendAsync(request, HttpCompletionOption.ResponseHeadersRead).Result)
using (var responseStream = response.Content.ReadAsStreamAsync().Result)
{
if (response.IsSuccessStatusCode)
{
using (var textReader = new StreamReader(responseStream))
using (var jsonReader = new JsonTextReader(textReader))
{
result = JsonSerializer.CreateDefault().Deserialize<Result>(jsonReader);
}
}
else
{
// TODO: handle an unsuccessful response somehow, e.g. by throwing an exception
}
}
或者,使用 async/await
:
Result result;
var requestJson = JsonConvert.SerializeObject(message); // Here we assume the request JSON is not too large
using (var requestContent = new StringContent(requestJson, Encoding.UTF8, "application/json"))
using (var request = new HttpRequestMessage(HttpMethod.Post, client.BaseAddress) { Content = requestContent })
using (var response = await client.SendAsync(request, HttpCompletionOption.ResponseHeadersRead))
using (var responseStream = await response.Content.ReadAsStreamAsync())
{
if (response.IsSuccessStatusCode)
{
using (var textReader = new StreamReader(responseStream))
using (var jsonReader = new JsonTextReader(textReader))
{
result = JsonSerializer.CreateDefault().Deserialize<Result>(jsonReader);
}
}
else
{
// TODO: handle an unsuccessful response somehow, e.g. by throwing an exception
}
}
我上面的代码没有经过全面测试,并且 错误和取消处理需要实现.您可能还需要设置超时,如下所示这里 和 这里.Json.NET 的 JsonSerializer
不支持异步反序列化,使其与 HttpClient
的异步编程模型有点尴尬.
My code above isn't fully tested, and error and cancellation handling need to be implemented. You may also need to set the timeout as shown here and here. Json.NET's JsonSerializer
does not support async deserialization, making it a slightly awkward fit with the asynchronous programming model of HttpClient
.
最后,作为使用 Json.NET 从 JSON 文件读取大量 Base64 块的替代方法,您可以使用 JsonReaderWriterFactory
确实支持阅读可管理的块中的 Base64 数据.有关详细信息,请参阅 this answer to 通过流式传输 json 的某些部分来解析巨大的 OData JSON 以避免 LOH 以解释如何使用此阅读器流式传输巨大的 JSON 文件,以及 this answer to 从 XmlReader 读取流,base64 解码并将结果写入文件 了解如何使用 XmlReader.ReadElementContentAsBase64
Finally, as an alternative to using Json.NET to read a huge Base64 chunk from a JSON file, you could use the reader returned by JsonReaderWriterFactory
which does support reading Base64 data in manageable chunks. For details, see this answer to Parse huge OData JSON by streaming certain sections of the json to avoid LOH for an explanation of how stream through a huge JSON file using this reader, and this answer to Read stream from XmlReader, base64 decode it and write result to file for how to decode Base64 data in chunks using XmlReader.ReadElementContentAsBase64
这篇关于Json.Net 反序列化内存不足问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!