August 22nd, 2012, 04:19 AM
Read Extremely large file efficiently in C#. Currently using StreamReader
I have a Json file that is sized 50GB and beyond. Following is what I have written to read a very small chunk of the Json. I now need to modify this to read the large file.
Would it suffice if I specify the buffer size while constructing the StreamReader in the current code?
internal static IEnumerable<T> ReadJson<T>(string filePath)
DataContractJsonSerializer ser = new DataContractJsonSerializer(typeof(T));
using (StreamReader sr = new StreamReader(filePath))
// Read and display lines from the file until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
byte jsonBytes = Encoding.UTF8.GetBytes(line);
XmlDictionaryReader jsonReader = JsonReaderWriterFactory.CreateJsonReader(jsonBytes, XmlDictionaryReaderQuotas.Max);
var myPerson = ser.ReadObject(jsonReader);
yield return (T)myPerson;
Please correct me if I am wrong here.. The buffer size basically specifies how much data is read from disk to memory at a time. So if File is 100MB in size with buffer size as 5MB, it reads 5MB at a time to memory, until entire file is read.
Assuming my understanding of point 3 is right, what would be the ideal buffer size with such a large text file? Would int.Max size be a bad idea? In 64-bit PC int.Max size is 2147483647. I presume buffer size is in bytes, which evaluates to about 2GB. This itself could consume time. I was looking at something like 100MB - 300MB as buffer size.
Last edited by BioPhysEngr; August 24th, 2012 at 11:38 PM.
Reason: change quote tags to code tags
Click Here to Expand Forum to Full Width