Read Extremely large file efficiently in C#. Currently using StreamReader
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 11 of 11

Thread: Read Extremely large file efficiently in C#. Currently using StreamReader

Threaded View

  1. #1
    Join Date
    May 2012
    Location
    Earth!
    Posts
    9

    Read Extremely large file efficiently in C#. Currently using StreamReader

    I have a Json file that is sized 50GB and beyond. Following is what I have written to read a very small chunk of the Json. I now need to modify this to read the large file.

    Code:
    internal static IEnumerable<T> ReadJson<T>(string filePath)
    {
        DataContractJsonSerializer ser = new DataContractJsonSerializer(typeof(T));
        using (StreamReader sr = new StreamReader(filePath))
        {
            String line;
            // Read and display lines from the file until the end of
            // the file is reached.
            while ((line = sr.ReadLine()) != null)
            {
                byte[] jsonBytes = Encoding.UTF8.GetBytes(line);
                XmlDictionaryReader jsonReader = JsonReaderWriterFactory.CreateJsonReader(jsonBytes, XmlDictionaryReaderQuotas.Max);
                var myPerson = ser.ReadObject(jsonReader);
                jsonReader.Close();
    
                yield return (T)myPerson;
            }
        }
    }
    Would it suffice if I specify the buffer size while constructing the StreamReader in the current code?
    Please correct me if I am wrong here.. The buffer size basically specifies how much data is read from disk to memory at a time. So if File is 100MB in size with buffer size as 5MB, it reads 5MB at a time to memory, until entire file is read.
    Assuming my understanding of point 3 is right, what would be the ideal buffer size with such a large text file? Would int.Max size be a bad idea? In 64-bit PC int.Max size is 2147483647. I presume buffer size is in bytes, which evaluates to about 2GB. This itself could consume time. I was looking at something like 100MB - 300MB as buffer size.
    Last edited by BioPhysEngr; August 24th, 2012 at 10:38 PM. Reason: change quote tags to code tags

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Azure Activities Information Page

Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This is a CodeGuru survey question.


Featured


HTML5 Development Center