|
-
January 5th, 2012, 08:20 AM
#1
[RESOLVED] System.outofmemoryexception, how to handle
Hello,
Below is code that compares 122 text files (there are two copies of each file, making 61 old copies and 61 new copies). These are data files from a database and can be extremely large. I am having a problem when I hit my largest file (471,483 KB and it grows every day depending on what is added to it). In my last test run the two files were 471,483 KB and 485,359 KB). I compare these files to only extract the new data to a new text file that i write in another direcotry. Is there a way i can handle/free up memory to handle this exception? Thanks in advance.
Code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Comparing text files to find new data...");
//these are the folders that have the data
// i perform two loops to go through both of them
string folderUsing;
try
{
//to iterate over both folders
for (int i = 1; i <= 2; i++)
{
if (i == 1)
{
folderUsing = "dataset1";
}
else
{
folderUsing = "dataset2";
}
Console.WriteLine();
Console.WriteLine("Using data from {0}", folderUsing);
//New appended data location, this folder will always be there
string folderSending = "C:\\Reporting\\" + folderUsing + "\\Uploads\\";
//New Data location, this folder will always be there
string folderNewData = "C:\\Reporting\\" + folderUsing + "\\NewData";
//Old files get moved here, need to create this folder if not one already
//this acts as my backup
string folderArchive = "C:\\Reporting\\" + folderUsing + "\\" + DateTime.Now.ToString("yyyyMMdd") + "\\";
//Prior files location, this folder will always be there
string folderPrior = "C:\\Reporting\\" + folderUsing + "\\";
//Create the new data folder if it does not exist
if (!Directory.Exists(folderArchive))
{
Directory.CreateDirectory(folderArchive);
}
//Prelim test, make sure all files in new dir are in old dir
//if there is a new file move it into the new directory immediately
string[] fileEntriesNew = Directory.GetFiles(folderNewData);
foreach (string fileName in fileEntriesNew)
{
string newFileName = Path.GetFileName(fileName);
FileInfo newFileInfo = new FileInfo(fileName);
FileInfo oldFileInfo = new FileInfo(folderPrior + newFileName);
//Check old dir name for similar file, if it does not exist, copy entire file over
if (!File.Exists(folderPrior + newFileName))
{
Console.WriteLine("{0} does not exist.", folderPrior + newFileName);
Console.WriteLine("{0} will be copied over.", fileName);
Console.WriteLine("Press enter to continue");
Console.ReadLine();
File.Move(fileName, folderSending);
}
else
{
Console.WriteLine();
Console.WriteLine("Comparing files {0}, {1} KB", newFileName, newFileInfo.Length / 1024);
//the new file should never be less than the old file size
if (newFileInfo.Length < oldFileInfo.Length)
{
Console.WriteLine("Possible error: The new file is smaller than the old file");
Console.WriteLine("Press enter to continue");
Console.ReadLine();
}
//if there are two similar files open them both
if (File.Exists(folderPrior + newFileName))
{
Console.WriteLine("Reading files");
//IEnumerable data sources (arrays), this will get all lines in newText not in oldText
string[] newText = File.ReadAllLines(fileName);
string[] oldText = File.ReadAllLines(folderPrior + newFileName);
//The query based on the data sources
IEnumerable<string> differenceQuery = newText.Except(oldText);
//Any will get any differences and I do not want the first line to be blank
if (differenceQuery.Any() && differenceQuery.First() != "")
{
Console.WriteLine("Outputting new lines");
using (StreamWriter fsSending = new StreamWriter(folderSending + newFileName, true))
{
foreach (string newLine in differenceQuery)
{
fsSending.WriteLine(newLine);
}
}
}
}
}
}
//this will move new files in to the old dir and old files into the archive dir
Console.WriteLine();
Console.WriteLine("Moving files to appropriate destination");
string[] fileEntriesOld = Directory.GetFiles(folderNewData);
foreach (string fileName in fileEntriesOld)
{
string fileMove = Path.GetFileName(fileName);
File.Move(folderPrior + fileMove, folderArchive + fileMove);
File.Move(fileName, folderPrior + fileMove);
}
}
}
catch (Exception e)
{
Console.WriteLine("The process failed: {0}", e.ToString());
throw;
}
finally
{
Console.WriteLine();
Console.WriteLine("Program finished");
}
}
}
}
-
January 5th, 2012, 09:40 AM
#2
Re: System.outofmemoryexception, how to handle
These 2 statements will eat up your memory if your files are very big:
string[] newText = File.ReadAllLines(fileName);
string[] oldText = File.ReadAllLines(folderPrior + newFileName);
You can use ReadLine method of StreamReader to compare the text line by line.
-
January 5th, 2012, 10:06 AM
#3
Re: System.outofmemoryexception, how to handle
Would I be able to still use the IEnumerable? Initially I did use your suggested method (i would open the old file read a line, open the new file and go through the entire file to find a similar line, if it was not there i would use streamwriter to write it to a new file, if a match was found in the old file I would go to the next line in the new file, close the old file and continue the process). However this process was taking way too long as you can imagine.
-
January 6th, 2012, 09:39 AM
#4
Re: System.outofmemoryexception, how to handle
then you need to figure out what is the maximum file size your system will accept and use a byte[] buffer to store partial string you get from your text file, process it, then clear it and read in other part of the file.
-
January 8th, 2012, 12:42 PM
#5
Re: System.outofmemoryexception, how to handle
Or use:
Code:
foreach (var line in File.ReadLines (path_to_file))
Process (line);
or
Code:
using (var stream = new StreamReader (path_to_file)) {
string line = null;
while ((line = stream.ReadLine ()) != null)
Process (line);
}
www.monotorrent.com For all your .NET bittorrent needs
NOTE: My code snippets are just snippets. They demonstrate an idea which can be adapted by you to solve your problem. They are not 100% complete and fully functional solutions equipped with error handling.
-
January 9th, 2012, 03:01 PM
#6
Re: System.outofmemoryexception, how to handle
I will revisit my code and let you know what I come up with. Thanks for the help!
-
January 9th, 2012, 05:01 PM
#7
Re: [RESOLVED] System.outofmemoryexception, how to handle
Will the code below find the differences in both text files regardless of where the lines are? (for example is this code just comparing line 1 and line 1, line 2 and line 2 etc.)
Code:
var differenceQuery = File.ReadLines(fileName).Except(File.ReadLines(folderPrior + newFileName));
Console.WriteLine("Outputting new lines");
using (StreamWriter fsSending = new StreamWriter(folderSending + newFileName, true))
{
foreach (string newLine in differenceQuery)
{
fsSending.WriteLine(newLine);
}
}
-
January 10th, 2012, 05:41 AM
#8
Re: [RESOLVED] System.outofmemoryexception, how to handle
This query will work, but it will hit similar memory issues as your original approach. You will require *all* of the first file to be in memory in order to execute the query. If your files are massive (which they are), you are quite likely to require custom logic to do the checking and comparing. Is every line likely to be unique in your text file or does it contain about 100 unique lines which are just repeated a lot?
www.monotorrent.com For all your .NET bittorrent needs
NOTE: My code snippets are just snippets. They demonstrate an idea which can be adapted by you to solve your problem. They are not 100% complete and fully functional solutions equipped with error handling.
-
January 10th, 2012, 09:37 AM
#9
Re: [RESOLVED] System.outofmemoryexception, how to handle
there are a lot of repeating lines yes. The new data that is added to the second file depends on how much the user inputs to the end of the file or changes to lines within the file. You are right that last bit of code eats my memory, but for some reason it did not give me the exception, just froze my computer. For example i had an old file of 64,141kb and a new file of 66,885kb and the new text file i created was only 2,974kb..will hashing the files first help?
Last edited by dssrun; January 10th, 2012 at 09:54 AM.
-
January 11th, 2012, 12:07 PM
#10
Re: [RESOLVED] System.outofmemoryexception, how to handle
does it help that these two files are really data sets (comma separated files stored in different formats). Would creating two data sets and merging them to find the differences work? If so, how would i go about this?
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|