Click to See Complete Forum and Search --> : large stings memory footprint


Marraco
November 14th, 2008, 06:59 AM
I have an application which reads a 70 Mb text file (compressed into a 5 Mb file).

The problem is, that it takes initially 256 Mb to read the file (with this code): Public Function CompressedFileToStringArray(ByVal FilePath As String) As String()
Dim Answer() As String
Try
Dim fs As FileStream = File.OpenRead(FilePath)
Dim GZfs As New GZipStream(fs, CompressionMode.Decompress, False)
Dim sr As New StreamReader(GZfs)
Answer = sr.ReadToEnd.Split(vbCrLf.ToCharArray, StringSplitOptions.RemoveEmptyEntries)
fs.Close()

Catch
Throw New Exception _
("Error en Function LíneasDeArchivo" & vbCrLf & _
"no se pudo abrir:" & vbCrLf & _
FilePath)
End Try

Return Answer
End Function
Later, it grows up to 720 Mb!

¿Some idea on how to reduce the memory footprint?

TheCPUWizard
November 14th, 2008, 07:08 AM
This is a perfect example of the reasons to PRE-allocate memory...

Most of the collections/arrays use a "doubling" approach to memory allocation, as this has proven to be most useful and efficient in the general case.

This means that if you have a collection allocated for 50 items, and add a 51st, the allocated size becomes 100. When you add a 101st it becormes 200.

This is for each individual collection.

-----

Now if you are talki8ng about the aggregate memory footprint you need to be very careful......

The framework allocates memory from the OS as required. Provided it is getting memory without issues, it will continue (within certain bounds) to keep growing.

This is NOT a problem. Generally speaking the most efficient use of resources is when the resources are being used. The is no reason to keep "Free" memory.

As memory pressure increases, the GC will begin to run, and make memory available WITHIN the process for future allocations. This does NOT mean that the memory will be returned fto the operating system.

------

Bottom line. You need to look at HOW the memory is being used, and you need to look at the entire system environment.

Marraco
November 14th, 2008, 07:24 AM
This is a perfect example of the reasons to PRE-allocate memory...

Most of the collections/arrays use a "doubling" approach to memory allocation, as this has proven to be most useful and efficient in the general case.probably that is the reasen because Notepat reserves 150 mb to read the 70 mb file.

This means that if you have a collection allocated for 50 items, and add a 51st, the allocated size becomes 100. When you add a 101st it becormes 200.

This is for each individual collection.
The problem is, that, I tried to copy each string (one by one) into another array of strings. Then I make The OriginalString = Nothing, and even call GC.Collect, but it does nothing. the memory remains the same.

Probably each string allocates much more memory than needed.

TheCPUWizard
November 14th, 2008, 07:34 AM
probably that is the reasen because Notepat reserves 150 mb to read the 70 mb file.
The problem is, that, I tried to copy each string (one by one) into another array of strings. Then I make The OriginalString = Nothing, and even call GC.Collect, but it does nothing. the memory remains the same.

Probably each string allocates much more memory than needed.

Go back and REREAD my post. Go read the DOCUMENTATION.

Just becuase you are no longer referencing an object (setting something to nothing if it is going out of scope is BAD bractice) does NOT mean that the memory will be returned to the operating system, nor will you see a reduction in memory "usage".

This expectation is one of the clearest indicators that a person does not understand the very fundamentals of using .NET (this is not VB.NET specific).

Consider: (pseudo code)

for num = 1 to 1000000
item = new Item()
item.DoSomething();
next num

Clearly there will only be one "Item" in use at any time ( a total of one million will be created).

IF the computer system has sufficient memory to allocate all one million (ie GC does not run for the duration of the loop), there is NO problem.

IF GC (explicitly or implicitly) runs after the loop, the process will have already allocated significant memory from the OS. But there is no prima facia reason for the proces to return the memory to the OS.

Therefore a perfectly valid memory profile would be a rapid increase during the running of the loop, and the memory footprint of the process NEVER going down.

Marraco
November 14th, 2008, 09:59 AM
This is a perfect example of the reasons to PRE-allocate memory...As I understand it, you suggest to declare an array of fixed length (of fixed length strings). But I don't know beforehand the length of the strings (instead of Dim Answer() As String)
otherwise, I don't understand how I can pre allocate memory. There is not malloc() on VB.NET (Or I maybe are wrong?).
(also, I suspect than declaring Answer other way, would block use of this line:
Answer = sr.ReadToEnd.Split(vbCrLf.ToCharArray, StringSplitOptions.RemoveEmptyEntries))
Most of the collections/arrays use a "doubling" approach to memory allocation, as this has proven to be most useful and efficient in the general case.

This means that if you have a collection allocated for 50 items, and add a 51st, the allocated size becomes 100. When you add a 101st it becomes 200.

This is for each individual collection.I think that I understand it.

I think you mean that sr.ReadToEnd.Split reserves probably a power of 2 size of memory (or something like that), and than an array is internally a collection, or maybe you assume that I copy the array elements into some collection.


-----

Now if you are talki8ng about the aggregate memory footprint you need to be very careful......
My English is not good. I don't know if you mean some specific technical term with the word "footprint". I mean the total size the task manager declares the executable uses.
I mean that:
Before calling those lines:
Dim fs As FileStream = File.OpenRead(FilePath)
Dim GZfs As New GZipStream(fs, CompressionMode.Decompress, False)
Dim sr As New StreamReader(GZfs)
Answer = sr.ReadToEnd.Split(vbCrLf.ToCharArray, StringSplitOptions.RemoveEmptyEntries)
fs.Close()
My application have less than 30 Mb memory.
After calling that routine, it jumps to 250 Mb!

So, I tried cloning each string, one by one, and tried to free the original array memory with myArray = nothing
GC.Collectbut it makes no difference
The framework allocates memory from the OS as required. Provided it is getting memory without issues, it will continue (within certain bounds) to keep growing.

This is NOT a problem. Generally speaking the most efficient use of resources is when the resources are being used. The is no reason to keep "Free" memory.My app reach later 900 Mb. Then, the free memory is 4 /5 Mb, and start heavily using the swap file.

As memory pressure increases, the GC will begin to run, and make memory available WITHIN the process for future allocations. This does NOT mean that the memory will be returned fto the operating system.As I understand, you mean the memory is free to be allocated again by my own program, but not for the OS.
Bottom line. You need to look at HOW the memory is being used, and you need to look at the entire system environment.I don't get it.

Go back and REREAD my post. Go read the DOCUMENTATION.MS documentation is the worst I ever had seen. I never find any answer in the MSDN. I get a lot of unrelated topics on java, or anything (even when specifically restrict the answers to VB). Sometimes I can't even find ONE of the words I had searched on the MSDN search results.

Can you be more specific? Do you suggest search about strings or garbage collector documentation?Just because you are no longer referencing an object (setting something to nothing if it is going out of scope is BAD practice) does NOT mean that the memory will be returned to the operating system, nor will you see a reduction in memory "usage".I had done my search before posting here (obviously wrong). The array does not have a .Dispose sub, and I does not have found any way on how to free the array memory.This expectation is one of the clearest indicators that a person does not understand the very fundamentals of using .NET (this is not VB.NET specific).

Consider: (pseudo code)

for num = 1 to 1000000
item = new Item()
item.DoSomething();
next num

Clearly there will only be one "Item" in use at any time ( a total of one million will be created).

IF the computer system has sufficient memory to allocate all one million (ie GC does not run for the duration of the loop), there is NO problem.

IF GC (explicitly or implicitly) runs after the loop, the process will have already allocated significant memory from the OS. But there is no prima facia reason for the proces to return the memory to the OS.

Therefore a perfectly valid memory profile would be a rapid increase during the running of the loop, and the memory footprint of the process NEVER going down.My best interpretation is that i need to free each string one by one?

...and how to do it? I have fond only a .Finalize method, but it is not accessible.

TheCPUWizard
November 14th, 2008, 10:15 AM
A couple of points:

1) "Total FootPrint" is what is being shown by TaskMgr (and some of the PerfMon counters.

2) When using MSDN, it is a good idea to utilize the filters feature. Many people find google, with "msdn.microsoft.com" as part of the query a better way to search msdn than msdn.microsoft.com itself.

3) EXPLICIT calls to GC.Collect are a BAD idea. They can actually cause memory requirements to INCREASE.

4) Dispose is only applicable if the managed code [VB.NET] is using a resource which must be explicitly returned or which has limited availability [WIN32 API objects such as Pens, and DB objects such as connection, etc]

5) The Finalizer will only be invoked IF you have a BUG in your program (you failed to call Dispose and/or Dispose failed to suppress the finalizer.

6) Consider the following:

a) I am at work and need to write something down, so I get a pen from the supply cabinet [new]
b) I write down what I need [method call]
c) I put the pen in my pocket [no more usage]
d) I go home and empty my pockets [no more reference - I can not longer "Reach" the pen

Tomorrow I repeat this process..And again the next day.

Eventually I have many pens at home. This is not a problem provided:

a) I have a place to store all of them
b) The supply cabinet does not run out of pens for myself or others.

It is only when one of the above occurs, that I must [because I am honest and decent] return the pens to the supply cabinet.

The same is true for memory utilization. There is nothing "wrong" with a program using ALL of the available memory.... until the OS requests that the process return some.

Cimperiali
November 14th, 2008, 10:23 AM
I do believe question was:
"I am running out of memory. Is there any way to have back a bit before the usual?"
and I am afraid answer is : "no".

Marraco
November 14th, 2008, 11:09 AM
Thanks for your help.
...
2) When using MSDN, it is a good idea to utilize the filters feature. Many people find google, with "msdn.microsoft.com" as part of the query a better way to search msdn than msdn.microsoft.com itself.The filters don't work. (and exactly NOW they are not available, so, I cannot provide an easy to get example)

I totally agree on Google, although Microsoft disables much of the Google links to MSDN. Frequently, direct google links to MSDN not work, but once google tells what to looking for, you can research it on MSDN.3) EXPLICIT calls to GC.Collect are a BAD idea. They can actually cause memory requirements to INCREASE.That is a good piece of advice. It tells me that I are even more lost than I though:eek:...
It is only when one of the above occurs, that I must [because I am honest and decent] return the pens to the supply cabinet..... hhhmm I' not decent, but at least honest;)
Is there a way to return the Strings pens? (or the entire array?)

The same is true for memory utilization. There is nothing "wrong" with a program using ALL of the available memory.... until the OS requests that the process return some.(I are kicked from the building now, so, I gonna get back Monday)

Oblio
November 16th, 2008, 07:34 AM
if you are having memory problems then don't use readtoend.

Marraco
November 18th, 2008, 07:16 AM
Ok. I give up.

Maybe if I make the reading of data in an independent dll, then I can call the dll to do the reading, and send me the data in an memory efficient structure. But that will only work if I can unload the dll from RAM after using it.

¿Does it make sense, or it does not worth the work?

(My application needs to stay in memory all day, so I cannot mess with the available RAM.)

TheCPUWizard
November 18th, 2008, 07:35 AM
¿Does it make sense, or it does not worth the work?

(My application needs to stay in memory all day, so I cannot mess with the available RAM.)

1) No it does not make sense...but thinkg that is does is a very common mistake (even among professional developers)

2) If your program is running, but not accessing specific pages of memory, they will be swapped out to disk, and have NO impact on the running state of your machine.

Marraco
November 18th, 2008, 08:06 AM
1) No it does not make sense...but thinkg that is does is a very common mistake (even among professional developers)

2) If your program is running, but not accessing specific pages of memory, they will be swapped out to disk, and have NO impact on the running state of your machine.The problem is, that since the memory taken increases later, it reach easily 1 Gb, and Windows start swapping the memory on hard disk. That makes the computer unusable.
Worst, the swapping causes the code to run for hours, instead of minutes.

It would be solved I were possible to free the unused strings memory.

Cimperiali
November 18th, 2008, 08:11 AM
if that is really an issue, then you should think to rewrite code to read and populate array: do a more coded job, and you will be able to preallocate the exact amount of bytes you need. Matter, however is: why it keep on consuming ram? You sure you need all those new instances? Could it be done with a single instance stuff (see about "shared" keyword)?

TheCPUWizard
November 18th, 2008, 08:16 AM
if that is really an issue, then you should think to rewrite code to read and populate array: do a more coded job, and you will be able to preallocate the exact amount of bytes you need. Matter, however is: why it keep on consuming ram? You sure you need all those new instances? Could it be done with a single instance stuff (see about "shared" keyword)?

Just remember that:

1) EVERY modification to a string creates a new string ALWAYS.
2) If you are creating object create that 84999 bytes [42499 (minus overhead) characters] they are going onto the LOH and fragmentation will cause memory growth.

Ideally, a long running program should NEVER have a string that exceeds about 40K in length. Not one, not for an instant.

Marraco
November 19th, 2008, 07:39 AM
if ...(see about "shared" keyword)?Just remember that...Not one, not for an instant.It looks like I have a lot of unexpected work...:sick:

Marraco
November 19th, 2008, 04:05 PM
I have moved the code to a new form, and recovered some memory after unloading the form. It solved some of the problem.

TheCPUWizard
November 19th, 2008, 04:30 PM
I have moved the code to a new form, and recovered some memory after unloading the form. It solved some of the problem.

In general (and ignoring this may be part of your problem), Forms (and Controls) should only contain sufficient code to allow them to communicate with a NON-UI class.

using C#, but the intent is 100% the same...

void SomeBody_Click(object sender, EventArg e)
{
m_Something.HandleClick();
}
Something m_Something;

class Something
{
void HandleClick() { //real work goes here }
}

This easily allows you to manage the lifetime of the different aspects independantly. For example, once a ListControl is populated, do you need the original list anymore at all????

Marraco
November 20th, 2008, 09:02 AM
In general (and ignoring this may be part of your problem), Forms (and Controls) should only contain sufficient code to allow them to communicate with a NON-UI class.

using C#, but the intent is 100% the same...

void SomeBody_Click(object sender, EventArg e)
{
m_Something.HandleClick();
}
Something m_Something;

class Something
{
void HandleClick() { //real work goes here }
}

This easily allows you to manage the lifetime of the different aspects independantly. For example, once a ListControl is populated, do you need the original list anymore at all????

I tried to make a class on an independent class file, to hold the code, but the only way to get free the memory than I know is
myClassInstance = Nothing
GC.CollectAt GC.Collect, some 100 Mb of data are freed (¿How is the right writing for "made free"?).
But putting the code on a Form Class, free 150 MB (after the GC.Collect call the total memory footprint is 50 Mb less, and the form takes more memory for himself, which also free at GC.Collect)

It looks like a form have better memory isolation than a simple class.

PD: I know from your advice that I need to learn more about the garbage collector. Its on my to-do list to read about that, altough from your messages, I understand that is not the solution.

TheCPUWizard
November 20th, 2008, 09:24 AM
1) Stop even thinking about EVER calling GC.Collect. Put all of you money and valuables in a can, and promise that you will set the can on fire the next time you consider it. In 7+ years developing (many) dozens of DotNet code (a few million lines), there has been exactly 1 case I have seen where it was warranted.

2) Stop worrying about the memory footprint in general. Instead measure system performance during actual usage. If the system does not "need" that 100MB of memory, then freeing it is meaningless. The system knows what is appening and will generally adjust properly. If there is a MEASURED need to influence this, you create artifical demand via "Memory Pressure"

3) Learn about object lifetime. Setting a reference variable to "nothing" has NO impact unless the variable itself is going to stay in scope for significantly longer than the actual object it is referencing. Most often this indicates a design issue with the scope of the variable:
Pseudo-Code:

class BadSample
{
SomeClass m_SomeVariable;// ONLY used to carry state from Func1 to Func2
public void Function1()
{
m_SomeVariable = new SomeClass();
Function2()
}
private void Function2()
{
// Operations that use m_SomeVariable
}
}

The variable (m_SomeVariable) has a lifetime of the instance of BadSample, but is only meaningful for the duration of Function2. his should (generally) be recorded by passing the variable as a parameter.


class BadSample
{
SomeClass m_SomeVariable;// ONLY used to carry state from Func1 to Func2
public void Function1()
{
SomeClass someVariable = new SomeClass();
Function2(someVariable );
}
private void Function2(SomeClass someVariable)
{
// Operations that use someVariable
}
}

Now the scope of the variable is Function1, and the instance of SomeClass will go out of scope immediately and be eligible for GC, without ever seting anything to "Nothing"/"null".

Once each of your "non-UI" classes are properly implemented to minimize object lifetimes. [And this should be measured with a good profiler] utilizing these classes in you UI (again following the principle of lifetime minimization) can be leveraged.

For example:
A click event needs to perform some calculations and update a textbox

1) Write a class for the calculations
2) In the click handler create and initialize an instance of this class as a local variable
3) Invoke the method(s) which perform the calculations
4) Update the textbox with the results
5) Allow the class instance to simply go out of scope.


3)

Marraco
November 20th, 2008, 10:28 AM
1) Stop even thinking about EVER calling GC.Collect. Put all of you money and valuables in a can, and promise that you will set the can on fire the next time you consider it. In 7+ years developing (many) dozens of DotNet code (a few million lines), there has been exactly 1 case I have seen where it was warranted.I am' sure that you know 1,000,000.54 times more than me about it, although when I not write the GC.Collect line, my app start continuously swapping memory to disk.:confused:
(maybe MS changed something on Framework 3.5 or 3.5 SP1 or Visual Studio 2008 SP1)
It start swapping when the used memory reach 900 Mb (virtual memory is even higher). Anyway later my app eats those 100 Mb "recovered", because of bad design:sick:
But I need to solve the memory wasting step by step. This first.

Those said, and being Christmas near, I would hide all the cans from the pyromaniac wizards:)

TheCPUWizard
November 20th, 2008, 11:15 AM
Your approach is backwards, you are treating a symptom, not addressing the actual problem.

What happened when you put a profiler break right before where you are currently calling GC.Collect....

How big was each of the Heaps?

How many Time had each generation of GC already run?

When was the LAST time the GC had run for each generation?

How many Finalizers have been Executed?

What was the impact of Creating Memory Pressure?

Are the object in GEN2 expected to live for the duration of the process?

These are all relatively simple and quick measurements to make. They will quickly point out the what is necessary to address the underlying CAUSE.

If you make those measurements and post them [please start a new thread "Understanding Memory Utilization" as this is a more general topic] I will be happy to guide you through the analysis.

Marraco
November 20th, 2008, 12:36 PM
Your approach is backwards, you are treating a symptom, not addressing the actual problem.
...
If you make those measurements and post them [please start a new thread "Understanding Memory Utilization" as this is a more general topic] I will be happy to guide you through the analysis.Ok. Thanks.
I need to install the CLR profiler before.

TheCPUWizard
November 20th, 2008, 12:40 PM
Ok. Thanks.
I need to install the CLR profiler before.

The profiler is definately useful for the LAST of the above actions [You may find it easier to use Red-Gate's ANTS profiler]

All of the other items (excluding Last Time of GC) can be done directly in the immediate Window while sitting at a breakpoint in the IDE.

Deteriming the time of LAst GC takes about 15 lines of code be added to your program at startup....

AGAIN: Please start a NEW Thread for discussion of this topic.