StyleCop Memory Usage

Jul 27, 2011 at 2:25 PM
I use StyleCop to check a solution with some very Big Code Files and the Memory Ballons over 1GB. 
I have now traced the reason to the function below LoadBuffer(int count);

The reason this ballons the memory are the following:
1) Big arrays are not immediatly garbage collected so when you make a array of a file that has maybe 1Mb then it takes it's time to get garbage collected.
2) The function works in this way that it always creates a new array that is 80 bytes bigger. If I have a big file with maybe 1 Mb of data. Then this function allocates
12'500 increasing big arrays (I would call this an linear algorithm).

My idea would be to improve this function. And always double the allocation size

1. 80
2. 160
3. 320
4. 640
5. 1280
...

with the following code

CharacterBlockSize = CharacterBlockSize *2;

this has the advantage to be log(n).

Now my question is how should I contribute it, as a patch, or a fork and then a pull request?

Thanks in advance.

/// <summary> /// Loads the internal character buffer with the requested number of characters. /// </summary> /// <param name="count">The number of characters to load.</param> /// <returns>Returns true if the characters were loaded, or false if the end /// of the character source was reached before all the characters were loaded.</returns> private bool LoadBuffer(int count) { Param.AssertGreaterThanOrEqualToZero(count, "count"); // Check whether there are already enough characters in the current buffer. if (this.bufferLength > this.position + count - 1) { Debug.Assert(this.charBuffer != null && this.charBuffer.Length > this.position + count - 1, "The buffer position is invalid."); return true; } // Create a new buffer large enough to contain the left over characters from the previous // buffer, as well as the new characters to read from the code. int leftOverCharacterCount = this.bufferLength - this.position; char[] newBuffer = new char[CharacterBlockSize + leftOverCharacterCount]; // Fill in any characters left over from the previous buffer. for (int i = 0; i < leftOverCharacterCount; ++i) { newBuffer[i] = this.charBuffer[this.position + i]; } // Read the new set of characters from the code buffer. int numberOfCharactersRead = this.code.ReadBlock(newBuffer, leftOverCharacterCount, CharacterBlockSize); // Set the correct number of characters in the new buffer. this.bufferLength = leftOverCharacterCount + numberOfCharactersRead; // Save the new buffer and reset the position. this.position = 0; this.charBuffer = newBuffer; // Return true if the requested number of characters are in the buffer. return this.bufferLength >= count; }
Coordinator
Jul 27, 2011 at 2:38 PM

Hi Thomas,

Great investigation, thanks. If you post your replacement function here I'll merge it in.

 

Andy.

Aug 23, 2011 at 1:12 PM
Edited Aug 24, 2011 at 2:52 PM
         // The main change is that not always a buffer is allocated only when it is necessary (if it is to small).
// if it is small enough the data is simply internally copied.
// This solution is massive faster but I don't know if the tests still work.
// Could you please try to run the tests if it still works?
// Maybe some range checks are needed in peek and read to simulate the limited size behavior of the buffers.

///
 <summary>          /// Loads the internal character buffer with the requested number of characters.          /// </summary>          /// <param name="count">The number of characters to load.</param>          /// <returns>Returns true if the characters were loaded, or false if the end           /// of the character source was reached before all the characters were loaded.</returns>         private bool LoadBuffer(int count)          {              Param.AssertGreaterThanOrEqualToZero(count, "count");              // Check whether there are already enough characters in the current buffer.              if (this.bufferLength > this.position + count - 1)              {                  Debug.Assert(this.charBuffer != null && this.charBuffer.Length > this.position + count - 1, "The buffer position is invalid.");                  return true;              }              // Create a new buffer large enough to contain the left over characters from the previous              // buffer, as well as the new characters to read from the code.              char[] newBuffer;              int leftOverCharacterCount = this.bufferLength - this.position;            
            if (this.charBuffer == null || (CharacterBlockSize + leftOverCharacterCount) > this.charBuffer.Length)
            {
                // allocate a bigger buffer
                newBuffer = new char[CharacterBlockSize + leftOverCharacterCount];
            }
            else
            {
                // reuse the old buffer
                newBuffer = this.charBuffer;
            }            
 
            // Fill in any characters left over from the previous buffer.
            for (int i = 0; i < leftOverCharacterCount; ++i)
            {
                newBuffer[i] = this.charBuffer[this.position + i];
            }
 
            // Read the new set of characters from the code buffer.
            int numberOfCharactersRead = this.code.ReadBlock(newBuffer, leftOverCharacterCount, CharacterBlockSize);
 
            // Set the correct number of characters in the new buffer.
            this.bufferLength = leftOverCharacterCount + numberOfCharactersRead;
 
            // Save the new buffer and reset the position.
            this.position = 0;
            this.charBuffer = newBuffer;
 
            // Return true if the requested number of characters are in the buffer.
            return this.bufferLength >= count;
        }
Aug 24, 2011 at 2:53 PM
changed:

if
 ((CharacterBlockSize + leftOverCharacterCount) > this.charBuffer.Length)
to:
if
 (this.charBuffer == null || (CharacterBlockSize + leftOverCharacterCount) > this.charBuffer.Length)

That it works with the standard code
Coordinator
Aug 24, 2011 at 4:15 PM

Yep. I'd done that. I have it here working but it doesn't fix the other memory problem that someone else was having....so I'm still investigating. This change will be in 4.6 soon.

Aug 24, 2011 at 11:39 PM
andyr wrote:

Yep. I'd done that. I have it here working but it doesn't fix the other memory problem that someone else was having....so I'm still investigating. This change will be in 4.6 soon.

Hi Andy,

If there's any further investigation you need for that memory issue (ie running alphas or builds w/ logging, etc), let me know.

Regards,

Daniel B.

Aug 25, 2011 at 5:14 AM
Edited Aug 25, 2011 at 5:14 AM

I usually use

Memory Profiler for profiling Memory usage

http://memprofiler.com/download.aspx.

I found with this although Memory Leaks in Resharper.

Sincerly Thomas

 

Aug 25, 2011 at 1:41 PM
Edited Aug 25, 2011 at 1:45 PM

I let run MemoryProfiler on StyleCop

And maybe I found an issue. In CodeParser there is a partial Elements List. Where partial Classes are added. But as far as I can see they aren't anymore removed. So my interpretation is the following if I have 100 partial classes all this partial classes are kept alive by the Code Parser and so the memory grows (Because a parsed class is quite big).

I don't know how to solve it because I don't know styleCop very well.

Some Ideas Maybe this partial classes could be cached on disk and not in the memory, and only got from the file cache when needed. (There is already a caching facility for that).

Or First a fast pass is made to find all partial classes that belong together.

Or maybe this list isn't needed altogether.

P.S.: I have a lot of partial classes in the solution where the memory grows.

Sincerly Thomas

P.S.:

private Node<CsTokenParseElementContainerBody(

it's the partialElements collection that is given as a parameter
Coordinator
Aug 25, 2011 at 8:25 PM

Thanks Thomas.

PartialElements is a Dictionary of strings and List<CsElements>. Its created in the CsParser.cs type in the PreParse method and set to null in the PostParse method and so it is cleared. In the StyleCopCore.cs it calls PreParse and PostParse on each parser.

Aug 25, 2011 at 9:41 PM

Hi Thomas,

I did a similar thing - attached MemoryProfiler to the MSBUILD instance. I noticed significant growth in the number of CodePoint instances. That was from a quick run before I left for the day. If I get a chance, I'll run it again.

Like you, the projects where I have the most memory usage are also dominated by partial class definitions.

Regards,

Daniel B.

Aug 26, 2011 at 5:29 AM
Edited Aug 26, 2011 at 5:39 AM

Hi Andy

Yes you are right in the PostParse this collection is set to null. How is it done in Reshaper plugin and Msbuild.

If in Msbuild for every Project only the postparse is only called once than the maximum Memory Usage is all partial classes in memory.

How is it don in Resharper plugin if it is called per file analyzation than no memory leak occours if it is called once per project it is the same as in msbuild.

Sincerly Thomas

Aug 26, 2011 at 5:38 AM
Edited Aug 26, 2011 at 5:38 AM

Hi Daniel B.

I although saw the growth of the number of CodePoints. Maybe changing them to structs removes some memory pressure.

Sincerly Thomas

Coordinator
Aug 28, 2011 at 2:58 PM

Sorry Thomas,

I'm not sure I understand your question.

Andy

Aug 29, 2011 at 6:47 AM

Hi Andy

I'll try to explain my question with pseudocode.

public class StyleCopCore

{

private void Analyze(IList<CodeProject> projects, ....)

{
...

foreach(var parser in parsers)
{
parser.PreProcess
}

RunMultithreadedParseactionsOnEveryProject

foreach(var parser in parsers)
{
// Here the Partial Lists are cleared.
parser.PostProcess
}
}

}

So my question is how many "projects" or files are put into the anlyze function. If all Files of a project are put into the analyze function then all parsed partial classes remain into memory until the PostProcess

MsBuild:

1. So my question is Is this function called from the MSbuild process.

2. If yes are all Files entered as input.

Resharper:

1. Is this function although used by Resharper.

2. If yes is the input only the current File or although all files of the project.

I'm asking that because I'm not so the MSBuild Resharper integration expert.

I use StyleCop from a Custom Console Application (StylecopCommand) and there this is the case as I have described it. So the memory starts growing and spikes for every project.

Sincerly Thomas

Coordinator
Aug 29, 2011 at 10:50 AM
Edited Aug 30, 2011 at 7:26 AM
Understood. I will investigate later. If ok can I do you a special version to try some tests please? Are you using 4.6?
A.

Aug 29, 2011 at 10:56 AM
Edited Aug 30, 2011 at 10:00 PM
Hi Andy,
I'm currently afk, but can test some custom builds day after tomorrow.
Cheers,
Daniel B
Aug 29, 2011 at 11:29 AM

Hi Andy

Yes I can use 4.6. You can send me custom Versions at thomas dot stocker at gmail dot com.

Or send me an email where I can download the custom version

Sincerly Thomas

Coordinator
Aug 29, 2011 at 12:51 PM

Hi.

I've just had a look at this. It loads and keeps all the partial types found in all the .cs files being processsed. However, it has to do this as the partial types found could be needed to complete the analysis of any types being analysed, so it cannot free them until it's finished all the cs files.

Aug 30, 2011 at 11:35 AM

Hi Andy

So I have two ideas to reduce the Memory usage of the loaded partial Classes.

1. Use structs for certain often used Classes instead of class

    A class has an overhead of 28 bytes if you use a struct this overhead is not there

    http://stackoverflow.com/questions/26570/sizeof-equivalent-for-reference-types

   Example Codepoint:

   class 28 bytes + 3 (integers) * 4 (size of integer)  ==  40 bytes. for class

   struct 3 (integers) * 4 (size of integer) = 12 bytes for class

   size reduction: around 70 %.

   Problems: Sematic change (unmutable , a value type semantics).

2. More fundamental change cache the parsed classes not in memory but on disk

   For performance keep the most used classes in memory.

   Problem: Code changes.

3. Pre parse the files to find out partial classes that belong together and process them together.

   Problem: Code Changes.

Sincerly Thomas

Coordinator
Sep 2, 2011 at 8:14 AM

 

Hi all. For those helping with investigating memory and performable issues can you please install the 4.6.2.0 debug build now uploaded. Follow the doc at http://stylecop.codeplex.com/wikipage?title=EnablingTracing 

Sep 2, 2011 at 9:38 PM
andyr wrote:

 

Hi all. For those helping with investigating memory and performable issues can you please install the 4.6.2.0 debug build now uploaded. Follow the doc at http://stylecop.codeplex.com/wikipage?title=EnablingTracing 

Hi Andy,

I've run this one our two biggest projects (although I ran it through MSBUILD.exe, and not VS). 

Where would you like me to send them?

Cheers,

Daniel B.

Coordinator
Sep 2, 2011 at 9:54 PM
Andy at stylecop dot com please



On 2 Sep 2011, at 22:38, "djcbecroft" <notifications@codeplex.com> wrote:

From: djcbecroft

andyr wrote:

Hi all. For those helping with investigating memory and performable issues can you please install the 4.6.2.0 debug build now uploaded. Follow the doc at http://stylecop.codeplex.com/wikipage?title=EnablingTracing

Hi Andy,

I've run this one our two biggest projects (although I ran it through MSBUILD.exe, and not VS).

Where would you like me to send them?

Cheers,

Daniel B.

Sep 21, 2011 at 1:02 AM

Hi Andy,

Did the output files I emailled assist in any way? Do you need another test with a later build, perhaps?

Cheers,

Daniel B.

Nov 27, 2011 at 8:36 PM

Hi Andy,

Is any further information required from us to assist in the investigation of this?

Cheers,

Daniel B.