C# StreamReader causing +200ms overhead to read data from source

C# StreamReader causing +200ms overhead to read data from source - c#

There is an issue where we are seeing some periodic +200ms overhead on reading the Input Stream from a Stream Reader when there is load on the system. I am wondering has anyone else seen this and if they have done anything to fix it?
The following is the code:
string requestBody;
var streamReaderTime = Stopwatch.StartNew();
using (var streamReader = new StreamReader(context.Request.InputStream, context.Request.ContentEncoding))
{
var allLines = streamReader.ReadLines();
var request = new StringBuilder();
allLines.ForEach(line => request.Append(line));
requestBody = request.ToString();
}
streamReaderTime.Stop();
ReadLine is just as follows:
public static IEnumerable<string> ReadLines(this StreamReader reader)
{
while (!reader.EndOfStream)
{
yield return reader.ReadLine();
}
}
Note: Using ReadLines() or ReadToEnd() makes very little difference if any.
We run performance tests overnight and we are seeing the following behavior just from graphing streamReaderTime.
A single request takes between 45ms and 70ms to execute but it can be seen from the screenshot that it is adding on a fixed value and sometimes an even bigger spike. I saw it before being at around 1.5 seconds.
If anyone has any solutions/suggestions it would be greatly appreciated.
Edit : I did have ReadToEnd() instead of ReadLines() and that got rid of the StringBuilder but it was still the same overhead. Is there an alternative to StreamReader, just to test out even? It does seem like GC cost since having a request ever ten seconds does not effect it, but the exact same request per second will cause this overhead to happen. Also I am not able to reproduce it locally either, it is only in the virtual environment that this is happening.

This issue is not with the above code at all. The issue is from the caller. The service calling is using a library that is cutting the connections to early and that overhead is the connection being re-established again.

Related

SSH.NET Real-Time Logging High CPU

Let me preface this question by saying I'm absolutely not a pro C# programmer and have pretty much brute forced my way through most of my small programs so far.
I'm working on a small WinForms application to SSH into a few devices, tail -f a log file on each, and display the real-time output in TextBoxes while also saving to log files. Right now, it works, but hogs nearly 30% of my CPU during logging and I'm sure I'm doing something wrong.
After creating the SshClient and connecting, I run the tail command like so (these variables are part of a logger class which exists for each connection):
command = client.CreateCommand("tail -f /tmp/messages")
result = command.BeginExecute();
stream = command.OutputStream;
I then have a log reading/writing function:
public async Task logOutput(IAsyncResult result, Stream stream, TextBox textBox, string logPath)
{
// Clear textbox ( thread-safe :) )
textBox.Invoke((MethodInvoker)(() => textBox.Clear()));
// Create reader for stream and writer for text file
StreamReader reader = new StreamReader(stream, Encoding.UTF8, true, 1024, true);
StreamWriter sw = File.AppendText(logPath);
// Start reading from SSH stream
while (!result.IsCompleted || !reader.EndOfStream)
{
string line = await reader.ReadLineAsync();
if (line != null)
{
// append to textbox
textBox.Invoke((Action)(() => textBox.AppendText(line + Environment.NewLine)));
// append to file
sw.WriteLine(line);
}
}
}
Which I call the following way, per device connection:
Task.Run(() => logOutput(logger.result, logger.stream, textBox, fileName), logger.token);
Everything works fine, it's just the CPU usage that's the issue. I'm guessing I'm creating way more than one thread per logging process, but I don't know why or how to fix that.
Does anything stand out as a simple fix to the above code? Or even better - is there a way to set up a callback that only prints the new data when the result object gets new text?
All help is greatly appreciated!
EDIT 3/4/2021
I tried a simple test using CopyToAsync by changing the code inside logOutput() to the following:
public async Task logOutput(IAsyncResult result, Stream stream, string logPath)
{
using (Stream fileStream = File.Open(logPath, FileMode.OpenOrCreate))
{
// While the result is running, copy everything from the command stream to a file
while (!result.IsCompleted)
{
await stream.CopyToAsync(fileStream);
}
}
}
However this results in the text files never getting data written to them, and CPU usage is actually slightly worse.
2ND EDIT 3/4/2021
Doing some more debugging, it appears the high CPU usage occurs only when there's no new data coming in. As far as I can tell, this is because the ReadLineAsync() method is constantly firing regardless of whether or not there's actually new data from the SSH command that's running, and it's running as fast as possible hogging all the CPU cycles it can. I'm not entirely sure why that is though, and could really use some help here. I would've assumed that ReadLineAsync() would simply wait until a new line was available from the SSH command to continue.

The solution ended up being much simpler than I would've thought.
There's a known bug in SSH.NET where the command's OutputStream will continually spit out null data when there's no actual new data recieved. This causes the while loop in my code to be running as fast as possible, consuming a bunch of CPU in the process.
The solution is simply to add a short asynchronous delay in the loop. I included the delay only when the recieved data is null, so that reading isn't interrupted when there's actual valid data coming through.
while (!result.IsCompleted && !token.IsCancellationRequested)
{
string line = await reader.ReadLineAsync();
// Append line if it's valid
if (string.IsNullOrEmpty(line))
{
await Task.Delay(10); // prevents high CPU usage
continue;
}
// Append line to textbox
textBox.Invoke((Action)(() => textBox.AppendText(line + Environment.NewLine)));
// Append line to file
writer.WriteLine(line);
}
On a Ryzen 5 3600, this brought my CPU usage from ~30-40% while the program was running to less than 1% even when data is flowing. Much better.

Execute code after reader has finished reading ENTIRE message

I would like to achieve the following result.
After the http message is fully read by the streamreader, I want to get the host of the request (which I don't think will be an issue) and start a tcp client to that host.
Code I currently have
Since the comment is in the while true loop it loops. But I thought readline() was blocking so it would only get executed once.
Does anyone have any suggestions on how I could solve this matter?

Consider using the following code, to avoid redundancy (peeking the StreamReader to only then use the value). There are reports in this site of people having the same issue with Peek.
while ((line = reader.ReadLine()) != null)
{
// Use the line
}

C# - Memory management in a app that is periodicaly calling HttpWebRequest and WebBrowser

About:
I have this Windows Form application which every 60 seconds it captures information from two common web pages, do some simple string treatment with the result and do something (or not) based in the result.
One of those sites doesn't have any protection, so I can easily get it's HTML code using HttpWebRequest and it's HttpWebResponse.GetResponseStream().
The other one has some code protection and I can't use the same approach. The solution was use the WebBrowser class to select all text of the site and copy to the clipboard, as Jake Drew posted here (method 1).
Extra information:
When the timer reachs 1 min, each method is asynchronously execuded using Task. At the end of each Task the main thread will search some information in those texts and take or not some decisions based in the result. After this process, not even the captured text will relevant anymore. Basically everything can be wipe out from memory, since I'll get everything new and process it in about 1 minute.
Problem:
Everything is working fine but the problem is that it's gradually increasing the memory usage (about 20mb for each ticking), which are unecessary as I said before I don't need to maintain data in running in memory more than I had in the begin of app's execution:
and after comparing two snapshots I've found these 3 objects. Apparently they're responsible for that excess of memory usage:
So, even after I put the main execution in Tasks and do everything I could to help the Garbage Collector, I still have this issue.
What else could I do to avoid this issue or dump the trash from memory??
Edit:
Here's the code that is capturing the HTML of the page using HttpWebRequest:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(URL);
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse()) {
if (response.StatusCode == HttpStatusCode.OK) {
Stream receiveStream = response.GetResponseStream();
StreamReader readStream = null;
if (response.CharacterSet == null) {
readStream = new StreamReader(receiveStream);
} else {
readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
}
PB_value = readStream.ReadToEnd();
readStream.Close(); //Ensure
}
response.Close(); //Ensure
}

Solved:
After some research I've found a solution. I actually feel kind of ashamed because it is a quite simple solution that I haven't tried before, still, it's important to share.
The first thing I did was create an Event to identify when my two Tasks were finished then I assigned two functions to this event. The first function forced the Garbage Collector (GC.Collect()). The second function disposed the two Tasks, since all the main processes were done inside them (T.Dispose()). Then I've got the result I wanted:

Out of Memory Exception when using File Stream Write Byte to Output Progress Through the Console

I have the following code that throws an out of memory exception when writing large files. Is there something I'm missing?
I am not sure why it is throwing an out of memory error as I thought the Filestream would only use a maximum of 4096 bytes for the buffer? I am not entirely sure what it means by the Buffer to be honest and any advice would be appreciated.
public static async Task CreateRandomFile(string pathway, int size, IProgress<int> prog)
{
byte[] fileSize = new byte[size];
new Random().NextBytes(fileSize);
await Task.Run(() =>
{
using (FileStream fs = File.Create(pathway,4096))
{
for (int i = 0; i < size; i++)
{
fs.WriteByte(fileSize[i]);
prog.Report(i);
}
}
}
);
}
public static void p_ProgressChanged(object sender, int e)
{
int pos = Console.CursorTop;
Console.WriteLine("Progress Copied: " + e);
Console.SetCursorPosition (0, pos);
}
public static void Main()
{
Console.WriteLine("Testing CopyLearning");
//CopyFile()
Progress<int> p = new Progress<int>();
p.ProgressChanged += p_ProgressChanged;
Task ta = CreateRandomFile(#"D:\Programming\Testing\RandomFile.asd", 99999999, p);
ta.Wait();
}
Edit: the 99,999,999 was just created to make a 99MB file
Note: I have commented out prog.Report(i) and it will work fine.
It seems for some reason, the error occurs at the line
Console.writeline("Progress Copied: " + e);
I am not entirely sure why this causes an error? So the error might have been caused because of the progressEvent?
Edit 2: I have followed advice to change the code such that it reports progress every 4000 Bytes by using the following:
if (i%4000==0)
prog.Report(i);
For some reason. I am now able to write files up to 900MBs fine.
I guess the question is, why would the "Edit 2"'s code allow it to write up to 900MB just fine? Is it because it's reporting progress and writing to the console up to 4000x less than before? I didn't realize the Console would take up so much memory especially because I'm assuming all it's doing is outputting "Progress Copied"?
Edit 3:
For some reason when I change the following line as follows:
for (int i = 0; i < size; i++)
{
fs.WriteByte(fileSize[i]);
Console.Writeline(i)
prog.Report(i);
}
where there is a "Console.Writeline()" before the prog.Report(i), it would work fine and copy the file, albeit take a very long time to do so. This leads me to believe that this is a Console related issue for some reason but I am not sure as to what.

fs.WriteByte(fileSize[i]);
prog.Report(i);
You created a fire-hose problem. After deadlocks and threading races, probably the 3rd most likely problem caused by threads. And just as hard to diagnose.
Easiest to see by using the debugger's Debug + Windows + Threads window and look at thread that is executing CreateRandomFile(). With some luck, you'll see it is completed and has written all 99MB bytes. But the progress reported on the console is far behind this, having only reported 125KB bytes written, give or take.
Core issue is the way Progress<>.Report() works. It uses SynchronizationContext.Post() to invoke the ProgressChanged event handler. In a console mode app that will call ThreadPool.QueueUserWorkItem(). That's quite fast, your CreateRandomFile() method won't be bogged down much by it.
But the event handler itself is quite a lot slower, console output is not very fast. So in effect, you are adding threadpool work requests at an enormous rate, 99 million of them in a handful of seconds. No way for the threadpool scheduler to keep up, you'll have roughly 4 of them executing at the same time. All competing to write to the console as well, only one of them can acquire the underlying lock.
So it is the threadpool scheduler that causes OOM, forced to store so many work requests.
And sure, when you call Report() less frequently then the fire-hose problem is a lot less worse. Not actually that simple to ensure it never causes a problem, although directly calling Console.Write() is an obvious fix. Ultimately simple, create a usable UI that is useful to a human. Nobody likes a crazily scrolling window or a blur of text. Reporting progress no more frequently than 20 times per second is plenty good enough for the user's eyes, the console has no trouble keeping up with that.

StreamReader ReadLine returns null when not EOF

Having a strange problem that I've never encountered nor heard of happening. It seems that occasionally, the ReadLine() function of the StreamReader class will return NULL, as if it's at the end of the file, BUT it's not.
My log file indicates that everything is happening just as if it had actually reached the end of the file, but yet it's only processing part of it. There doesn't appear to be any consistency, because if I restart the process from scratch, the whole file is processed without incident.
Clearly, there is nothing funky in the file itself, or it would do this on the same line each time, plus it has happened with a few different files, and each time they are re-run, it works fine.
Anyone run across anything similar, or have any suggestions on what might be causing such a thing?
Thanks,
Andrew
Sample:
line = _readerStream.ReadLine();
if (null != line)
{
eventRetVal = loadFileLineEvent(line);
}
else
{
// do some housecleaning and log file completed
}
_readerStream is the stream which has been opened elsewhere.
loadFileLineEvent is a delegate that gets passed in which processes the line. This has its own error handling (with logging), so there's no issue in there.
The routine above (not shown in its entirety) has error handling around it also (with logging), which is not being triggered either.
It's getting to the "else" and logging that it reached the end of the file, but it's obvious from the number of records I got that it didn't.

Have you tried a more traditional approach to reading the stream? This way your checking for the end of the stream before reading the next potentially empty/null line. Seems like your code should work, but with a possible null exception thrown for trying to read a line that doesn't exists(not sure if SR throws for that though).
using (StreamReader SR = new StreamReader(OFD.FileName))
{
while (!SR.EndOfStream)
{
string CurrentLine = SR.ReadLine();
var eventRetVal = loadFileLineEvent(CurrentLine);
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.