How to properly open and read from a StorageFile multiple times? - c#

In Windows Phone 8.1 (WinRT) I'm grabbing a file from the user's document folder and trying to read through it twice. Once to read each line and get a count of total line for progress tracking purposes. And the second time to actually parse the data. However, on the second pass I get a "File is not readable" type error. So I have a small understanding of what's going on but not entirely. Am I getting this error because the stream of the file is already at the end of the file? Can't I just open a new stream from the same file, or do I have to close the first stream?
Here's my code:
public async Task UploadBerData(StorageFile file)
{
_csvParser = new CsvParser();
var stream = await file.OpenAsync(FileAccessMode.Read);
using (var readStream = stream.AsStreamForRead())
{
dataCount = _csvParser.GetDataCount(stream.AsStreamForRead());
// Set the progressBar total to 2x dataCount.
// Once for reading, twice for uploading data
TotalProgress = dataCount * 2;
CurrentProgress = 0;
}
var csvData = _csvParser.GetFileData(stream.AsStreamForRead());
...
}

After using the Stream, the position is the end of stream length.
You can set it to beginning to read stream again.
Add following line before your parse data function.
stream.Position = 0;

Related

How can I save a stream of Json data to a text file in C# Windows Forms app?

I've got a stream of data incoming as a Json file and I'm trying to save it to a text file, I've got it working here below however, when i check the file, it only has the last Json message received saved, I am trying to get it so that once it saves a line it goes onto a new line and prints the latest Json message below. at the moment it will print let's say 1000 lines but they are all the same and they match the latest Json received.
Any help would be much appreciated.
void ReceiveData() //This function is used to listen for messages from the flight simulator
{
while (true)
{
NetworkStream stream = client.GetStream(); //sets the network stream to the client's stream
byte[] buffer = new byte[256]; //Defines the max amount of bytes that can be sent
int bytesRead = stream.Read(buffer, 0, buffer.Length);
if (bytesRead > 0)
{
string jsonreceived = Encoding.ASCII.GetString(buffer, 0, bytesRead); //Converts the received data into ASCII for the json variable
JavaScriptSerializer serializer = new JavaScriptSerializer();
TelemetryUpdate telemetry = serializer.Deserialize<TelemetryUpdate>(jsonreceived);
this.Invoke(new Action(() => { TelemetryReceivedLabel.Text = jsonreceived;
})) ;
Updatelabels(telemetry); //runs the update labels function with the telemetry data as an argument
File.Delete(#"c:\temp\BLACKBOX.txt"); // this deletes the original file
string path = #"c:\temp\BLACKBOX.txt"; //this stores the path of the file in a string
using (StreamWriter sw = File.CreateText(path)) // Create a file to write to.
{
for (int i = 0; i<10000; i++)
{
sw.Write(jsonreceived.ToString()); //writes the json data to the file
}
}
}
}
}
As per the .NET documentation for File.CreateText:
Creates or opens a file for writing UTF-8 encoded text. If the file already exists, its contents are overwritten.
So, every time you call File.CreateText you're creating a new StreamWriter that's going to overwrite the contents of your file. Try using File.AppendText instead to pick up where you left off.

Skip First Row (CSV Header Row) of HttpResponseMessage Content.ReadAsStream

Below is a simplified example of a larger piece of code. Basically I'm calling one or more API endpoints and downloading a CSV file that gets written to an Azure Blob Container. If there's multiple files, the blob is appended for every new csv file loaded.
The issue is when I append the target blob I ended up with a multiple header rows scattered throughout the file depending on how may CSVs I consumed. All the CSVs have the same header row and I know the first row will always have a line feed. Is there a way to read the stream, skip the content until after the first line feed and then copy the stream to the blob?
It seemed simple in my head, but I'm having trouble finding my way there code-wise. I don't want to wait for the whole file to download and then in-memory delete the header row since some of these files can be several gigabytes.
I am using .net core v6 if that helps
using Stream blobStream = await blockBlobClient.OpenWriteAsync(true);
{
for (int i = 0; i < 3; i++)
{
using HttpResponseMessage response = await client.GetAsync(downloadUrls[i], HttpCompletionOption.ResponseHeadersRead);
Stream sourceStream = response.Content.ReadAsStream();
sourceStream.CopyTo(blobStream);
}
}
.CopyTo copies from the current position in the stream. So all you need to do is throw away all the characters until you have thrown away the first CR or Line Feed.
using Stream blobStream = await blockBlobClient.OpenWriteAsync(true);
{
for (int i = 0; i < 3; i++)
{
using HttpResponseMessage response = await client.GetAsync(downloadUrls[i], HttpCompletionOption.ResponseHeadersRead);
Stream sourceStream = response.Content.ReadAsStream();
if (i != 0)
{
char c;
do { c = (char)sourceStream.ReadByte(); } while (c != '\n');
}
sourceStream.CopyTo(blobStream);
}
}
If all the files always have the same size header row, you can come up with a constant for its length. That way you could just skip the stream to the exact correct location like this:
using Stream blobStream = await blockBlobClient.OpenWriteAsync(true);
{
for (int i = 0; i < 3; i++)
{
using HttpResponseMessage response = await client.GetAsync(downloadUrls[i], HttpCompletionOption.ResponseHeadersRead);
Stream sourceStream = response.Content.ReadAsStream();
if (i != 0)
sourceStream.Seek(HeaderSizeInBytes, SeekOrigin.Begin);
sourceStream.CopyTo(blobStream);
}
}
This will be slightly quicker but does have the downside that the files can't change format easily in the future.
P.S. You probably want to Dispose sourceStream. Either directly or by wrapping its creation in a using statement.
If we can assume that stream contains UTF 8 encoded text then you can do the following:
Create a streamReader against sourceStream
var reader = new StreamReader(sourceStream);
Read the first line (assumed the lines ends with \n)
var header = reader.ReadLine();
Convert the first line + a \n to byte array
var headerInBytes = Encoding.UTF8.GetBytes(header + Environment.NewLine);
Set the position after the first line
sourceStream.Position = headerInBytes.Length;
Copy the source stream from the desired position
sourceStream.CopyTo(blobStream);
This proposed solution is just an example, depending on the actual stream content you might need to further adjust it and make it more robust.

Filestream and datagridview memory issue with CsvHelper

TL;DR
Reading and modifying flat files within memory before passing to CsvHelper to process as normal (within stream)
Process works fine when tested on records ~32k, run multiple times
Process works only once when run on 5m+ record, then fails if you try and run it a second time
System.OutOfMemoryException error thrown
Linked to this post:
CsvHelper - Set the header row and data row
New question since I've come up with a potential solution that deviates from the original post. But am now facings a different issue.
So I amended the test sample data as follows (I added a pipe in row 7):
This is a random line in the file
SOURCE_ID|NAME|START_DATE|END_DATE|VALUE_1|VALUE_2
Another random line in the file
|
GILBER|FRED|2019-JAN-01|2019-JAN-31|ABC|DEF
ALEF|ABC|2019-FEB-01|2019-AUG-31|FBC|DGF
GILBER|FRED|2019-JAN-01|2019-JAN-31|ABC|TEF
FLBER|RED|2019-JUN-01|2019-JUL-31|AJC|DEH
GI|JOE|2020-APR-01|2020-DEC-31|GBC|DER
I decided to try and manipulate the inbound file in memory and then pass that stream into CsvHelper to process.
I ended up with the below code:
// Using BufferdStream for speed
// https://stackoverflow.com/questions/2161895/reading-large-text-files-with-streams-in-c-sharp
// Read from memory stream
// https://stackoverflow.com/questions/1232443/writing-to-then-reading-from-a-memorystream
int header_row = 3; //row the header is on
int data_row = 10; //row the data starts from
using (FileStream fs = File.Open(filepath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (BufferedStream bs = new BufferedStream(fs))
using (var stream = new MemoryStream())
using (StreamWriter sw = new StreamWriter(stream))
using (StreamReader sr = new StreamReader(bs))
{
string line;
int i = 0;
while ((line = sr.ReadLine()) != null)
{
i++;
if (i < header_row) // check if the line is less than the header row, if yes ignore
continue;
if (i > header_row && i < data_row) // check if the line is between the header row and start of the data, if yes ignore
continue;
{
// write to stream if all conditions pass
sw.WriteLine(line);
sw.Flush();
}
}
sw.Flush();
stream.Position = 0; //reset position
// continue using CsvHelper as before, feeding in the 'stream' from memory rather than a file
using (var reader = new StreamReader(stream))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
csv.Configuration.Delimiter = "|"; // Set delimiter
// Load csv to datatable and set dgv source
using (var dr = new CsvDataReader(csv))
{
var dt = new DataTable();
dt.Load(dr);
dgvTst04_View.DataSource = dt; // EXECPTION IS THROWN HERE
}
}
}
And I get the below result in the datagridview:
Sample file test result
So this works!!
But when I try and implement the same code on a csv file with 5m+ records, it runs once okay (~24s - which is about the same as if I would import it directly into CsvHelper with no other pre-manipulation). But when I try and run it a second time it throws a System.OutOfMemoryException error.
For context, I have 64GB of memory and the process seems to peak at 2GB usage (but it doesn't drop). So I feel like the 'using' is not disposing of the memory/variables correctly? As I had assumed it would come back down after running. Before and after screenshots of diagnostics below:
Before running:
Diagnostics before run
After running:
Diagnostics after run
Am I not handling the variables correctly in my code or not disposing of them? Although I though that if I use 'using' I shouldn't have to dispose of them manually.
Additional info:
I ran the same code on a file with 32k+ rows of data multiple times within the same session (10+), with a similar header/data row structure and it runs in 27 milliseconds on average and there are no 'System.OutOfMemoryException' errors thrown.
Let me know if you would like the 5m record sample file (it's a sample file that I found online on the NZ governments website, so it's public information).
thanks!

Stream.CopyToAsync is empty after first iteration

Background: I need to relay the content of the request to multiple other servers (via client.SendAsync(request)).
Problem: After first request the content stream is empty
[HttpPost]
public async Task<IActionResult> PostAsync() {
for (var n = 0; n <= 1; n++) {
using (var stream = new MemoryStream()) {
await Request.Body.CopyToAsync(stream);
// why is stream.length == 0 in the second iteration?
}
}
return StatusCode((int)HttpStatusCode.OK);
}
Streams have a pointer indicating at which position the stream is; after copying it, the pointer is at the end. You need to rewind a stream by setting its position to 0.
This is however only supported in streams that support seeking. You can read the request stream only once. This is because it's read "from the wire", and therefore doesn't support seeking.
When you want to copy the request stream to multiple output streams, you have two options:
Forward while you read
Read once into memory, then forward at will
The first option means all forwards happen at the same speed; the entire transfer goes as slow as the input, or as slow as the slowest reader. You read a chunk from the caller, and forward that chunk to all forward addresses.
For the second approach, you'll want to evaluate whether you can hold the entire request body plus the body for each forward address in memory. If that's not expected to be a problem and properly configured with sensible limits, then simply copy the request stream to a single MemoryStream and copy and rewind that one after every call:
using (var bodyStream = new MemoryStream())
{
await Request.Body.CopyToAsync(bodyStream);
for (...)
{
using (var stream = new MemoryStream())
{
await bodyStream.CopyToAsync(stream);
// Rewind for next copy
bodyStream.Position = 0;
}
}
}
I found out that the CopyToAsync function sets the origin stream position to the last read position. The next time I use CopyToAsync the stream starts reading from the last read position and does not find more content. However I could not use Request.Body.Position = 0 since it is not supported. I ended up copying the stream once more and reset the position after each copy.
If someone knows a cleaner solution you are welcome to point it out.
using (var contentStream = new MemoryStream()) {
await Request.Body.CopyToAsync(contentStream);
for (var n = 0; n <= 1; n++) {
using (var stream = new MemoryStream()) {
contentStream.Position = 0;
await contentStream.CopyToAsync(stream);
// works
}
}
}

How to read text file from memorystream without missing bytes

I am writing some code to learn new c# async design patterns. So I thought writing a small windows forms program that counts lines and words of text files and display the reading progress.
Avoiding disk swapping, I read files into a MemoryStream and then build a StreamReader to read text by lines and count.
The issue is I can`t update the progressbar right.
I read a file but always there are bytes missing, so the progressbar doesn't fill entirely.
Need a hand or a idea to achieve this. Thanks
public async Task Processfile(string fname)
{
MemoryStream m;
fname.File2MemoryStream(out m); // custom extension which read file into
// MemoryStream
int flen = (int)m.Length; // store File length
string line = string.Empty; // used later to read lines from streamreader
int linelen = 0; // store current line bytes
int readed = 0; // total bytes read
progressBar1.Minimum = 0; // progressbar bound to winforms ui
progressBar1.Maximum = flen;
using (StreamReader sr = new StreamReader(m)) // build streamreader from ms
{
while ( ! sr.EndOfStream ) // tried ( line = await sr.ReadLineAsync() ) != null
{
line = await sr.ReadLineAsync();
await Task.Run(() =>
{
linelen = Encoding.UTF8.GetBytes(line).Length; // get & update
readed += linelen; // bytes read
// custom function
Report(new Tuple<int, int>(flen, readed)); // implements Iprogress
// to feed progress bar
});
}
}
m.Close(); // releases MemoryStream
m = null;
}
The total length being assigned to flen includes the carriage returns of each line. The ReadLineAsync() function returns a string that does not include the carriage return. My guess is that the amount of missing bytes in your progress bar is directly proportional to the amount of carriage returns in the file being read.

Categories

Resources