Slow ReadLine by `\n` on BinaryReader - c#

I am using a BinaryReader to read a file and split by new line \n into ReadOnlySpan<byte> (to add context I want bytes and not strings as I am using Utf8JsonReader and trying to avoid copying from string to byte array).
There is a reason for the large buffer it is deliberate - 16kB is OK for the application and is processed one buffer at a time.
However compared to File.ReadAllBytes(filename) which completes in 1 second, the code below takes 30+ seconds on the same machine.
I am naively assuming BinaryReader would be reading forward and caching in advance - seems not the case or at least not using any flags for this (I can't seem to fine any).
How can i improve my performance, or implement the line splitting via an alternative class?
static void Main(string[] args)
{
using var fileStream = File.Open(args[0], FileMode.Open);
using (var reader = new BinaryReader(fileStream))
{
var i = 0;
ReadOnlySpan<byte> line = null;
while ((line = reader.ReadLine()) != null)
{
// Process the line here, one at a time.
i++;
}
Console.WriteLine("Read line " + i);
}
}
public static class BinaryReaderExtensions
{
public static ReadOnlySpan<byte> ReadLine(this BinaryReader reader)
{
if (reader.IsEndOfStream())
return null;
// Buffer size is deliberate, we process one line at a time.
var buffer = new byte[16384];
var i = 0;
while (!reader.IsEndOfStream() && i < buffer.Length)
{
if((buffer[i] = reader.ReadByte()) == '\n')
return new ReadOnlySpan<byte>(buffer, 0, i + 1);
i++;
}
return null;
}
public static bool IsEndOfStream(this BinaryReader reader)
{
return reader.BaseStream.Position == reader.BaseStream.Length;
}
}

Related

How to remove lag during data fetching from large text files from my code in C#?

I have a text file consisting of 21000 lines, i have an attribute and i need to search it in the .txt file and need to return a value from the same too. All code is done and tried async plus new thread but there is a five second lag during the button click . how can i remove the lag.
Tried on new unity and C#
public async void read()
{
string[] lines = await ReadAllLinesAsync("Assets/Blockchain Module/" + File + ".csv");
fields = null;
for (int j = 0; j < lines.Length; j++)
{
fields = lines[j].Split(',');
x[j] = System.Convert.ToDouble(fields[1]);
y[j] = System.Convert.ToDouble(fields[2]);
z[j] = System.Convert.ToDouble(fields[3]);
temp[j] = System.Convert.ToDouble(fields[4]);
}
}
public void Start()
{
Thread thread = new Thread(read);
thread.Start();
//gradient.Evaluate()
//var main = particleSystem.main;
//main.maxParticles = 200;
}
private const FileOptions DefaultOptions = FileOptions.Asynchronous | FileOptions.SequentialScan;
public static Task<string[]> ReadAllLinesAsync(string path) => ReadAllLinesAsync(path, Encoding.UTF8);
public static async Task<string[]> ReadAllLinesAsync(string path, Encoding encoding)
{
var lines = new List<string>();
// Open the FileStream with the same FileMode, FileAccess
// and FileShare as a call to File.OpenText would've done.
using (var stream = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read, DefaultBufferSize, DefaultOptions))
using (var reader = new StreamReader(stream, encoding))
{
string line;
while ((line = await reader.ReadLineAsync()) != null)
{
lines.Add(line);
}
}
return lines.ToArray();
}
Reading and parsing the file apparently costs 5 seconds on your system. I don't think reading it line by line is the fastest approach, but anyway, don't parse the file for each request.
Read it once on application startup, and cache it in an appropriate data type.
In general if your search is line based is better to read line one by one instead to read all the file:
using (StreamReader reader = new StreamReader("filename"))
{
while (true)
{
string line = await reader.ReadLineAsync();
if (line == null)
{
break;
}
//logic here...
}
}

ReadingLine in SR, and accessing lastline with index -1

i am writing atm, a streamreader tool, and have following problem. I read a line by ReadLine(); Then the stream, continues with the next line. But i need information about the last character (especially if it is a NewLine or a Linefeed) from the line before.
This is my approach:
I tried several approaches, with readblock or so. But it seems, that the stream itself, does not allow me, to get back in the position, to parse the elements i needed.
off = 0;
FileStream stream = new FileStream(filename, FileMode.Open);
using (StreamReader content = new StreamReader(stream, Encoding.UTF8))
{
String s = "";
while ((s = content.ReadLine()) != null)
{
content.BaseStream.Seek((a == 0)? off: off - 1, SeekOrigin.Begin);
//content.BaseStream.Seek(off, SeekOrigin.Current);
var c=content.Peek();
char b = (char)c;
data = s;
maxlist.Add(data.Length);
if (data != null)
{
offset = offset + (data.Length)+2;
offsetindex.Add(offset);
}
a++;
off = off + data.Length - 2;
}
content.Close();
}
The expected Output should be, that i can access, the line above, after the Readline is called.So i can read with ReadBlock, the last elements, that i needed for exact positioning in the stream.

How do I replicate the functionality of tail -f in C# [duplicate]

I want to read file continuously like GNU tail with "-f" param. I need it to live-read log file.
What is the right way to do it?
More natural approach of using FileSystemWatcher:
var wh = new AutoResetEvent(false);
var fsw = new FileSystemWatcher(".");
fsw.Filter = "file-to-read";
fsw.EnableRaisingEvents = true;
fsw.Changed += (s,e) => wh.Set();
var fs = new FileStream("file-to-read", FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
using (var sr = new StreamReader(fs))
{
var s = "";
while (true)
{
s = sr.ReadLine();
if (s != null)
Console.WriteLine(s);
else
wh.WaitOne(1000);
}
}
wh.Close();
Here the main reading cycle stops to wait for incoming data and FileSystemWatcher is used just to awake the main reading cycle.
You want to open a FileStream in binary mode. Periodically, seek to the end of the file minus 1024 bytes (or whatever), then read to the end and output. That's how tail -f works.
Answers to your questions:
Binary because it's difficult to randomly access the file if you're reading it as text. You have to do the binary-to-text conversion yourself, but it's not difficult. (See below)
1024 bytes because it's a nice convenient number, and should handle 10 or 15 lines of text. Usually.
Here's an example of opening the file, reading the last 1024 bytes, and converting it to text:
static void ReadTail(string filename)
{
using (FileStream fs = File.Open(filename, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
// Seek 1024 bytes from the end of the file
fs.Seek(-1024, SeekOrigin.End);
// read 1024 bytes
byte[] bytes = new byte[1024];
fs.Read(bytes, 0, 1024);
// Convert bytes to string
string s = Encoding.Default.GetString(bytes);
// or string s = Encoding.UTF8.GetString(bytes);
// and output to console
Console.WriteLine(s);
}
}
Note that you must open with FileShare.ReadWrite, since you're trying to read a file that's currently open for writing by another process.
Also note that I used Encoding.Default, which in US/English and for most Western European languages will be an 8-bit character encoding. If the file is written in some other encoding (like UTF-8 or other Unicode encoding), It's possible that the bytes won't convert correctly to characters. You'll have to handle that by determining the encoding if you think this will be a problem. Search Stack overflow for info about determining a file's text encoding.
If you want to do this periodically (every 15 seconds, for example), you can set up a timer that calls the ReadTail method as often as you want. You could optimize things a bit by opening the file only once at the start of the program. That's up to you.
To continuously monitor the tail of the file, you just need to remember the length of the file before.
public static void MonitorTailOfFile(string filePath)
{
var initialFileSize = new FileInfo(filePath).Length;
var lastReadLength = initialFileSize - 1024;
if (lastReadLength < 0) lastReadLength = 0;
while (true)
{
try
{
var fileSize = new FileInfo(filePath).Length;
if (fileSize > lastReadLength)
{
using (var fs = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
fs.Seek(lastReadLength, SeekOrigin.Begin);
var buffer = new byte[1024];
while (true)
{
var bytesRead = fs.Read(buffer, 0, buffer.Length);
lastReadLength += bytesRead;
if (bytesRead == 0)
break;
var text = ASCIIEncoding.ASCII.GetString(buffer, 0, bytesRead);
Console.Write(text);
}
}
}
}
catch { }
Thread.Sleep(1000);
}
}
I had to use ASCIIEncoding, because this code isn't smart enough to cater for variable character lengths of UTF8 on buffer boundaries.
Note: You can change the Thread.Sleep part to be different timings, and you can also link it with a filewatcher and blocking pattern - Monitor.Enter/Wait/Pulse. For me the timer is enough, and at most it only checks the file length every second, if the file hasn't changed.
This is my solution
static IEnumerable<string> TailFrom(string file)
{
using (var reader = File.OpenText(file))
{
while (true)
{
string line = reader.ReadLine();
if (reader.BaseStream.Length < reader.BaseStream.Position)
reader.BaseStream.Seek(0, SeekOrigin.Begin);
if (line != null) yield return line;
else Thread.Sleep(500);
}
}
}
so, in your code you can do
foreach (string line in TailFrom(file))
{
Console.WriteLine($"line read= {line}");
}
You could use the FileSystemWatcher class which can send notifications for different events happening on the file system like file changed.
private void button1_Click(object sender, EventArgs e)
{
if (folderBrowserDialog.ShowDialog() == DialogResult.OK)
{
path = folderBrowserDialog.SelectedPath;
fileSystemWatcher.Path = path;
string[] str = Directory.GetFiles(path);
string line;
fs = new FileStream(str[0], FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
tr = new StreamReader(fs);
while ((line = tr.ReadLine()) != null)
{
listBox.Items.Add(line);
}
}
}
private void fileSystemWatcher_Changed(object sender, FileSystemEventArgs e)
{
string line;
line = tr.ReadLine();
listBox.Items.Add(line);
}
If you are just looking for a tool to do this then check out free version of Bare tail

Convert hex string to bytearray and write to file

I wrote one app using C# to read data from a serial port and show the data on a textbox in hex string format. Finally, I saved all data to a binary file. If data is big (maybe > 20mb) it throws an out of memory error. How can I solve this? Here is my code:
private void btn_Save_Click(object sender, EventArgs e)
{
SaveFileDialog save_log = new SaveFileDialog();
save_log.DefaultExt = ".bin";
save_log.Filter = "Binary File (*.bin)|*.bin";
// Determine if the user selected a file name from the saveFileDialog.
if (save_log.ShowDialog() == System.Windows.Forms.DialogResult.OK &&
save_log.FileName.Length > 0)
{
try
{
string hexString = Content.Text.ToString();
FileStream stream = new FileStream(save_log.FileName, FileMode.Create, FileAccess.ReadWrite);
stream.Write(Hex_to_ByteArray(hexString), 0, Hex_to_ByteArray(hexString).Length);
stream.Close();
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
}
private byte[] Hex_to_ByteArray(string s)
{
s = s.Replace(" ", "");
byte[] buffer = new byte[s.Length / 2];
for (int i = 0; i < s.Length; i += 2)
{
buffer[i / 2] = (byte)Convert.ToByte(s.Substring(i, 2), 16);
}
return buffer;
}
You're creating the byte array twice. Also, the .Replace over such a long string doesn't help. You can avoid all this:
try
{
var stream = new FileStream(
save_log.FileName,
FileMode.Create,
FileAccess.ReadWrite);
WriteHexStringToFile(Content.Text, stream);
stream.Close();
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
private void WriteHexStringToFile(string hexString, FileStream stream)
{
var twoCharacterBuffer = new StringBuilder();
var oneByte = new byte[1];
foreach (var character in hexString.Where(c => c != ' '))
{
twoCharacterBuffer.Append(character);
if (twoCharacterBuffer.Length == 2)
{
oneByte[0] = (byte)Convert.ToByte(twoCharacterBuffer.ToString(), 16);
stream.Write(oneByte, 0, 1);
twoCharacterBuffer.Clear();
}
}
}
Also, take a look at Encoding and/or BinaryFormatter which might do all of this for you.
Update:
First of all, please note that your whole idea of storing megabytes of data in a string is a nonsense, and you shouldn't do that. You should process your data in smaller parts. Because of this nonsense I'm unable to provide you with a working demo (on IDEONE for example), because of resource limitations of online compilers. I've tested the code on my machine, and as you can see I can even process 50 MB strings - but it all depends on the amount of memory you have available. If you do such things, then on every machine it will be easy to reach the limit of available memory. And the methods you ask about in this particular question are irrelevant - the problem is because you fill your memory with tons of data in your Content.Text string. When memory is almost full, the OutOfMemoryException can occur at almost any place in your code.
You can view the whole picture in your browser to see all the details.
Use IEnumerable. That will avoid the large byte array.
I don't know what is in Content.Text. If it's a byte array, maybe you can change
static internal IEnumerable<byte>Hex_to_Byte(string s)
into
static internal IEnumerable<byte>Hex_to_Byte(byte[] bytes)
and modify the code a bit
FileStream stream = new FileStream(save_log.FileName, FileMode.Create, FileAccess.ReadWrite);
foreach( byte b in Hex_to_Byte(hexString) )
stream.WriteByte(b);
stream.Close();
static internal IEnumerable<byte>Hex_to_Byte(string s)
{
bool haveFirstByte = false;
int firstByteValue = 0;
foreach( char c in s)
{
if( char.IsWhiteSpace(c))
continue;
if( !haveFirstByte)
{
haveFirstByte = true;
firstByteValue = GetHexValue(c) << 4;
}
else
{
haveFirstByte = false;
yield return unchecked((byte)(firstByteValue + GetHexValue(c)));
}
}
}
static string hexChars = "0123456789ABCDEF";
static int GetHexValue(char c)
{
int v = hexChars.IndexOf(char.ToUpper(c));
if (v < 0)
throw new ArgumentOutOfRangeException("c", string.Format("not a hex char: {0}"));
return v;
}

How can I use the DeflateStream class on one line in a file?

I have a file which contains plaintext mixed in with some compressed text, for example:
Version 01
Maker SomeCompany
l 73
mark
h�22V0P���w�/�+Q0���L)�66□ // This line was compressed using DeflateZLib
endmark
It seems that Microsoft has a solution, the DeflateStream class, but their examples show how to use it on an entire file, whereas I can't figure out how to just use it on one line in my file.
So far I have the following:
bool isDeflate = false;
using (var fs = new FileStream(#"C:\Temp\MyFile.dat", FileMode.Open)
using (var reader = new StreamReader(fs))
{
string line;
while ((line = reader.ReadLine()) != null)
{
if (isDeflate)
{
if (line == "endmark")
{
isDeflate = false;
}
else
{
line = DeflateSomehow(line);
}
}
if (line == "mark")
{
isDeflate = true;
}
Console.WriteLine(line);
}
}
public string DeflateSomehow(string line)
{
// How do I deflate just that string?
}
Since the file is not created by me (we're only reading it in), we have no control over its structure... but, I'm not tied down to the code I have right now. If I need to change more of it than simply figuring out how to implement the DeflateSomehow method, than I'm fine with that as well.
A deflate stream works on binary data. An arbitrary binary chunk in the middle of a text file is also known as: a corrupt text file. There is no sane way of decoding this:
you can't read "lines", because there is no definition of a "line" when talking about binary data; any combination of CR/LF/CRLF/etc could occur completely by random in the binary data
you can't read a "string line", because that suggests you are running the data through an Encoding; but since this isn't text data, again: that will simply give you gibberish that cannot be processed (it will have lost data when reading)
Now, the second of these two problems is solvable by reading via the Stream API rather than the StreamReader API, so that you are only ever reading binary; you would then need to look for the line endings yourself, using an Encoding to probe what you can (noting that this isn't as simple as it sounds if you are using multi/variable-byte encodings such as UTF-8).
However, the first of these two problems is inherently not solvable by itself. To do this reliably, you would need some kind of binary framing protocol - which again, does not exist in a text file. It looks like the example is using "mark" and "endmark" - again, there is technically a chance that these would occur at random, but you'll probably get away with it for the 99.999% case. The trick, then, would be to read the entire file manually using Stream and Encoding, looking for "mark" and "endmark" - and stripping the bits that are encoded as text from the bits that are compressed data. Then run the encoded-as-text piece through the correct Encoding.
However! At the point when you are reading binary, then it is simple: you simply buffer the right amount (using whatever framing/sentinel protocol the data is written in), and use something like:
using(var ms = new MemoryStream(bytes))
using(var inflate = new GZipStream(ms, CompressionMode.Decompress))
{
// now read from 'inflate'
}
With the addition of the l 73 marker, and the information that it is ASCII, it becomes a little more viable.
This won't work for me because the data here on SO is already corrupted (posting binary as text does that), but basically something like:
using System;
using System.Collections.Generic;
using System.IO;
using System.IO.Compression;
using System.Text;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
using (var file = File.OpenRead("my.txt"))
using (var buffer = new MemoryStream())
{
List<string> lines = new List<string>();
string line;
while ((line = ReadToCRLF(file, buffer)) != null)
{
lines.Add(line);
Console.WriteLine(line);
if (line == "mark" && lines.Count >= 2)
{
var match = Regex.Match(lines[lines.Count - 2], "^l ([0-9]+)$");
int bytes;
if (match.Success && int.TryParse(match.Groups[1].Value, out bytes))
{
ReadBytes(file, buffer, bytes);
string inflated = Inflate(buffer);
lines.Add(inflated); // or something similar
Console.WriteLine(inflated);
}
}
}
}
}
static string Inflate(Stream source)
{
using (var deflate = new DeflateStream(source, CompressionMode.Decompress, true))
using (var reader = new StreamReader(deflate, Encoding.ASCII))
{
return reader.ReadToEnd();
}
}
static void ReadBytes(Stream source, MemoryStream buffer, int count)
{
buffer.SetLength(count);
int read, offset = 0;
while (count > 0 && (read = source.Read(buffer.GetBuffer(), offset, count)) > 0)
{
count -= read;
offset += read;
}
if (count != 0) throw new EndOfStreamException();
buffer.Position = 0;
}
static string ReadToCRLF(Stream source, MemoryStream buffer)
{
buffer.SetLength(0);
int next;
bool wasCr = false;
while ((next = source.ReadByte()) >= 0)
{
if(next == 10 && wasCr) { // CRLF
// end of line (minus the CR)
return Encoding.ASCII.GetString(
buffer.GetBuffer(), 0, (int)buffer.Length - 1);
}
buffer.WriteByte((byte)next);
wasCr = next == 13;
}
// end of file
if (buffer.Length == 0) return null;
return Encoding.ASCII.GetString(buffer.GetBuffer(), 0, (int)buffer.Length);
}
}

Categories

Resources