i am writing atm, a streamreader tool, and have following problem. I read a line by ReadLine(); Then the stream, continues with the next line. But i need information about the last character (especially if it is a NewLine or a Linefeed) from the line before.
This is my approach:
I tried several approaches, with readblock or so. But it seems, that the stream itself, does not allow me, to get back in the position, to parse the elements i needed.
off = 0;
FileStream stream = new FileStream(filename, FileMode.Open);
using (StreamReader content = new StreamReader(stream, Encoding.UTF8))
{
String s = "";
while ((s = content.ReadLine()) != null)
{
content.BaseStream.Seek((a == 0)? off: off - 1, SeekOrigin.Begin);
//content.BaseStream.Seek(off, SeekOrigin.Current);
var c=content.Peek();
char b = (char)c;
data = s;
maxlist.Add(data.Length);
if (data != null)
{
offset = offset + (data.Length)+2;
offsetindex.Add(offset);
}
a++;
off = off + data.Length - 2;
}
content.Close();
}
The expected Output should be, that i can access, the line above, after the Readline is called.So i can read with ReadBlock, the last elements, that i needed for exact positioning in the stream.
Related
I am using a BinaryReader to read a file and split by new line \n into ReadOnlySpan<byte> (to add context I want bytes and not strings as I am using Utf8JsonReader and trying to avoid copying from string to byte array).
There is a reason for the large buffer it is deliberate - 16kB is OK for the application and is processed one buffer at a time.
However compared to File.ReadAllBytes(filename) which completes in 1 second, the code below takes 30+ seconds on the same machine.
I am naively assuming BinaryReader would be reading forward and caching in advance - seems not the case or at least not using any flags for this (I can't seem to fine any).
How can i improve my performance, or implement the line splitting via an alternative class?
static void Main(string[] args)
{
using var fileStream = File.Open(args[0], FileMode.Open);
using (var reader = new BinaryReader(fileStream))
{
var i = 0;
ReadOnlySpan<byte> line = null;
while ((line = reader.ReadLine()) != null)
{
// Process the line here, one at a time.
i++;
}
Console.WriteLine("Read line " + i);
}
}
public static class BinaryReaderExtensions
{
public static ReadOnlySpan<byte> ReadLine(this BinaryReader reader)
{
if (reader.IsEndOfStream())
return null;
// Buffer size is deliberate, we process one line at a time.
var buffer = new byte[16384];
var i = 0;
while (!reader.IsEndOfStream() && i < buffer.Length)
{
if((buffer[i] = reader.ReadByte()) == '\n')
return new ReadOnlySpan<byte>(buffer, 0, i + 1);
i++;
}
return null;
}
public static bool IsEndOfStream(this BinaryReader reader)
{
return reader.BaseStream.Position == reader.BaseStream.Length;
}
}
In native C#, how can I read from the end of a file?
This is pertinent because I need to read a log file, and it doesn't make sense to read 10k, to read the last 3 lines.
To read the last 1024 bytes:
using (var reader = new StreamReader("foo.txt"))
{
if (reader.BaseStream.Length > 1024)
{
reader.BaseStream.Seek(-1024, SeekOrigin.End);
}
string line;
while ((line = reader.ReadLine()) != null)
{
Console.WriteLine(line);
}
}
Maybe something like this will work for you:
using (var fs = File.OpenRead(filePath))
{
fs.Seek(0, SeekOrigin.End);
int newLines = 0;
while (newLines < 3)
{
fs.Seek(-1, SeekOrigin.Current);
newLines += fs.ReadByte() == 13 ? 1 : 0; // look for \r
fs.Seek(-1, SeekOrigin.Current);
}
byte[] data = new byte[fs.Length - fs.Position];
fs.Read(data, 0, data.Length);
}
Take note that this assumes \r\n.
The code below uses a random-access FileStream to seed a StreamReader at an offset near the end of the file, discarding the first read line since it is most likely only partial.
FileStream stream = new FileStream(#"c:\temp\build.txt",
FileMode.Open, FileAccess.Read);
stream.Seek(-1024, SeekOrigin.End); // rewind enough for > 1 line
StreamReader reader = new StreamReader(stream);
reader.ReadLine(); // discard partial line
while (!reader.EndOfStream)
{
string nextLine = reader.ReadLine();
Console.WriteLine(nextLine);
}
Take a look at this related question's answer to read a text file in reverse. There is a lot of complexity to reading a file backward correctly because of stuff like encoding.
I am a bit new to files in C# and am having a problem. When reading from a file and copying to another, the last chunk of text is not being written. Below is my code:
StringBuilder sb = new StringBuilder(8192);
string fileName = "C:...rest of path...inputFile.txt";
string outputFile = "C:...rest of path...outputFile.txt";
using (StreamReader reader = File.OpenText(fileName))
{
char[] buffer = new char[8192];
while ((reader.ReadBlock(buffer, 0, buffer.Length)) != 0)
{
foreach (char c in buffer)
{
//do some function on char c...
sb.Append(c);
}
using (StreamWriter writer = File.CreateText(outputFile))
{
writer.Write(sb.ToString());
}
}
}
My aim was to read and write to a textfile in a buffered manner. Something that in Java I would achieve in the following manner:
public void encrypt(File inputFile, File outputFile) throws IOException
{
BufferedReader infromfile = null;
BufferedWriter outtofile = null;
try
{
String key = getKeyfromFile(keyFile);
if (key != null)
{
infromfile = new BufferedReader(new FileReader(inputFile));
outtofile = new BufferedWriter(new FileWriter(outputFile));
char[] buffer = new char[8192];
while ((infromfile.read(buffer, 0, buffer.length)) != -1)
{
String temptext = String.valueOf(buffer);
//some changes to temptext are done
outtofile.write(temptext);
}
}
}
catch (FileNotFoundException exc)
{
} // and all other possible exceptions
}
Could you help me identify the source of my problem?
If you think that there is possibly a better approach to achieve buffered i/o with text files, I would truly appreciate your suggestion.
There are a couple of "gotchas":
c can't be changed (it's the foreach iteration variable), you'll need to copy it in order to process before writing
you have to keep track of your buffer's size, ReadBlock fills it with characters which would make your output dirty
Changing your code like this looks like it works:
//extracted from your code
foreach (char c in buffer)
{
if (c == (char)0) break; //GOTCHA #2: maybe you don't want NULL (ascii 0) characters in your output
char d = c; //GOTCHA #1: you can't change 'c'
// d = SomeProcessingHere();
sb.Append(d);
}
Try this:
string fileName = #"";
string outputfile = #"";
StreamReader reader = File.OpenText(fileName);
string texto = reader.ReadToEnd();
StreamWriter writer = new StreamWriter(outputfile);
writer.Write(texto);
writer.Flush();
writer.Close();
Does this work for you?
using (StreamReader reader = File.OpenText(fileName))
{
char[] buffer = new char[8192];
bool eof = false;
while (!eof)
{
int numBytes = (reader.ReadBlock(buffer, 0, buffer.Length));
if (numBytes>0)
{
using (StreamWriter writer = File.CreateText(outputFile))
{
writer.Write(buffer, 0, numBytes);
}
} else {
eof = true;
}
}
}
You still have to take care of character encoding though!
If you dont care about carraign returns, you could use File.ReadAllText
This method opens a file, reads each line of the file, and then adds each line as an element of a string. It then closes the file. A line is defined as a sequence of characters followed by a carriage return ('\r'), a line feed ('\n'), or a carriage return immediately followed by a line feed. The resulting string does not contain the terminating carriage return and/or line feed.
StringBuilder sb = new StringBuilder(8192);
string fileName = "C:...rest of path...inputFile.txt";
string outputFile = "C:...rest of path...outputFile.txt";
// Open the file to read from.
string readText = File.ReadAllText(fileName );
foreach (char c in readText)
{
// do something to c
sb.Append(new_c);
}
// This text is added only once to the file, overwrite it if it exists
File.WriteAllText(outputFile, sb.ToString());
Unless I'm missing something, it appears that your issue is that you're overwriting the existing contents of your output file on each blockread iteration.
You call:
using (StreamWriter writer = File.CreateText(outputFile))
{
writer.Write(sb.ToString());
}
for every ReadBlock iteration. The output of the file would only be the last chunk of data that was read.
From MSDN documentation on File.CreateText:
If the file specified by path does not exist, it is created. If the
file does exist, its contents are overwritten.
Due to me receiving a very bad datafile, I have to come up with code to read from a non delimited textfile from a specific starting position and a specific length to buildup a workable dataset. The textfile is not delimited in any way, but I do have the starting and ending position of each string that I need to read. I've come up with this code, but I'm getting an error and can't figure out why, because if I replace the 395 with a 0 it works..
e.g. Invoice number starting position = 395, ending position = 414, length = 20
using (StreamReader sr = new StreamReader(#"\\t.txt"))
{
char[] c = null;
while (sr.Peek() >= 0)
{
c = new char[20];//Invoice number string
sr.Read(c, 395, c.Length); //THIS IS GIVING ME AN ERROR
Debug.WriteLine(""+c[0] + c[1] + c[2] + c[3] + c[4]..c[20]);
}
}
Here is the error that I get:
System.ArgumentException: Offset and length were out of bounds for the array
or count is greater than the number of elements from
index to the end of the source collection. at
System.IO.StreamReader.Read(Char[] b
Please Note
Seek() is too low level for what the OP wants. See this answer instead for line-by-line parsing.
Also, as Jordan mentioned, Seek() has the issue of character encodings and varying character sizes (e.g. for non-ASCII and non-ANSI files, like UTF, which is probably not applicable to this question). Thanks for pointing that out.
Original Answer
Seek() is only available on a stream, so try using sr.BaseStream.Seek(..), or use a different stream like such:
using (Stream s = new FileStream(path, FileMode.Open))
{
s.Seek(offset, SeekOrigin.Begin);
s.Read(buffer, 0, length);
}
Here is my suggestion for you:
using (StreamReader sr = new StreamReader(#"\\t.txt"))
{
char[] c = new char[20]; // Invoice number string
sr.BaseStream.Position = 395;
sr.Read(c, 0, c.Length);
}
(new answer based on comments)
You are parsing invoice data, with each entry on a new line, and the required data is at a fixed offset for every line. Stream.Seek() is too low level for what you want to do, because you will need several seeks, one for every line. Rather use the following:
int offset = 395;
int length = 20;
using (StreamReader sr = new StreamReader(#"\\t.txt"))
{
while (!sr.EndOfStream)
{
string line = sr.ReadLine();
string myData = line.Substring(offset, length);
}
}
Solved this ages ago, just wanted to post the solution that was suggested
using (StreamReader sr = new StreamReader(path2))
{
string line;
while ((line = sr.ReadLine()) != null)
{
dsnonhb.Tables[0].Columns.Add("InvoiceNum" );
dsnonhb.Tables[0].Columns.Add("Odo" );
dsnonhb.Tables[0].Columns.Add("PumpVal" );
dsnonhb.Tables[0].Columns.Add("Quantity" );
DataRow myrow;
myrow = dsnonhb.Tables[0].NewRow();
myrow["No"] = rowcounter.ToString();
myrow["InvoiceNum"] = line.Substring(741, 6);
myrow["Odo"] = line.Substring(499, 6);
myrow["PumpVal"] = line.Substring(609, 7);
myrow["Quantity"] = line.Substring(660, 6);
I've created a class called AdvancedStreamReader into my Helpers project on git hub here:
https://github.com/jsmunroe/Helpers/blob/master/Helpers/IO/AdvancedStreamReader.cs
It is fairly robust. It is a subclass of StreamReader and keeps all of that functionality intact. There are a few caveats: a) it resets the position of the stream when it is constructed; b) you should not seek the BaseStream while you are using the reader; c) you need to specify the newline character type if it differs from the environment and the file can only use one type. Here are some unit tests to demonstrate how it is used.
[TestMethod]
public void ReadLineWithNewLineOnly()
{
// Setup
var text = $"ƒun ‼Æ¢ with åò☺ encoding!\nƒun ‼Æ¢ with åò☺ encoding!\nƒun ‼Æ¢ with åò☺ encoding!\nHa!";
var bytes = Encoding.UTF8.GetBytes(text);
var stream = new MemoryStream(bytes);
var reader = new AdvancedStreamReader(stream, NewLineType.Nl);
reader.ReadLine();
// Execute
var result = reader.ReadLine();
// Assert
Assert.AreEqual("ƒun ‼Æ¢ with åò☺ encoding!", result);
Assert.AreEqual(54, reader.CharacterPosition);
}
[TestMethod]
public void SeekCharacterWithUtf8()
{
// Setup
var text = $"ƒun ‼Æ¢ with åò☺ encoding!{NL}ƒun ‼Æ¢ with åò☺ encoding!{NL}ƒun ‼Æ¢ with åò☺ encoding!{NL}Ha!";
var bytes = Encoding.UTF8.GetBytes(text);
var stream = new MemoryStream(bytes);
var reader = new AdvancedStreamReader(stream);
// Pre-condition assert
Assert.IsTrue(bytes.Length > text.Length); // More bytes than characters in sample text.
// Execute
reader.SeekCharacter(84);
// Assert
Assert.AreEqual(84, reader.CharacterPosition);
Assert.AreEqual($"Ha!", reader.ReadToEnd());
}
I wrote this for my own use, but I hope it will help other people.
395 is the index in c array at which you start writing. There's no 395 index there, max is 19.
I would suggest something like this.
StreamReader r;
...
string allFile = r.ReadToEnd();
int offset = 395;
int length = 20;
And then use
allFile.Substring(offset, length)
In native C#, how can I read from the end of a file?
This is pertinent because I need to read a log file, and it doesn't make sense to read 10k, to read the last 3 lines.
To read the last 1024 bytes:
using (var reader = new StreamReader("foo.txt"))
{
if (reader.BaseStream.Length > 1024)
{
reader.BaseStream.Seek(-1024, SeekOrigin.End);
}
string line;
while ((line = reader.ReadLine()) != null)
{
Console.WriteLine(line);
}
}
Maybe something like this will work for you:
using (var fs = File.OpenRead(filePath))
{
fs.Seek(0, SeekOrigin.End);
int newLines = 0;
while (newLines < 3)
{
fs.Seek(-1, SeekOrigin.Current);
newLines += fs.ReadByte() == 13 ? 1 : 0; // look for \r
fs.Seek(-1, SeekOrigin.Current);
}
byte[] data = new byte[fs.Length - fs.Position];
fs.Read(data, 0, data.Length);
}
Take note that this assumes \r\n.
The code below uses a random-access FileStream to seed a StreamReader at an offset near the end of the file, discarding the first read line since it is most likely only partial.
FileStream stream = new FileStream(#"c:\temp\build.txt",
FileMode.Open, FileAccess.Read);
stream.Seek(-1024, SeekOrigin.End); // rewind enough for > 1 line
StreamReader reader = new StreamReader(stream);
reader.ReadLine(); // discard partial line
while (!reader.EndOfStream)
{
string nextLine = reader.ReadLine();
Console.WriteLine(nextLine);
}
Take a look at this related question's answer to read a text file in reverse. There is a lot of complexity to reading a file backward correctly because of stuff like encoding.