Read textfile from specific position till specific length - c#

Due to me receiving a very bad datafile, I have to come up with code to read from a non delimited textfile from a specific starting position and a specific length to buildup a workable dataset. The textfile is not delimited in any way, but I do have the starting and ending position of each string that I need to read. I've come up with this code, but I'm getting an error and can't figure out why, because if I replace the 395 with a 0 it works..
e.g. Invoice number starting position = 395, ending position = 414, length = 20
using (StreamReader sr = new StreamReader(#"\\t.txt"))
{
char[] c = null;
while (sr.Peek() >= 0)
{
c = new char[20];//Invoice number string
sr.Read(c, 395, c.Length); //THIS IS GIVING ME AN ERROR
Debug.WriteLine(""+c[0] + c[1] + c[2] + c[3] + c[4]..c[20]);
}
}
Here is the error that I get:
System.ArgumentException: Offset and length were out of bounds for the array
or count is greater than the number of elements from
index to the end of the source collection. at
System.IO.StreamReader.Read(Char[] b

Please Note
Seek() is too low level for what the OP wants. See this answer instead for line-by-line parsing.
Also, as Jordan mentioned, Seek() has the issue of character encodings and varying character sizes (e.g. for non-ASCII and non-ANSI files, like UTF, which is probably not applicable to this question). Thanks for pointing that out.
Original Answer
Seek() is only available on a stream, so try using sr.BaseStream.Seek(..), or use a different stream like such:
using (Stream s = new FileStream(path, FileMode.Open))
{
s.Seek(offset, SeekOrigin.Begin);
s.Read(buffer, 0, length);
}

Here is my suggestion for you:
using (StreamReader sr = new StreamReader(#"\\t.txt"))
{
char[] c = new char[20]; // Invoice number string
sr.BaseStream.Position = 395;
sr.Read(c, 0, c.Length);
}

(new answer based on comments)
You are parsing invoice data, with each entry on a new line, and the required data is at a fixed offset for every line. Stream.Seek() is too low level for what you want to do, because you will need several seeks, one for every line. Rather use the following:
int offset = 395;
int length = 20;
using (StreamReader sr = new StreamReader(#"\\t.txt"))
{
while (!sr.EndOfStream)
{
string line = sr.ReadLine();
string myData = line.Substring(offset, length);
}
}

Solved this ages ago, just wanted to post the solution that was suggested
using (StreamReader sr = new StreamReader(path2))
{
string line;
while ((line = sr.ReadLine()) != null)
{
dsnonhb.Tables[0].Columns.Add("InvoiceNum" );
dsnonhb.Tables[0].Columns.Add("Odo" );
dsnonhb.Tables[0].Columns.Add("PumpVal" );
dsnonhb.Tables[0].Columns.Add("Quantity" );
DataRow myrow;
myrow = dsnonhb.Tables[0].NewRow();
myrow["No"] = rowcounter.ToString();
myrow["InvoiceNum"] = line.Substring(741, 6);
myrow["Odo"] = line.Substring(499, 6);
myrow["PumpVal"] = line.Substring(609, 7);
myrow["Quantity"] = line.Substring(660, 6);

I've created a class called AdvancedStreamReader into my Helpers project on git hub here:
https://github.com/jsmunroe/Helpers/blob/master/Helpers/IO/AdvancedStreamReader.cs
It is fairly robust. It is a subclass of StreamReader and keeps all of that functionality intact. There are a few caveats: a) it resets the position of the stream when it is constructed; b) you should not seek the BaseStream while you are using the reader; c) you need to specify the newline character type if it differs from the environment and the file can only use one type. Here are some unit tests to demonstrate how it is used.
[TestMethod]
public void ReadLineWithNewLineOnly()
{
// Setup
var text = $"ƒun ‼Æ¢ with åò☺ encoding!\nƒun ‼Æ¢ with åò☺ encoding!\nƒun ‼Æ¢ with åò☺ encoding!\nHa!";
var bytes = Encoding.UTF8.GetBytes(text);
var stream = new MemoryStream(bytes);
var reader = new AdvancedStreamReader(stream, NewLineType.Nl);
reader.ReadLine();
// Execute
var result = reader.ReadLine();
// Assert
Assert.AreEqual("ƒun ‼Æ¢ with åò☺ encoding!", result);
Assert.AreEqual(54, reader.CharacterPosition);
}
[TestMethod]
public void SeekCharacterWithUtf8()
{
// Setup
var text = $"ƒun ‼Æ¢ with åò☺ encoding!{NL}ƒun ‼Æ¢ with åò☺ encoding!{NL}ƒun ‼Æ¢ with åò☺ encoding!{NL}Ha!";
var bytes = Encoding.UTF8.GetBytes(text);
var stream = new MemoryStream(bytes);
var reader = new AdvancedStreamReader(stream);
// Pre-condition assert
Assert.IsTrue(bytes.Length > text.Length); // More bytes than characters in sample text.
// Execute
reader.SeekCharacter(84);
// Assert
Assert.AreEqual(84, reader.CharacterPosition);
Assert.AreEqual($"Ha!", reader.ReadToEnd());
}
I wrote this for my own use, but I hope it will help other people.

395 is the index in c array at which you start writing. There's no 395 index there, max is 19.
I would suggest something like this.
StreamReader r;
...
string allFile = r.ReadToEnd();
int offset = 395;
int length = 20;
And then use
allFile.Substring(offset, length)

Related

ReadingLine in SR, and accessing lastline with index -1

i am writing atm, a streamreader tool, and have following problem. I read a line by ReadLine(); Then the stream, continues with the next line. But i need information about the last character (especially if it is a NewLine or a Linefeed) from the line before.
This is my approach:
I tried several approaches, with readblock or so. But it seems, that the stream itself, does not allow me, to get back in the position, to parse the elements i needed.
off = 0;
FileStream stream = new FileStream(filename, FileMode.Open);
using (StreamReader content = new StreamReader(stream, Encoding.UTF8))
{
String s = "";
while ((s = content.ReadLine()) != null)
{
content.BaseStream.Seek((a == 0)? off: off - 1, SeekOrigin.Begin);
//content.BaseStream.Seek(off, SeekOrigin.Current);
var c=content.Peek();
char b = (char)c;
data = s;
maxlist.Add(data.Length);
if (data != null)
{
offset = offset + (data.Length)+2;
offsetindex.Add(offset);
}
a++;
off = off + data.Length - 2;
}
content.Close();
}
The expected Output should be, that i can access, the line above, after the Readline is called.So i can read with ReadBlock, the last elements, that i needed for exact positioning in the stream.

Unusual Character addition after writing back decoded file

I am using ZXing.Net library to encode and decode my video file using RS Encoder. It works well by adding and and removing parity after encoding and decoding respectively. But When writing decoded file back it is adding "?" characters in file on different locations which was not part of original file. I am not getting why this problem is occurring when writing file back.
Here is my code
using ZXing.Common.ReedSolomon;
namespace zxingtest
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
string inputFileName = #"D:\JM\bin\baseline_30.264";
string outputFileName = #"D:\JM\bin\baseline_encoded.264";
string Content = File.ReadAllText(inputFileName, ASCIIEncoding.Default);
//File.WriteAllText(outputFileName, Content, ASCIIEncoding.Default);
ReedSolomonEncoder enc = new ReedSolomonEncoder(GenericGF.AZTEC_DATA_12);
ReedSolomonDecoder dec = new ReedSolomonDecoder(GenericGF.AZTEC_DATA_12);
//string s = "1,2,4,6,1,7,4,0,0";
//int[] array = s.Split(',').Select(str => int.Parse(str)).ToArray();
int parity = 10;
List<byte> toBytes = ASCIIEncoding.Default.GetBytes(Content.Substring(0, 500)).ToList();
for (int index = 0; index < parity; index++)
{
toBytes.Add(0);
}
int[] bytesAsInts = Array.ConvertAll(toBytes.ToArray(), c => (int)c);
enc.encode(bytesAsInts, parity);
bytesAsInts[1] = 3;
dec.decode(bytesAsInts, parity);
string st = new string(Array.ConvertAll(bytesAsInts.ToArray(), z => (char)z));
File.WriteAllText(outputFileName, st, ASCIIEncoding.Default);
}
}
}
And here is the Hex file view of H.264 bit stream
The problem is that you're handling a binary format as if it is a Text file with an encoding. But based on what you are doing you only seem to be interested in reading some bytes, process them (encode, decode) and then write the bytes back to a file.
If that is what you need then use the proper reader and writer for your files, in this case the BinaryReader and BinaryWriter. Using your code as a starting point this is my version using the earlier mentioned readers/writers. My inputfile and outputfile are similar for the bytes read and written.
string inputFileName = #"input.264";
string outputFileName = #"output.264";
ReedSolomonEncoder enc = new ReedSolomonEncoder(GenericGF.AZTEC_DATA_12);
ReedSolomonDecoder dec = new ReedSolomonDecoder(GenericGF.AZTEC_DATA_12);
const int parity = 10;
// open a file as stream for reading
using (var input = File.OpenRead(inputFileName))
{
const int max_ints = 256;
int[] bytesAsInts = new int[max_ints];
// use a binary reader
using (var binary = new BinaryReader(input))
{
for (int i = 0; i < max_ints - parity; i++)
{
//read a single byte, store them in the array of ints
bytesAsInts[i] = binary.ReadByte();
}
// parity
for (int i = max_ints - parity; i < max_ints; i++)
{
bytesAsInts[i] = 0;
}
enc.encode(bytesAsInts, parity);
bytesAsInts[1] = 3;
dec.decode(bytesAsInts, parity);
// create a stream for writing
using(var output = File.Create(outputFileName))
{
// write bytes back
using(var writer = new BinaryWriter(output))
{
foreach(var value in bytesAsInts)
{
// we need to write back a byte
// not an int so cast it
writer.Write((byte)value);
}
}
}
}
}

How can I use the DeflateStream class on one line in a file?

I have a file which contains plaintext mixed in with some compressed text, for example:
Version 01
Maker SomeCompany
l 73
mark
h�22V0P���w�/�+Q0���L)�66□ // This line was compressed using DeflateZLib
endmark
It seems that Microsoft has a solution, the DeflateStream class, but their examples show how to use it on an entire file, whereas I can't figure out how to just use it on one line in my file.
So far I have the following:
bool isDeflate = false;
using (var fs = new FileStream(#"C:\Temp\MyFile.dat", FileMode.Open)
using (var reader = new StreamReader(fs))
{
string line;
while ((line = reader.ReadLine()) != null)
{
if (isDeflate)
{
if (line == "endmark")
{
isDeflate = false;
}
else
{
line = DeflateSomehow(line);
}
}
if (line == "mark")
{
isDeflate = true;
}
Console.WriteLine(line);
}
}
public string DeflateSomehow(string line)
{
// How do I deflate just that string?
}
Since the file is not created by me (we're only reading it in), we have no control over its structure... but, I'm not tied down to the code I have right now. If I need to change more of it than simply figuring out how to implement the DeflateSomehow method, than I'm fine with that as well.
A deflate stream works on binary data. An arbitrary binary chunk in the middle of a text file is also known as: a corrupt text file. There is no sane way of decoding this:
you can't read "lines", because there is no definition of a "line" when talking about binary data; any combination of CR/LF/CRLF/etc could occur completely by random in the binary data
you can't read a "string line", because that suggests you are running the data through an Encoding; but since this isn't text data, again: that will simply give you gibberish that cannot be processed (it will have lost data when reading)
Now, the second of these two problems is solvable by reading via the Stream API rather than the StreamReader API, so that you are only ever reading binary; you would then need to look for the line endings yourself, using an Encoding to probe what you can (noting that this isn't as simple as it sounds if you are using multi/variable-byte encodings such as UTF-8).
However, the first of these two problems is inherently not solvable by itself. To do this reliably, you would need some kind of binary framing protocol - which again, does not exist in a text file. It looks like the example is using "mark" and "endmark" - again, there is technically a chance that these would occur at random, but you'll probably get away with it for the 99.999% case. The trick, then, would be to read the entire file manually using Stream and Encoding, looking for "mark" and "endmark" - and stripping the bits that are encoded as text from the bits that are compressed data. Then run the encoded-as-text piece through the correct Encoding.
However! At the point when you are reading binary, then it is simple: you simply buffer the right amount (using whatever framing/sentinel protocol the data is written in), and use something like:
using(var ms = new MemoryStream(bytes))
using(var inflate = new GZipStream(ms, CompressionMode.Decompress))
{
// now read from 'inflate'
}
With the addition of the l 73 marker, and the information that it is ASCII, it becomes a little more viable.
This won't work for me because the data here on SO is already corrupted (posting binary as text does that), but basically something like:
using System;
using System.Collections.Generic;
using System.IO;
using System.IO.Compression;
using System.Text;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
using (var file = File.OpenRead("my.txt"))
using (var buffer = new MemoryStream())
{
List<string> lines = new List<string>();
string line;
while ((line = ReadToCRLF(file, buffer)) != null)
{
lines.Add(line);
Console.WriteLine(line);
if (line == "mark" && lines.Count >= 2)
{
var match = Regex.Match(lines[lines.Count - 2], "^l ([0-9]+)$");
int bytes;
if (match.Success && int.TryParse(match.Groups[1].Value, out bytes))
{
ReadBytes(file, buffer, bytes);
string inflated = Inflate(buffer);
lines.Add(inflated); // or something similar
Console.WriteLine(inflated);
}
}
}
}
}
static string Inflate(Stream source)
{
using (var deflate = new DeflateStream(source, CompressionMode.Decompress, true))
using (var reader = new StreamReader(deflate, Encoding.ASCII))
{
return reader.ReadToEnd();
}
}
static void ReadBytes(Stream source, MemoryStream buffer, int count)
{
buffer.SetLength(count);
int read, offset = 0;
while (count > 0 && (read = source.Read(buffer.GetBuffer(), offset, count)) > 0)
{
count -= read;
offset += read;
}
if (count != 0) throw new EndOfStreamException();
buffer.Position = 0;
}
static string ReadToCRLF(Stream source, MemoryStream buffer)
{
buffer.SetLength(0);
int next;
bool wasCr = false;
while ((next = source.ReadByte()) >= 0)
{
if(next == 10 && wasCr) { // CRLF
// end of line (minus the CR)
return Encoding.ASCII.GetString(
buffer.GetBuffer(), 0, (int)buffer.Length - 1);
}
buffer.WriteByte((byte)next);
wasCr = next == 13;
}
// end of file
if (buffer.Length == 0) return null;
return Encoding.ASCII.GetString(buffer.GetBuffer(), 0, (int)buffer.Length);
}
}

Multi-level sorting on strings

Here is a same of the raw data i have:
Sana Paden,1098,64228,46285,2/15/2011
Ardelle Mahr,1242,85663,33218,3/25/2011
Joel Fountain,1335,10951,50866,5/2/2011
Ashely Vierra,1349,5379,87475,6/9/2011
Amado Loiacono,1406,62789,38490,7/17/2011
Joycelyn Dolezal,1653,14720,13638,8/24/2011
Alyse Braunstein,1657,69455,52871,10/1/2011
Cheri Ravenscroft,1734,55431,58460,11/8/2011
i used a Filestream with a nested Streamwriter to determine first, how many lines are in the file, 2 to create an array of longs that give me the start of every line in the file. Code and out put follows:
using (FileStream fs = new FileStream(#"C:\SourceDatatoedit.csv", FileMode.Open, FileAccess.Read))
{
fs.Seek(offset, SeekOrigin.Begin);
StreamReader sr = new StreamReader(fs);
{
while (!sr.EndOfStream && fs.CanRead)
{
streamsample = sr.ReadLine();
numoflines++;
}// end while block
}//end stream sr block
long[] dataArray = new long[numoflines];
fs.Seek(offset, SeekOrigin.Begin);
StreamReader dr = new StreamReader(fs);
{
numoflines = 0;
streamsample = "";
while (!dr.EndOfStream && fs.CanRead)
{
streamsample = dr.ReadLine();
//pointers.Add(numoflines.ToString());
dataArray[numoflines] = offset;
offset += streamsample.Length - 1;
numoflines++;
}// end while
one string contains name, ID, a loan amount, a payment amount and the payment date.
i have a method in place to return the remaining amount by subtracting the payment amount from the loan amount and then dividing that by 100 to get the dollar and cents value.
after doing this i want to order my information by Date, name, and then lastly negative amounts first. i understand i could create a loan class then create a list of loan objects and run Linq for Objects query against the set to obtain this but im trying to do this without the use of Linq....any suggestions?
Depending on the context for your code, you can gain many benefits by introducing a custom class / business object. It will help you provide a good separation of concerns in your code, and thus move to more manageable and testable code. You can implement the IComparable interface so that you can invoke a custom Sort on a collection of type List.
I know you mentioned not to use LINQ. However, you could use one line of code for this lines of code here:
using (FileStream fs = new FileStream(#"C:\SourceDatatoedit.csv", FileMode.Open, FileAccess.Read))
{
fs.Seek(offset, SeekOrigin.Begin);
StreamReader sr = new StreamReader(fs);
{
while (!sr.EndOfStream && fs.CanRead)
{
streamsample = sr.ReadLine();
numoflines++;
}// end while block
}//end stream sr block
}
To this one line of code like this:
int numoflines = File.ReadLines("SourceDatatoedit.csv").
Select(line => line.Split(',')).ToList().Count;
Or you could even just get the List like:
var lines = File.ReadLines("SourceDatatoedit.csv").
Select(line => line.Split(',')).ToList();
And get the number of lines afterward
numoflines = lines.Count;
And then continue with your code that you have like:
long[] dataArray = new long[numoflines];
fs.Seek(offset, SeekOrigin.Begin);
StreamReader dr = new StreamReader(fs);
{
numoflines = 0;
streamsample = "";
while (!dr.EndOfStream && fs.CanRead)
{
streamsample = dr.ReadLine();
//pointers.Add(numoflines.ToString());
dataArray[numoflines] = offset;
offset += streamsample.Length - 1;
numoflines++;
}// end while
Or just use the List obtained above and work with it like creating an IComparable implementation as #sfuqua suggested above.

Unexpected output when reading and writing to a text file

I am a bit new to files in C# and am having a problem. When reading from a file and copying to another, the last chunk of text is not being written. Below is my code:
StringBuilder sb = new StringBuilder(8192);
string fileName = "C:...rest of path...inputFile.txt";
string outputFile = "C:...rest of path...outputFile.txt";
using (StreamReader reader = File.OpenText(fileName))
{
char[] buffer = new char[8192];
while ((reader.ReadBlock(buffer, 0, buffer.Length)) != 0)
{
foreach (char c in buffer)
{
//do some function on char c...
sb.Append(c);
}
using (StreamWriter writer = File.CreateText(outputFile))
{
writer.Write(sb.ToString());
}
}
}
My aim was to read and write to a textfile in a buffered manner. Something that in Java I would achieve in the following manner:
public void encrypt(File inputFile, File outputFile) throws IOException
{
BufferedReader infromfile = null;
BufferedWriter outtofile = null;
try
{
String key = getKeyfromFile(keyFile);
if (key != null)
{
infromfile = new BufferedReader(new FileReader(inputFile));
outtofile = new BufferedWriter(new FileWriter(outputFile));
char[] buffer = new char[8192];
while ((infromfile.read(buffer, 0, buffer.length)) != -1)
{
String temptext = String.valueOf(buffer);
//some changes to temptext are done
outtofile.write(temptext);
}
}
}
catch (FileNotFoundException exc)
{
} // and all other possible exceptions
}
Could you help me identify the source of my problem?
If you think that there is possibly a better approach to achieve buffered i/o with text files, I would truly appreciate your suggestion.
There are a couple of "gotchas":
c can't be changed (it's the foreach iteration variable), you'll need to copy it in order to process before writing
you have to keep track of your buffer's size, ReadBlock fills it with characters which would make your output dirty
Changing your code like this looks like it works:
//extracted from your code
foreach (char c in buffer)
{
if (c == (char)0) break; //GOTCHA #2: maybe you don't want NULL (ascii 0) characters in your output
char d = c; //GOTCHA #1: you can't change 'c'
// d = SomeProcessingHere();
sb.Append(d);
}
Try this:
string fileName = #"";
string outputfile = #"";
StreamReader reader = File.OpenText(fileName);
string texto = reader.ReadToEnd();
StreamWriter writer = new StreamWriter(outputfile);
writer.Write(texto);
writer.Flush();
writer.Close();
Does this work for you?
using (StreamReader reader = File.OpenText(fileName))
{
char[] buffer = new char[8192];
bool eof = false;
while (!eof)
{
int numBytes = (reader.ReadBlock(buffer, 0, buffer.Length));
if (numBytes>0)
{
using (StreamWriter writer = File.CreateText(outputFile))
{
writer.Write(buffer, 0, numBytes);
}
} else {
eof = true;
}
}
}
You still have to take care of character encoding though!
If you dont care about carraign returns, you could use File.ReadAllText
This method opens a file, reads each line of the file, and then adds each line as an element of a string. It then closes the file. A line is defined as a sequence of characters followed by a carriage return ('\r'), a line feed ('\n'), or a carriage return immediately followed by a line feed. The resulting string does not contain the terminating carriage return and/or line feed.
StringBuilder sb = new StringBuilder(8192);
string fileName = "C:...rest of path...inputFile.txt";
string outputFile = "C:...rest of path...outputFile.txt";
// Open the file to read from.
string readText = File.ReadAllText(fileName );
foreach (char c in readText)
{
// do something to c
sb.Append(new_c);
}
// This text is added only once to the file, overwrite it if it exists
File.WriteAllText(outputFile, sb.ToString());
Unless I'm missing something, it appears that your issue is that you're overwriting the existing contents of your output file on each blockread iteration.
You call:
using (StreamWriter writer = File.CreateText(outputFile))
{
writer.Write(sb.ToString());
}
for every ReadBlock iteration. The output of the file would only be the last chunk of data that was read.
From MSDN documentation on File.CreateText:
If the file specified by path does not exist, it is created. If the
file does exist, its contents are overwritten.

Categories

Resources