Multi-level sorting on strings - c#

Here is a same of the raw data i have:
Sana Paden,1098,64228,46285,2/15/2011
Ardelle Mahr,1242,85663,33218,3/25/2011
Joel Fountain,1335,10951,50866,5/2/2011
Ashely Vierra,1349,5379,87475,6/9/2011
Amado Loiacono,1406,62789,38490,7/17/2011
Joycelyn Dolezal,1653,14720,13638,8/24/2011
Alyse Braunstein,1657,69455,52871,10/1/2011
Cheri Ravenscroft,1734,55431,58460,11/8/2011
i used a Filestream with a nested Streamwriter to determine first, how many lines are in the file, 2 to create an array of longs that give me the start of every line in the file. Code and out put follows:
using (FileStream fs = new FileStream(#"C:\SourceDatatoedit.csv", FileMode.Open, FileAccess.Read))
{
fs.Seek(offset, SeekOrigin.Begin);
StreamReader sr = new StreamReader(fs);
{
while (!sr.EndOfStream && fs.CanRead)
{
streamsample = sr.ReadLine();
numoflines++;
}// end while block
}//end stream sr block
long[] dataArray = new long[numoflines];
fs.Seek(offset, SeekOrigin.Begin);
StreamReader dr = new StreamReader(fs);
{
numoflines = 0;
streamsample = "";
while (!dr.EndOfStream && fs.CanRead)
{
streamsample = dr.ReadLine();
//pointers.Add(numoflines.ToString());
dataArray[numoflines] = offset;
offset += streamsample.Length - 1;
numoflines++;
}// end while
one string contains name, ID, a loan amount, a payment amount and the payment date.
i have a method in place to return the remaining amount by subtracting the payment amount from the loan amount and then dividing that by 100 to get the dollar and cents value.
after doing this i want to order my information by Date, name, and then lastly negative amounts first. i understand i could create a loan class then create a list of loan objects and run Linq for Objects query against the set to obtain this but im trying to do this without the use of Linq....any suggestions?

Depending on the context for your code, you can gain many benefits by introducing a custom class / business object. It will help you provide a good separation of concerns in your code, and thus move to more manageable and testable code. You can implement the IComparable interface so that you can invoke a custom Sort on a collection of type List.

I know you mentioned not to use LINQ. However, you could use one line of code for this lines of code here:
using (FileStream fs = new FileStream(#"C:\SourceDatatoedit.csv", FileMode.Open, FileAccess.Read))
{
fs.Seek(offset, SeekOrigin.Begin);
StreamReader sr = new StreamReader(fs);
{
while (!sr.EndOfStream && fs.CanRead)
{
streamsample = sr.ReadLine();
numoflines++;
}// end while block
}//end stream sr block
}
To this one line of code like this:
int numoflines = File.ReadLines("SourceDatatoedit.csv").
Select(line => line.Split(',')).ToList().Count;
Or you could even just get the List like:
var lines = File.ReadLines("SourceDatatoedit.csv").
Select(line => line.Split(',')).ToList();
And get the number of lines afterward
numoflines = lines.Count;
And then continue with your code that you have like:
long[] dataArray = new long[numoflines];
fs.Seek(offset, SeekOrigin.Begin);
StreamReader dr = new StreamReader(fs);
{
numoflines = 0;
streamsample = "";
while (!dr.EndOfStream && fs.CanRead)
{
streamsample = dr.ReadLine();
//pointers.Add(numoflines.ToString());
dataArray[numoflines] = offset;
offset += streamsample.Length - 1;
numoflines++;
}// end while
Or just use the List obtained above and work with it like creating an IComparable implementation as #sfuqua suggested above.

Related

ReadingLine in SR, and accessing lastline with index -1

i am writing atm, a streamreader tool, and have following problem. I read a line by ReadLine(); Then the stream, continues with the next line. But i need information about the last character (especially if it is a NewLine or a Linefeed) from the line before.
This is my approach:
I tried several approaches, with readblock or so. But it seems, that the stream itself, does not allow me, to get back in the position, to parse the elements i needed.
off = 0;
FileStream stream = new FileStream(filename, FileMode.Open);
using (StreamReader content = new StreamReader(stream, Encoding.UTF8))
{
String s = "";
while ((s = content.ReadLine()) != null)
{
content.BaseStream.Seek((a == 0)? off: off - 1, SeekOrigin.Begin);
//content.BaseStream.Seek(off, SeekOrigin.Current);
var c=content.Peek();
char b = (char)c;
data = s;
maxlist.Add(data.Length);
if (data != null)
{
offset = offset + (data.Length)+2;
offsetindex.Add(offset);
}
a++;
off = off + data.Length - 2;
}
content.Close();
}
The expected Output should be, that i can access, the line above, after the Readline is called.So i can read with ReadBlock, the last elements, that i needed for exact positioning in the stream.

how read not empty lines of file in StreamReader

i want check if lines of file is more than 20 (for example) then prevent user to upload text file so want read file only once this is my code
using (Stream stream = fileUploadBatch.FileContent)
{
stream.Seek(0, SeekOrigin.Begin);
using (StreamReader streamReader = new StreamReader(stream))
{
string line = null;
List<string> fileLines = new List<string>();
while (!streamReader.EndOfStream && fileLines.Count < 50)
{
line = streamReader.ReadLine();
if(strig.IsNullOrEmpty(line))
return;
fileLines.Add(line);
}
// do something with list
}
}
this is bad practice because result of streamReader.ReadLine() assigns to line variable and create memory issues (string is immutable)
so i want add not empty line in to list without storing in line variable

Is there a better way to write read and modify text lines and write them into an output stream?

I'm currently trying to read a file, modify a few placeholders within and then write the file into an output stream. As its the output stream for a page response in aspx.net I'm using the OutputStream.Write method there (the file is an attachment in the end).
Originally I had:
using (FileStream fs = new FileStream(filename, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
while (readBytes < fs.Length)
{
tmpReadBytes = fs.Read(bytes, 0, bytes.Length);
if (tmpReadBytes > 0)
{
readBytes += tmpReadBytes;
page.Response.OutputStream.Write(bytes, 0, tmpReadBytes);
}
}
}
After thinking things over I came up with the following:
foreach(string line in File.ReadLines(filename))
{
string modifiedLine = line.Replace("#PlaceHolder#", "NewValue");
byte[] modifiedByteArray = System.Text.Encoding.UTF8.GetBytes(modifiedLine);
page.Response.OutputStream.Write(modifiedByteArray, 0, modifiedByteArray.length);
}
But it looks inefficient especially with the conversions. So my question is: Is there any better way of doing this?
As note the file itself is not very big, it's an about 3-4 KB sized textfile.
You don't need to handle the bytes your self.
If you know the file is and always will be small,
this.Response.Write(File.ReadAllText("path").Replace("old", "new"));
otherwise
using (var stream = new FileStream("path", FileMode.Open))
{
using (var streamReader = new StreamReader(stream))
{
while (streamReader.Peek() != -1)
{
this.Response.Write(streamReader.ReadLine().Replace("old", "new"));
}
}
}
To get the lines in a string array:
string[] lines = File.ReadAllLines(file);
To alter the lines, use a loop.
for (int i = 0; i < lines.Length; i++)
{
lines[i] = lines[i].Replace("#PlaceHolder#", "NewValue");
}
And to save the new text, first create a string with all the lines.
string output = "";
foreach(string line in lines)
{
output+="\n"+line;
}
And then save the string to the file.
File.WriteAllText(file,output);

Unable to re-construct a file using byte array retrieved from another file (chunk-by-chunk)

I am currently trying to construct file B by extracting a certain length of bytes from file A (chunk-by-chunk). The size of file B is 38052441 bytes, and its location in file A is from byte 34 onward. If I do it in one shot, I manage to extract file B from file A without any issue, as shown in the snippet below.
test = new byte[38052441];
//madefilePath: file A, madecabfilePath: file B
using (BinaryReader reader = new BinaryReader(new FileStream(madefilePath, FileMode.Open)))
using (BinaryWriter bw = new BinaryWriter(File.Open(madecabfilePath, FileMode.OpenOrCreate)))
{
reader.BaseStream.Seek(34, SeekOrigin.Begin);
reader.Read(test, 0, 38052441);
bw.Write(test);
bw.Close();
reader.Close();
}
Howerver, if I try to do it in multiple query (I have to do this, because this feature will be ported to compact framework in the future), I kept on getting a corrupted file. Currently, I am testing by getting the first 20Mb, write into a file, then get the remaining bytes and write it into the file again.
int max = 38052474;
int offset = 34;
int weight = 20000000;
bool isComplete = false;
test = null;
test = new byte[weight];
using (BinaryWriter bw = new BinaryWriter(File.Open(madecabfilePath, FileMode.OpenOrCreate)))
using (BinaryReader reader = new BinaryReader(new FileStream(madefilePath, FileMode.Open)))
{
while (!isComplete)
{
if (offset + weight < max)
{
reader.BaseStream.Seek(offset, SeekOrigin.Begin);
reader.Read(test, 0, weight);
bw.Write(test);
offset = offset + weight;
}
else
{
weight = max - offset;
test = null;
test = new byte[weight];
reader.BaseStream.Seek(offset, SeekOrigin.Begin);
reader.Read(test, 0, weight);
bw.Write(test);
//Terminate everything
reader.Close();
bw.Close();
isComplete = true;
}
}
}
I think the issue lies with my logic, but I can't figure out why. Any help is appreciated. Thank you.
BinaryReader.Read() returns the number of bytes that were actually read. So you can simplify your logic and probably fix some issues with something like:
using (BinaryWriter bw = new BinaryWriter(File.Open(madecabfilePath, FileMode.OpenOrCreate)))
using (BinaryReader reader = new BinaryReader(new FileStream(madefilePath, FileMode.Open)))
{
reader.BaseStream.Seek(offset, SeekOrigin.Begin);
while (!isComplete)
{
int charsRead = reader.Read(test, 0, weight);
if (charsRead == 0)
{
isComplete = true;
}
else
{
bw.Write(test, 0, charsRead);
}
}
}
Note that you don't need to explicitly close bw or reader, as the using statement will do that for you. Also note that after the first Seek() call the position in the BinaryReader will be kept track of.

Read textfile from specific position till specific length

Due to me receiving a very bad datafile, I have to come up with code to read from a non delimited textfile from a specific starting position and a specific length to buildup a workable dataset. The textfile is not delimited in any way, but I do have the starting and ending position of each string that I need to read. I've come up with this code, but I'm getting an error and can't figure out why, because if I replace the 395 with a 0 it works..
e.g. Invoice number starting position = 395, ending position = 414, length = 20
using (StreamReader sr = new StreamReader(#"\\t.txt"))
{
char[] c = null;
while (sr.Peek() >= 0)
{
c = new char[20];//Invoice number string
sr.Read(c, 395, c.Length); //THIS IS GIVING ME AN ERROR
Debug.WriteLine(""+c[0] + c[1] + c[2] + c[3] + c[4]..c[20]);
}
}
Here is the error that I get:
System.ArgumentException: Offset and length were out of bounds for the array
or count is greater than the number of elements from
index to the end of the source collection. at
System.IO.StreamReader.Read(Char[] b
Please Note
Seek() is too low level for what the OP wants. See this answer instead for line-by-line parsing.
Also, as Jordan mentioned, Seek() has the issue of character encodings and varying character sizes (e.g. for non-ASCII and non-ANSI files, like UTF, which is probably not applicable to this question). Thanks for pointing that out.
Original Answer
Seek() is only available on a stream, so try using sr.BaseStream.Seek(..), or use a different stream like such:
using (Stream s = new FileStream(path, FileMode.Open))
{
s.Seek(offset, SeekOrigin.Begin);
s.Read(buffer, 0, length);
}
Here is my suggestion for you:
using (StreamReader sr = new StreamReader(#"\\t.txt"))
{
char[] c = new char[20]; // Invoice number string
sr.BaseStream.Position = 395;
sr.Read(c, 0, c.Length);
}
(new answer based on comments)
You are parsing invoice data, with each entry on a new line, and the required data is at a fixed offset for every line. Stream.Seek() is too low level for what you want to do, because you will need several seeks, one for every line. Rather use the following:
int offset = 395;
int length = 20;
using (StreamReader sr = new StreamReader(#"\\t.txt"))
{
while (!sr.EndOfStream)
{
string line = sr.ReadLine();
string myData = line.Substring(offset, length);
}
}
Solved this ages ago, just wanted to post the solution that was suggested
using (StreamReader sr = new StreamReader(path2))
{
string line;
while ((line = sr.ReadLine()) != null)
{
dsnonhb.Tables[0].Columns.Add("InvoiceNum" );
dsnonhb.Tables[0].Columns.Add("Odo" );
dsnonhb.Tables[0].Columns.Add("PumpVal" );
dsnonhb.Tables[0].Columns.Add("Quantity" );
DataRow myrow;
myrow = dsnonhb.Tables[0].NewRow();
myrow["No"] = rowcounter.ToString();
myrow["InvoiceNum"] = line.Substring(741, 6);
myrow["Odo"] = line.Substring(499, 6);
myrow["PumpVal"] = line.Substring(609, 7);
myrow["Quantity"] = line.Substring(660, 6);
I've created a class called AdvancedStreamReader into my Helpers project on git hub here:
https://github.com/jsmunroe/Helpers/blob/master/Helpers/IO/AdvancedStreamReader.cs
It is fairly robust. It is a subclass of StreamReader and keeps all of that functionality intact. There are a few caveats: a) it resets the position of the stream when it is constructed; b) you should not seek the BaseStream while you are using the reader; c) you need to specify the newline character type if it differs from the environment and the file can only use one type. Here are some unit tests to demonstrate how it is used.
[TestMethod]
public void ReadLineWithNewLineOnly()
{
// Setup
var text = $"ƒun ‼Æ¢ with åò☺ encoding!\nƒun ‼Æ¢ with åò☺ encoding!\nƒun ‼Æ¢ with åò☺ encoding!\nHa!";
var bytes = Encoding.UTF8.GetBytes(text);
var stream = new MemoryStream(bytes);
var reader = new AdvancedStreamReader(stream, NewLineType.Nl);
reader.ReadLine();
// Execute
var result = reader.ReadLine();
// Assert
Assert.AreEqual("ƒun ‼Æ¢ with åò☺ encoding!", result);
Assert.AreEqual(54, reader.CharacterPosition);
}
[TestMethod]
public void SeekCharacterWithUtf8()
{
// Setup
var text = $"ƒun ‼Æ¢ with åò☺ encoding!{NL}ƒun ‼Æ¢ with åò☺ encoding!{NL}ƒun ‼Æ¢ with åò☺ encoding!{NL}Ha!";
var bytes = Encoding.UTF8.GetBytes(text);
var stream = new MemoryStream(bytes);
var reader = new AdvancedStreamReader(stream);
// Pre-condition assert
Assert.IsTrue(bytes.Length > text.Length); // More bytes than characters in sample text.
// Execute
reader.SeekCharacter(84);
// Assert
Assert.AreEqual(84, reader.CharacterPosition);
Assert.AreEqual($"Ha!", reader.ReadToEnd());
}
I wrote this for my own use, but I hope it will help other people.
395 is the index in c array at which you start writing. There's no 395 index there, max is 19.
I would suggest something like this.
StreamReader r;
...
string allFile = r.ReadToEnd();
int offset = 395;
int length = 20;
And then use
allFile.Substring(offset, length)

Categories

Resources