Reading the Huffman encoding tree in C#

Reading the Huffman encoding tree in C# - c#

I'm trying to build a Huffman encoder that can read the bytes out of a text file and convert them to Huffman codes. And I need to be able to decode.
So far I've been able to read the bytes from a file and put them in an tree that is based on a double-linked list representing the Huffman encoding. I have a text file with this text: "abcabcabd". This is the tree I generate:
My goal is to read out the tree and put the whole output in an array. My expected array would look like this: "1, 01, 000, 1, 01, 000, 1, 01, 001"
The only problem is I have no idea how to determine where what letter is positioned. I want to do it using the 1/0 method.
So my question here is: how do I determine where the 'byte' codes are located. Would I do it in a while loop? I want it to be variable so creating an library isn't possible. I have tried this:
First I create my tree:
public static void mBitTree()
{
W = H;
if (W.N != null)
{
int Fn = W.A;
int Sn = W.N.A;
int Cn = Fn + Sn;
cZip n = new cZip(0, Cn);
//First Set new node to link to subnodes.
n.L = W;
n.R = W.N;
if (W.N.N != null)
{
n.N = H.N.N;
H.N.N.P = n;
H.N.N = n;
//Safe and set new head and fix links.
W = H.N.N;
H.N.N = null;
H.N.P = null;
H.N = null;
H = W;
}
//this means there were 2 nodes left. so the newly created one will become Head ant Tail and the tree is complete.
else
{
H.N.P = null;
H.N = null;
H = n;
T = n;
}
}
else
{
return;
}
}
The top node from where I start is H and T. Also it has an L and R.
The first thing needed to be checked is H.L is not equal to current. Then go to H.R.L, then H.R.R.L and so on. If I have a bigger text file this needs to work as well. So I created something like this:
public static string[] mCodedchain(uint[] count, byte[] bytesInFile)
{
int counter = 0;
for (int i = 0; i < count.Length; i++) { if (count[i] != 0) counter = counter + (int)count[i]; }
string[] codedArray = new string[counter];
for (int i = 0; i < bytesInFile.Length; i++)
{
int number = 0;
while ((int)bytesInFile[i] != number )
{
number = H.L.V //and if not H.L Add R in between to go to H.R.L.V
}
}
return codedArray;
}
But have no clue as to how I would build up the while loop. I've tried a lot of things but this one seems the most valid to use, but I can't get it to work.
I'm not quite sure if this question is clear. But I hope it is.
Thanks in advance, and happy coding.

Decoding static Huffman data is actually quite simple.
What you need:
The node tree with symbols (letters/bytes in your case) attached in the same configuration as were used during encoding
A way to read bits from the stream, one bit at a time
With that in place, here's pseudocode for doing the decoding:
<node> ← <root>
WHILE MORE
<bit> ← <ReadBit()>
IF <bit> IS 1 THEN
<node> ← <node.LEFT>
ELSE
<node> ← <node.RIGHT>
END IF
IF <node.SYMBOL> THEN
OUTPUT <node.SYMBOL> AS DECODED SYMBOL
<node> ← <root>
END IF
END WHILE
Or in plain English:
Start at the root node.
Read 1 bit from the stream. If the bit is 1, go to the left child of the current node, otherwise go to the right child.
Whenever you hit a node with a symbol in it, output the symbol and go back to the root node
Go back to "read 1 bit" and keep going until you've decoded the entire stream
NOTE! You need to know how many symbols to output separately. The reason for this is that the very last byte in the encoded stream might have extra bits, and if you decode these as well you might end up with extra symbols, compared to the original data before encoding.

Related

Custom Newline in Binary Stream using Hex Array in WPF

I have a binary file I am reading and printing into a textbox while wrapping at a set point, but it is wrapping at places it shouldn't be. I want to ignore all line feed characters except those I have defined.
There isn't a single Newline byte, rather it seems to be a series of them. I think I found the series of Hex values 00-01-01-0B that seem to correspond with where the line feeds should be.
How do I ignore existing line breaks, and use what I want instead?
This is where I am at:
shortFile = new FileStream(#"tempfile.dat", FileMode.Open, FileAccess.Read);
DisplayArea.Text = "";
byte[] block = new byte[1000];
shortFile.Position = 0;
while (shortFile.Read(block, 0, 1000) > 0)
{
string trimmedText = System.Text.Encoding.Default.GetString(block);
DisplayArea.Text += trimmedText + "\n";
}

I had just figured it out a couple minutes before dlatikay posted, but really appreciated seeing that he also had the right idea. I just replaced all control characters with spaces.
for (int i = 0; i < block.Length; i++)
{
if (block[i] < 32)
{
block[i] = 0x20;
}
}

How to maintain style formatting when merging two ODT documents together

I am working with the AODL library for C#. So far I have been able to wholesale import the text of the second document into the first. The issue is I can't quite figure out what I need to grab to make sure the styling is also moved over to the merged document. Below is the simple code I'm using to test. The closest answer I can find is Merging two .odt files from code, which somewhat answers my question, but it still doesn't tell me where I need to put the styling/ where to get it from. It at least lets me know that I need to go through the styles in the second document and make sure there are not matching names in the first otherwise there will be conflicts. I'm not sure exactly what to do, and documentation has been very slim. Before you suggest anything I would like to let you know that, yes, odt is the filetype I need to work with, and doing any kind of interop stuff like Microsoft does with Word is not what I'm after. If there is another library out there that works similarly to AODL I'm all ears.
TextDocument mergeTemplateDoc = ReadContentsOfFile(mergeTemplateFileName);
TextDocument vehicleTemplateDoc = ReadContentsOfFile(vehicleTemplateFileName);
foreach (IContent piece in vehicleTemplateDoc.Content)
{
XmlNode newNode = mergeTemplateDoc.XmlDoc.ImportNode(piece.Node,true);
Paragraph p = ParagraphBuilder.CreateParagraphWithExistingNode(mergeTemplateDoc, newNode);
mergeTemplateDoc.Content.Add(p);
}
mergeTemplateDoc.SaveTo("MergComplete.odt");

Here is what I ended up doing to solve my issue. Keep in mind I have since migrated to using Java since this question was asked, as the library appears to work a little better in that language.
Essentially what the methods below are doing is Grabbing the Automatic Styles that are generated in each document. It iterates through the second document and finds each style node, checking for the name attribute. That name is then tagged with an extra identifier that is unique to that document, so when they are merged together they won't conflict name wise.
The mergeFontTypesToPrimaryDoc just grabs the fonts that don't already exist in the primary doc since all the fonts are referenced in the same way in the documents there is no editing to be done.
The updateNodeChildrenStyleNames is just a recursive method that I used to make sure I get all the in line style nodes updated to remove any conflicting names between the two documents.
This similar idea should work in C# as well.
private static void mergeStylesToPrimaryDoc(OdfTextDocument primaryDoc, OdfTextDocument secondaryDoc) throws Exception {
OdfFileDom primaryContentDom = primaryDoc.getContentDom();
OdfOfficeAutomaticStyles primaryDocAutomaticStyles = primaryDoc.getContentDom().getAutomaticStyles();
OdfOfficeAutomaticStyles secondaryDocAutomaticStyles = secondaryDoc.getContentDom().getAutomaticStyles();
//Adopt style nodes from secondary doc
for(int i =0; i<secondaryDocAutomaticStyles.getLength();i++){
Node style = secondaryDocAutomaticStyles.item(i).cloneNode(true);
if(style.hasAttributes()){
NamedNodeMap attributes = style.getAttributes();
for(int j=0; j< attributes.getLength();j++){
Node a = attributes.item(j);
if(a.getLocalName().equals("name")){
a.setNodeValue(a.getNodeValue()+_stringToAddToStyle);
}
}
}
if(style.hasChildNodes()){
updateNodeChildrenStyleNames(style, _stringToAddToStyle, "name");
}
primaryDocAutomaticStyles.appendChild(primaryContentDom.adoptNode(style));
}
}
private static void mergeFontTypesToPrimaryDoc(OdfTextDocument primaryDoc, OdfTextDocument secondaryDoc) throws Exception {
//Insert referenced font types that are not in the primary document you are merging into
NodeList sdDomNodes = secondaryDoc.getContentDom().getChildNodes().item(0).getChildNodes();
NodeList pdDomNodes = primaryDoc.getContentDom().getChildNodes().item(0).getChildNodes();
OdfFileDom primaryContentDom = primaryDoc.getContentDom();
Node sdFontNode=null;
Node pdFontNode=null;
for(int i =0; i<sdDomNodes.getLength();i++){
if(sdDomNodes.item(i).getNodeName().equals("office:font-face-decls")){
sdFontNode = sdDomNodes.item(i);
break;
}
}
for(int i =0; i<pdDomNodes.getLength();i++){
Node n =pdDomNodes.item(i);
if(n.getNodeName().equals("office:font-face-decls")){
pdFontNode = pdDomNodes.item(i);
break;
}
}
if(sdFontNode !=null && pdFontNode != null){
NodeList sdFontNodeChildList = sdFontNode.getChildNodes();
NodeList pdFontNodeChildList = pdFontNode.getChildNodes();
List<String> fontNames = new ArrayList<String>();
//Get list of existing fonts in primary doc
for(int i=0; i<pdFontNodeChildList.getLength();i++){
NamedNodeMap attributes = pdFontNodeChildList.item(i).getAttributes();
for(int j=0; j<attributes.getLength();j++){
if(attributes.item(j).getLocalName().equals("name")){
fontNames.add(attributes.item(j).getNodeValue());
}
}
}
//Check each font in the secondary doc to make sure it gets added if the primary doesn't have it
for(int i=0; i<sdFontNodeChildList.getLength();i++){
Node fontNode = sdFontNodeChildList.item(i).cloneNode(true);
NamedNodeMap attributes = fontNode.getAttributes();
String fontName="";
for(int j=0; j< attributes.getLength();j++){
if(attributes.item(j).getLocalName().equals("name")){
fontName = attributes.item(j).getNodeValue();
break;
}
}
if(!fontName.equals("") && !fontNames.contains(fontName)){
pdFontNode.appendChild(primaryContentDom.adoptNode(fontNode));
}
}
}
}
private static void updateNodeChildrenStyleNames(Node n, String stringToAddToStyle, String nodeLocalName){
NodeList childNodes = n.getChildNodes();
for (int i=0; i< childNodes.getLength(); i++){
Node currentChild = childNodes.item(i);
if(currentChild.hasAttributes()){
NamedNodeMap attributes = currentChild.getAttributes();
for(int j =0; j < attributes.getLength(); j++){
Node a = attributes.item(j);
if(a.getLocalName().equals(nodeLocalName)){
a.setNodeValue(a.getNodeValue() + stringToAddToStyle);
}
}
}
if(currentChild.hasChildNodes()){
updateNodeChildrenStyleNames(currentChild, stringToAddToStyle, nodeLocalName);
}
}
}
}

I do not know how precisely it should be coded, but using 7zip i have been able to just copy the whole styles.xml from one file to another. Programatically it should be just as easy.
I always format my files with styles and never with direct formatting. So just replacing any file is prone to eliminate the local styles.
I found this answer (to the question "Cleaning a stylesheet of unused styles") https://www.mobileread.com/forums/showpost.php?s=cbbee08a1204df71ec5cd88bcf222253&p=2100914&postcount=13
which iterates through all the styles in one document. It doesn't show how to incorporate one into the other, but the backbone is clear.
'---------------------------------------------------------- 03/02/2012
' Supprimer les styles personnalisés inutilisés
' d'un document texte ou d'un classeur
'---------------------------------------------------------------------
sub stylesPersoInutiles()
dim coStylesFamilles as object, oStyleFamille as object
dim oStyle as object, nomFamille as string
dim f as long, x as long
dim ts(), buf as string, iRet as integer
const SEP = ", "
coStylesFamilles = thisComponent.StyleFamilies
for f = 0 to coStylesFamilles.count -1
' Pour chaque famille
nomFamille = coStylesFamilles.elementNames(f)
oStyleFamille = coStylesFamilles.getByName(nomFamille)
buf = ""
for x = 0 to oStyleFamille.Count -1
' Pour chaque style
oStyle = oStyleFamille(x)
'xray oStyle
if (oStyle.isUserDefined) and (not oStyle.isInUse) then
buf = buf & oStyle.name & SEP
end if
next x
if len(buf) > len(SEP) then
buf = left(buf, len(buf) - len(SEP))
iRet = msgBox("Styles personnalisés non utilisés : " _
& chr(13) & buf & chr(13) & chr(13) _
& "Faut-il les détruire ?", 4+32+256, nomFamille)
if iRet = 6 then
ts = split(buf, SEP)
for x = 0 to uBound(ts)
oStyleFamille.removeByName(ts(x))
next x
end if
end if
next f
end sub

C# Best way to parse flat file with dynamic number of fields per row

I have a flat file that is pipe delimited and looks something like this as example
ColA|ColB|3*|Note1|Note2|Note3|2**|A1|A2|A3|B1|B2|B3
The first two columns are set and will always be there.
* denotes a count for how many repeating fields there will be following that count so Notes 1 2 3
** denotes a count for how many times a block of fields are repeated and there are always 3 fields in a block.
This is per row, so each row may have a different number of fields.
Hope that makes sense so far.
I'm trying to find the best way to parse this file, any suggestions would be great.
The goal at the end is to map all these fields into a few different files - data transformation. I'm actually doing all this within SSIS but figured the default components won't be good enough so need to write own code.
UPDATE I'm essentially trying to read this like a source file and do some lookups and string manipulation to some of the fields in between and spit out several different files like in any normal file to file transformation SSIS package.
Using the above example, I may want to create a new file that ends up looking like this
"ColA","HardcodedString","Note1CRLFNote2CRLF","ColB"
And then another file
Row1: "ColA","A1","A2","A3"
Row2: "ColA","B1","B2","B3"
So I guess I'm after some ideas on how to parse this as well as storing the data in either Stacks or Lists or?? to play with and spit out later.

One possibility would be to use a stack. First you split the line by the pipes.
var stack = new Stack<string>(line.Split('|'));
Then you pop the first two from the stack to get them out of the way.
stack.Pop();
stack.Pop();
Then you parse the next element: 3* . For that you pop the next 3 items on the stack. With 2** you pop the next 2 x 3 = 6 items from the stack, and so on. You can stop as soon as the stack is empty.
while (stack.Count > 0)
{
// Parse elements like 3*
}
Hope this is clear enough. I find this article very useful when it comes to String.Split().

Something similar to below should work (this is untested)
ColA|ColB|3*|Note1|Note2|Note3|2**|A1|A2|A3|B1|B2|B3
string[] columns = line.Split('|');
List<string> repeatingColumnNames = new List<string();
List<List<string>> repeatingFieldValues = new List<List<string>>();
if(columns.Length > 2)
{
int repeatingFieldCountIndex = columns[2];
int repeatingFieldStartIndex = repeatingFieldCountIndex + 1;
for(int i = 0; i < repeatingFieldCountIndex; i++)
{
repeatingColumnNames.Add(columns[repeatingFieldStartIndex + i]);
}
int repeatingFieldSetCountIndex = columns[2 + repeatingFieldCount + 1];
int repeatingFieldSetStartIndex = repeatingFieldSetCountIndex + 1;
for(int i = 0; i < repeatingFieldSetCount; i++)
{
string[] fieldSet = new string[repeatingFieldCount]();
for(int j = 0; j < repeatingFieldCountIndex; j++)
{
fieldSet[j] = columns[repeatingFieldSetStartIndex + j + (i * repeatingFieldSetCount))];
}
repeatingFieldValues.Add(new List<string>(fieldSet));
}
}

System.IO.File.ReadAllLines("File.txt").Select(line => line.Split(new[] {'|'}))

How to parse a numbered sequence from a List of filenames?

I would like to automatically parse a range of numbered sequences from an already sorted List<FileData> of filenames by checking which part of the filename changes.
Here is an example (file extension has already been removed):
First filename: IMG_0000
Last filename: IMG_1000
Numbered Range I need: 0000 and 1000
Except I need to deal with every possible type of file naming convention such as:
0000 ... 9999
20080312_0000 ... 20080312_9999
IMG_0000 - Copy ... IMG_9999 - Copy
8er_green3_00001 .. 8er_green3_09999
etc.
I would like the entire 0-padded range e.g. 0001 not just 1
The sequence number is 0-padded e.g. 0001
The sequence number can be located anywhere e.g. IMG_0000 - Copy
The range can start and end with anything i.e. doesn't have to start with 1 and end with 9999
Numbers may appear multiple times in the filename of the sequence e.g. 20080312_0000
Whenever I get something working for 8 random test cases, the 9th test breaks everything and I end up re-starting from scratch.
I've currently been comparing only the first and last filenames (as opposed to iterating through all filenames):
void FindRange(List<FileData> files, out string startRange, out string endRange)
{
string firstFile = files.First().ShortName;
string lastFile = files.Last().ShortName;
...
}
Does anyone have any clever ideas? Perhaps something with Regex?

If you're guaranteed to know the files end with the number (eg. _\d+), and are sorted, just grab the first and last elements and that's your range. If the filenames are all the same, you can sort the list to get them in order numerically. Unless I'm missing something obvious here -- where's the problem?

Use a regex to parse out the numbers from the filenames:
^.+\w(\d+)[^\d]*$
From these parsed strings, find the maximum length, and left-pad any that are less than the maximum length with zeros.
Sort these padded strings alphabetically. Take the first and last from this sorted list to give you your min and max numbers.

Firstly, I will assume that the numbers are always zero-padded so that they are the same length. If not then bigger headaches lie ahead.
Secondly, assume that the file names are exactly the same apart from the increment number component.
If these assumptions are true then the algorithm should be to look at each character in the first and last filenames to determine which same-positioned characters do not match.
var start = String.Empty;
var end = String.Empty;
for (var index = 0; index < firstFile.Length; index++)
{
char c = firstFile[index];
if (filenames.Any(filename => filename[index] != c))
{
start += firstFile[index];
end += lastFile[index];
}
}
// convert to int if required
edit: Changed to check every filename until a difference is found. Not as efficient as it could be but very simple and straightforward.

Here is my solution. It works with all of the examples that you have provided and it assumes the input array to be sorted.
Note that it doesn't look exclusively for numbers; it looks for a consistent sequence of characters that might differ across all of the strings. So if you provide it with {"0000", "0001", "0002"} it will hand back "0" and "2" as the start and end strings, since that's the only part of the strings that differ. If you give it {"0000", "0010", "0100"}, it will give you back "00" and "10".
But if you give it {"0000", "0101"}, it will whine since the differing parts of the string are not contiguous. If you would like this behavior modified so it will return everything from the first differing character to the last, that's fine; I can make that change. But if you are feeding it a ton of filenames that will have sequential changes to the number region, this should not be a problem.
public static class RangeFinder
{
public static void FindRange(IEnumerable<string> strings,
out string startRange, out string endRange)
{
using (var e = strings.GetEnumerator()) {
if (!e.MoveNext())
throw new ArgumentException("strings", "No elements.");
if (e.Current == null)
throw new ArgumentException("strings",
"Null element encountered at index 0.");
var template = e.Current;
// If an element in here is true, it means that index differs.
var matchMatrix = new bool[template.Length];
int index = 1;
string last = null;
while (e.MoveNext()) {
if (e.Current == null)
throw new ArgumentException("strings",
"Null element encountered at index " + index + ".");
last = e.Current;
if (last.Length != template.Length)
throw new ArgumentException("strings",
"Element at index " + index + " has incorrect length.");
for (int i = 0; i < template.Length; i++)
if (last[i] != template[i])
matchMatrix[i] = true;
}
// Verify the matrix:
// * There must be at least one true value.
// * All true values must be consecutive.
int start = -1;
int end = -1;
for (int i = 0; i < matchMatrix.Length; i++) {
if (matchMatrix[i]) {
if (end != -1)
throw new ArgumentException("strings",
"Inconsistent match matrix; no usable pattern discovered.");
if (start == -1)
start = i;
} else {
if (start != -1 && end == -1)
end = i;
}
}
if (start == -1)
throw new ArgumentException("strings",
"Strings did not vary; no usable pattern discovered.");
if (end == -1)
end = matchMatrix.Length;
startRange = template.Substring(start, end - start);
endRange = last.Substring(start, end - start);
}
}
}

Fast and efficient way to read a space separated file of numbers into an array?

I need a fast and efficient method to read a space separated file with numbers into an array. The files are formatted this way:
4 6
1 2 3 4 5 6
2 5 4 3 21111 101
3 5 6234 1 2 3
4 2 33434 4 5 6
The first row is the dimension of the array [rows columns]. The lines following contain the array data.
The data may also be formatted without any newlines like this:
4 6
1 2 3 4 5 6 2 5 4 3 21111 101 3 5 6234 1 2 3 4 2 33434 4 5 6
I can read the first line and initialize an array with the row and column values. Then I need to fill the array with the data values. My first idea was to read the file line by line and use the split function. But the second format listed gives me pause, because the entire array data would be loaded into memory all at once. Some of these files are in the 100 of MBs. The second method would be to read the file in chunks and then parse them piece by piece. Maybe somebody else has a better a way of doing this?

What's your usage pattern for the data once it's loaded? Do you generally need to touch every array element or will you just make sparse/random access?
If you need to touch most array elements, loading it into memory will probably be the best way to go.
If you need to just access certain elements, you might want to lazy load the elements that you need into memory. One strategy would be to determine which of the two layouts the file uses (with/without newline) and create an algorithm to load a particular element directly from disk as needed (seek the given file offset, read and parse). To efficiently re-access the same element it could make sense to keep the element, once read, in a dictionary indexed by the offset. Check the dictionary first before going to the file for a particular value.
On general principal I would take the simple route unless your testing proves that you need to go a more complicated route (avoid premature optimization).

Read the file a character at a time. If it's whitespace, start a new number. If it's a digit, use it.
for numbers with multiple digits, keep a counter variable:
int counter = 0;
while (fileOpen) {
char ch = readChar(); // use your imagination to define this method.
if (isDigit(ch)) {
counter *= 10;
counter += asciiToDecimal(ch);
} else if (isWhitespace(ch)) {
appendToArray(counter);
counter = 0;
} else {
// Error?
}
}
Edited for clarification.

How about:
static void Main()
{
// sample data
File.WriteAllText("my.data", #"4 6
1 2 3 4 5 6
2 5 4 3 21111 101
3 5 6234 1 2 3
4 2 33434 4 5 6");
using (Stream s = new BufferedStream(File.OpenRead("my.data")))
{
int rows = ReadInt32(s), cols = ReadInt32(s);
int[,] arr = new int[rows, cols];
for(int y = 0 ; y < rows ; y++)
for (int x = 0; x < cols; x++)
{
arr[y, x] = ReadInt32(s);
}
}
}
private static int ReadInt32(Stream s)
{ // edited to improve handling of multiple spaces etc
int b;
// skip any preceeding
while ((b = s.ReadByte()) >= 0 && (b < '0' || b > '9')) { }
if (b < 0) throw new EndOfStreamException();
int result = b - '0';
while ((b = s.ReadByte()) >= '0' && b <= '9')
{
result = result * 10 + (b - '0');
}
return result;
}
Actually, this isn't very specific about the delimiters - it'll pretty much assume that anything that isn't an integer is a delimiter, and it only supports ASCII (you use use a reader if you need other encodings).

Unless the machine you're parsing these text files on is limited, files of a few hundred MB should still fit in memory. I'd suggest going with your first approach of reading by line and using split.
If memory becomes an issue, your second approach of reading in chunks should work fine.
Basically what I'm saying is just to implement it and measure if performance is a problem.

Lets assume we've read the entire file into a string.
You say the first two are rows and columns, so what we definitely need is to parse the numbers.
After that, we can take the first two, create our data structure, and fill it accordingly.
var fileData = File.ReadAllText(...).Split(' ');
var convertedToNumbers = fileData.Select(entry => int.Parse(entry));
int rows = convertedToNumbers.First();
int columns = convertedToNumbers.Skip(1).First();
// Now we have the number of rows, number of columns, and the data.
int[,] resultData = new int[rows, columns];
// Skipping over rows and columns values.
var indexableData = convertedToNumbers.Skip(2).ToList();
for(int i=0; i<rows; i++)
for(int j=0; j<columns; j++)
resultData[i, j] = inedexableData[i*rows + j];
An alternative would be to read the first two from a stream, initialize the array, and then read n values at a time, which would be complicated. Also, it's best to keep files open for the shortest time possible.

you want to stream the file into memory and parse as you go.
private IEnumerable<String> StreamAsSpaceDelimited(this StreamReader reader)
{
StringBuilder builder = new StringBuilder();
int v;
while((v = reader.Read()) != -1)
{
char c = (char) v;
if(Char.IsWhiteSpace(c))
{
if(builder.Length >0)
{
yield return builder.ToString();
builder.Clear();
}
}
else
{
builder.Append(c);
}
}
yield break;
}
this will parse the file into a collection of space delimited strings (lazily) and then you can read them as doubles just like :
using(StreamReader sr = new StreamReader("filename"))
{
var nums = sr.StreamAsSpaceDelimited().Select(s => int.Parse(s));
var enumerator = nums.GetEnumerator();
enumerator.MoveNext();
int numRows = enumerator.Current;
enumerator.MoveNext();
int numColumns = enumerator.current;
int r =0, c = 0;
int[][] destArray = new int[numRows][numColumns];
while(enumerator.MoveNext())
{
destArray[r][c] = enumerator.Current;
c++;
if(c == numColumns)
{
c = 0;
r++;
if(r == numRows)
break;//we are done
}
}
because we use iterators this should never read more than a few chars at a time. this is a common approach used to parse large files (for example this is how LINQ2CSV works).

Here are two methods
IEnumerable<int[]> GetArrays(string filename, bool skipFirstLine)
{
using (StreamReader reader = new StreamReader(filename))
{
if (skipFirstLine && !reader.EndOfStream)
reader.ReadLine();
while (!reader.EndOfStream)
{
string temp = reader.ReadLine();
int[] array = temp.Trim().Split().Select(s => int.Parse(s)).ToArray();
yield return array;
}
}
}
int[][] GetAllArrays(string filename, bool skipFirstLine)
{
int skipNumber = 0;
if (skipFirstLine )
skipNumber = 1;
int[][] array = File.ReadAllLines(filename).Skip(skipNumber).Select(line => line.Trim().Split().Select(s => int.Parse(s)).ToArray()).ToArray();
return array;
}
If you're dealing with large files, the first would likely be preferrable. If files are small, then the second can load the entire thing into a jagged array.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Reading the Huffman encoding tree in C# - c#

Related

Custom Newline in Binary Stream using Hex Array in WPF

How to maintain style formatting when merging two ODT documents together

C# Best way to parse flat file with dynamic number of fields per row

How to parse a numbered sequence from a List of filenames?

Fast and efficient way to read a space separated file of numbers into an array?

Categories

Resources