How to write millions of double values into a txt file - c#

I've made a neural network and now I need to save the results of the training process into a local file. In total, there are 7,155,264 values. I've tried with a loop like this
string weightsString = "";
string biasesString = "";
for (int l = 1; l < layers.Length; l++)
{
for (int j = 0; j < layers[l].Length; j++)
{
for (int k = 0; k < layers[l - 1].Length; k++)
{
weightsString += weights[l][j, k] + "\n";
}
biasesString += biases[l][j] + "\n";
}
}
File.WriteAllText(#"path", weightsString + "\n" + biasesString);
But it literally takes forever to go through all of the values. Is there no way to write the contents directly without having to write them in a string first?
(Weights is a double[][,] while biases is a double[][])

First of writing down 7 million datasets will obviously take a lot of time.
I'd suggest you split up weights and biases into two files and write them on the fly, no need to store them all in memory until you are done.
using StreamWriter weigthStream = new("weigths.txt", append: true);
using StreamWriter biasStream = new("biases.txt", append: true);
for (int l = 1; l < layers.Length; l++)
{
for (int j = 0; j < layers[l].Length; j++)
{
for (int k = 0; k < layers[l - 1].Length; k++)
{
await weightStream.WriteLineAsync(weights[l][j, k]);
}
await biasStream.WriteLineAsync(biases[l][j]);
}
}

But it literally takes forever to go through all of the values. Is there no way to write the contents directly without having to write them in a string first?
One option would be to save it as binary data. This makes it much harder to read by humans, but for large amount of data this would really be preferable since it will save a lot of time both when reading and writing. For example using BinaryWriter and using unsafe code.
myBinaryWriter.Write(myArray.GetLength(0));
myBinaryWriter.Write(myArray.GetLength(1));
fixed (double* ptr = myArray)
{
var span = new ReadOnlySpan<byte>(ptr, myArray.GetLength(0) *myArray.GetLength(1) * 8);
myBinaryWriter.Write(span);
}
You might also consider using a binary serialization library like protbuf.net that can just take a object an and serialize it to a stream. Note that some libraries may need attributes to be added to classes and properties. Some libraries may also have issues with multidimensional and/or jagged arrays. Because of this it can sometimes be useful to define your own 2D array that uses a 1D array as the backing storage, this can make things like serialization or passing data to other components much simpler.
Another somewhat common practice is to store metadata, like height, width, etc in a simple human readable text-file using something like json or xml. While keeping the actual data in a separate raw binary file.

Bad variant - you can use json serialization
So-so variant - write in file immediately. Use File.AppendText
IMHO the best variant - use DB
IMHO good variant - use BinaryFormatter (you will not be able to read that by yourself, but application will)
Working variant - use StringBuilder

StringBuilder weightsSB = new StringBuilder();
StringBuilder biasesSB = new StringBuilder();
for (int l = 1; l < layers.Length; l++)
{
for (int j = 0; j < layers[l].Length; j++)
{
for (int k = 0; k < layers[l - 1].Length; k++)
{
weightsSB.Append(weights[l][j, k] + "\n");
}
biasesSB.Append(biases[l][j] + "\n");
}
}
As suggested in the comments, I used a StringBuilder instead. Works like a charm.

Related

Printing square with non repetitive character

I want to print a rectangle like this :
&#*#
#*#&
*#&#
#&#*
But problem is that i can't find the algorithm to print this.
I only know how to print a simple rectangle/square
public static void Main(string[] args)
{
Console.Out.Write("Saisir la taille : ");
int taille = int.Parse(Console.In.ReadLine());
int i;
int j;
for(i = 0; i < taille; i++){
for(j = 0; j < taille; j++){
Console.Write("*");
}
Console.WriteLine("");
}
}
Thank you !
First things first unless you need your iterators outside of your loop, just declare them in the for declaration
public static void Main(string[] args)
{
Console.Out.Write("Saisir la taille : ");
int taille = int.Parse(Console.In.ReadLine());
for(int i = 0; i < taille; i++){
for(int j = 0; j < taille; j++){
Console.Write("*");
}
Console.WriteLine("");
}
}
Second you'll need a list of the characters you want to use, given your example
char[] chars = { '&', `#`, `*`, '#' };
and we'll need a way to know which character we want to use at any given time, say an iterator we can call characterIndex for simplicity. We will increment it each iteration. If incrementing it puts it out of the range of our character array, if characterIndex == 4, we set it back to zero.
int characterIndex;
To get the scrolling effect you have, before each line we must select a characterIndex that is offset by the row
characterIndex = i % chars.Length;
Tying it all together
public static void Main(string[] args)
{
char[] chars = { '&', `#`, `*`, '#' };
int characterIndex;
Console.Out.Write("Saisir la taille : ");
int taille = int.Parse(Console.In.ReadLine());
for(int i = 0; i < taille; i++){
characterIndex = i % chars.Length;
for(int j = 0; j < taille; j++){
Console.Write(chars[characterIndex]);
characterIndex++;
if(characterIndex == chars.Length)
characterIndex = 0;
}
Console.WriteLine("");
}
}
Getting the permutations by nesting for loops will only work if you know exactly how many elements there will be. Basically you need to write a for-loop for every element after the 1st.
The proper way to deal with this is Recursion. While there are cases where Recursion and nested for-loops are interchangeable. And in cases where they are, for loops have a potential speed advantage. While normally the speed rant applies to such differences, with the sheer amount of data both Recursion and Loops might have to deal with, it often maters - so best to prefer loops where possible.
Permutations is AFAIK not a case where loops and recursion are interchangeable. Recurions seems to be mandatory. Some problem as simply inherently recursive. As the recursion version is fairly well known, I will not post any example code.
You should defiitely use Recursion. With your example code I basically asume you are:
In a learning environment
You just learned recursion
A input variant recurions can effortless solve (like a 6 or 20 size input), is the next assignment

Calling from a public class and from its public method, into 2d double array, my zeros before the fractional point change to number 9

I've been writing a program, to use it as a tool for quick calculations in an online game, and it also helps me a bit to revise C# for my final exam in IT.
Here's my code:
public class ConvertingToArrays
{
public static double[,] CountryVAT(double[,] vat)
{
//I have a table in a .txt file with 20 rows and 6 columns
//and I only need one of the cols.
vat = new double[20, 1];
string[,] convertTableToString = new string[20, 6];
//Here I'm just calling from ReadFromFile public class and its
//public static string[,] Input method and until this point
//everything works fine
convertTableToString = ReadFromFile.Input(convertTableToString);
for (int i = 0; i < 20; i++)
{
for (int j = 0; j < 1; j++)
{
vat[i, j] = double.Parse(convertTableToString[i, 1]);
}
}
return vat;
}
}
With the string to double converting I had no problem, I tested it and it should not be the cause.
class Program
{
public static void Main(string[] args)
{
double[,] vat = new double[20, 1];
vat = ConvertingToArrays.CountryVAT(vat);
Console.WriteLine("Testing ConvertVAT Method Call");
for (int i = 0; i < 20; i++)
{
for (int j = 0; j < 1; j++)
{
Console.Write(vat[i, j] + '\t');
}
Console.WriteLine('\n');
}
Console.ReadKey();
}
}
I'm reading from a .txt file a few numbers like: 0,03; 0,05; 0,4; 0
And for some reason the output for these numbers are: 9,03; 9,05; 9,4; 9
I've tried to look it up on Google but I found nothing. It might be just one subtle and easy thing that I overlooked accidentally (please keep in mind that I have started learning to code by myself just 6 months before and I've practised it only 10-12 hours a week).
Can anyone help with a solution?
Look at this line:
Console.Write(vat[i, j] + '\t');
What is going on here? You add a double with a char value. You think you do some string operation, but that is not what your code is actually doing. Note that no strings are involved in the operation above. Both the double variable and the char literal are numeric data types, thus your code is executing an addition of the two numeric values.
What is the numeric value of the tabulator char '\t'? It is 9. So basically your code is doing Console.Write(vat[i, j] + 9);
There are different ways to you can change the code. One is to make two Console.Write calls like this:
Console.Write(vat[i, j]);
Console.Write('\t');
Alternatively, you could also force a string concatenation by converting the double value or the tab char to a string before "adding" them:
Console.Write(vat[i, j] + "\t");
or, less elegantly:
Console.Write(vat[i, j].ToString() + '\t');
As a third option you could also use format strings:
Console.Write("{0}{1}", vat[i, j], '\t');
or, simplified:
Console.Write("{0}\t", vat[i, j]);

Utilizing c++ from c#

I currently have some c++ code which processes 'char ***myArray' much faster than any other method for string comparison.
I'm also wrapping my c++ into a DLL and calling functions from a C# GUI which uses a 'DataTable'.
I'm curious how I go about passing my 'DataTable' data accross to my 'char ***myArray'.
Interface.cs:
DataTable table
cppFunctions.cpp:
int CheckColumn(char ***myArray)
{
int k = 0;
double weight = 0;
for (int i = 1; i < RowCount; i++)
{
for (int j = i + 1; j < RowCount; j++)
{
weight = nGram(myArray[i][colNum], myArray[j][colNum], 3);
k++;
}
}
return k;
}
If I pass int, double, string, or any simple value across it works just fine.
DataTable is part of the .NET FCL, so you cannot pass it. The reason int, string, etc work is that they are primitives. You could serialize / de-serialize the DataTable.
Alternatively, you could use marshalling:
http://msdn.microsoft.com/en-us/library/ms235266.aspx

How to use StreamReader in XNA

I have some problems with StreamReader. Firstly, below, is my simple code:
using (StreamReader reader = new StreamReader("Content/Levels/" + mapName + ".txt"))
{
for (int i = 0; i < 20; i++)
for (int j = 0; j < 36; j++)
{
string[] objLoc = reader.ReadLine().Split(',');
map[i, j] = Convert.ToInt32(objLoc[j]);
}
}
So, I have a text file which has rows and columns, just like an array. Each position holds an integer. Those integers are delimited by , chars.
I want to read each character from the position within the text file, and then convert it to an actual integer and add it to a separate array. I'll read from that array to build the map after the code I've shown you.
Being new to C# and programming, I assume that my code actually reads every position from a line using that Split method, and then I use the read char to insert it in the map array.
Am I doing it right? At the moment, I'm getting an exception:
NullReferenceException was unhandled: Object reference not set to an instance of an object.
I've read the documentation from MS also. Stumbled upon numerous similar problems, but none fixed my issue.
Any help would be highly appreciated!
You are reading a whole new line in your inner loop, which means you run out of lines fast. You need to read a new line in the outer loop, and loop throught the result of the split (the inidividual elements) in the inner loop
Try something like
using (StreamReader reader = new StreamReader("Content/Levels/" + mapName + ".txt"))
{
for (int i = 0; i < 20; i++) {
string[] objLoc = reader.ReadLine().Split(',')
for (int j = 0; j < 36; j++) {
map[i, j] = Convert.ToInt32(objLoc[j]);
}
}
}
Note: you will need to check for errors in case the line does not contain enough elements or the file is too short. The conversion to int might fail as well

c# fixed arrays - which structure is fastest to read from?

I have some large arrays of 2D data elements. A and B aren't equally sized dimensions.
A) is between 5 and 20
B) is between 1000 and 100000
The initialization time is no problem as its only going to be lookup tables for realtime application, so performance on indexing elements from knowing value A and B is crucial. The data stored is currently a single byte-value.
I was thinking around these solutions:
byte[A][B] datalist1a;
or
byte[B][A] datalist2a;
or
byte[A,B] datalist1b;
or
byte[B,A] datalist2b;
or perhaps loosing the multidimension as I know the fixed size and just multiply the to values before looking it up.
byte[A*Bmax + B] datalist3;
or
byte[B*Amax + A] datalist4;
What I need is to know, what datatype/array structure to use for most efficient lookup in C# when I have this setup.
Edit 1
the first two solutions were supposed to be multidimensional, not multi arrays.
Edit 2
All data in the smallest dimension is read at each lookup, but the large one is only used for indexing once at a time.
So its something like - Grab all A's from sample B.
I'd bet on the jagged arrays, unless the Amax or Bmax are a power of 2.
I'd say so, because a jagged array needs two indexed accesses, thus very fast. The other forms implies a multiplication, either implicit or explicit. Unless that multiplication is a simple shift, I think could be a bit heavier than a couple of indexed accesses.
EDIT: Here is the small program used for the test:
class Program
{
private static int A = 10;
private static int B = 100;
private static byte[] _linear;
private static byte[,] _square;
private static byte[][] _jagged;
unsafe static void Main(string[] args)
{
//init arrays
_linear = new byte[A * B];
_square = new byte[A, B];
_jagged = new byte[A][];
for (int i = 0; i < A; i++)
_jagged[i] = new byte[B];
//set-up the params
var sw = new Stopwatch();
byte b;
const int N = 100000;
//one-dim array (buffer)
sw.Restart();
for (int i = 0; i < N; i++)
{
for (int r = 0; r < A; r++)
{
for (int c = 0; c < B; c++)
{
b = _linear[r * B + c];
}
}
}
sw.Stop();
Console.WriteLine("linear={0}", sw.ElapsedMilliseconds);
//two-dim array
sw.Restart();
for (int i = 0; i < N; i++)
{
for (int r = 0; r < A; r++)
{
for (int c = 0; c < B; c++)
{
b = _square[r, c];
}
}
}
sw.Stop();
Console.WriteLine("square={0}", sw.ElapsedMilliseconds);
//jagged array
sw.Restart();
for (int i = 0; i < N; i++)
{
for (int r = 0; r < A; r++)
{
for (int c = 0; c < B; c++)
{
b = _jagged[r][c];
}
}
}
sw.Stop();
Console.WriteLine("jagged={0}", sw.ElapsedMilliseconds);
//one-dim array within unsafe access (and context)
sw.Restart();
for (int i = 0; i < N; i++)
{
for (int r = 0; r < A; r++)
{
fixed (byte* offset = &_linear[r * B])
{
for (int c = 0; c < B; c++)
{
b = *(byte*)(offset + c);
}
}
}
}
sw.Stop();
Console.WriteLine("unsafe={0}", sw.ElapsedMilliseconds);
Console.Write("Press any key...");
Console.ReadKey();
Console.WriteLine();
}
}
Multidimensional ([,]) arrays are nearly always the slowest, unless under a heavy random access scenario. In theory they shouldn't be, but it's one of the CLR oddities.
Jagged arrays ([][]) are nearly always faster than multidimensional arrays; even under random access scenarios. These have a memory overhead.
Singledimensional ([]) and algebraic arrays ([y * stride + x]) are the fastest for random access in safe code.
Unsafe code is, normally, fastest in all cases (provided you don't pin it repeatedly).
The only useful answer to "which X is faster" (for all X) is: you have to do performance tests that reflect your requirements.
And remember to consider, in general*:
Maintenance of the program. If this is not a quick one off, a slightly slower but maintainable program is a better option in most cases.
Micro benchmarks can be deceptive. For instance a tight loop just reading from a collection might be optimised away in ways not possible when real work is being done.
Additionally consider that you need to look at the complete program to decide where to optimise. Speeding up a loop by 1% might be useful for that loop, but if it is only 1% of the complete runtime then it is not making much differences.
* But all rules have exceptions.
On most modern computers, arithmetic operations are far, far faster than memory lookups.
If you fetch a memory address that isn't in a cache or where the out of order execution pulls from the wrong place you are looking at 10-100 clocks, a pipelined multiply is 1 clock.
The other issue is cache locality.
byte[BAmax + A] datalist4; seems like the best bet if you are accessing with A's varying sequentially.
When datalist4[bAmax + a] is accessed, the computer will usually start pulling in datalist4[bAmax + a+ 64/sizeof(dataListType)], ... +128 ... etc, or if it detects a reverse iteration, datalist4[bAmax + a - 64/sizeof(dataListType)]
Hope that helps!
May be best way for u will be use HashMap
Dictionary?

Categories

Resources