WPF C#: Splitting a long string - c#

I have been thinking about this problem for a long time, but now I managed to request help from those who know. I have a code, that is supposed to read text from a big file (a couple of Gbs) line by line. Every line can be around 500Mb as it must be a video, converted to base64 connected with video name. Here I read current line and separate video name from its' content (start from else).
string[] fileline = GetFileLine(resPath, currentRow).Split(); //Here split causes SystemOutOfMemory
try
{
string base64 = fileline[0].Replace(specSymbol, ' ');
try
{
if (!IsVideo(ref base64) && !IsGif(ref base64))
{
ShowPrimary();
imgFile.Source = BytesToBitmap(Convert.FromBase64String(base64));
}
else
btnLoadFile.Background = readyColor;
if (fileline.Length > 1)
return fileline[1].Replace(specSymbol, ' ');
}
catch (Exception ex3) { MessageBox.Show("Next(4):" + ex3.Message); }
}
catch (Exception ex2) { MessageBox.Show("Next(3):" + ex2.Message); }
So my question is: does the way to split long strings exist or I only have to store names in a separate file without splitting?
UPD1: I have wrote a method using an advice #canton7 gave me. I have tested it on really small files (around 100 symbols), where it works good, but I am testing it now on 25Mb file, and the speed of the reading is awful (like 10Mb in an hour), even though, the reading of really big files didn't make the program to crash, so I think I'm on the right way.
I still wonder if there is a better method. If you have some advice on the ready method improvement - please give it here.
static string ReadFirstHalfAfter(string path, int skips = 0)
{
int skipsDone = 0;
int ri = 0;
char[] buffer = new char[1];
StreamReader reader = new StreamReader(path);
while (reader.Peek() >= 0)//while reader is not at the end of file
{
reader.Read(buffer, ri, 1);//reading one element from the current position
if (skipsDone < skips)//line skips not enough
{
if (buffer[buffer.Length - 1] == '\n')//current symbol is line end
{
skipsDone++;//line skip counted
continue;
}
}
else//enough line skips
{
if (buffer[buffer.Length - 1] == ' ') break; //if line separator - stop
ExpandArray(ref buffer); //adding one more free element
ri++; //switching element to read next
}
if (ri % 10000 == 0) Console.Write('.');
}
return new string(buffer).Trim(' ');
}

To separate string into 2 pieces you can use substring to save memory, but if you want more memory to be saved - there is only one way through writing the line parts in the different rows.

Related

How can I have a .txt file that I read lines from and put them into an array, but every line is split after a certain character

I'm making a console app to navigate my PC.
I have a function called Askforcmd() which lets you write a command. It tests if you wrote a specific thing with ifs and else ifs (what you write is stored in the string "commands").
I'm trying to write all my games in a .txt, seperated by newline, and write the location after (seperated by "^"),
(example:
portal 2^C:/PathOfGame
portal^C:/path
)
and have the code know that if you write the name of a game, it should open the file at the path after (I know how to open the file).
I know how to read from a txt and put that in an array, but how do I make it stop reading the lines after a certain character and store that in a different array?
What I have so far:
else if (lines.Any(commands.Contains))
{
/*Code to check what game to open and at
what path
*/
Askforcmd();
}
else if (commands == "games")
{
Console.Write("\n");
int count = lines.Length;
int numsss = 0;
int ds;
while (numsss != count)
{
ds = numsss + 1;
Console.WriteLine(ds + ": " + lines[numsss]);
numsss++;
}
Askforcmd();
}
When I run the code and write "games", it lists the games with a number before them.
1: Portal 2
2: Portal
etc
You can take the first string you read and run the split command on it.
Array[] newArray = lines[numsss].split('^');
you would get a new array equal to the file name and the path is in the second element.
edit: As per your comment, you have weird requirements. You could do something like this:
//assume your previous array is lines
List<string> temp = new List<string>;
for each (string line in lines)
{
temp.Add(line.split('^')[0]);
temp.Add(line.split('^')[1]);
}
String[] outArray = temp.toArray();

How to split a string into efficient way c#

I have a string like this:
-82.9494547,36.2913021,0
-83.0784938,36.2347521,0
-82.9537782,36.079235,0
I need to have output like this:
-82.9494547 36.2913021, -83.0784938 36.2347521, -82.9537782,36.079235
I have tried this following to code to achieve the desired output:
string[] coordinatesVal = coordinateTxt.Trim().Split(new string[] { ",0" }, StringSplitOptions.None);
for (int i = 0; i < coordinatesVal.Length - 1; i++)
{
coordinatesVal[i] = coordinatesVal[i].Trim();
coordinatesVal[i] = coordinatesVal[i].Replace(',', ' ');
numbers.Append(coordinatesVal[i]);
if (i != coordinatesVal.Length - 1)
{
coordinatesVal.Append(", ");
}
}
But this process does not seem to me the professional solution. Can anyone please suggest more efficient way of doing this?
Your code is okay. You could dismiss temporary results and chain method calls
var numbers = new StringBuilder();
string[] coordinatesVal = coordinateTxt
.Trim()
.Split(new string[] { ",0" }, StringSplitOptions.None);
for (int i = 0; i < coordinatesVal.Length - 1; i++) {
numbers
.Append(coordinatesVal[i].Trim().Replace(',', ' '))
.Append(", ");
}
numbers.Length -= 2;
Note that the last statement assumes that there is at least one coordinate pair available. If the coordinates can be empty, you would have to enclose the loop and this last statement in if (coordinatesVal.Length > 0 ) { ... }. This is still more efficient than having an if inside the loop.
You ask about efficiency, but you don't specify whether you mean code efficiency (execution speed) or programmer efficiency (how much time you have to spend on it).
One key part of professional programming is to judge which one of these is more important in any given situation.
The other answers do a good job of covering programmer efficiency, so I'm taking a stab at code efficiency. I'm doing this at home for fun, but for professional work I would need a good reason before putting in the effort to even spend time comparing the speeds of the methods given in the other answers, let alone try to improve on them.
Having said that, waiting around for the program to finish doing the conversion of millions of coordinate pairs would give me such a reason.
One of the speed pitfalls of C# string handling is the way String.Replace() and String.Trim() return a whole new copy of the string. This involves allocating memory, copying the characters, and eventually cleaning up the garbage generated. Do that a few million times, and it starts to add up. With that in mind, I attempted to avoid as many allocations and copies as possible.
enum CurrentField
{
FirstNum,
SecondNum,
UnwantedZero
};
static string ConvertStateMachine(string input)
{
// Pre-allocate enough space in the string builder.
var numbers = new StringBuilder(input.Length);
var state = CurrentField.FirstNum;
int i = 0;
while (i < input.Length)
{
char c = input[i++];
switch (state)
{
// Copying the first number to the output, next will be another number
case CurrentField.FirstNum:
if (c == ',')
{
// Separate the two numbers by space instead of comma, then move on
numbers.Append(' ');
state = CurrentField.SecondNum;
}
else if (!(c == ' ' || c == '\n'))
{
// Ignore whitespace, output anything else
numbers.Append(c);
}
break;
// Copying the second number to the output, next will be the ,0\n that we don't need
case CurrentField.SecondNum:
if (c == ',')
{
numbers.Append(", ");
state = CurrentField.UnwantedZero;
}
else if (!(c == ' ' || c == '\n'))
{
// Ignore whitespace, output anything else
numbers.Append(c);
}
break;
case CurrentField.UnwantedZero:
// Output nothing, just track when the line is finished and we start all over again.
if (c == '\n')
{
state = CurrentField.FirstNum;
}
break;
}
}
return numbers.ToString();
}
This uses a state machine to treat incoming characters differently depending on whether they are part of the first number, second number, or the rest of the line, and output characters accordingly. Each character is only copied once into the output, then I believe once more when the output is converted to a string at the end. This second conversion could probably be avoided by using a char[] for the output.
The bottleneck in this code seems to be the number of calls to StringBuilder.Append(). If more speed were required, I would first attempt to keep track of how many characters were to be copied directly into the output, then use .Append(string value, int startIndex, int count) to send an entire number across in one call.
I put a few example solutions into a test harness, and ran them on a string containing 300,000 coordinate-pair lines, averaged over 50 runs. The results on my PC were:
String Split, Replace each line (see Olivier's answer, though I pre-allocated the space in the StringBuilder):
6542 ms / 13493147 ticks, 130.84ms / 269862.9 ticks per conversion
Replace & Trim entire string (see Heriberto's second version):
3352 ms / 6914604 ticks, 67.04 ms / 138292.1 ticks per conversion
- Note: Original test was done with 900000 coord pairs, but this entire-string version suffered an out of memory exception so I had to rein it in a bit.
Split and Join (see Ɓukasz's answer):
8780 ms / 18110672 ticks, 175.6 ms / 362213.4 ticks per conversion
Character state machine (see above):
1685 ms / 3475506 ticks, 33.7 ms / 69510.12 ticks per conversion
So, the question of which version is most efficient comes down to: what are your requirements?
Your solution is fine. Maybe you could write it a bit more elegant like this:
string[] coordinatesVal = coordinateTxt.Trim().Split(new string[] { ",0" },
StringSplitOptions.RemoveEmptyEntries);
string result = string.Empty;
foreach (string line in coordinatesVal)
{
string[] numbers = line.Trim().Split(',');
result += numbers[0] + " " + numbers[1] + ", ";
}
result = result.Remove(result.Count()-2, 2);
Note the StringSplitOptions.RemoveEmptyEntries parameter of Split method so you don't have to deal with empty lines into foreach block.
Or you can do extremely short one-liner. Harder to debug, but in simple cases does the work.
string result =
string.Join(", ",
coordinateTxt.Trim().Split(new string[] { ",0" }, StringSplitOptions.RemoveEmptyEntries).
Select(i => i.Replace(",", " ")));
heres another way without defining your own loops and replace methods, or using LINQ.
string coordinateTxt = #" -82.9494547,36.2913021,0
-83.0784938,36.2347521,0
-82.9537782,36.079235,0";
string[] coordinatesVal = coordinateTxt.Replace(",", "*").Trim().Split(new string[] { "*0", Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
string result = string.Join(",", coordinatesVal).Replace("*", " ");
Console.WriteLine(result);
or even
string coordinateTxt = #" -82.9494540,36.2913021,0
-83.0784938,36.2347521,0
-82.9537782,36.079235,0";
string result = coordinateTxt.Replace(Environment.NewLine, "").Replace($",", " ").Replace(" 0", ", ").Trim(new char[]{ ',',' ' });
Console.WriteLine(result);

Index of the line in StreamWriter

I'm using StreamWriter to write into file, but I need the index of line I'm writing to.
int i;
using (StreamWriter s = new StreamWriter("myfilename",true) {
i= s.Index(); //or something that works.
s.WriteLine("text");
}
My only idea is to read the whole file and count the lines. Any better solution?
The definition of a line
The definition of a line index and more specifically a line in a file is denoted by the \n character. Typically (and on Windows moreso) this can be preceded by the carriage return \r character too, but not required and not typically present on Linux or Mac.
Correct Solution
So what you are asking is for the line index at the current position basically means you are asking for the number of \n present before the current position in the file you are writing to, which seems to be the end (appending to the file), so you can think of it as how many lines are in the file.
You can read the stream and count these, with consideration for your machines RAM and to not just read in the entire file into memory. So this would be safe to use on very large files.
// File to read/write
var filePath = #"C:\Users\luke\Desktop\test.txt";
// Write a file with 3 lines
File.WriteAllLines(filePath,
new[] {
"line 1",
"line 2",
"line 3",
});
// Get newline character
byte newLine = (byte)'\n';
// Create read buffer
var buffer = new char[1024];
// Keep track of amount of data read
var read = 0;
// Keep track of the number of lines
var numberOfLines = 0;
// Read the file
using (var streamReader = new StreamReader(filePath))
{
do
{
// Read the next chunk
read = streamReader.ReadBlock(buffer, 0, buffer.Length);
// If no data read...
if (read == 0)
// We are done
break;
// We read some data, so go through each character...
for (var i = 0; i < read; i++)
// If the character is \n
if (buffer[i] == newLine)
// We found a line
numberOfLines++;
}
while (read > 0);
}
The lazy solution
If your files are not that large (large being dependant on your intended machine/device RAM and program as a whole) and you want to just read the entire file into memory (so into your programs RAM) you can do a one liner:
var numberOfLines = File.ReadAllLines(filePath).Length;

C# - Reading Text Files(System IO)

I would like to consecutively read from a text file that is generated by my program. The problem is that after parsing the file for the first time, my program reads the last line of the file before it can begin re-parsing, which causes it to accumulates unwanted data.
3 photos: first is creating tournament and showing points, second is showing text file and the third is showing that TeamA got more 3 points
StreamReader = new StreamReader("Torneios.txt");
torneios = 0;
while (!rd.EndOfStream)
{
string line = rd.ReadLine();
if (line == "Tournament")
{
torneios++;
}
else
{
string[] arr = line.Split('-');
equipaAA = arr[0];
equipaBB = arr[1];
res = Convert.ToChar(arr[2]);
}
}
rd.Close();
That is what I'm using at the moment.
To avoid mistakes like these, I highly recommend using File.ReadAllText or File.ReadAllLines unless you are using large files (in which case they are not good choices), here is an example of an implementation of such:
string result = File.ReadAllText("textfilename.txt");
Regarding your particular code, an example using File.ReadAllLines which achieves this is:
string[] lines = File.ReadAllLines("textfilename.txt");
for(int i = 0; i < lines.Length; i++)
{
string line = lines[i];
//Do whatever you want here
}
Just to make it clear, this is not a good idea if the files you intend to read from are large (such as binary files).

file handling in C# .net

There is a list of things I want to do. I have a forms application.
Go to a particular line. I know how to go in a serial manner, but is there any way by which I can jump to a particular line no.
To find out total no of line.
If the file is not too big, you can try the ReadAllLines.
This reads the whole file, into a string array, where every line is an element of the array.
Example:
var fileName = #"C:\MyFolder\MyFileName.txt";
var contents = System.IO.File.ReadAllLines(fileName);
Console.WriteLine("Line: 10: " + contents[9]);
Console.WriteLine("Number of lines:");
Console.WriteLine(contents.Lenght);
But be aware: This reads in the whole file into memory.
If the file is too big:
Open the file (OpenText), and create a Dictionary to store the offset of every line. Scan every line, and store the offset. Now you can go to every line, and you have the number of lines.
var lineOffset = new Dictionary<int, long>();
using (var rdr = System.IO.File.OpenText(fileName)) {
int lineNr = 0;
lineOffset.Add(0,0);
while (rdr.ReadLine() != null)) {
lineNr++;
lineOffset.Add(lineNr, rdr.BaseStream.Position);
}
// Goto line 10
rdr.BaseStream.Position = lineOffset[10];
var line10 = rdr.ReadLine();
}
This would help for your first point: jump into file line c#

Categories

Resources