Splitting up a text file is incredibly slow - c#

I have a database in a text file, split up by certain words (NUM/OPP/TUM/YUR/etc) to know where each piece of data should go. The issue is that running this is very slow, it would take about an hour to get through all three thousand things in the .txt file, most likely due to the .Split() (I think).
Is there a faster way to do this?
wholepage = streamReader.ReadToEnd();
while (true)
{
tempword = wholepage.Split()[tempnum];
tempnum++;
if (lastword == "NUM")
{
things[number_of_things].num = tempword;
number_of_things++;
slist.Add(new string[] { tempword });
listBox5.Items.Add(tempword);
}
lastword = tempword;
}
Thanks in advance.
Edit: Thanks for the help guys...and yes it was infinite loop, which didn't matter at the time since it would never make it through the loop once (unless you waited an hour).

Yes, split your input only once, on once every time loop is executed:
wholepage = streamReader.ReadToEnd();
var split = wholepage.Split();
while (true)
{
tempword = split[tempnum];
// (...)
}
And btw. you doesn't stop your loop so it potentially never ends (well, actually it does, when index is greater than number of items in array and exception is thrown). You should probably use foreach instead of while:
wholepage = streamReader.ReadToEnd();
var split = wholepage.Split();
foreach(var tempword in split)
{
if(lastword == "NUM")
{
things[number_of_things].num = tempword;
number_of_things++;
slist.Add(new string[] { tempword });
listBox5.Items.Add(tempword);
}
lastword = tempword;
}

Don't repeatedly Split!
Take this outside the loop...
myPage = wholepage.Split();
Then in your loop:
tempword=myPage[tempnum]
Also, instead of while(true), which will just keep looping, use a for loop
where myPage.length < tempnum

Related

adding a string to an array

I'm just doing a little project in C# (I'm a beginner), my code is basically asking you "how many words are in this sentence?" and then asks you for every word, once it gets all of them it prints it out with "ba" attached to every word.
I know I'm a real beginner and my code's probably a joke but could you please help me out with this one?
Console.WriteLine("How many words are in this sentence?");
int WordAmount = Convert.ToInt32(Console.ReadLine());
int i = 1;
while (i <= WordAmount)
{
Console.WriteLine("Enter a word");
string[] word = new string[] { Console.ReadLine() };
i++;
}
Console.WriteLine(word + "ba");
You're close, you've just got one issue.
string[] word = new string[] { Console.ReadLine() };
You are creating a new array list inside the scope of a while loop. Not only will this disappear every loop, meaning you never save the old words, but you also won't be able to use it outside of the loop, making it useless.
Create a string[] words = new string[WordAmount];. Then iterate through it to add your Console.ReadLine() to it, and finally, iterate through it once more and Console.WriteLine(words[i] + "ba");
string[] wordList = new string[WordAmount];
while (i <= WordAmount)
{
Console.WriteLine("Enter a word");
wordList[i-1] = Console.ReadLine() ;
i++;
}
foreach (var item in wordList)
Console.WriteLine(item + "ba");
Working Fiddle: https://dotnetfiddle.net/7UJKwN
your code has multiple issues. First you need to define your array outside of your while loop, and then fill it one by one.
In order to read/write array of strings (string[]), you need to loop through (iterate) it.
My code actually iterates your wordList. In the first While loop I am iterating to fill the wordList array. then printing it in the second loop
First of all, consider storing your words in some kind of collection, for example a list.
List<string> words = new List<string>();
while (i <= WordAmount)
{
Console.WriteLine("Enter a word");
string word = Console.ReadLine();
words.Add(word);
i++;
}
I don't think your code compiles - the reason is you are trying to use the word variable outside of the scope that it is defined in. In my solution I have declared and initialized a list of strings (so list of the words in this case) outside of the scope where user has to input words, it is possible to access it in the inner scope (the area between curly brackets where user enters the words).
To print all the words, you have to iterate over the list and add a "ba" part. Something like this:
foreach(var word in words)
{
Console.WriteLine(word + "ba");
}
Or more concisely:
words.ForEach(o => Console.WriteLine(o + "ba"));
If you want to print the sentence without using line breaks, you can use LINQ:
var wordsWithBa = words.Select(o => o + "ba ").Aggregate((a, b) => a + b);
Console.WriteLine(wordsWithBa);
Although I would recommend learning LINQ after you are a bit more familiarized with C# :)
You can look here and here to familiarize yourself with the concept of collections and scopes of variables.
You could also use a StringBuilder class to do this task (my LINQ method is not very efficient if it comes to memory, but i believe it is enough for your purpose).

Stop a loop before a predictable error happens

So I need to count lines in a textbox, to do this i use:
if (subject.Length <= 20)
{
bool moreLines = true;
int onLine = 2;
int count = 0;
while (moreLines)
{
textBody[count] = TextBox_Text.Text.Split('\n')[onLine];
Console.WriteLine("Line saved: " + textBody[count]);
onLine++;
count++;
try
{
if (TextBox_Text.Text.Split('\n')[onLine] == null)
{
}
}
catch (IndexOutOfRangeException)
{
moreLines = false;
}
}
return true;
}
I insert the split strings into textBody[] array but once I approach the last lines where there is no text I want the loop to stop. I tried to do an if statement which checks if the next line is null, and if yes stop the loop. However, I kept getting an IndexOutOfRangeException so I just put the whole thing in a try catch, but I feel like there would be an easier way to do this?
I think you might have over complicated things massively.
The String.Split method have multiple overloads, some of them takes as an argument a member of the StringSplitOptions enum - one of it's members is called None, and the other is called RemoveEmptyEntries - so as far as I understand, all you need is this:
var textBody = TextBox_Text.Text.Split(
new char[] {'\n'},
StringSplitOptions.RemoveEmptyEntries);
An easy way to do this would just to use the following:
TextBox_Text.Text.Split('\n').Length
The Length property returns the length of the array.
so I just used the LineCount property instead and done a compare to the onLine
if (TextBox_Text.LineCount >= onLine)
{
moreLines = false;
}

Replace character at specific index in List<string>, but indexer is read only [duplicate]

This question already has answers here:
Is there an easy way to change a char in a string in C#?
(8 answers)
Closed 5 years ago.
This is kind of a basic question, but I learned programming in C++ and am just transitioning to C#, so my ignorance of the C# methods are getting in my way.
A client has given me a few fixed length files and they want the 484th character of every odd numbered record, skipping the first one (3, 5, 7, etc...) changed from a space to a 0. In my mind, I should be able to do something like the below:
static void Main(string[] args)
{
List<string> allLines = System.IO.File.ReadAllLines(#"C:\...").ToList();
foreach(string line in allLines)
{
//odd numbered logic here
line[483] = '0';
}
...
//write to new file
}
However, the property or indexer cannot be assigned to because it is read only. All my reading says that I have not set a setter for the variable, and I have tried what was shown at this SO article, but I am doing something wrong every time. Should what is shown in that article work? Should I do something else?
You cannot modify C# strings directly, because they are immutable. You can convert strings to char[], modify it, then make a string again, and write it to file:
File.WriteAllLines(
#"c:\newfile.txt"
, File.ReadAllLines(#"C:\...").Select((s, index) => {
if (index % 2 = 0) {
return s; // Even strings do not change
}
var chars = s.ToCharArray();
chars[483] = '0';
return new string(chars);
})
);
Since strings are immutable, you can't modify a single character by treating it as a char[] and then modify a character at a specific index. However, you can "modify" it by assigning it to a new string.
We can use the Substring() method to return any part of the original string. Combining this with some concatenation, we can take the first part of the string (up to the character you want to replace), add the new character, and then add the rest of the original string.
Also, since we can't directly modify the items in a collection being iterated over in a foreach loop, we can switch your loop to a for loop instead. Now we can access each line by index, and can modify them on the fly:
for(int i = 0; i < allLines.Length; i++)
{
if (allLines[i].Length > 483)
{
allLines[i] = allLines[i].Substring(0, 483) + "0" + allLines[i].Substring(484);
}
}
It's possible that, depending on how many lines you're processing and how many in-line concatenations you end up doing, there is some chance that using a StringBuilder instead of concatenation will perform better. Here is an alternate way to do this using a StringBuilder. I'll leave the perf measuring to you...
var sb = new StringBuilder();
for (int i = 0; i < allLines.Length; i++)
{
if (allLines[i].Length > 483)
{
sb.Clear();
sb.Append(allLines[i].Substring(0, 483));
sb.Append("0");
sb.Append(allLines[i].Substring(484));
allLines[i] = sb.ToString();
}
}
The first item after the foreach (string line in this case) is a local variable that has no scope outside the loop - that’s why you can’t assign a value to it. Try using a regular for loop instead.
Purpose of for each is meant to iterate over a container. It's read only in nature. You should use regular for loop. It will work.
static void Main(string[] args)
{
List<string> allLines = System.IO.File.ReadAllLines(#"C:\...").ToList();
for (int i=0;i<=allLines.Length;++i)
{
if (allLines[i].Length > 483)
{
allLines[i] = allLines[i].Substring(0, 483) + "0";
}
}
...
//write to new file
}

Can't find string in input file

I have a text file, which I am trying to insert a line of code into. Using my linked-lists I believe I can avoid having to take all the data out, sort it, and then make it into a new text file.
What I did was come up with the code below. I set my bools, but still it is not working. I went through debugger and what it seems to be going on is that it is going through the entire list (which is about 10,000 lines) and it is not finding anything to be true, so it does not insert my code.
Why or what is wrong with this code?
List<string> lines = new List<string>(File.ReadAllLines("Students.txt"));
using (StreamReader inFile = new StreamReader("Students.txt", true))
{
string newLastName = "'Constant";
string newRecord = "(LIST (LIST 'Constant 'Malachi 'D ) '1234567890 'mdcant#mail.usi.edu 4.000000 )";
string line;
string lastName;
bool insertionPointFound = false;
for (int i = 0; i < lines.Count && !insertionPointFound; i++)
{
line = lines[i];
if (line.StartsWith("(LIST (LIST "))
{
values = line.Split(" ".ToCharArray());
lastName = values[2];
if (newLastName.CompareTo(lastName) < 0)
{
lines.Insert(i, newRecord);
insertionPointFound = true;
}
}
}
if (!insertionPointFound)
{
lines.Add(newRecord);
}
You're just reading the file into memory and not committing it anywhere.
I'm afraid that you're going to have to load and completely re-write the entire file. Files support appending, but they don't support insertions.
you can write to a file the same way that you read from it
string[] lines;
/// instanciate and build `lines`
File.WriteAllLines("path", lines);
WriteAllLines also takes an IEnumerable, so you can past a List of string into there if you want.
one more issue: it appears as though you're reading your file twice. one with ReadAllLines and another with your StreamReader.
There are at least four possible errors.
The opening of the streamreader is not required, you have already read
all the lines. (Well not really an error, but...)
The check for StartsWith can be fooled if you lines starts with blank
space and you will miss the insertionPoint. (Adding a Trim will remove any problem here)
In the CompareTo line you check for < 0 but you should check for == 0. CompareTo returns 0 if the strings are equivalent, however.....
To check if two string are equals you should avoid using CompareTo as
explained in MSDN link above but use string.Equals
List<string> lines = new List<string>(File.ReadAllLines("Students.txt"));
string newLastName = "'Constant";
string newRecord = "(LIST (LIST 'Constant 'Malachi 'D ) '1234567890 'mdcant#mail.usi.edu 4.000000 )";
string line;
string lastName;
bool insertionPointFound = false;
for (int i = 0; i < lines.Count && !insertionPointFound; i++)
{
line = lines[i].Trim();
if (line.StartsWith("(LIST (LIST "))
{
values = line.Split(" ".ToCharArray());
lastName = values[2];
if (newLastName.Equals(lastName))
{
lines.Insert(i, newRecord);
insertionPointFound = true;
}
}
}
if (!insertionPointFound)
lines.Add(newRecord);
I don't list as an error the missing write back to the file. Hope that you have just omitted that part of the code. Otherwise it is a very simple problem.
(However I think that the way in which CompareTo is used is probably the main reason of your problem)
EDIT Looking at your comment below it seems that the answer from Sam I Am is the right one for you. Of course you need to write back the modified array of lines. All the changes are made to an in memory array of lines and nothing is written back to a file if you don't have code that writes a file. However you don't need new file
File.WriteAllLines("Students.txt", lines);

Replace string.Split with other constructs - Optimization

Here I am using Split function to get the parts of string.
string[] OrSets = SubLogic.Split('|');
foreach (string OrSet in OrSets)
{
bool OrSetFinalResult = false;
if (OrSet.Contains('&'))
{
OrSetFinalResult = true;
if (OrSet.Contains('0'))
{
OrSetFinalResult = false;
}
//string[] AndSets = OrSet.Split('&');
//foreach (string AndSet in AndSets)
//{
// if (AndSet == "0")
// {
// // A single "false" statement makes the entire And statement FALSE
// OrSetFinalResult = false;
// break;
// }
//}
}
else
{
if (OrSet == "1")
{
OrSetFinalResult = true;
}
}
if (OrSetFinalResult)
{
// A single "true" statement makes the entire OR statement TRUE
FinalResult = true;
break;
}
}
How can I replace the Split operation , along with replacement of foreach constructs.
Hypothesis #1
Depending of the kind of your process, you can parallellize the work :
var OrSets = SubLogic.Split('|').AsParallel();
foreach (string OrSet in OrSets)
{
...
....
}
However, this can often leads to problems with multithreaded apps (locking resource, etc.).
And you have also to measure the benefits. Switching from one thread to another can be costly. If the job is small, the AsParallel will be slower than a simple sequential loop.
This is very efficient when you have latency with network resource, or any kind of I/O.
Hypothesis #2
Your SubLogic variable is very very very big
You can, in this case, walk sequentially the file :
class Program
{
static void Main(string[] args)
{
var SubLogic = "darere|gfgfgg|gfgfg";
using (var sr = new StringReader(SubLogic))
{
var str = string.Empty;
int charValue;
do
{
charValue = sr.Read();
var c = (char)charValue;
if (c == '|' || (charValue == -1 && str.Length > 0))
{
Process(str);
str = string.Empty; // Reset the string
}
else
{
str += c;
}
} while (charValue >= 0);
}
Console.ReadLine();
}
private static void Process(string str)
{
// Your actual Job
Console.WriteLine(str);
}
Also, depending of the length of each chunk between |, you may want to use a StringBuilder and not a simple string concatenation.
Chances are that if you need to optimize to improve the performance of your application, that the code inside of the foreach loop is what needs to be optimized, not the string.Split method.
[EDIT:]
There are a number of good answers elsewhere on StackOverflow related to optimized string parsing:
Fastest Way to Parse Large Strings (multi threaded)
Fast string parsing in C#
String.Split() likely does more than you can do on your own to actually split the string up in a well-optimized manner. That assumes that you are interesting in returning true or false for each split section of your input, of course. Otherwise, you can just focus on searching your string.
As others have mentioned, if you need to search through a huge string (many hundreds of megabytes) and, especially, do so repeatedly and continuously, then look at what .NET 4 gives you with the Task Parallel Library.
For searching through strings, you can look at this example on MSDN for how to use IndexOf, LastIndexOf, StartsWith, and EndsWith methods. Those should perform better than the Contains method.
Of course, the best solution is dependent upon the facts of your particular situation. You'll want to use the System.Diagnostics.Stopwatch class to see how long your implementations (both current and new) take to see what works best.
You could possibly deal with it by using StringBuilder.
Try reading char-by-char from your source string into StringBuilder, till you find '|', then process what a StringBuilder contains.
That is how you'll avoid creation of tonns of String objects and save a lot of memory.
If you would have used Java, I'd recommend using StringTokenizer and StreamTokenizer classes. It's a pity there are no similar classes in .NET

Categories

Resources