Generating random questions from a text file - C#

Generating random questions from a text file - C# - c#

I want to provide the user with a selection of questions but I want them to be random, having a quiz game with the same questions isn't exactly fun.
My idea was to store a large collection of questions and there appropiate answers in a text file:
What colour is an Strawberry|Red
How many corners are there on a Triangle|Three
This means that I could simply select a line at random, read the question and answer from the line and store them in a collection to be used in the game.
I've come up with some pseudo code with an approach I think would be useful and am looking for some input as to how it could be improved:
Random rand = new Random();
int line;
string question,answer;
for(int i = 0; i < 20; i++)
{
line = rand.Next();
//Read question at given line number to string
//Read answer at given line number to string
//Copy question and answer to collection
}
In terms of implementing the idea I'm unsure as to how I can specify a line number to read from as well as how to split the entire line and read both parts separately. Unless there's a better way my thoughts are manually entering a line number in the text file followed by a "|" so each line looks like this:
1|What colour is an Strawberry|Red
2|How many corners are there on a Triangle|Three
Thanks for any help!

Why not read the entire file into an array or a list using ReadLine and then refer to a random index within the bounds of the array to pull the question/answer string from, rather than reading from the text file when you want a question.
As for parsing it, just use Split to split it at the | delineator (and make sure that no questions have a | in the question for some reason). This would also let you store some wrong answers with the question (just say that the first one is always right, then when you output it you can randomize the order).

You don't want to display any questions twice, right?
Random random = new Random();
var q = File.ReadAllLines("questions.txt")
.OrderBy(x=>random.Next())
.Take(20)
.Select(x=>x.Split('|'))
.Select(x=>new QuestionAndAnswer(){Question=x[0],Answer=x[1]});

Related

Replace Characters in .txt file with Another Character from a Defined List [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am creating a simple WPF Program that will read and write .txt files for a simple character switcher/scrambler.
I want to be able to create a list of Definitions in a file for every character to replace it with another character that will be used to switch the characters when its ran.
for example:
a = x
b = r
c = e
d = u
and so on
So essentially I to want be able to select a .txt file and run a C# script when my button is pressed that will replace all characters in the .txt file with the predefined definition of it from the list above and save that changed version as a .txt file. I also want to be able to select the changed .txt file and revert it to the original characters if possible when another button is pressed.
I already have the WPF program set up with the buttons and File directory stuff, I need help with the C# implementation of actually making this idea work as I can't find any resources online that are covering what I am describing. Any guidance on where to look and or what to do next would be appreciated.

First you need to read your mapping file and create a map. I pondered a few ways of doing this but easiest is probably to use a dictionary
var config = File.ReadAllLines(...);
//declare these at class level probably, I only put here to make my code work and show the definition
var scramble = new Dictionary<char, char>();
var unscramble = new Dictionary<char, char>();
foreach(var line in config){
var a = line.First();
var b = line.Last();
scramble[a] = b;
unscramble[b] = a;
}
Then you have to process your file. It would be more memory efficient to read char by char using a StreamReader than to read to as a string and convert to char (needs twice the memory) but I'll leave that as an exercise for the reader. Here I take the quick and dirty method of reading the file and then turning it into a char array immediately, but it briefly requires twice the memory as the file is big:
var chars = File.ReadAllText(...).ToArray();
for(int i = 0; i < chars.Length; i++)
if(scramble.ContainsKey(chars[i]))
chars[i] = scramble[chars[i]];
Writing the chars array to disk as the newly scrambled file is an exercise for you. Unscrambling is a similar pattern

Step 1) let's get the text into a format that we can read through.
var FileText = File.ReadAllText(filename.txt).ToArray();
Step 2) we need to decide how to find the characters in the string. A very basic way to achieve this is going to be to loop through the string that we have searching for our characters.
for (int i = 0; i < FileText.length; i++)
{
switch (FileText[i])
{
case 'a':
FileText[i] = 'x';
break;
... // Fill out cases for each substitution
default:
// This character is something we don't care about, so keep looping
continue;
}
}
There are very likely more efficient solutions, but this will at least get the job done while not being too complex.
Edit:
As per Caius' comment, the assignment to a string indexer doesn't work. It needed to be a character array. I tested the change, and it seems to work now. The downside of this solution is it is not configurable, though.

How do I count modified lines of code?

I have a program which counts lines of code (excluding comments, braces, whitespace, etc.) of two programs then compares them. It puts all the lines from one program in one List and the lines from the other program in another List. It then removes all lines that are identical between the two. One List is then all the lines added to program 1 to get program 2 and the other List is all the lines removed from program 1 to get program 2.
Now I need a way to detect how many lines of code from program 1 have been MODIFIED to get program 2. I found an algorithm for the Levenshtein Distance, and it seems like that will work. I just need to compare the distance with the length of the strings to get a percentage changed, and I'll need to come up with a good value for the threshold.
However my problem is this: how do I know which two strings to compare for the Levenshtein Distance? My best guess is to have a nested for loop and loop through one program once for every line in the other program to compare every line with every other line looking for a Distance that meets my difference threshold. However, that seems very inefficient. Are there any other ways of doing this?
I should add this is for a software engineering class. It's technically homework, but we're allowed to use any resource we need. While I'm just looking for an algorithm, I'll let you know I'm using C#.

If you allow lines to be shuffled, how do you count the changes? Not all shuffled lines might result in identical functionality, even if you compare all lines and find exact matches.
If you compare
var random = new Random();
for (int i = 0; i < 9; i++) {
int randomNumber = random.Next(1, 50);
}
to
for (int i = 0; i < 9; i++) {
var random = new Random();
int randomNumber = random.Next(1, 50);
}
you have four unchanged lines of code, but the second version is likely to produce different results. There is definitely a change in the code, and yet line-by-line comparison will not detect it if you allow shuffling.
This is a good reason to disallow shuffling and actually mark line 1 in the first code as deleted, and line 2 in the second code as added, even though the deleted line and the added line are exactly the same.
Once you dicide that lines cannot be shuffled, i think you can figure out quite easily how to match your lines for comparison.
To step through both sources and compare the line you might want to look up the balance line algorithm (e.g http://www.isqa.unomaha.edu/haworth/isqa3300/fs006.htm )

If you suggest that lines of codes are shuffled (their order can be changed) then you need to compare all lines from 1st program to all lines from 2nd program excluding not changed lines.
You can simplify you task suggesting that lines cannot be shuffled. They can be only inserted, removed or unchanged. From my experience most of the programs comparing text files work this way

performance problem

OK so I need to know if anyone can see a way to reduce the number of iterations of these loops because I can't. The first while loop is for going through a file, reading one line at a time. The first foreach loop is then comparing each of the compareSet with what was read in the first while loop. Then the next while loop is to do with bit counting.
As requested, an explaination of my algorithm:
There is a file that is too large to fit in memory. It contains a word followed by the pages in a very large document that this word is on. EG:
sky 1 7 9 32....... (it is not in this format, but you get the idea).
so parseLine reads in the line and converts it into a list of ints that are like a bit array where 1 means the word is on the page, and 0 means it isn't.
CompareSet is a bunch of other words. I can't fit my entire list of words into memory so I can only fit a subset of them. This is a bunch of words just like the "sky" example. I then compare each word in compareSet with Sky by seeing if they are on the same page.
So if sky and some other word both have 1 set at a certain index in the bit array (simulated as an int array for performance), they are on the same page. The algorithm therefore counts the occurances of any two words on a particular page. So in the end I will have a list like:
(for all words in list) is on the same page as (for all words in list) x number of times.
eg sky and land is on the same page x number of times.
while ((line = parseLine(s)) != null) {
getPageList(line.Item2, compareWord);
foreach (Tuple<int, uint[], List<Tuple<int, int>>> word in compareSet) {
unchecked {
for (int i = 0; i < 327395; i++) {
if (word.Item2[i] == 0 || compareWord[i] == 0)
continue;
uint combinedNumber = word.Item2[i] & compareWord[i];
while (combinedNumber != 0) {
actual++;
combinedNumber = combinedNumber & (combinedNumber - 1);
}
}
}

As my old professor Bud used to say: "When you see nested loops like this, your spidey senses should be goin' CRAZY!"
You have a while with a nested for with another while. This nesting of loops is an exponential increase on the order of operations. Your one for loop has 327395 iterations. Assuming they have the same or similar number of iterations, that means you have an order of operations of
327,395 * 327,395 * 327,395 = 35,092,646,987,154,875 (insane)
It's no wonder that things would be slowing down. You need to redefine your algorithm to remove these nested loops or combine work somewhere. Even if the numbers are smaller than my assumptions, the nesting of the loops is creating a LOT of operations that are probably unnecessary.

As Joal already mentioned nobody is able to optimize this looping algorithm. But what you can do is trying to better explain what you are trying to accomplish and what your hard requirements are. Maybe you can take a different approach by using some like HashSet<T>.IntersectWith() or BloomFilter or something like this.
So if you really want help from here you should not only post the code that doesn't work, but also what the overall task is you like to accomplish. Maybe someone has a completely other idea to solve your problem, making your whole algorithm obsolete.

Need help on an algorithm

I need help on an algorithm. I have randomly generated numbers with 6 digits. Like;
123654
109431
There are approximately 1 million of them saved in a file line by line. I have to filter them according to the rule I try to describe below.
Take a number, compare it to all others digit by digit. If a number comes up with a digit with a value of bigger by one to the compared number, then delete it. Let me show it by using numbers.
Our number is: 123456
Increase the first digit with 1, so the number becomes: 223456. Delete all the 223456s from the file.
Increase the second digit by 1, the number becomes: 133456. Delete all 133456s from the file, and so on...
I can do it just as I describe but I need it to be "FAST".
So can anyone help me on this?
Thanks.

First of all, since it is around 1Million you had better perform the algorithm in RAM, not on Disk, that is, first load the contents into an array, then modify the array, then paste the results back into the file.
I would suggest the following algorithm - a straightforward one. Precalculate all the target numbers, in this case 223456, 133456, 124456, 123556, 123466, 123457. Now pass the array and if the number is NOT any of these, write it to another array. Alternatively if it is one of these numbers delete it(recommended if your data structure has O(1) remove)

This algorithm will keep a lot of numbers around in memory, but it will process the file one number at a time so you don't actually need to read it all in at once. You only need to supply an IEnumerable<int> for it to operate on.
public static IEnumerable<int> FilterInts(IEnumerable<int> ints)
{
var removed = new HashSet<int>();
foreach (var i in ints)
{
var iStr = i.ToString("000000").ToCharArray();
for (int j = 0; j < iStr.Length; j++)
{
var c = iStr[j];
if (c == '9')
iStr[j] = '0';
else
iStr[j] = (char)(c + 1);
removed.Add(int.Parse(new string(iStr)));
iStr[j] = c;
}
if (!removed.Contains(i))
yield return i;
}
}
You can use this method to create an IEnumerable<int> from the file:
public static IEnumerable<int> ReadIntsFrom(string path)
{
using (var reader = File.OpenText(path))
{
string line;
while ((line = reader.ReadLine()) != null)
yield return int.Parse(line);
}
}

Take all the numbers from the file to an arrayList, then:
take the number of threads as the number of digits
increment the first digit on the number in first thread, second in the second thread and then compare it with the rest of the numbers,
It would be fast as it will undergo by parallel processing...

All the suggestions (so far) require six comparisons per input line, which is not necessary. The numbers are coming in as strings, so use string comparisons.
Start with #Armen Tsirunyan's idea:
Precalculate all the target numbers,
in this case 223456, 133456, 124456,
123556, 123466, 123457.
But instead of single comparisons, make that into a string:
string arg = "223456 133456 124456 123556 123466 123457";
Then read through the input (either from file or in memory). Pseudocode:
foreach (string s in theBigListOfNumbers)
if (arg.indexOf(s) == -1)
print s;
This is just one comparison per input line, no dictionaries, maps, iterators, etc.
Edited to add:
In x86 instruction set processors (not just the Intel brand), substring searches like this are very fast. To search for a character within a string, for example, is just one machine instruction.
I'll have to ask others to weigh in on alternate architectures.

For starters, I would just read all the numbers into an array.
When you are finally done, rewrite the file.

It seems like the rule you're describing is for the target number abdcef you want to find all numbers that contain a+1, b+1, c+1, d+1, e+1, or f+1 in the appropriate place. You can do this in O(n) by looping over the lines in the file and comparing each of the six digits to the digit in the target number if no digits match, write the number to an output file.

This sounds like a potential case for a multidimensional array, and possibly also unsafe c# code so that you can use pointer math to iterate through such a large quantity of numbers.
I would have to dig into it further, but I would also probably use a Dictionary for non-linear lookups, if you are comparing numbers that aren't in sequence.

How about this. You process numbers one by one. Numbers will be stored in hash tables NumbersOK and NumbersNotOK.
Take one number
If it's not in NumbersNotOK place it in a Hash of NumbersOK
Get it's variances of single number increments in hash - NumbersNotOK.
Remove all of the NumbersOK members if they match any of the variances.
Repeat from 1, untill end of file
Save the NumbersOK to the file.
This way you'll pass the list just once. The hash table is made just for this kind of purposes and it'll be very fast (no expensive comparison methods).
This algorithm is not in full, as it doesn't handle when there are some numbers repeating, but it can be handled with some tweaking...

Read all your numbers from the file and store them in a map where the number is the key and a boolean is the value signifying that the value hasn't been deleted. (True means exists, false means deleted).
Then iterate through your keys. For each key, set the map to false for the values you would be deleting from the list.
Iterate through your list one more time and get all the keys where the value is true. This is the list of remaining numbers.
public List<int> FilterNumbers(string fileName)
{
StreamReader sr = File.OpenTest(fileName);
string s = "";
Dictionary<int, bool> numbers = new Dictionary<int, bool>();
while((s = sr.ReadLine()) != null)
{
int number = Int32.Parse(s);
numbers.Add(number,true);
}
foreach(int number in numbers.Keys)
{
if(numbers[number])
{
if(numbers.ContainsKey(100000+number))
numbers[100000+number]=false;
if(numbers.ContainsKey(10000+number))
numbers[10000+number]=false;
if(numbers.ContainsKey(1000+number))
numbers[1000+number]=false;
if(numbers.ContainsKey(100+number))
numbers[100+number]=false;
if(numbers.ContainsKey(10+number))
numbers[10+number]=false;
if(numbers.ContainsKey(1+number))
numbers[1+number]=false;
}
}
List<int> validNumbers = new List<int>();
foreach(int number in numbers.Keys)
{
validNumbers.Add(number);
}
return validNumbers;
}
This may need to be tested as I don't have a C# compiler on this computer and I'm a bit rusty. The algorithm will take a bit of memory bit it runs in linear time.
** EDIT **
This runs into problems whenever one of the numbers is 9. I'll update the code later.

Still sounds like a homework question... the fastest sort on a million numbers will be n log(n) that is 1000000log(1000000) that is 6*1000000 which is the same as comparing 6 numbers to each of the million numbers. So a direct comparison will be faster than sort and remove, because after sorting you still have to compare to remove. Unless, ofcourse, my calculations have entirely missed the target.
Something else comes to mind. When you pick up the number, read it as hex and not base 10. then maybe some bitwise operators may help somehow.
Still thinking on what can be done using this. Will update if it works
EDIT: currently thinking on the lines of gray code. 123456 (our original number) and 223456 or 133456 will be off only by one digit and a gray code convertor will catch it fast. It's late night here, so if someone else finds this useful and can give a solution...

C# Datatype for large sorted collection with position?

I am trying to compare two large datasets from a SQL query. Right now the SQL query is done externally and the results from each dataset is saved into its own csv file. My little C# console application loads up the two text/csv files and compares them for differences and saves the differences to a text file.
Its a very simple application that just loads all the data from the first file into an arraylist and does a .compare() on the arraylist as each line is read from the second csv file. Then saves the records that don't match.
The application works but I would like to improve the performance. I figure I can greatly improve performance if I can take advantage of the fact that both files are sorted, but I don't know a datatype in C# that keeps order and would allow me to select a specific position. Theres a basic array, but I don't know how many items are going to be in each list. I could have over a million records. Is there a data type available that I should be looking at?

If data in both of your CSV files is already sorted and have the same number of records, you could skip the data structure entirely and do in-place analysis.
StreamReader one = new StreamReader("C:\file1.csv");
StreamReader two = new StreamReader("C:\file2.csv");
String lineOne;
String lineTwo;
StreamWriter differences = new StreamWriter("Output.csv");
while (!one.EndOfStream)
{
lineOne = one.ReadLine();
lineTwo = two.ReadLine();
// do your comparison.
bool areDifferent = true;
if (areDifferent)
differences.WriteLine(lineOne + lineTwo);
}
one.Close();
two.Close();
differences.Close();

System.Collections.Specialized.StringCollection allows you to add a range of values and, using the .IndexOf(string) method, allows you to retrieve the index of that item.
That being said, you could likely just load up a couple of byte[] from a filestream and do byte comparison... don't even worry about loading that stuff into a formal datastructure like StringCollection or string[]; if all you're doing is checking for differences, and you want speed, I would wreckon byte differences are where it's at.

This is an adaptation of David Sokol's code to work with varying number of lines, outputing the lines that are in one file but not the other:
StreamReader one = new StreamReader("C:\file1.csv");
StreamReader two = new StreamReader("C:\file2.csv");
String lineOne;
String lineTwo;
StreamWriter differences = new StreamWriter("Output.csv");
lineOne = one.ReadLine();
lineTwo = two.ReadLine();
while (!one.EndOfStream || !two.EndOfStream)
{
if(lineOne == lineTwo)
{
// lines match, read next line from each and continue
lineOne = one.ReadLine();
lineTwo = two.ReadLine();
continue;
}
if(two.EndOfStream || lineOne < lineTwo)
{
differences.WriteLine(lineOne);
lineOne = one.ReadLine();
}
if(one.EndOfStream || lineTwo < lineOne)
{
differences.WriteLine(lineTwo);
lineTwo = two.ReadLine();
}
}
Standard caveat about code written off the top of my head applies -- you may need to special-case running out of lines in one while the other still has lines, but I think this basic approach should do what you're looking for.

Well, there are several approaches that would work. You could write your own data structure that did this. Or you can try and use SortedList. You can also return the DataSets in code, and then use .Select() on the table. Granted, you would have to do this on both tables.

You can easily use a SortedList to do fast lookups. If the data you are loading is already sorted, insertions into the SortedList should not be slow.

If you are looking simply to see if all lines in FileA are included in FileB you could read it in and just compare streams inside a loop.
File 1
Entry1
Entry2
Entry3
File 2
Entry1
Entry3
You could loop through with two counters and find omissions, going line by line through each file and see if you get what you need.

Maybe I misunderstand, but the ArrayList will maintain its elements in the same order by which you added them. This means you can compare the two ArrayLists within one pass only - just increment the two scanning indices according to the comparison results.

One question I have is have you considered "out-sourcing" your comparison. There are plenty of good diff tools that you could just call out to. I'd be surprised if there wasn't one that let you specify two files and get only the differences. Just a thought.

I think the reason everyone has so many different answers is that you haven't quite got your problem specified well enough to be answered. First off, it depends what kind of differences you want to track. Are you wanting the differences to be output like in a WinDiff where the first file is the "original" and second file is the "modified" so you can list changes as INSERT, UPDATE or DELETE? Do you have a primary key that will allow you to match up two lines as different versions of the same record (when fields other than the primary key are different)? Or is is this some sort of reconciliation where you just want your difference output to say something like "RECORD IN FILE 1 AND NOT FILE 2"?
I think the asnwers to these questions will help everyone to give you a suitable answer to your problem.

If you have two files that are each a million lines as mentioned in your post, you might be using up a lot of memory. Some of the performance problem might be that you are swapping from disk. If you are simply comparing line 1 of file A to line one of file B, line2 file A -> line 2 file B, etc, I would recommend a technique that does not store so much in memory. You could either read write off of two file streams as a previous commenter posted and write out your results "in real time" as you find them. This would not explicitly store anything in memory. You could also dump chunks of each file into memory, say one thousand lines at a time, into something like a List. This could be fine tuned to meet your needs.

To resolve question #1 I'd recommend looking into creating a hash of each line. That way you can compare hashes quick and easy using a dictionary.
To resolve question #2 one quick and dirty solution would be to use an IDictionary. Using itemId as your first string type and the rest of the line as your second string type. You can then quickly find if an itemId exists and compare the lines. This of course assumes .Net 2.0+

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Generating random questions from a text file - C# - c#

You don't want to display any questions twice, right? Random random = new Random(); var q = File.ReadAllLines("questions.txt") .OrderBy(x=>random.Next()) .Take(20) .Select(x=>x.Split('|')) .Select(x=>new QuestionAndAnswer(){Question=x[0],Answer=x[1]});

Related

Replace Characters in .txt file with Another Character from a Defined List [closed]

How do I count modified lines of code?

performance problem

Need help on an algorithm

C# Datatype for large sorted collection with position?

Categories

Resources