Confusing indexing error while going line by line through some text - c#

I'm reading in some text, line by line, and I'd like to tokenize the words and create 1-grams and 2-grams, but I think there's a problem with my indexing because I either get an index error or it'll say that the item I'm trying to modify in my dictionary doesn't exist, which is totally weird, since I wrote the code to first make the dictionary item and if it already exists, to increment a counter.
Basically, my dictionaries are of the form (n-gram string, frequency int)
System.IO.StreamReader lines = new System.IO.StreamReader("myfile");
while (true)
{
string line = lines.ReadLine().ToLower();
if (line == null) break;
if (line.Trim().Length == 0) continue;
string[] tokens = Regex.Split(line, "[^\\w']+");
for (int i = 0; i < tokens.Count()-1; i++)
{
try
{
one_gram.Add(tokens[i], 1);
two_gram.Add(tokens[i] + " " + tokens[i + 1], 1);
}
catch
{
one_gram[tokens[i]]++;
two_gram[tokens[i] + " "+tokens[i + 1]]++;
}
}
}
Can anyone look at my code and tell me where I went wrong? The problem seems to occur at the end of the for loop at the first line, but if I do
for(int i=0;i<tokens.Count()-3;i++)
then the error happens in the second line... but I'm not sure exactly what's causing it.
EDIT: As per suggestions, I tried using the ContainsKey method, but I still get an error near the end of the first line saying that I'm adding a Key that already exists, even though the if statements are supposed to catch that?!
for (int i = 0; i < tokens.Count()-1; i++)
{
if (one_gram.ContainsKey(tokens[i]))
{
one_gram[tokens[i]]++;
}
if (two_gram.ContainsKey(tokens[i] + " " + tokens[i + 1]))
{
two_gram[tokens[i] + " " + tokens[i + 1]]++;
}
one_gram.Add(tokens[i], 1);
two_gram.Add(tokens[i] + " " + tokens[i + 1], 1);
}

You need to use an else (or break):
for (int i = 0; i < tokens.Count() - 1; i++)
{
// Save yourself typing errors by creating variables to hold
// the key values and then you can just use the variable name
var oneGramKey = tokens[i];
var twoGramKey = string.Format("{0} {1}", tokens[i], tokens[i + 1]);
if (one_gram.ContainsKey(oneGramKey))
{
one_gram[oneGramKey]++;
}
else
{
one_gram.Add(oneGramKey, 1);
}
if (two_gram.ContainsKey(twoGramKey))
{
two_gram[twoGramKey]++;
}
else
{
two_gram.Add(twoGramKey, 1);
}
}

Related

How to efficiently edit data within a C# String Array

I'm working on a simple program to edit data within a string array, and have been scratching my head over this for the past few nights. I'm relatively new to C# and would really appreciate some help.
I want to edit a string array into something that looks like this (in theory):
[Section]
Key: Data
Key2: Data
Key3: Data
If the section isn't found, it should be created (along with another line containing the key & data passed to the method). If it is found, it should be checked until the next section (or the end of the file). If the key is not found within the section, it should be created at the end of the section. If it is found, the data of the key should be edited.
What's the best way of doing this? I've tried a few times with some super hacky code and always wind up with something like this:
[Section]
Key3: System.String[]
Sorry if this isn't the best question. I'm relatively new to C#, as I've said, and could really use the help. Thanks.
"edit data within a string array"
string[] myArray = { "one", "two", "three" };
myArray[1] = "nottwo";
Second value (myArray[1]) has changed from two to nottwo.
Now going deeper into the description of your problem...
You have mentioned keys & values, for this you will very likely want to look into Dictionary<TKey,TValue> Class. See reference: https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.dictionary-2?view=netframework-4.8
Example:
Dictionary<int, string> myDictionary = new Dictionary<string, string>();
myDictionary.Add("one", "Hello");
myDictionary.Add("two", "World");
myDictionary.Add("three", "This is");
myDictionary.Add("sandwich", "a Dictionary.");
Console.Writeline(myDictionary["one"]);
Console.Writeline(myDictionary["two"]);
Console.Writeline(myDictionary["three"]);
Console.Writeline(myDictionary["sandwich"]);
I've found some code that works for my use case.
public static string[] SetData(string section, string key, string value, string[] data)
{
var updatedData = data;
int sectionIndex = Array.IndexOf(data, "[" + section + "]");
if(sectionIndex > -1)
{
//section found
for(int i = sectionIndex; i < data.Length; i++)
{
if (data[i].StartsWith(key))
{
//key found
string newData = data[i];
string tempString = newData.Remove(newData.LastIndexOf(":"));
updatedData[i] = tempString + ": " + value;
break;
}
else if (data[i].StartsWith("[") && !data[i].Contains(section))
{
//key not found, end of section reached.
List<string> temp = data.ToList();
temp.Insert(i, key + ": " + value);
updatedData = temp.ToArray();
break;
}
else if (i == data.Length - 1) //-1?
{
//key not found, end of file reached.
List<string> temp = data.ToList();
temp.Insert(i, key + ": " + value);
updatedData = temp.ToArray();
break;
}
}
return updatedData;
}
else
{
//section not found
updatedData = new string[data.Length + 2];
for (int i = 0; i < data.Length; i++)
{
updatedData[i] = data[i];
}
updatedData[updatedData.Length - 2] = "[" + section + "]";
updatedData[updatedData.Length - 1] = key + ": " + value;
return updatedData;
}
}

ForEach In List isn't targetting all

so I currently have a program that adds an item to a list in a format such as
username,Index it then adds one to the index in this code below however. It is only adding one to the item that has been added most recently.
Console.WriteLine("There are currently: " + AntiSpam.Count);
int Index = 0;
foreach (string s in AntiSpam)
{
Console.WriteLine("Found User: " + s.Split(',')[0]);
AntiSpam[Index] = s.Split(',')[0] + "," + (int.Parse(s.Split(',')[1]) + 1).ToString();
Index++;
}
Basically this returns the data There are currently: 10
Found User: someone. It then goes again for another loop of this code and shows the same result again.
EDIT
I have managed to make my code work by using this code
for (var i = 0; i < AntiSpam.Count; i++)
{
AntiSpam[i] = AntiSpam[i].Split(',')[0] + "," + (int.Parse(AntiSpam[i].Split(',')[1]) + 1).ToString();
Console.WriteLine("Text is {0}", AntiSpam[i]);
}
However if possible I would like to know why this works and the first doesn't
If you're going to be indexing a list, just do for rather than foreach. This avoids the need (and possible confusion) of using a separate variable to keep track of the AntiSpam.IndexOf(s) which is basically what you were trying to do with Index:
Console.WriteLine("There are currently: " + AntiSpam.Count);
int index;
string s;
for(int i=0; i < AntiSpam.Count, i++)
{
string[] parts = AntiSpam[i].Split(',');
username = parts[0];
Console.WriteLine("Found User: " + username);
if (parts.Length > 1)
{
index = int.Parse(parts[1])
AntiSpam[i] = username + "," + (index + 1).ToString();
}
}

Splitting String from Settings

I'm trying to read out the contents off a Setting inside my Application. Below is the code i'm having troubles with:
private bool checkGrid()
{
string playlists = Spotify_Extender.Properties.Settings.Default.Playlists;
MessageBox.Show(playlists);
string[] split1;
if (playlists.Contains(";"))
{
MessageBox.Show("Multiple Links");
split1 = playlists.Split(';');
}
else
{
MessageBox.Show("One Link");
split1 = new string[1];
split1[0] = playlists;
}
MessageBox.Show("Array Length: " + split1.Length);
int lines = 0;
for (int i = 0; i < split1.Length; i++)
{
MessageBox.Show("Check #" + i + " - " + split1[i] + " - Length: " + split1[i].Length);
if (split1[i].Length >= 22)
{
MessageBox.Show(i + " - " + split1[1]);
lines++;
}
}
int rows = this.playlistGrid.Rows.Count;
MessageBox.Show(lines + "");
if (rows == lines)
return true;
return false;
}
The code should be easy to understand and it should work as far as i am aware, but it doesn't. I entered this in my Setting:
If i run the program now, my first MessageBox prints out exactly what i entered, the second one prints out "One Link" and the third prints "Array Length: 1". Now we get to the part i'm having troubles with. The next Message is this:
So the length of the text is 22 as displayed in the MessageBox, but down below this statement isn't true:
if (split1[i].Length >= 22)
I'm really confused by this and it also does this when i check this:
if (split1[i] != "")
Any help is appreciated, because i don't know what to do, since my code should be fine. Thanks for your time!
You should have split[i] and not split[1]

How to parse below string in C#?

Please someone to help me to parse these sample string below? I'm having difficulty to split the data and also the data need to add carriage return at the end of every event
sample string:
L,030216,182748,00,FF,I,00,030216,182749,00,FF,I,00,030216,182750,00,FF,I,00
batch of events
expected output:
L,030216,182748,00,FF,I,00 - 1st Event
L,030216,182749,00,FF,I,00 - 2nd Event
L,030216,182750,00,FF,I,00 - 3rd Event
Seems like an easy problem. Something as easy as this should do it:
string line = "L,030216,182748,00,FF,I,00,030216,182749,00,FF,I,00,030216,182750,00,FF,I,00";
string[] array = line.Split(',');
StringBuilder sb = new StringBuilder();
for(int i=0; i<array.Length-1;i+=6)
{
sb.AppendLine(string.Format("{0},{1} - {2} event",array[0],string.Join(",",array.Skip(i+1).Take(6)), "number"));
}
output (sb.ToString()):
L,030216,182748,00,FF,I,00 - number event
L,030216,182749,00,FF,I,00 - number event
L,030216,182750,00,FF,I,00 - number event
All you have to do is work on the function that increments the ordinals (1st, 2nd, etc), but that's easy to get.
This should do the trick, given there are no more L's inside your string, and the comma place is always the sixth starting from the beginning of the batch number.
class Program
{
static void Main(string[] args)
{
String batchOfevents = "L,030216,182748,00,FF,I,00,030216,182749,00,FF,I,00,030216,182750,00,FF,I,00,030216,182751,00,FF,I,00,030216,182752,00,FF,I,00,030216,182753,00,FF,I,00";
// take out the "L," to start processing by finding the index of the correct comma to slice.
batchOfevents = batchOfevents.Substring(2);
String output = "";
int index = 0;
int counter = 0;
while (GetNthIndex(batchOfevents, ',', 6) != -1)
{
counter++;
if (counter == 1){
index = GetNthIndex(batchOfevents, ',', 6);
output += "L, " + batchOfevents.Substring(0, index) + " - 1st event\n";
batchOfevents = batchOfevents.Substring(index + 1);
} else if (counter == 2) {
index = GetNthIndex(batchOfevents, ',', 6);
output += "L, " + batchOfevents.Substring(0, index) + " - 2nd event\n";
batchOfevents = batchOfevents.Substring(index + 1);
}
else if (counter == 3)
{
index = GetNthIndex(batchOfevents, ',', 6);
output += "L, " + batchOfevents.Substring(0, index) + " - 3rd event\n";
batchOfevents = batchOfevents.Substring(index + 1);
} else {
index = GetNthIndex(batchOfevents, ',', 6);
output += "L, " + batchOfevents.Substring(0, index) + " - " + counter + "th event\n";
batchOfevents = batchOfevents.Substring(index + 1);
}
}
output += "L, " + batchOfevents + " - " + (counter+1) + "th event\n";
Console.WriteLine(output);
}
public static int GetNthIndex(string s, char t, int n)
{
int count = 0;
for (int i = 0; i < s.Length; i++)
{
if (s[i] == t)
{
count++;
if (count == n)
{
return i;
}
}
}
return -1;
}
}
Now the output will be in the format you asked for, and the original string has been decomposed.
NOTE: the getNthIndex method was taken from this old post.
If you want to split the string into multiple strings, you need a set of rules,
which are implementable. In your case i would start splitting the complete
string by the given comma , and than go though the elements in a loop.
All the strings in the loop will be appended in a StringBuilder. If your ruleset
say you need a new line, just add it via yourBuilder.Append('\r\n') or use AppendLine.
EDIT
Using this method, you can also easily add new chars like L or at the end rd Event
Look for the start index of 00,FF,I,00 in the entire string.
Extract a sub string starting at 0 and index plus 10 which is the length of the characters in 1.
Loop through it again each time with a new start index where you left of in 2.
Add a new line character each time.
Have a try the following:
string stream = "L,030216,182748,00,FF,I,00, 030216,182749,00,FF,I,00, 030216,182750,00,FF,I,00";
string[] lines = SplitLines(stream, "L", "I", ",");
Here the SplitLines function is implemented to detect variable-length events within the arbitrary-formatted stream:
string stream = "A;030216;182748 ;00;FF;AA;01; 030216;182749;AA;02";
string[] lines = SplitLines(batch, "A", "AA", ";");
Split-rules are:
- all elements of input stream are separated by separator(, for example).
- each event is bounded by the special markers(L and I for example)
- end marker is previous element of event-sequence
static string[] SplitLines(string stream, string startSeq, string endLine, string separator) {
string[] elements = stream.Split(new string[] { separator }, StringSplitOptions.RemoveEmptyEntries);
int pos = 0;
List<string> line = new List<string>();
List<string> lines = new List<string>();
State state = State.SeqStart;
while(pos < elements.Length) {
string current = elements[pos].Trim();
switch(state) {
case State.SeqStart:
if(current == startSeq)
state = State.LineStart;
continue;
case State.LineStart:
if(++pos < elements.Length) {
line.Add(startSeq);
state = State.Line;
}
continue;
case State.Line:
if(current == endLine)
state = State.LineEnd;
else
line.Add(current);
pos++;
continue;
case State.LineEnd:
line.Add(endLine);
line.Add(current);
lines.Add(string.Join(separator, line));
line.Clear();
state = State.LineStart;
continue;
}
}
return lines.ToArray();
}
enum State { SeqStart, LineStart, Line, LineEnd };
f you want to split the string into multiple strings, you need a set of rules, which are implementable. In your case i would start splitting the complete string by the given comma , and than go though the elements in a loop. All the strings in the loop will be appended in a StringBuilder. If your ruleset say you need a new line, just add it via yourBuilder.Append('\r\n') or use AppendLine.

Data lost while adding string to listbox

I am cycling through the contents of a two-dimensional array containing the result of a Punnett Square calculation for gene crosses. I need to summarize the result so that the user can readily see the unique instances. I can accomplish this by putting the result into a text box, but when I try and use a ListBox to display the data, part of the information is getting lost, namely a translation of the AaBBCc type data to something that directly relates to the traits that the user initially selected.
This is the main block of code for the operation:
foreach (string strCombination in arrUniqueCombinations)
{
int intUniqueCount = 0;
decimal decPercentage;
foreach (string strCrossResult in arrPunnettSQ)
{
if (strCrossResult == strCombination)
{
intUniqueCount++;
}
}
decPercentage = Convert.ToDecimal((intUniqueCount*100)) / Convert.ToDecimal(intPossibleCombinations);
txtReport.AppendText(strCombination + " appears " + intUniqueCount.ToString() + " times or " + decPercentage.ToString() + "%."+ Environment.NewLine);
lstCrossResult.Items.Add(DecodeGenome(strCombination) + " appears " + intUniqueCount.ToString() + " times or " + decPercentage.ToString() + "%.");
}
For appending the data to the textbox I use this code and it works perfectly:
txtReport.AppendText(DecodeGenome(strCombination) + " appears " + intUniqueCount.ToString() + " times or " + decPercentage.ToString() + "%."+ Environment.NewLine);
Giving the result:
Trait 1 Het.,Trait 3 appears 16 times or 25%.
For adding the result to a list box, this works:
lstCrossResult.Items.Add(strCombination + " appears " + intUniqueCount.ToString() + " times or " + decPercentage.ToString() + "%.");
Giving the result:
AaBBCc appears 16 times or 25%.
But the contents of strCombination is AaBBCc and I need it translated to "Trait 1 Het.,Trait 3", which I accomplish with this bit of code:
private string DecodeGenome(string strGenome)
{
string strTranslation = "";
int intLength = strGenome.Length;
int intCounter = intLength / 2;
string[] arrPairs = new string[intLength / 2];
//Break out trait pairs and load into array
for (int i = 1; i <= intLength; i++)
{
arrPairs[i / 2] = strGenome.Substring((i-1),2);
i++;
}
foreach (string strPair in arrPairs)
{
char chFirstLetter = strPair[0];
char chSecondLetter = strPair[1];
intCounter = intCounter - 1;
if (Char.IsUpper(chFirstLetter))
{
if (!Char.IsUpper(chSecondLetter))
{
if (intCounter > 0)
{
txtReport.AppendText(GetDescription(strPair.Substring(0, 1)) + " Het.,");
}
else
{
txtReport.AppendText(GetDescription(strPair.Substring(0, 1)));
}
}
}
else
{
if (!Char.IsUpper(chSecondLetter))
{
if (intCounter > 0)
{
txtReport.AppendText(GetDescription(strPair.Substring(0, 1)) + ",");
}
else
{
txtReport.AppendText(GetDescription(strPair.Substring(0, 1)));
}
}
}
}
return strTranslation;
}
That has no problem displaying in a text box, but when I try and put it as an item into a list box it turns it into null. Instead of:
"Trait 1 Het.,Trait 3 appears 16 times or 25%."
I get:
" appears 16 times or 25%."
I have tried adding the results to an ArrayList, then populating the listbox after everything is processed, but the result is the same.
Any clues as to why the list box is not accepting the translated AaBBCc information would be greatly appreciated.
strTranslation is never set. Everything is pushed to txtReport.AppendText

Categories

Resources