Pig Latin Translator spitting out multiple lines? C# - c#
So I have a Pig Latin Translator that supports multiple words. But whenever I enter some words (For this we'll just use "banana apple shears chocolate Theodore train" for example.) It will spit out the translated words correctly but it makes repeats! Here is my code:
namespace Pig_Latin_Translator
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
List<string> vowels = new List<string>();
List<string> specials = new List<string>();
private void TranslateButton_Click(object sender, EventArgs e)
{
String[] parts = TranslateBox.Text.Split();
foreach (string s in specials)
{
if (TranslateBox.Text.Contains(s) || TranslateBox.Text == "\"")
{
TranslateOutput.Text = "";
MessageBox.Show("No Special Characters!", "Warning!", MessageBoxButtons.OK, MessageBoxIcon.Warning);
break;
}
else
{
foreach (String part in parts)
{
foreach (String v in vowels)
{
if (part.Substring(0, 1) == v)
{
TranslateOutput.Text = TranslateOutput.Text + " " + part + "ay";
break;
}
else
{
if (part.Substring(0, 2) == "sh" || part.Substring(0, 2) == "ch" || part.Substring(0, 2) == "th" || part.Substring(0, 2) == "tr")
{
string SwitchP = part.Substring(2) + part.Substring(0, 2);
TranslateOutput.Text = TranslateOutput.Text + " " + SwitchP + "ay";
break;
}
else
{
string Switch = part.Substring(1) + part.Substring(0, 1);
TranslateOutput.Text = TranslateOutput.Text + " " + Switch + "ay";
break;
}
}
}
}
}
}
}
private void Form1_Load(object sender, EventArgs e)
{
vowels.Add("a");
vowels.Add("e");
vowels.Add("i");
vowels.Add("o");
vowels.Add("u");
specials.Add("`");
specials.Add("1");
specials.Add("2");
specials.Add("3");
specials.Add("4");
specials.Add("5");
specials.Add("6");
specials.Add("7");
specials.Add("8");
specials.Add("9");
specials.Add("0");
specials.Add("-");
specials.Add("=");
specials.Add("[");
specials.Add("]");
specials.Add(#"\");
specials.Add(";");
specials.Add("'");
specials.Add(",");
specials.Add(".");
specials.Add("/");
specials.Add("~");
specials.Add("!");
specials.Add("#");
specials.Add("#");
specials.Add("$");
specials.Add("%");
specials.Add("^");
specials.Add("&");
specials.Add("*");
specials.Add("(");
specials.Add(")");
specials.Add("_");
specials.Add("+");
specials.Add("{");
specials.Add("}");
specials.Add("|");
specials.Add(":");
specials.Add("\"");
specials.Add("<");
specials.Add(">");
specials.Add("?");
}
private void AboutButton_Click(object sender, EventArgs e)
{
MessageBox.Show("Pig Latin is a fake language. It works by taking the first letter (Or two if it's a pair like 'th' or 'ch') and bringing it to the end, unless the first letter is a vowel. Then add 'ay' to the end. So 'bus' becomes 'usbay', 'thank' becomes 'ankthay' and 'apple' becomes 'appleay'.", "About:", MessageBoxButtons.OK, MessageBoxIcon.Information);
}
}
}
Which outputs if you typed in "banana apple shears chocolate Theodore train":
"ananabay appleay earsshay ocolatechay heodoreTay aintray" repeating over 10 times.
BTW: Sorry if you can't answer because I know there is LOTS of code. But it doesn't matter because the thing still is useful. It's just that it shouldn't happen and get on my nerves. And I know there is still many glitches and MUCH more to do but I want to get this resolved first.
You are nesting your code in two loops that it shouldn't be nested in
foreach (string s in specials)
and
foreach (String v in vowels)
Your break statements are getting you out of trouble for one, but not the other.
You can avoid these loops entirely is you use the .Any(...) predicate.
Here's what your code could look like:
private void TranslateButton_Click(object sender, EventArgs e)
{
TranslateOutput.Text = "";
if (specials.Any(s => TranslateBox.Text.Contains(s)))
{
MessageBox.Show("No Special Characters!", "Warning!", MessageBoxButtons.OK, MessageBoxIcon.Warning);
}
else
{
String[] parts = TranslateBox.Text.Split();
foreach (var part in parts)
{
var index = 1;
if (vowels.Any(v => part.Substring(0, 1).ToLower() == v))
{
index = 0;
}
else if (new [] { "sh", "ch", "th", "tr", }.Contains(part.Substring(0, 2).ToLower()))
{
index = 2;
}
TranslateOutput.Text += " " + part.Substring(index) + part.Substring(0, index);
}
}
TranslateOutput.Text = TranslateOutput.Text.TrimEnd();
}
This brings it down to the one foreach loop that you actually need.
You can also get your code to initalize vowels and specials down to this:
vowels.AddRange("aeiou".Select(x => x.ToString()));
specials.AddRange(#"`1234567890-=[]\;',./~!##$%^&*()_+{}|:""<>?".Select(x => x.ToString()));
You are iterating through your words once for each special character. Your foreach to go through your words and translate is inside of your foreach to check to see if the textbox contains any special characters.
In otherwords, you are going to do your translation once per special character.
You'll want to move your foreach (String part in parts) out of your foreach (string s in specials)
You have a bit of a logic problem in your loops.
Your outer loop:
foreach( string s in specials ) {
...is looping through all 42 characters in your special characters list.
Your inner loop
foreach( String part in parts ) {
...is then executed 42 times. So for your six word example you're actually doing your pig latin conversion 252 times.
If you extract the inner loop from the outer, your results are better. Like this:
foreach( string s in specials ) {
if( TranslateBox.Text.Contains( s ) || TranslateBox.Text == "\"" ) {
TranslateOutput.Text = "";
MessageBox.Show( "No Special Characters!", "Warning!", MessageBoxButtons.OK, MessageBoxIcon.Warning );
return;
}
}
String[] parts = TranslateBox.Text.Split();
foreach( String part in parts ) {
foreach( String v in vowels ) {
if( part.Substring( 0, 1 ) == v ) {
TranslateOutput.Text = TranslateOutput.Text + " " + part + "ay";
break;
}
else {
if( part.Substring( 0, 2 ) == "sh" || part.Substring( 0, 2 ) == "ch" || part.Substring( 0, 2 ) == "th" || part.Substring( 0, 2 ) == "tr" ) {
string SwitchP = part.Substring( 2 ) + part.Substring( 0, 2 );
TranslateOutput.Text = TranslateOutput.Text + " " + SwitchP + "ay";
break;
}
else {
string Switch = part.Substring( 1 ) + part.Substring( 0, 1 );
TranslateOutput.Text = TranslateOutput.Text + " " + Switch + "ay";
break;
}
}
}
}
A somewhat more concise implementation would be:
private void TranslateButton_Click( object sender, EventArgs e )
{
char[] specials = "`1234567890-=[]\";',./~!##$%^&*()_+{}|:\\<>?".ToArray();
char[] vowels = "aeiou".ToArray();
TranslateOutput.Text = String.Empty;
if( TranslateBox.Text.IndexOfAny( specials ) > -1 ) {
MessageBox.Show( "No Special Characters!", "Warning!", MessageBoxButtons.OK, MessageBoxIcon.Warning );
return;
}
String[] parts = TranslateBox.Text.Split();
foreach( String part in parts ) {
int firstVowel = part.IndexOfAny( vowels );
if( firstVowel > 0 ) {
TranslateOutput.Text += part.Substring( firstVowel ) + part.Substring( 0, firstVowel ) + "ay ";
}
else {
TranslateOutput.Text += part + "ay ";
}
}
TranslateOutput.Text = TranslateOutput.Text.TrimEnd();
}
In this example, I create two character arrays for the specials and the vowels. I can then leverage the framework's IndexOfAny method to search for any of the characters in the array and return the index of the first occurrence. That will find the first special, if any, in the first loop and the first vowel in the second loop. Once I have that character index from the word I can parse the word into pig Latin. Note that I'm checking for zero as the vowel index since, in pig Latin a leading vowel stays where it is and the "ay" is just appended to the end of the word.
Related
Split a string if delimiter is between single quotes [duplicate]
This question already has answers here: How to split csv whose columns may contain comma (9 answers) Closed 4 years ago. I have the following comma-separated string that I need to split. The problem is that some of the content is within quotes and contains commas that shouldn't be used in the split. String: 111,222,"33,44,55",666,"77,88","99" I want the output: 111 222 33,44,55 666 77,88 99 I have tried this: (?:,?)((?<=")[^"]+(?=")|[^",]+) But it reads the comma between "77,88","99" as a hit and I get the following output: 111 222 33,44,55 666 77,88 , 99
Depending on your needs you may not be able to use a csv parser, and may in fact want to re-invent the wheel!! You can do so with some simple regex (?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*) This will do the following: (?:^|,) = Match expression "Beginning of line or string ," (\"(?:[^\"]+|\"\")*\"|[^,]*) = A numbered capture group, this will select between 2 alternatives: stuff in quotes stuff between commas This should give you the output you are looking for. Example code in C# static Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled); public static string[] SplitCSV(string input) { List<string> list = new List<string>(); string curr = null; foreach (Match match in csvSplit.Matches(input)) { curr = match.Value; if (0 == curr.Length) { list.Add(""); } list.Add(curr.TrimStart(',')); } return list.ToArray(); } private void button1_Click(object sender, RoutedEventArgs e) { Console.WriteLine(SplitCSV("111,222,\"33,44,55\",666,\"77,88\",\"99\"")); } Warning As per #MrE's comment - if a rogue new line character appears in a badly formed csv file and you end up with an uneven ("string) you'll get catastrophic backtracking (https://www.regular-expressions.info/catastrophic.html) in your regex and your system will likely crash (like our production system did). Can easily be replicated in Visual Studio and as I've discovered will crash it. A simple try/catch will not trap this issue either. You should use: (?:^|,)(\"(?:[^\"])*\"|[^,]*) instead
Fast and easy: public static string[] SplitCsv(string line) { List<string> result = new List<string>(); StringBuilder currentStr = new StringBuilder(""); bool inQuotes = false; for (int i = 0; i < line.Length; i++) // For each character { if (line[i] == '\"') // Quotes are closing or opening inQuotes = !inQuotes; else if (line[i] == ',') // Comma { if (!inQuotes) // If not in quotes, end of current string, add it to result { result.Add(currentStr.ToString()); currentStr.Clear(); } else currentStr.Append(line[i]); // If in quotes, just add it } else // Add any other character to current string currentStr.Append(line[i]); } result.Add(currentStr.ToString()); return result.ToArray(); // Return array of all strings } With this string as input : 111,222,"33,44,55",666,"77,88","99" It will return : 111 222 33,44,55 666 77,88 99
i really like jimplode's answer, but I think a version with yield return is a little bit more useful, so here it is: public IEnumerable<string> SplitCSV(string input) { Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled); foreach (Match match in csvSplit.Matches(input)) { yield return match.Value.TrimStart(','); } } Maybe it's even more useful to have it like an extension method: public static class StringHelper { public static IEnumerable<string> SplitCSV(this string input) { Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled); foreach (Match match in csvSplit.Matches(input)) { yield return match.Value.TrimStart(','); } } }
This regular expression works without the need to loop through values and TrimStart(','), like in the accepted answer: ((?<=\")[^\"]*(?=\"(,|$)+)|(?<=,|^)[^,\"]*(?=,|$)) Here is the implementation in C#: string values = "111,222,\"33,44,55\",666,\"77,88\",\"99\""; MatchCollection matches = new Regex("((?<=\")[^\"]*(?=\"(,|$)+)|(?<=,|^)[^,\"]*(?=,|$))").Matches(values); foreach (var match in matches) { Console.WriteLine(match); } Outputs 111 222 33,44,55 666 77,88 99
None of these answers work when the string has a comma inside quotes, as in "value, 1", or escaped double-quotes, as in "value ""1""", which are valid CSV that should be parsed as value, 1 and value "1", respectively. This will also work with the tab-delimited format if you pass in a tab instead of a comma as your delimiter. public static IEnumerable<string> SplitRow(string row, char delimiter = ',') { var currentString = new StringBuilder(); var inQuotes = false; var quoteIsEscaped = false; //Store when a quote has been escaped. row = string.Format("{0}{1}", row, delimiter); //We add new cells at the delimiter, so append one for the parser. foreach (var character in row.Select((val, index) => new {val, index})) { if (character.val == delimiter) //We hit a delimiter character... { if (!inQuotes) //Are we inside quotes? If not, we've hit the end of a cell value. { Console.WriteLine(currentString); yield return currentString.ToString(); currentString.Clear(); } else { currentString.Append(character.val); } } else { if (character.val != ' ') { if(character.val == '"') //If we've hit a quote character... { if(character.val == '\"' && inQuotes) //Does it appear to be a closing quote? { if (row[character.index + 1] == character.val) //If the character afterwards is also a quote, this is to escape that (not a closing quote). { quoteIsEscaped = true; //Flag that we are escaped for the next character. Don't add the escaping quote. } else if (quoteIsEscaped) { quoteIsEscaped = false; //This is an escaped quote. Add it and revert quoteIsEscaped to false. currentString.Append(character.val); } else { inQuotes = false; } } else { if (!inQuotes) { inQuotes = true; } else { currentString.Append(character.val); //...It's a quote inside a quote. } } } else { currentString.Append(character.val); } } else { if (!string.IsNullOrWhiteSpace(currentString.ToString())) //Append only if not new cell { currentString.Append(character.val); } } } } }
With minor updates to the function provided by "Chad Hedgcock". Updates are on: Line 26: character.val == '\"' - This can never be true due to the check made on Line 24. i.e. character.val == '"' Line 28: if (row[character.index + 1] == character.val) added !quoteIsEscaped to escape 3 consecutive quotes. public static IEnumerable<string> SplitRow(string row, char delimiter = ',') { var currentString = new StringBuilder(); var inQuotes = false; var quoteIsEscaped = false; //Store when a quote has been escaped. row = string.Format("{0}{1}", row, delimiter); //We add new cells at the delimiter, so append one for the parser. foreach (var character in row.Select((val, index) => new {val, index})) { if (character.val == delimiter) //We hit a delimiter character... { if (!inQuotes) //Are we inside quotes? If not, we've hit the end of a cell value. { //Console.WriteLine(currentString); yield return currentString.ToString(); currentString.Clear(); } else { currentString.Append(character.val); } } else { if (character.val != ' ') { if(character.val == '"') //If we've hit a quote character... { if(character.val == '"' && inQuotes) //Does it appear to be a closing quote? { if (row[character.index + 1] == character.val && !quoteIsEscaped) //If the character afterwards is also a quote, this is to escape that (not a closing quote). { quoteIsEscaped = true; //Flag that we are escaped for the next character. Don't add the escaping quote. } else if (quoteIsEscaped) { quoteIsEscaped = false; //This is an escaped quote. Add it and revert quoteIsEscaped to false. currentString.Append(character.val); } else { inQuotes = false; } } else { if (!inQuotes) { inQuotes = true; } else { currentString.Append(character.val); //...It's a quote inside a quote. } } } else { currentString.Append(character.val); } } else { if (!string.IsNullOrWhiteSpace(currentString.ToString())) //Append only if not new cell { currentString.Append(character.val); } } } } }
For Jay's answer, if you use a 2nd boolean then you can have nested double-quotes inside single-quotes and vice-versa. private string[] splitString(string stringToSplit) { char[] characters = stringToSplit.ToCharArray(); List<string> returnValueList = new List<string>(); string tempString = ""; bool blockUntilEndQuote = false; bool blockUntilEndQuote2 = false; int characterCount = 0; foreach (char character in characters) { characterCount = characterCount + 1; if (character == '"' && !blockUntilEndQuote2) { if (blockUntilEndQuote == false) { blockUntilEndQuote = true; } else if (blockUntilEndQuote == true) { blockUntilEndQuote = false; } } if (character == '\'' && !blockUntilEndQuote) { if (blockUntilEndQuote2 == false) { blockUntilEndQuote2 = true; } else if (blockUntilEndQuote2 == true) { blockUntilEndQuote2 = false; } } if (character != ',') { tempString = tempString + character; } else if (character == ',' && (blockUntilEndQuote == true || blockUntilEndQuote2 == true)) { tempString = tempString + character; } else { returnValueList.Add(tempString); tempString = ""; } if (characterCount == characters.Length) { returnValueList.Add(tempString); tempString = ""; } } string[] returnValue = returnValueList.ToArray(); return returnValue; }
The original version Currently I use the following regex: public static Regex regexCSVSplit = new Regex(#"(?x:( (?<FULL> (^|[,;\t\r\n])\s* ( (?<QUODAT> (?<QUO>[""'])(?<DAT>([^,;\t\r\n]|(?<!\k<QUO>\s*)[,;\t\r\n])*)\k<QUO>) | (?<QUODAT> (?<DAT> [^""',;\s\r\n]* )) ) (?=\s*([,;\t\r\n]|$)) ) | (?<FULL> (^|[\s\t\r\n]) ( (?<QUODAT> (?<QUO>[""'])(?<DAT> [^""',;\s\t\r\n]* )\k<QUO>) | (?<QUODAT> (?<DAT> [^""',;\s\t\r\n]* )) ) (?=[,;\s\t\r\n]|$) ) ))", RegexOptions.Compiled); This solution can handle pretty chaotic cases too like below: This is how to feed the result into an array: var data = regexCSVSplit.Matches(line_to_process).Cast<Match>(). Select(x => x.Groups["DAT"].Value).ToArray(); See this example in action HERE Note: The regular expression contains two set of <FULL> block and each of them contains two <QUODAT> block separated by "or" (|). Depending on your task you may only need one of them. Note: That this regular expression gives us one string array, and works on single line with or without <carrier return> and/or <line feed>. Simplified version The following regular expression will already cover many complex cases: public static Regex regexCSVSplit = new Regex(#"(?x:( (?<FULL> (^|[,;\t\r\n])\s* (?<QUODAT> (?<QUO>[""'])(?<DAT>([^,;\t\r\n]|(?<!\k<QUO>\s*)[,;\t\r\n])*)\k<QUO>) (?=\s*([,;\t\r\n]|$)) ) ))", RegexOptions.Compiled); See this example in action: HERE It can process complex, easy and empty items too: This is how to feed the result into an array: var data = regexCSVSplit.Matches(line_to_process).Cast<Match>(). Select(x => x.Groups["DAT"].Value).ToArray(); The main rule here is that every item may contain anything but the <quotation mark><separators><comma> sequence AND each item shall being and end with the same <quotation mark>. <quotation mark>: <">, <'> <comma>: <,>, <;>, <tab>, <carrier return>, <line feed> Edit notes: I added some more explanation to make it easier to understand and replaces the text "CO" with "QUO".
Try this: string s = #"111,222,""33,44,55"",666,""77,88"",""99"""; List<string> result = new List<string>(); var splitted = s.Split('"').ToList<string>(); splitted.RemoveAll(x => x == ","); foreach (var it in splitted) { if (it.StartsWith(",") || it.EndsWith(",")) { var tmp = it.TrimEnd(',').TrimStart(','); result.AddRange(tmp.Split(',')); } else { if(!string.IsNullOrEmpty(it)) result.Add(it); } } //Results: foreach (var it in result) { Console.WriteLine(it); }
I know I'm a bit late to this, but for searches, here is how I did what you are asking about in C sharp private string[] splitString(string stringToSplit) { char[] characters = stringToSplit.ToCharArray(); List<string> returnValueList = new List<string>(); string tempString = ""; bool blockUntilEndQuote = false; int characterCount = 0; foreach (char character in characters) { characterCount = characterCount + 1; if (character == '"') { if (blockUntilEndQuote == false) { blockUntilEndQuote = true; } else if (blockUntilEndQuote == true) { blockUntilEndQuote = false; } } if (character != ',') { tempString = tempString + character; } else if (character == ',' && blockUntilEndQuote == true) { tempString = tempString + character; } else { returnValueList.Add(tempString); tempString = ""; } if (characterCount == characters.Length) { returnValueList.Add(tempString); tempString = ""; } } string[] returnValue = returnValueList.ToArray(); return returnValue; }
Don't reinvent a CSV parser, try FileHelpers.
I needed something a little more robust, so I took from here and created this... This solution is a little less elegant and a little more verbose, but in my testing (with a 1,000,000 row sample), I found this to be 2 to 3 times faster. Plus it handles non-escaped, embedded quotes. I used string delimiter and qualifiers instead of chars because of the requirements of my solution. I found it more difficult than I expected to find a good, generic CSV parser so I hope this parsing algorithm can help someone. public static string[] SplitRow(string record, string delimiter, string qualifier, bool trimData) { // In-Line for example, but I implemented as string extender in production code Func <string, int, int> IndexOfNextNonWhiteSpaceChar = delegate (string source, int startIndex) { if (startIndex >= 0) { if (source != null) { for (int i = startIndex; i < source.Length; i++) { if (!char.IsWhiteSpace(source[i])) { return i; } } } } return -1; }; var results = new List<string>(); var result = new StringBuilder(); var inQualifier = false; var inField = false; // We add new columns at the delimiter, so append one for the parser. var row = $"{record}{delimiter}"; for (var idx = 0; idx < row.Length; idx++) { // A delimiter character... if (row[idx]== delimiter[0]) { // Are we inside qualifier? If not, we've hit the end of a column value. if (!inQualifier) { results.Add(trimData ? result.ToString().Trim() : result.ToString()); result.Clear(); inField = false; } else { result.Append(row[idx]); } } // NOT a delimiter character... else { // ...Not a space character if (row[idx] != ' ') { // A qualifier character... if (row[idx] == qualifier[0]) { // Qualifier is closing qualifier... if (inQualifier && row[IndexOfNextNonWhiteSpaceChar(row, idx + 1)] == delimiter[0]) { inQualifier = false; continue; } else { // ...Qualifier is opening qualifier if (!inQualifier) { inQualifier = true; } // ...It's a qualifier inside a qualifier. else { inField = true; result.Append(row[idx]); } } } // Not a qualifier character... else { result.Append(row[idx]); inField = true; } } // ...A space character else { if (inQualifier || inField) { result.Append(row[idx]); } } } } return results.ToArray<string>(); } Some test code: //var input = "111,222,\"33,44,55\",666,\"77,88\",\"99\""; var input = "111, 222, \"99\",\"33,44,55\" , \"666 \"mark of a man\"\", \" spaces \"77,88\" \""; Console.WriteLine("Split with trim"); Console.WriteLine("---------------"); var result = SplitRow(input, ",", "\"", true); foreach (var r in result) { Console.WriteLine(r); } Console.WriteLine(""); // Split 2 Console.WriteLine("Split with no trim"); Console.WriteLine("------------------"); var result2 = SplitRow(input, ",", "\"", false); foreach (var r in result2) { Console.WriteLine(r); } Console.WriteLine(""); // Time Trial 1 Console.WriteLine("Experimental Process (1,000,000) iterations"); Console.WriteLine("-------------------------------------------"); watch = Stopwatch.StartNew(); for (var i = 0; i < 1000000; i++) { var x1 = SplitRow(input, ",", "\"", false); } watch.Stop(); elapsedMs = watch.ElapsedMilliseconds; Console.WriteLine($"Total Process Time: {string.Format("{0:0.###}", elapsedMs / 1000.0)} Seconds"); Console.WriteLine(""); Results Split with trim --------------- 111 222 99 33,44,55 666 "mark of a man" spaces "77,88" Split with no trim ------------------ 111 222 99 33,44,55 666 "mark of a man" spaces "77,88" Original Process (1,000,000) iterations ------------------------------- Total Process Time: 7.538 Seconds Experimental Process (1,000,000) iterations -------------------------------------------- Total Process Time: 3.363 Seconds
I once had to do something similar and in the end I got stuck with Regular Expressions. The inability for Regex to have state makes it pretty tricky - I just ended up writing a simple little parser. If you're doing CSV parsing you should just stick to using a CSV parser - don't reinvent the wheel.
Here is my fastest implementation based upon string raw pointer manipulation: string[] FastSplit(string sText, char? cSeparator = null, char? cQuotes = null) { string[] oTokens; if (null == cSeparator) { cSeparator = DEFAULT_PARSEFIELDS_SEPARATOR; } if (null == cQuotes) { cQuotes = DEFAULT_PARSEFIELDS_QUOTE; } unsafe { fixed (char* lpText = sText) { #region Fast array estimatation char* lpCurrent = lpText; int nEstimatedSize = 0; while (0 != *lpCurrent) { if (cSeparator == *lpCurrent) { nEstimatedSize++; } lpCurrent++; } nEstimatedSize++; // Add EOL char(s) string[] oEstimatedTokens = new string[nEstimatedSize]; #endregion #region Parsing char[] oBuffer = new char[sText.Length]; int nIndex = 0; int nTokens = 0; lpCurrent = lpText; while (0 != *lpCurrent) { if (cQuotes == *lpCurrent) { // Quotes parsing lpCurrent++; // Skip quote nIndex = 0; // Reset buffer while ( (0 != *lpCurrent) && (cQuotes != *lpCurrent) ) { oBuffer[nIndex] = *lpCurrent; // Store char lpCurrent++; // Move source cursor nIndex++; // Move target cursor } } else if (cSeparator == *lpCurrent) { // Separator char parsing oEstimatedTokens[nTokens++] = new string(oBuffer, 0, nIndex); // Store token nIndex = 0; // Skip separator and Reset buffer } else { // Content parsing oBuffer[nIndex] = *lpCurrent; // Store char nIndex++; // Move target cursor } lpCurrent++; // Move source cursor } // Recover pending buffer if (nIndex > 0) { // Store token oEstimatedTokens[nTokens++] = new string(oBuffer, 0, nIndex); } // Build final tokens list if (nTokens == nEstimatedSize) { oTokens = oEstimatedTokens; } else { oTokens = new string[nTokens]; Array.Copy(oEstimatedTokens, 0, oTokens, 0, nTokens); } #endregion } } // Epilogue return oTokens; }
Try this private string[] GetCommaSeperatedWords(string sep, string line) { List<string> list = new List<string>(); StringBuilder word = new StringBuilder(); int doubleQuoteCount = 0; for (int i = 0; i < line.Length; i++) { string chr = line[i].ToString(); if (chr == "\"") { if (doubleQuoteCount == 0) doubleQuoteCount++; else doubleQuoteCount--; continue; } if (chr == sep && doubleQuoteCount == 0) { list.Add(word.ToString()); word = new StringBuilder(); continue; } word.Append(chr); } list.Add(word.ToString()); return list.ToArray(); }
This is Chad's answer rewritten with state based logic. His answered failed for me when it came across """BRAD""" as a field. That should return "BRAD" but it just ate up all the remaining fields. When I tried to debug it I just ended up rewriting it as state based logic: enum SplitState { s_begin, s_infield, s_inquotefield, s_foundquoteinfield }; public static IEnumerable<string> SplitRow(string row, char delimiter = ',') { var currentString = new StringBuilder(); SplitState state = SplitState.s_begin; row = string.Format("{0}{1}", row, delimiter); //We add new cells at the delimiter, so append one for the parser. foreach (var character in row.Select((val, index) => new { val, index })) { //Console.WriteLine("character = " + character.val + " state = " + state); switch (state) { case SplitState.s_begin: if (character.val == delimiter) { /* empty field */ yield return currentString.ToString(); currentString.Clear(); } else if (character.val == '"') { state = SplitState.s_inquotefield; } else { currentString.Append(character.val); state = SplitState.s_infield; } break; case SplitState.s_infield: if (character.val == delimiter) { /* field with data */ yield return currentString.ToString(); state = SplitState.s_begin; currentString.Clear(); } else { currentString.Append(character.val); } break; case SplitState.s_inquotefield: if (character.val == '"') { // could be end of field, or escaped quote. state = SplitState.s_foundquoteinfield; } else { currentString.Append(character.val); } break; case SplitState.s_foundquoteinfield: if (character.val == '"') { // found escaped quote. currentString.Append(character.val); state = SplitState.s_inquotefield; } else if (character.val == delimiter) { // must have been last quote so we must find delimiter yield return currentString.ToString(); state = SplitState.s_begin; currentString.Clear(); } else { throw new Exception("Quoted field not terminated."); } break; default: throw new Exception("unknown state:" + state); } } //Console.WriteLine("currentstring = " + currentString.ToString()); } This is a lot more lines of code than the other solutions, but it is easy to modify to add edge cases.
Make every other a-z letter Upper / Lower case, ignoring whitespace
Can somebody tell me what I am doing wrong please? can't seem to get the expected output, i.e. ignore whitespace and only upper/lowercase a-z characters regardless of the number of whitespace characters my code: var sentence = "dancing sentence"; var charSentence = sentence.ToCharArray(); var rs = ""; for (var i = 0; i < charSentence.Length; i++) { if (charSentence[i] != ' ') { if (i % 2 == 0 && charSentence[i] != ' ') { rs += charSentence[i].ToString().ToUpper(); } else if (i % 2 == 1 && charSentence[i] != ' ') { rs += sentence[i].ToString().ToLower(); } } else { rs += " "; } } Console.WriteLine(rs); Expected output: DaNcInG sEnTeNcE Actual output: DaNcInG SeNtEnCe
I use flag instead of i because (as you mentioned) white space made this algorithm work wrong: var sentence = "dancing sentence"; var charSentence = sentence.ToCharArray(); var rs = ""; var flag = true; for (var i = 0; i < charSentence.Length; i++) { if (charSentence[i] != ' ') { if (flag) { rs += charSentence[i].ToString().ToUpper(); } else { rs += sentence[i].ToString().ToLower(); } flag = !flag; } else { rs += " "; } } Console.WriteLine(rs);
Try a simple Finite State Automata with just two states (upper == true/false); another suggestion is to use StringBuilder: private static string ToDancing(string value) { if (string.IsNullOrEmpty(value)) return value; bool upper = false; StringBuilder sb = new StringBuilder(value.Length); foreach (var c in value) if (char.IsLetter(c)) sb.Append((upper = !upper) ? char.ToUpper(c) : char.ToLower(c)); else sb.Append(c); return sb.ToString(); } Test var sentence = "dancing sentence"; Console.Write(ToDancing(sentence)); Outcome DaNcInG sEnTeNcE
I think you should declare one more variable called isUpper. Now you have two variables, i indicates the index of the character that you are iterating next and isUpper indicates whether a letter should be uppercase. You increment i as usual, but set isUpper to true at first: // before the loop boolean isUpper = true; Then, rather than checking whether i is divisible by 2, check isUpper: if (isUpper) { rs += charSentence[i].ToString().ToUpper(); } else { rs += sentence[i].ToString().ToLower(); } Immediately after the above if statement, "flip" isUpper: isUpper = !isUpper;
Linq version var sentence = "dancing sentence"; int i = 0; string result = string.Concat(sentence.Select(x => { i += x == ' ' ? 0 : 1; return i % 2 != 0 ? char.ToUpper(x) : char.ToLower(x); })); Sidenote: please replace charSentence[i].ToString().ToUpper() with char.ToUpper(charSentence[i])
Thanks #Dmitry Bychenko. Best Approach. But i thought as per the OP's (might be a fresher...) mindset, what could be the solution. Here i have the code as another solution. Lengthy code. I myself don't like but still representing class Program { static void Main(string[] args) { var sentence = "dancing sentence large also"; string newString = string.Empty; StringBuilder newStringdata = new StringBuilder(); string[] arr = sentence.Split(' '); for (int i=0; i< arr.Length;i++) { if (i==0) { newString = ReturnEvenModifiedString(arr[i]); newStringdata.Append(newString); } else { if(char.IsUpper(newString[newString.Length - 1])) { newString = ReturnOddModifiedString(arr[i]); newStringdata.Append(" "); newStringdata.Append(newString); } else { newString = ReturnEvenModifiedString(arr[i]); newStringdata.Append(" "); newStringdata.Append(newString); } } } Console.WriteLine(newStringdata.ToString()); Console.Read(); } //For Even Test private static string ReturnEvenModifiedString(string initialString) { string newString = string.Empty; var temparr = initialString.ToCharArray(); for (var i = 0; i < temparr.Length; i++) { if (temparr[i] != ' ') { if (i % 2 == 0 && temparr[i] != ' ') { newString += temparr[i].ToString().ToUpper(); } else { newString += temparr[i].ToString().ToLower(); } } } return newString; } //For Odd Test private static string ReturnOddModifiedString(string initialString) { string newString = string.Empty; var temparr = initialString.ToCharArray(); for (var i = 0; i < temparr.Length; i++) { if (temparr[i] != ' ') { if (i % 2 != 0 && temparr[i] != ' ') { newString += temparr[i].ToString().ToUpper(); } else { newString += temparr[i].ToString().ToLower(); } } } return newString; } } OUTPUT
PigLatin translate
I will create a program that translates English words into Pig Latin ... My problem with the code found below, is that the only word in the last index of the array as reported in the results? Does anyone see the error? Thanks in advance public partial class Form1 : Form { public Form1() { InitializeComponent(); } private void btnTrans_Click( object sender, EventArgs e ) { string engWordText = engWord.Text.ToString(); string let1; string restLet; int position; string pigLatin = ""; string vokal = "AEIOUaeiou"; // split the sentence into individual words //string[] words = engWordText.Split(' '); string[] transWord = engWordText.Split(' '); // translate each word into pig latin foreach (string word in transWord) { // check for empty TextBox try { let1 = word.Substring(0, 1); restLet = word.Substring(1, word.Length - 1); position = vokal.IndexOf(let1); if (position == -1) { pigLatin = restLet + let1 + "ay"; } else { pigLatin = word + "way"; } // display the translation latinInput.Text = pigLatin.ToString(); engWord.Clear(); } catch (System.ArgumentOutOfRangeException) { MessageBox.Show("Du måste skriva in ett engelskt ord", "PigLatin", MessageBoxButtons.OK, MessageBoxIcon.Error); } } } // end method translateButton_Click // pressing enter is the same as clicking the Translate Button private void engWordText_KeyDown( object sender, KeyEventArgs e ) { // allow user to press enter in TextBox if ( e.KeyCode == Keys.Enter ) btnTrans_Click( sender, e ); } // end method inputTextBox_KeyDown } // end class PigLatinForm
You are assigning the value of pigLatin to the text box's Text property at the end of each loop, which means it will only have the last value that was assigned to it. Try this: List<string> plWords = new List<string>(); // translate each word into pig latin foreach (string word in transWord) { // check for empty TextBox try { let1 = word[0]; restLet = word.Substring(1, word.Length - 1); if (!vokal.Contains(let1)) { pigLatin = restLet + let1 + "ay"; } else { pigLatin = word + "way"; } plWords.Add(pigLatin); } catch (System.ArgumentOutOfRangeException) { MessageBox.Show("Du måste skriva in ett engelskt ord", "PigLatin", MessageBoxButtons.OK, MessageBoxIcon.Error); } } engWord.Clear(); latinInput.Text = string.Join(" ", plWords.ToArray()); As a bit of a bonus, here's how you can make this operation quite a bit cleaner using Linq: private static string MakePigLatin(string word) { const string vowels = "AEIOUaeiou"; char let1 = word[0]; string restLet = word.Substring(1, word.Length - 1); return vowels.Contains(let1) ? word + "way" : restLet + let1 + "ay"; } private void btnTrans_Click( object sender, EventArgs e ) { var plWords = engWord.Text .Split(new[]{' '}, StringSplitOptions.RemoveEmptyEntries) .Select(MakePigLatin); latinInput.Text = string.Join(" ", plWords); engWord.Clear(); }
Read Text File from specific places
I have a question about read a text file, because i dont know if i'm thinking right. I want to read from specific string to specific character. My text would look like this: ... ... CM_ "Hello, how are you? Rules: Don't smoke! - love others End"; ... CM_ "Why you?"; ...// Many CM_ ... After Splited should look like that: 1. CM_ 2. "Hello, how are you? Rules: Don't smoke! - love others End" 3. CM_ 4. "Why you?" ... // many CM_ I want to read from "CM_" till ";" My Code i tried so far: StreamReader fin = new StreamReader("text.txt"); string tmp = ""; tmp = fin.ReadToEnd(); if (tmp.StartsWith("CM_ ") && tmp.EndWith(";")) { var result = tmp.Split(new[] { '"' }).SelectMany((s, i) => { if (i % 2 == 1) return new[] { s }; return s.Split(new[] { ' ', ';' }, StringSplitOptions.RemoveEmptyEntries); }).ToList(); } foreach (string x in result) { Console.WriteLine(x); }
static void PRegex() { using (StreamReader fin = new StreamReader("text.txt")) { string tmp = fin.ReadToEnd(); var matches = Regex.Matches(tmp, "(CM_) ([^;]*);", RegexOptions.Singleline); for (int i = 0; i < matches.Count; i++) if (matches[i].Groups.Count == 3) Console.WriteLine((2 * i + 1).ToString() + ". " + matches[i].Groups[1].Value + "\r\n" + (2 * (i + 1)).ToString() + ". " + matches[i].Groups[2].Value); } Console.ReadLine(); } static void PLineByLine() { using (StreamReader fin = new StreamReader("text.txt")) { int index = 0; string line = null; string currentCMBlock = null; bool endOfBlock = true; while ((line = fin.ReadLine()) != null) { bool endOfLine = false; while (!endOfLine) { if (endOfBlock) { int startIndex = line.IndexOf("CM_ "); if (startIndex == -1) { endOfLine = true; continue; } line = line.Substring(startIndex + 4, line.Length - startIndex - 4); endOfBlock = false; } if (!endOfBlock) { int startIndex = line.IndexOf(";"); if (startIndex == -1) { currentCMBlock += line + "\r\n"; endOfLine = true; continue; } currentCMBlock += line.Substring(0, startIndex); if (!string.IsNullOrEmpty(currentCMBlock)) Console.WriteLine((++index) + ". CM_\r\n" + (++index) + ". " + currentCMBlock); currentCMBlock = null; line = line.Substring(startIndex + 1, line.Length - startIndex - 1); endOfBlock = true; } } } } Console.ReadLine(); }
You are reading the whole file into tmp. So, if there is any text before "CM_" then your conditional statement won't be entered. Instead, try reading line by line with fin.ReadLine in a loop over all lines.
Read the whole file: string FileToRead = File.ReadAllText("Path"); string GetContent(string StartAt, string EndAt, bool LastIndex) { string ReturnVal; if(LastIndex) { ReturnVal = FileToRead.Remove(FileToRead.IndexOf(StartAt), FileToRead.IndexOf(EndAt)); Return ReturnVal; } else { ReturnVal = FileToRead.Remove(FileToRead.LastIndex(StartAt), FileToRead.LastIndex(EndAt)); Return ReturnVal; } } -Hope I didn't do anything wrong here. (Free mind typing) You read the file, and we remove all the content, infront of the first index. and all after it. You can set it if will return the FIRST result found. or the last. NOTE: I think it would be better to use a StringReader. (If I don't remember wrong...) If you are to think about the memory usage of your application.
I tried something else, don't know if this is good. It still read the first Line, dont know that i did wrong here my Code: while ((tmp = fin.ReadLine()) != null) { if (tmp.StartsWith("CM_ ")) { //string[] tmpList = tmp.Split(new Char[] { ' ', ';' }, StringSplitOptions.RemoveEmptyEntries); var result = tmp.Split(new[] { '"' }).SelectMany((s, i) => { if (i % 2 == 1) return new[] { s }; return s.Split(new[] { ' ', ';' }, StringSplitOptions.RemoveEmptyEntries); }).ToList(); if (tmp.EndsWith(";")) break; fin.ReadLine(); if (tmp.EndsWith(";")) { result.ToList(); break; } else { result.ToList(); fin.ReadLine(); } foreach (string x in result) { Console.WriteLine(x); } }
I suggest you look into using Regular Expressions. It may be just what you need and much more flexible than Split().
Create Space Between Capital Letters and Skip Space Between Consecutive
I get the way to create space "ThisCourse" to be "This Course" Add Space Before Capital Letter By (EtienneT) LINQ Statement But i cannot Create Space Betweeen This "ThisCourseID" to be "This Course ID" without space between "ID" And Is there a way to do this in Linq ??
Well, if it has to be a single linq statement... var s = "ThisCourseIDMoreXYeahY"; s = string.Join( string.Empty, s.Select((x,i) => ( char.IsUpper(x) && i>0 && ( char.IsLower(s[i-1]) || (i<s.Count()-1 && char.IsLower(s[i+1])) ) ) ? " " + x : x.ToString())); Console.WriteLine(s); Output: "This Course ID More X Yeah Y"
var s = "ThisCourseID"; for (var i = 1; i < s.Length; i++) { if (char.IsLower(s[i - 1]) && char.IsUpper(s[i])) { s = s.Insert(i, " "); } } Console.WriteLine(s); // "This Course ID" You can improve this using StringBuilder if you are going to use this on very long strings, but for your purpose, as you presented it, it should work just fine. FIX: var s = "ThisCourseIDSomething"; for (var i = 1; i < s.Length - 1; i++) { if (char.IsLower(s[i - 1]) && char.IsUpper(s[i]) || s[i - 1] != ' ' && char.IsUpper(s[i]) && char.IsLower(s[i + 1])) { s = s.Insert(i, " "); } } Console.WriteLine(s); // This Course ID Something
You don't need LINQ - but you could 'enumerate' and use lambda to make it more generic... (though not sure if any of this makes sense) static IEnumerable<string> Split(this string text, Func<char?, char?, char, int?> shouldSplit) { StringBuilder output = new StringBuilder(); char? before = null; char? before2nd = null; foreach (var c in text) { var where = shouldSplit(before2nd, before, c); if (where != null) { var str = output.ToString(); switch(where) { case -1: output.Remove(0, str.Length -1); yield return str.Substring(0, str.Length - 1); break; case 0: default: output.Clear(); yield return str; break; } } output.Append(c); before2nd = before; before = c; } yield return output.ToString(); } ...and call it like this e.g. ... static IEnumerable<string> SplitLines(this string text) { return text.Split((before2nd, before, now) => { if ((before2nd ?? 'A') == '\r' && (before ?? 'A') == '\n') return 0; // split on 'now' return null; // don't split }); } static IEnumerable<string> SplitOnCase(this string text) { return text.Split((before2nd, before, now) => { if (char.IsLower(before ?? 'A') && char.IsUpper(now)) return 0; // split on 'now' if (char.IsUpper(before2nd ?? 'a') && char.IsUpper(before ?? 'a') && char.IsLower(now)) return -1; // split one char before return null; // don't split }); } ...and somewhere... var text = "ToSplitOrNotToSplitTHEQuestionIsNow"; var words = text.SplitOnCase(); foreach (var word in words) Console.WriteLine(word); text = "To\r\nSplit\r\nOr\r\nNot\r\nTo\r\nSplit\r\nTHE\r\nQuestion\r\nIs\r\nNow"; words = text.SplitLines(); foreach (var word in words) Console.WriteLine(word); :)