Interpreting and formatting user input

Interpreting and formatting user input - c#

I'm designing a command line interpreter for my software and need to be able to format user input. Currently I have a system which basically splits everything by spaces, the problem is that I need to not split anything inside double quotes.
As you can probably tell, my current implementation won't handle quoted paths very well.
This is my current interpreting and formatting logic (contained in a non static method which gets called when the user presses enter, in case anyone was wondering):
var command = ConsoleInput.Text;
ConsoleInput.Text = String.Empty;
string command_main = command.Split(new char[] { ' ' }).First();
string[] synatx = command.Split(new char[] { ' ' }).Skip(1).ToArray();
if (lCommands.ContainsKey(command_main))
{
Action<string[]> commandfunction;
lCommands.TryGetValue(command_main, out commandfunction);
commandfunction(synatx);
}
else
ConsoleOut($"Invalid Command - {command_main} {string.Join(" ", synatx)}");
I need quoted paths to be taken in as a single argument, instead of being split by spacing.
for example, (disclaimer: this is just an example and not actual code)
this is what I don't want: with an input of: "this is a test" and some more text it turns out to be something like this: syntax[0] = "this syntax[1] = is, and so on.
The expected outcome would be (what I want to happen): syntax[0] = "this is a test" syntax[1] = and syntax[2] = some, and so on.
I'm stuck here, anyone have a solution? Thank you.

Here's a solution. It's a hacked together state machine that handles quoted strings that may contain spaces. It throws away extraneous whitespace between arguments, and considers a doubled up double-quote as if it were a single double-quote (but without any special meaning; as if it were any other character).
public IEnumerable<string> ParseLine(string toParse)
{
var result = new List<string>();
bool inQuotedString = false;
bool parsingDoubleQuote = false;
bool inWhiteSpace = false;
int length = toParse.Length;
var argBuffer = new StringBuilder();
for (var index = 0; index < length; ++index)
{
//if looking ahead for a double quote succeeded, just add the quote to the current arguemnt
if (parsingDoubleQuote)
{
parsingDoubleQuote = false;
argBuffer.Append('"');
//and we are done with this character, so...
continue; //done with this character, time to just loop again
}
if (toParse[index] == '"')
{
inWhiteSpace = false;
//look ahead one character to see if there's a double quote
if (index < length - 1 && toParse[index + 1] == '"')
{
parsingDoubleQuote = true;
continue; //done with this character, time to just loop again
}
if (!inQuotedString)
{
inQuotedString = true;
continue; //done with this character, time to just loop again
}
else
{
//it's not a double quote, and we are in quotes string, so
inQuotedString = false;
//we don't add the buffer to the output args until a space or the end, so
continue; //done with this character, time to just loop again
}
}
//if we are here, there's no quote, so...
if (toParse[index] == ' ' || toParse[index] == '\t')
{
if (inQuotedString)
{
argBuffer.Append(toParse[index]);
continue; //done with this character, time to just loop again
}
if (inWhiteSpace)
{
//nothing to do
continue; //out of the for loop
}
else
{
inWhiteSpace = true;
if (argBuffer.Length > 0)
{
result.Add(argBuffer.ToString());
argBuffer.Clear();
continue; //done with this character, time to just loop again
}
}
}
else
{
inWhiteSpace = false;
//no quote, no space, so...
argBuffer.Append(toParse[index]);
continue; //done with this character, time to just loop again
}
} //end of for loop
if (argBuffer.Length > 0)
{
result.Add(argBuffer.ToString());
}
return result;
}
I've given it cursory testing - you'll want to test it harder

Related

How to replace multiple substrings in a string in C#?

I have to replace multiple substrings from a string (max length 32 of input string). I have a big dictionary which can have millions of items as a key-value pair. I need to check for each word if this word is present in the dictionary and replace with the respective value if present in the dictionary. The input string can have multiple trailing spaces.
This method is being called millions of time, due to this, it's affecting the performance badly.
Is there any scope of optimization in the code or some other better way to do this.
public static string RandomValueCompositeField(object objInput, Dictionary<string, string> g_rawValueRandomValueMapping) {
if (objInput == null)
return null;
string input = objInput.ToString();
if (input == "")
return input;
//List<string> ls = new List<string>();
int count = WhiteSpaceAtEnd(input);
foreach (string data in input.Substring(0, input.Length - count).Split(' ')) {
try {
string value;
gs_dictRawValueRandomValueMapping.TryGetValue(data, out value);
if (value != null) {
//ls.Add(value.TrimEnd());
input = input.Replace(data, value);
}
else {
//ls.Add(data);
}
}
catch(Exception ex) {
}
}
//if (count > 0)
// input = input + new string(' ', count);
//ls.Add(new string(' ', count));
return input;
}
EDIT:
I missed one important thing in the question. substring can occur only once inthe input string. Dictionay key and value have same number of characters.

Here's a method that will take an input string and will build a new string by finding "words" (any consecutive non-whitespace) and then checking if that word is in a dictionary and replacing it with the corresponding value if found. This will fix the issues of Replace doing replacements on "sub-words" (if you have "hello hell" and you want to replace "hell" with "heaven" and you don't want it to give you "heaveno heaven"). It also fixes the issue of swapping. For example if you want to replace "yes" with "no" and "no" with "yes" in "yes no" you don't want it to first turn that into "no no" and then into "yes yes".
public string ReplaceWords(string input, Dictionary<string, string> replacements)
{
var builder = new StringBuilder();
int wordStart = -1;
int wordLength = 0;
for(int i = 0; i < input.Length; i++)
{
// If the current character is white space check if we have a word to replace
if(char.IsWhiteSpace(input[i]))
{
// If wordStart is not -1 then we have hit the end of a word
if(wordStart >= 0)
{
// get the word and look it up in the dictionary
// if found use the replacement, if not keep the word.
var word = input.Substring(wordStart, wordLength);
if(replacements.TryGetValue(word, out var replace))
{
builder.Append(replace);
}
else
{
builder.Append(word);
}
}
// Make sure to reset the start and length
wordStart = -1;
wordLength = 0;
// append whatever whitespace was found.
builder.Append(input[i]);
}
// If this isn't whitespace we set wordStart if it isn't already set
// and just increment the length.
else
{
if(wordStart == -1) wordStart = i;
wordLength++;
}
}
// If wordStart is not -1 then we have a trailing word we need to check.
if(wordStart >= 0)
{
var word = input.Substring(wordStart, wordLength);
if(replacements.TryGetValue(word, out var replace))
{
builder.Append(replace);
}
else
{
builder.Append(word);
}
}
return builder.ToString();
}

Regular expression for pipe delimited and double quoted string

I have a string something like this:
"2014-01-23 09:13:45|\"10002112|TR0859657|25-DEC-2013>0000000000000001\"|10002112"
I would like to split by pipe apart from anything wrapped in double quotes so I have something like (similar to how csv is done):
[0] => 2014-01-23 09:13:45
[1] => 10002112|TR0859657|25-DEC-2013>0000000000000001
[2] => 10002112
I would like to know if there is a regular expression that can do this?

I think you may need to write your own parser.
Yo will need:
custom collection to keep results
boolean flag to decide whether pipe is inside quotation or outside quotation marks
string (or StringBuilder) to keep current word
The idea is that you read string char by char. Each char is appended to the word. If there is a pipe outside quotation marks you add the word to your result collection. If there is a quote you switch a flag so you don't treat the pipe as a divider anymore but you append it as a part of the word. Then if there is another quotation you switch the flag back again. So next pipe will result in adding the whole word (with pipes within quotation marks) to the collection. I tested the code below on your example and it worked.
private static List<string> ParseLine(string yourString)
{
bool ignorePipe = false;
string word = string.Empty;
List<string> divided = new List<string>();
foreach (char c in yourString)
{
if (c == '|' &&
!ignorePipe)
{
divided.Add(word);
word = string.Empty;
}
else if (c == '"')
{
ignorePipe = !ignorePipe;
}
else
{
word += c;
}
}
divided.Add(word);
return divided;
}

How about this Regular Expression:
/((["|]).*\2)/g
Online Demo
It looks like it could be used as valid split expression.

I'm going to blatantly ignore the fact that you want a RegEx, because I think that making your own IEnumerable will be easier. Plus, you get instant access to Linq.
var line = "2014-01-23 09:13:45|\"10002112|TR0859657|25-DEC-2013>0000000000000001\"|10002112";
var data = GetPartsFromLine(line).ToList();
private static IEnumerable<string> GetPartsFromLine(string line)
{
int position = -1;
while (position < line.Length)
{
position++;
if (line[position] == '"')
{
//go find the next "
int endQuote = line.IndexOf('"', position + 1);
yield return line.Substring(position + 1, endQuote - position - 1);
position = endQuote;
if (position < line.Length && line[position + 1] == '|')
{
position++;
}
}
else
{
//go find the next |
int pipe = line.IndexOf('|', position + 1);
if (pipe == -1)
{
//hit the end of the line
yield return line.Substring(position);
position = line.Length;
}
else
{
yield return line.Substring(position, pipe - position);
position = pipe;
}
}
}
}
This hasn't been fully tested, but it works with your example.

Censoring words in string[] by replacing

I am making a censor program for a game .dll I cannot figure out how to do this. I have a string[] of words and sentences. I have found out how to filter the words and block the messages. Right now I am trying to replace words with * the same length as a word. For example if someone said "fuck that stupid ass" it would come out as **** that stupid ***. Below is the code I am using
public void Actionfor(ServerChatEventArgs args)
{
var player = TShock.Players[args.Who];
if (!args.Text.ToLower().StartsWith("/") || args.Text.ToLower().StartsWith("/w") || args.Text.ToLower().StartsWith("/r") || args.Text.ToLower().StartsWith("/me") || args.Text.ToLower().StartsWith("/c") || args.Text.ToLower().StartsWith("/party"))
{
foreach (string Word in config.BanWords)
{
if (player.Group.HasPermission("caw.staff"))
{
args.Handled = false;
}
else if (args.Text.ToLower().Contains(Word))
{
switch (config.Action)
{
case "kick":
args.Handled = true;
TShock.Utils.Kick(player, config.KickMessage, true, false);
break;
case "ignore":
args.Handled = true;
player.SendErrorMessage("Your message has been ignored for saying: {0}", Word);
break;
case "censor":
args.Handled = false;
var wordlength = Word.Length;
break;
case "donothing":
args.Handled = false;
break;
}
}
}
}
else
{
args.Handled = false;
}
}
public string[] BanWords = { "fuck", "ass", "can i be staff", "can i be admin" };
Some places have code something like this under my case "censor"
Word = Word.Replace(Word, new string("*", Word.Length));
However I always get an error cannot convert string to char and cannot figure out else to do.

The compiler is telling you the problem; the overload of String you want takes a char and int, not a string and int.
It's trying to convert the * from a string to a char. Replace the double quotes " with a single quote '.

For chars, use single quotes ' instead of double quotes " like this:
new string('*', Word.Length)
And in your code, you don't need to replace. Simply do:
Word = new string('*', Word.Length);

Special characters Regex

Hello I'm try to remove special characters from user inputs.
public void fd()
{
string output = "";
string input = Console.ReadLine();
char[] charArray = input.ToCharArray();
foreach (var item in charArray)
{
if (!Char.IsLetterOrDigit(item))
{
\\\CODE HERE }
}
output = new string(trimmedChars);
Console.WriteLine(output);
}
At the end I'm turning it back to a string. My code only removes one special character in the string. Does anyone have any suggestions on a easier way instead

You have a nice implementation, just consider using next code, which is only a bit shorter, but has a little bit higher abstractions
var input = " th#ere's! ";
Func<char, bool> isSpecialChar = ch => !char.IsLetter(ch) && !char.IsDigit(ch);
for (int i = 1; i < input.Length - 1; i++)
{
//if current character is a special symbol
if(isSpecialChar(input[i]))
{
//if previous or next character are special symbols
if(isSpecialChar(input[i-1]) || isSpecialChar(input[i+1]))
{
//remove that character
input = input.Remove(i, 1);
//decrease counter, since we removed one char
i--;
}
}
}
Console.WriteLine(input); //prints " th#ere's "
A new string would be created each time you would call Remove. Use a StringBuilder for a more memory-performant solution.

The problem with your code is that you are taking the data from charArray and putting the result in trimmedChars for each change that you make, so each change will ignore all previous changes and work with the original. At the end you only have the last change.
Another problem with the code is that you are using IndexOf to get the index of a character, but that will get the index of the first occurance of that character, not the index where you got that character. For example when you are at the second ! in the string "foo!bar!" you will get the index of the first one.
You don't need to turn the string into an array to work with the characters in the string. You can just loop through the index of the characters in the string.
Note that you should also check the value of the index when you are looking at the characters before and after, so that you don't try to look at characters that are outside the string.
public void fd() {
string input = Console.ReadLine();
int index = 0;
while (index < input.Length) {
if (!Char.IsLetterOrDigit(input, index) && ((index == 0 || !Char.IsLetterOrDigit(input, index - 1)) || (index == input.Length - 1 || !Char.IsLetterOrDigit(input, index + 1)))) {
input = input.Remove(index, 1);
} else {
index++;
}
}
Console.WriteLine(input);
}

Been awhile since I've hit on C#, but a reg ex might be helpful
string input = string.Format("{0}! ", Console.ReadLine());
Regex rgx = new Regex("(?i:[^a-z]?)[.](?i:[^a-z]?)");
string output = rgx.Replace(input, "$1$2");
The regex looks for a character with a non-alpha character on left or right and replaces it with nothing.

What is the best algorithm for arbitrary delimiter/escape character processing?

I'm a little surprised that there isn't some information on this on the web, and I keep finding that the problem is a little stickier than I thought.
Here's the rules:
You are starting with delimited/escaped data to split into an array.
The delimiter is one arbitrary character
The escape character is one arbitrary character
Both the delimiter and the escape character could occur in data
Regex is fine, but a good-performance solution is best
Edit: Empty elements (including leading or ending delimiters) can be ignored
The code signature (in C# would be, basically)
public static string[] smartSplit(
string delimitedData,
char delimiter,
char escape) {}
The stickiest part of the problem is the escaped consecutive escape character case, of course, since (calling / the escape character and , the delimiter): ////////, = ////,
Am I missing somewhere this is handled on the web or in another SO question? If not, put your big brains to work... I think this problem is something that would be nice to have on SO for the public good. I'm working on it myself, but don't have a good solution yet.

A simple state machine is usually the easiest and fastest way. Example in Python:
def extract(input, delim, escape):
# states
parsing = 0
escaped = 1
state = parsing
found = []
parsed = ""
for c in input:
if state == parsing:
if c == delim:
found.append(parsed)
parsed = ""
elif c == escape:
state = escaped
else:
parsed += c
else: # state == escaped
parsed += c
state = parsing
if parsed:
found.append(parsed)
return found

void smartSplit(string const& text, char delim, char esc, vector<string>& tokens)
{
enum State { NORMAL, IN_ESC };
State state = NORMAL;
string frag;
for (size_t i = 0; i<text.length(); ++i)
{
char c = text[i];
switch (state)
{
case NORMAL:
if (c == delim)
{
if (!frag.empty())
tokens.push_back(frag);
frag.clear();
}
else if (c == esc)
state = IN_ESC;
else
frag.append(1, c);
break;
case IN_ESC:
frag.append(1, c);
state = NORMAL;
break;
}
}
if (!frag.empty())
tokens.push_back(frag);
}

private static string[] Split(string input, char delimiter, char escapeChar, bool removeEmpty)
{
if (input == null)
{
return new string[0];
}
char[] specialChars = new char[]{delimiter, escapeChar};
var tokens = new List<string>();
var token = new StringBuilder();
for (int i = 0; i < input.Length; i++)
{
var c = input[i];
if (c.Equals(escapeChar))
{
if (i >= input.Length - 1)
{
throw new ArgumentException("Uncompleted escape sequence has been encountered at the end of the input");
}
var nextChar = input[i + 1];
if (nextChar != escapeChar && nextChar != delimiter)
{
throw new ArgumentException("Unknown escape sequence has been encountered: " + c + nextChar);
}
token.Append(nextChar);
i++;
}
else if (c.Equals(delimiter))
{
if (!removeEmpty || token.Length > 0)
{
tokens.Add(token.ToString());
token.Length = 0;
}
}
else
{
var index = input.IndexOfAny(specialChars, i);
if (index < 0)
{
token.Append(c);
}
else
{
token.Append(input.Substring(i, index - i));
i = index - 1;
}
}
}
if (!removeEmpty || token.Length > 0)
{
tokens.Add(token.ToString());
}
return tokens.ToArray();
}

The implementation of this kind of tokenizer in terms of a FSM is fairly straight forward.
You do have a few decisions to make (like, what do I do with leading delimiters? strip or emit NULL tokens).
Here is an abstract version which ignores leading and multiple delimiters, and doesn't allow escaping the newline:
state(input) action
========================
BEGIN(*): token.clear(); state=START;
END(*): return;
*(\n\0): token.emit(); state=END;
START(DELIMITER): ; // NB: the input is *not* added to the token!
START(ESCAPE): state=ESC; // NB: the input is *not* added to the token!
START(*): token.append(input); state=NORM;
NORM(DELIMITER): token.emit(); token.clear(); state=START;
NORM(ESCAPE): state=ESC; // NB: the input is *not* added to the token!
NORM(*): token.append(input);
ESC(*): token.append(input); state=NORM;
This kind of implementation has the advantage of dealing with consecutive excapes naturally, and can be easily extended to give special meaning to more escape sequences (i.e. add a rule like ESC(t) token.appeand(TAB)).

Here's my ported function in C#
public static void smartSplit(string text, char delim, char esc, ref List<string> listToBuild)
{
bool currentlyEscaped = false;
StringBuilder fragment = new StringBuilder();
for (int i = 0; i < text.Length; i++)
{
char c = text[i];
if (currentlyEscaped)
{
fragment.Append(c);
currentlyEscaped = false;
}
else
{
if (c == delim)
{
if (fragment.Length > 0)
{
listToBuild.Add(fragment.ToString());
fragment.Remove(0, fragment.Length);
}
}
else if (c == esc)
currentlyEscaped = true;
else
fragment.Append(c);
}
}
if (fragment.Length > 0)
{
listToBuild.Add(fragment.ToString());
}
}
Hope this helps someone in the future. Thanks to KenE for pointing me in the right direction.

Here's a more idiomatic and readable way to do it:
public IEnumerable<string> SplitAndUnescape(
string encodedString,
char separator,
char escape)
{
var inEscapeSequence = false;
var currentToken = new StringBuilder();
foreach (var currentCharacter in encodedString)
if (inEscapeSequence)
{
currentToken.Append(currentCharacter);
inEscapeSequence = false;
}
else
if (currentCharacter == escape)
inEscapeSequence = true;
else
if (currentCharacter == separator)
{
yield return currentToken.ToString();
currentToken.Clear();
}
else
currentToken.Append(currentCharacter);
yield return currentToken.ToString();
}
Note that this doesn't remove empty elements. I don't think that should be the responsibility of the parser. If you want to remove them, just call Where(item => item.Any()) on the result.
I think this is too much logic for a single method; it gets hard to follow. If someone has time, I think it would be better to break it up into multiple methods and maybe its own class.

You'ew looking for something like a "string tokenizer". There's a version I found quickly that's similar. Or look at getopt.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Interpreting and formatting user input - c#

Related

How to replace multiple substrings in a string in C#?

Regular expression for pipe delimited and double quoted string

Censoring words in string[] by replacing

Special characters Regex

What is the best algorithm for arbitrary delimiter/escape character processing?

Categories

Resources