starting from this C# example:
string myStringFormat = "I want to surround string {0} with {{ and }}";
string myStringArgs = "StringToSurround";
string myFinalString = string.Format(myStringFormat, myStringArgs);
I'd like to know if there is a quick and simple way to distinguish between escape character/sequence and arguments for curly braces/brackets.
The reasons why I am asking this are:
+] I want to provide some logging functionality and I don't want to require users to be aware of the double curly braces/brackets escape rule
+] I want to be very fast in doing this distinction for performance requirements
Currently the only solution I can think about is to scan the string looking for curly braces/brackets and do some check (number parsing) on subsequent characters. Probably regex can be helpful but I cannot find a way to use them in this scenario.
Btw, the final situation I'd like to achieve is user being allowed to this without getting exceptions:
string myStringFormat = "I want to surround string {0} with { and }";
string myStringArgs = "StringToSurround";
//string myFinalString = string.Format(myStringFormat, myStringArgs); throwing exception
string myFinalString = MyCustomizedStringFormat(myStringFormat, myStringArgs);
EDIT:
sorry the word "surround" was tricky and misleading, please consider this example:
string myStringFormat = "I want to append to string {0} these characters {{ and }}";
string myStringArgs = "StringToAppendTo";
string myFinalString = string.Format(myStringFormat, myStringArgs);
giving output
I want to append to string StringToAppendTo these characters { and }
Use this Regex to find the Argument substrings:
{\d+}
This regex escapes {} {1a} etc. and only chooses {1} {11} etc.
Now you need to handle either Arguments (replace them with their values) or the Escaped curly braces (replace them with double braces). The choice is yours and it depends on the number of occurrences of each case. (I chose to replace arguments in my code below)
Now you need to actually replace the characters. Again the choice is yours to use a StringBuilder or not. It depends on the size of your input and the number of replacements. In any case I suggest StringBuilder.
var m = Regex.Matches(input, #"{\d+}");
if (m.Any())
{
// before any arg
var sb = new StringBuilder(input.Substring(0, m[0].Index));
for (int i = 0; i < m.Count; i++)
{
// arg itself
sb.Append(args[i]);
// from right after arg
int start = m[i].Index + m[i].Value.Length;
if (i < m.Count - 1)
{
// until next arg
int length = m[i + 1].Index - start;
sb.Append(input.Substring(start, length));
}
else
// until end of input
sb.Append(input.Substring(start));
}
}
I believe this is the most robust and cleanest way to do it,and it does not have any performance (memory or speed) issues.
Edit
If you don't have access to args[] then you can first replace {/} with {{/}} and then simply do these modifications to the code:
use this pattern: #"{{\d+}}"
write m[i].Value.Substring(1, m[i].Value.Length - 2) instead of args[i]
Related
I just want to replace a portion of a string only if matches the given text.
My use case is as follows:
var text = "<wd:response><wd:response-data></wd:response-data></wd:response >";
string result = text.Replace("wd:response", "response");
/*
* expecting the below text
<response><wd:response-data></wd:response-data></response>
*
*/
I followed the following answers:
Way to have String.Replace only hit "whole words"
Regular expression for exact match of a string
But I failed to achieve what I want.
Please share your thoughts/solutions.
Sample on
https://dotnetfiddle.net/pMkO8Q
In general, you should really be parsing and manipulating XML as XML, using functions that know how XML works and what's legal in the language. Regex and other naive text manipulation will often lead you into trouble.
That said, for a very simple solution to this specific problem, you can do this with two replaces:
var text = "<wd:response><wd:response-data></wd:response-data></wd:response >";
text.Replace("wd:response>", "response>").Replace("wd:response ", "response ")
(Note the spaces at the end of the parameters to the second replace.)
Alternatively use a regex similar to "wd:response\s*>"
The easiest way to achieve your result as per your .net fiddle is use the replace as below.
string result = text.Replace("wd:response>", "response>");
But proper way to achieve this is parsing using XML
You can capture the string wd-response in a capturing group and replace using Regex.Replace using the MatchEvaluator like this.
Regex explanation - <[/]?(wd:response)[\s+]?>
Match < literally
Match / optionally hence the ?
Match the string wd:response and place it in a capturing group enclosed with ()
Match one or more optional whitespace [\s+]?
Match > literally
public class Program
{
public static void Main(string[] args)
{
string text = "<wd:response><wd:response-data></wd:response-data></wd:response >";
string replacePattern = "response";
string pattern = #"<[/]?(wd:response)[\s+]?>";
string replacedPattern = Regex.Replace(text, pattern, match =>
{
// Extract the first group
Group group = match.Groups[1];
// Replace the group value with the replacePattern
return string.Format("{0}{1}{2}", match.Value.Substring(0, group.Index - match.Index), replacePattern, match.Value.Substring(group.Index - match.Index + group.Length));
});
Console.WriteLine(replacedPattern);
}
}
Outputting:
<response><wd:response-data></wd:response-data></response >
In my article titles, I use CultureInfo.CurrentCulture.TextInfo.ToTitleCase(str.ToLower()); but I think, it is not working after double quotes. At least for Turkish.
For example, an article's title like this:
KİRA PARASININ ÖDENMEMESİ NEDENİYLE YAPILAN "İLAMSIZ TAHLİYE"
TAKİPLERİNDE "TAKİP TALEBİ"NİN İÇERİĞİ.
After using the method like this:
private static string TitleCase(this string str)
{
return CultureInfo.CurrentCulture.TextInfo.ToTitleCase(str.ToLower());
}
var art_title = textbox1.Text.TitleCase(); It returns
Kira Parasının Ödenmemesi Nedeniyle Yapılan "İlamsız Tahliye"
Takiplerinde "Takip Talebi"Nin İçeriği.
The problem is here. Because it must be like this:
... "Takip Talebi"nin ...
but it is like this:
... "Takip Talebi"Nin ...
What's more, in the MS Word, when I click "Start a Word Initial Expense," it's transforming like that
... "Takip Talebi"Nin ...
But it is absolutely wrong. How can I fix this problem?
EDIT: Firstly I cut the sentence from the blanks and obtained the words. If a word includes double quote, it would get a lowercase string until the first space after the second double quote. Here is the idea:
private static string _TitleCase(this string str)
{
return CultureInfo.CurrentCulture.TextInfo.ToTitleCase(str.ToLower());
}
public static string TitleCase(this string str)
{
var words = str.Split(' ');
string sentence = null;
var i = 1;
foreach (var word in words)
{
var space = i < words.Length ? " " : null;
if (word.Contains("\""))
{
// After every second quotes, it would get a lowercase string until the first space after the second double quote... But how?
}
else
sentence += word._TitleCase() + space;
i++;
}
return sentence?.Trim();
}
Edit - 2 After 3 Hours: After 9 hours, I found a way to solve the problem. I believe that it is absolutely not scientific. Please don't condemn me for this. If the whole problem is double quotes, I replace it with a number that I think it is unique or an unused letter in Turkish, like alpha, beta, omega etc. before sending it to the ToTitleCase. In this case, the ToTitleCase realizes the title transformation without any problems. Then I replace number or unused letter with double quotes in return time. So the purpose is realized. Please share it in here if you have a programmatic or scientific solution.
Here is my non-programmatic solution:
public static string TitleCase(this string str)
{
str = str.Replace("\"", "9900099");
str = CultureInfo.CurrentCulture.TextInfo.ToTitleCase(str.ToLower());
return str.Replace("9900099", "\"").Trim();
}
var art_title = textbox1.Text.TitleCase();
And the result:
Kira Parasının Ödenmemesi Nedeniyle Yapılan "İlamsız Tahliye" Takiplerinde "Takip Talebi"nin İçeriği
Indeed, Microsoft documentation ToTitleCase states that ToTitleCase is (at least currently) not linguistically correct. In fact, it is REALLY hard to do this correctly (see these blog posts of the great Michael Kaplan: Sometimes, uppercasing sucks and "Michael, why does ToTitleCase suck so much?").
I'm not aware of any service or library providing a linguistically correct version.
So - unless you want to spend a lot of effort - you probably have to live with this inaccuracy.
You can find the apostrophe or quote character with RegEx and replace the character after it.
For apostrophe
Regex.Replace(str, "’(?:.)", m => m.Value.ToLower());
or
Regex.Replace(str, "'(?:.)", m => m.Value.ToLower());
I am trying to validate a string based on the inputed characters. I want to be able to set which characters are allowed besides characters and numbers. Below is my extension method:
public static bool isAlphaNumeric(this string inputString, string allowedChars)
{
StringBuilder str = new StringBuilder(allowedChars);
str.Replace(" ", "\\s");
str.Replace("\n","\\\\");
str.Replace("/n", "////");
allowedChars = str.ToString();
Regex rg = new Regex(#"^[a-zA-Z0-9" + allowedChars + "]*$");
return rg.IsMatch(inputString);
}
The way I use this is:
s string = " te\m#as 1963' yili.ışçöÖÇÜ/nda olbnrdu" // just a test string with no meaning
if (s.isAlphaNumeric("ışŞö\Ö#üÜçÇ ğ'Ğ/.")) {...}
Of course it gives an error:
parsing "^[a-zA-Z0-9ışŞö\Ö#üÜçÇ\sğ'Ğ/.]*$" - Unrecognized escape sequence
the stringbuilder replace function is wrong which I am aware of. I want to be able to accept all characters given in the allowedChars parameter. This can also include slashes (any other characters similar to slashes I am not aware of?) Given this, how can I get my replace function work? and also is the way I am doing is correct? I am very very new to regular expressions and have no clue on how to work with them...
You need to use Regex.Escape on your string.
allowedChars = Regex.Escape(str.ToString());
ought to do it.
You're looking for Regex.Escape.
This question is not related to:
Best way to break long strings in C# source code
Which is about source, this is about processing long outputs. If someone enters:
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
As a comment, it breaks the container and makes the entire page really wide. Is there any clever regexp that can say, define a maximum word length of 20 chars and then force a whitespace character?
Thanks for any help!
There's probably no need to involve regexes in something this simple. Take this extension method:
public static string Abbreviate(this string text, int length) {
if (text.Length <= length) {
return text;
}
char[] delimiters = new char[] { ' ', '.', ',', ':', ';' };
int index = text.LastIndexOfAny(delimiters, length - 3);
if (index > (length / 2)) {
return text.Substring(0, index) + "...";
}
else {
return text.Substring(0, length - 3) + "...";
}
}
If the string is short enough, it's returned as-is. Otherwise, if a "word boundary" is found in the second half of the string, it's "gracefully" cut off at that point. If not, it's cut off the hard way at just under the desired length.
If the string is cut off at all, an ellipsis ("...") is appended to it.
If you expect the string to contain non-natural-language constructs (such as URLs) you 'd need to tweak this to ensure nice behavior in all circumstances. In that case working with a regex might be better.
You could try using a regular expression that uses a positive look-ahead like this:
string outputStr = Regex.Replace(inputStr, #"([\S]{20}(?=\S+))", "$1\n");
This should "insert" a line break into all words that are longer than 20 characters.
Yes you can use this one regex
string pattern = #"^([\w]{1,20})$";
this regex allow to enter not more than 20 characters
string strRegex = #"^([\w]{1,20})$";
string strTargetString = #"asdfasfasfasdffffff";
if(Regex.IsMatch(strTargetString, strRegex))
{
//do something
}
If you need only lenght constraint you should use this regex
^(.{1,20})$
because the \w is match only
alphanumeric and underscore symbol
I need to strip unknown characters from the end of a string returned from an SQL database. I also need to log when a special character occurs in the string.
What's the best way to do this?
You can use the Trim() method to trim blanks or specific characters from the end of a string. If you need to trim a certain number of characters you can use the Substring() method. You can use Regexs (System.Text.RegularExpressions namespace) to match patterns in a string and detect when they occur. See MSDN for more info.
If you need more help you'll need to provide a bit more info on what exactly you're trying to do.
First define what are unknown characters (chars other than 0-9, a to z and A to Z ?) and put them in an array
Loop trough the characters of a string and check if the char occurs, if so remove.
you can also to a String.Replace with as param the unknown char, and replaceparam ''.
Since you've specified that the legal characters are only alphanumeric, you could do something like this:
Match m = Regex.Match(original, "^([0-9A-Za-z]*)(.*)$");
string good = m.Groups[1].Value;
string bad = m.Groups[2].Value;
if (bad.Length > 0)
{
// log bad characters
}
Console.WriteLine(good);
Your definition of the problem is not precise yet this is a fast trick to do so:
string input;
...
var trimed = input.TrimEnd(new[] {'#','$',...} /* array of unwanted characters */);
if(trimed != input)
myLogger.Log(input.Replace(trimed, ""));
check out the Regex.Replace methods...there are lots of overloads. You can use the Match methods for the logging to identify all matches.
String badString = "HELLO WORLD!!!!";
Regex regex = new Regex("!{1,}$" );
String newString = regex.Replace(badString, String.Empty);