Is there a way to trim a string to the first numeric digit from left AND right using standard .NET tools? Or I need to write my own function (not difficult, but I'd rather use standard methods). I need the following outputs for the provided inputs:
Input Output
-----------------------
abc123def 123
;'-2s;35(r 2s;35
abc12de3f4g 12de3f4
You'll need to use regular expressions
string TrimToDigits(string text)
{
var pattern = #"\d.*\d";
var regex = new Regex(pattern);
Match m = regex.Match(text); // m is the first match
if (m.Success)
{
return m.Value;
}
return String.Empty;
}
If you want to call this like you normally would the String.Trim() method, you can create it as an extension method.
static class StringExtensions
{
static string TrimToDigits(this string text)
{
// ...
}
}
And then you can call it like this:
var trimmedString = otherString.TrimToDigits();
No, there is no built in way. You will have to write your own method to do this.
No, I don't think there is. Method though:
for (int i = 0; i < str.Length; i++)
{
if (char.IsDigit(str[i]))
{
break;
}
str = string.Substring(1);
}
for (int i = str.Length - 1; i > 0; i--)
{
if (char.IsDigit(str[i]))
{
break;
}
str = string.Substring(0, str.Length - 1);
}
I think this'll work.
Related
I have a large XML file that contain tag names that implement the dash-separated naming convention. How can I use C# to convert the tag names to the camel case naming convention?
The rules are:
1. Convert all characters to lower case
2. Capitalize the first character after each dash
3. Remove all dashes
Example
Before Conversion
<foo-bar>
<a-b-c></a-b-c>
</foo-bar>
After Conversion
<fooBar>
<aBC></aBC>
</fooBar>
Here's a code example that works, but it's slow to process - I'm thinking that there is a better way to accomplish my goal.
string ConvertDashToCamelCase(string input)
{
input = input.ToLower();
char[] ca = input.ToCharArray();
StringBuilder sb = new StringBuilder();
for(int i = 0; i < ca.Length; i++)
{
if(ca[i] == '-')
{
string t = ca[i + 1].ToString().toUpper();
sb.Append(t);
i++;
}
else
{
sb.Append(ca[i].ToString());
}
}
return sb.ToString();
}
The reason your original code was slow is because you're calling ToString all over the place unnecessarily. There's no need for that. There's also no need for the intermediate array of char. The following should be much faster, and faster than the version that uses String.Split, too.
string ConvertDashToCamelCase(string input)
{
StringBuilder sb = new StringBuilder();
bool caseFlag = false;
for (int i = 0; i < input.Length; ++i)
{
char c = input[i];
if (c == '-')
{
caseFlag = true;
}
else if (caseFlag)
{
sb.Append(char.ToUpper(c));
caseFlag = false;
}
else
{
sb.Append(char.ToLower(c));
}
}
return sb.ToString();
}
I'm not going to claim that the above is the fastest possible. In fact, there are several obvious optimizations that could save some time. But the above is clean and clear: easy to understand.
The key is the caseFlag, which you use to indicate that the next character copied should be set to upper case. Also note that I don't automatically convert the entire string to lower case. There's no reason to, since you'll be looking at every character anyway and can do the appropriate conversion at that time.
The idea here is that the code doesn't do any more work than it absolutely has to.
For completeness, here's also a regular expression one-liner (inspred by this JavaScript answer):
string ConvertDashToCamelCase(string input) =>
Regex.Replace(input, "-.", m => m.Value.ToUpper().Substring(1));
It replaces all occurrences of -x with x converted to upper case.
Special cases:
If you want lower-case all other characters, replace input with input.ToLower() inside the expression:
string ConvertDashToCamelCase(string input) =>
Regex.Replace(input.ToLower(), "-.", m => m.Value.ToUpper().Substring(1));
If you want to support multiple dashes between words (dash--case) and have all of the dashes removed (dashCase), replace - with -+ in the regular expression (to greedily match all sequences of dashes) and keep only the final character:
string ConvertDashToCamelCase(string input) =>
Regex.Replace(input, "-+.", m => m.Value.ToUpper().Substring(m.Value.Length - 1));
If you want to support multiple dashes between words (dash--case) and remove only the final one (dash-Case), change the regular expression to match only a dash followed by a non-dash (rather than a dash followed by any character):
string ConvertDashToCamelCase(string input) =>
Regex.Replace(input, "-[^-]", m => m.Value.ToUpper().Substring(1));
string ConvertDashToCamelCase(string input)
{
string[] words = input.Split('-');
words = words.Select(element => wordToCamelCase(element));
return string.Join("", words);
}
string wordToCamelCase(string input)
{
return input.First().ToString().ToUpper() + input.Substring(1).ToLower();
}
Here is an updated version of #Jim Mischel's answer that will ignore the content - i.e. it will only camelCase tag names.
string ConvertDashToCamelCase(string input)
{
StringBuilder sb = new StringBuilder();
bool caseFlag = false;
bool tagFlag = false;
for(int i = 0; i < input.Length; i++)
{
char c = input[i];
if(tagFlag)
{
if (c == '-')
{
caseFlag = true;
}
else if (caseFlag)
{
sb.Append(char.ToUpper(c));
caseFlag = false;
}
else
{
sb.Append(char.ToLower(c));
}
}
else
{
sb.Append(c);
}
// Reset tag flag if necessary
if(c == '>' || c == '<')
{
tagFlag = (c == '<');
}
}
return sb.ToString();
}
using System;
using System.Text;
public class MyString
{
public static string ToCamelCase(string str)
{
char[] s = str.ToCharArray();
StringBuilder sb = new StringBuilder();
for(int i = 0; i < s.Length; i++)
{
if (s[i] == '-' || s[i] == '_')
sb.Append(Char.ToUpper(s[++i]));
else
sb.Append(s[i]);
}
return sb.ToString();
}
}
I have a string (input is performed by user) which has an expression to be checked against a regex pattern matcher.
I wish to loop through the String until EOF. I was thinking of using input.Length but then I don't know how to continue to compare the number. If the whole string is correct against the pattern then it returns TRUE, otherwise FALSE. This is where I arrived till now.
private void checkInput (String input)
{
{
String acceptedInput = "(?=\\()|(?<=\\)\\d)";
// Need a loop until End of String
// (while ?)
{
foreach (Match match in Regex.Matches(input, acceptedInput))
{
outputDialog.AppendText("Correct");
}
return true;
}
return false;
}
}
Is there any way to do it please?
Thank you
To loop over each char in a string:
for (int i = 0; i < stringVariable.Length; i++)
{
char x = stringVariable[i]; //is the i'th character of the string
}
But that approach makes no sense if your using RegEx, which generally work on entire strings.
Maybe explain what your trying to achieve?
Use String.ToCharArray
i.e.
char[] array = input.ToCharArray();
for (int i = 0; i < array.Length; i++)
{
var letter = array[i];//here is the individual character
}
you don't need to loop:
string toAvoid = "$%&#";
if (input.IndexOfAny(toAvoid.ToCharArray()) != -1)
{
// the input contains forbidden characters
}
What's the easiest and fastest way to find a sub-string(template) in a string and replace it with something else following the template's letter case (if all lower case - replace with lowercase, if all upper case - replace with uppercase, if begins with uppercase and so on...)
so if the substring is in curly braces
"{template}" becomes "replaced content"
"{TEMPLATE}" becomes "REPLACED CONTENT" and
"{Template}" becomes "Replaced content" but
"{tEMPLATE}" becomes "rEPLACED CONTENT"
Well, you could use regular expressions and a match evaluator callback like this:
regex = new Regex(#"\{(?<value>.*?)\}",
RegexOptions.CultureInvariant | RegexOptions.ExplicitCapture);
string replacedText = regex.Replace(<text>,
new MatchEvaluator(this.EvaluateMatchCallback));
And your evaluator callback would do something like this:
private string EvaluateMatchCallback(Match match) {
string templateInsert = match.Groups["value"].Value;
// or whatever
string replacedText = GetReplacementTextBasedOnTemplateValue(templateInsert);
return replacedText;
}
Once you get the regex match value you can just do a case-sensitive comparison and return the correct replacement value.
EDIT I sort of assumed you were trying to find the placeholders in a block of text rather than worry about the casing per se, if your pattern is valid all the time then you can just check the first two characters of the placeholder itself and that will tell you the casing you need to use in the replacement expression:
string foo = "teMPLATE";
if (char.IsLower(foo[0])) {
if (char.IsLower(foo[1])) {
// first lower and second lower
}
else {
// first lower and second upper
}
}
else {
if (char.IsLower(foo[1])) {
// first upper and second lower
}
else {
// first upper and second upper
}
}
I would still use a regular expression to match the replacement placeholder, but that's just me.
You can check the case of the first two letters of the placeholder and choose one of the four case transforming strategies for the inserted text.
public static string Convert(string input, bool firstIsUpper, bool restIsUpper)
{
string firstLetter = input.Substring(0, 1);
firstLetter = firstIsUpper ? firstLetter.ToUpper() : firstLetter.ToLower();
string rest = input.Substring(1);
rest = restIsUpper ? rest.ToUpper() : rest.ToLower();
return firstLetter + rest;
}
public static string Replace(string input, Dictionary<string, string> valueMap)
{
var ms = Regex.Matches(input, "{(\\w+?)}");
int i = 0;
var sb = new StringBuilder();
for (int j = 0; j < ms.Count; j++)
{
string pattern = ms[j].Groups[1].Value;
string key = pattern.ToLower();
bool firstIsUpper = char.IsUpper(pattern[0]);
bool restIsUpper = char.IsUpper(pattern[1]);
sb.Append(input.Substring(i, ms[j].Index - i));
sb.Append(Convert(valueMap[key], firstIsUpper, restIsUpper));
i = ms[j].Index + ms[j].Length;
}
return sb.ToString();
}
public static void DoStuff()
{
Console.WriteLine(Replace("--- {aAA} --- {AAA} --- {Aaa}", new Dictionary<string,string> {{"aaa", "replacement"}}));
}
Ended up doing that:
public static string ReplaceWithTemplate(this string original, string pattern, string replacement)
{
var template = Regex.Match(original, pattern, RegexOptions.IgnoreCase).Value.Remove(0, 1);
template = template.Remove(template.Length - 1);
var chars = new List<char>();
var isLetter = false;
for (int i = 0; i < replacement.Length; i++)
{
if (i < (template.Length)) isLetter = Char.IsUpper(template[i]);
chars.Add(Convert.ToChar(
isLetter ? Char.ToUpper(replacement[i])
: Char.ToLower(replacement[i])));
}
return new string(chars.ToArray());
}
Rather than describing what I want (it's difficult to explain), Let me provide an example of what I need to accomplish in C# using a regular expression:
"HelloWorld" should be transformed to "Hello World"
"HelloWORld" should be transformed to "Hello WO Rld" //Two consecutive letters in capital should be treatead as one word
"helloworld" should be transformed to "helloworld"
EDIT:
"HellOWORLd" should be transformed to "Hell OW OR Ld"
Every 2-consecutive capital letters should be considered one word.
Is this possible?
This is fully working C# code, not just the regex:
Console.WriteLine(
Regex.Replace(
"HelloWORld",
"(?<!^)(?<wordstart>[A-Z]{1,2})",
" ${wordstart}", RegexOptions.Compiled));
And it prints:
Hello WO Rld
Update
To make this more UNICODE/international aware, consider replacing [A-Z] by \p{Lt} (meaning a UNICODE code point that represents a Letter in uppercase). The result for the current input would the same. So here is a slightly more compelling example:
Console.WriteLine(Regex.Replace(
#"ÉclaireürfØÑJßå",
#"(?<!^)(?<wordstart>\p{Lu}{1,2})",
#" ${wordstart}",
RegexOptions.Compiled));
The regular expression engine is not a transformative thing by nature, but rather a pattern matching (and replacing) engine. People often mistake the replace part of Regex, thinking that it can do more than it's designed to.
Back to your question, though... Regex cannot do what you want, instead, you should write your own parser to do this. With C#, if you're familiar with the language, this task is somewhat trivial.
It's a case of "You're using the wrong tool for the job".
Here are regular expressions that detect what you are looking for:
([A-Z]\w*?)[A-Z]
this matches any uppercase letter from A to Z once followed by aphanumerics up to the next uppercase.
([A-Z]{2}\w*?)[A-Z]
this matches any uppercase letter from A to Z exactly 2 times.
Regex is a matching engine, you can parse the input string and use regex.isMatch to find candidate matches to then insert spaces into the output string
string f(string input)
{
//'lowerUPPER' -> 'lower UPPER'
var x = Regex.Replace(input, "([a-z])([A-Z])","$1 $2");
//'UPPER' -> 'UP PE R'
return Regex.Replace(x, "([A-Z]{2})","$1 ");
}
class Program
{
static void Main(string[] args)
{
Print(Parse("HelloWorld"));
Print(Parse("HelloWORld"));
Print(Parse("helloworld"));
Print(Parse("HellOWORLd"));
Console.ReadLine();
}
static void Print(IEnumerable<string> input)
{
foreach (var s in input)
{
Console.Write(s);
Console.Write(' ');
}
Console.WriteLine();
}
static IEnumerable<string> Parse(string input)
{
var sb = new StringBuilder();
for (int i = 0; i < input.Length; i++)
{
if (!char.IsUpper(input[i]))
{
sb.Append(input[i]);
continue;
}
if (sb.Length > 0)
{
yield return sb.ToString();
sb.Clear();
}
sb.Append(input[i]);
if (char.IsUpper(input[i + 1]))
{
sb.Append(input[++i]);
yield return sb.ToString();
sb.Clear();
}
}
if (sb.Length > 0)
{
yield return sb.ToString();
}
}
}
I think does not need regular expression in this case.
Try this:
static void Main(string[] args)
{
var input = "HellOWORLd";
var i = 0;
var x = 4;
var len = input.Length;
var output = new List<string>();
while (x <= len)
{
output.Add(SubStr(input, i, x));
i = x;
x += 2;
}
var ret = output.ToArray(); //["Hell","OW", "OR", "Ld"]
Console.ReadLine();
}
static string SubStr(string str, int start, int end)
{
var len = str.Length;
if (start >= 0 && end <= len)
{
var ret = new StringBuilder();
for (int i = 0; i < len; i++)
{
if (i == start)
{
do
{
ret.Append(str[i]);
i++;
} while (i != end);
}
}
return ret.ToString();
}
return null;
}
Quick add on requirement in our project. A field in our DB to hold a phone number is set to only allow 10 characters. So, if I get passed "(913)-444-5555" or anything else, is there a quick way to run a string through some kind of special replace function that I can pass it a set of characters to allow?
Regex?
Definitely regex:
string CleanPhone(string phone)
{
Regex digitsOnly = new Regex(#"[^\d]");
return digitsOnly.Replace(phone, "");
}
or within a class to avoid re-creating the regex all the time:
private static Regex digitsOnly = new Regex(#"[^\d]");
public static string CleanPhone(string phone)
{
return digitsOnly.Replace(phone, "");
}
Depending on your real-world inputs, you may want some additional logic there to do things like strip out leading 1's (for long distance) or anything trailing an x or X (for extensions).
You can do it easily with regex:
string subject = "(913)-444-5555";
string result = Regex.Replace(subject, "[^0-9]", ""); // result = "9134445555"
You don't need to use Regex.
phone = new String(phone.Where(c => char.IsDigit(c)).ToArray())
Here's the extension method way of doing it.
public static class Extensions
{
public static string ToDigitsOnly(this string input)
{
Regex digitsOnly = new Regex(#"[^\d]");
return digitsOnly.Replace(input, "");
}
}
Using the Regex methods in .NET you should be able to match any non-numeric digit using \D, like so:
phoneNumber = Regex.Replace(phoneNumber, "\\D", String.Empty);
How about an extension method that doesn't use regex.
If you do stick to one of the Regex options at least use RegexOptions.Compiled in the static variable.
public static string ToDigitsOnly(this string input)
{
return new String(input.Where(char.IsDigit).ToArray());
}
This builds on Usman Zafar's answer converted to a method group.
for the best performance and lower memory consumption , try this:
using System;
using System.Diagnostics;
using System.Text;
using System.Text.RegularExpressions;
public class Program
{
private static Regex digitsOnly = new Regex(#"[^\d]");
public static void Main()
{
Console.WriteLine("Init...");
string phone = "001-12-34-56-78-90";
var sw = new Stopwatch();
sw.Start();
for (int i = 0; i < 1000000; i++)
{
DigitsOnly(phone);
}
sw.Stop();
Console.WriteLine("Time: " + sw.ElapsedMilliseconds);
var sw2 = new Stopwatch();
sw2.Start();
for (int i = 0; i < 1000000; i++)
{
DigitsOnlyRegex(phone);
}
sw2.Stop();
Console.WriteLine("Time: " + sw2.ElapsedMilliseconds);
Console.ReadLine();
}
public static string DigitsOnly(string phone, string replace = null)
{
if (replace == null) replace = "";
if (phone == null) return null;
var result = new StringBuilder(phone.Length);
foreach (char c in phone)
if (c >= '0' && c <= '9')
result.Append(c);
else
{
result.Append(replace);
}
return result.ToString();
}
public static string DigitsOnlyRegex(string phone)
{
return digitsOnly.Replace(phone, "");
}
}
The result in my computer is:
Init...
Time: 307
Time: 2178
I'm sure there's a more efficient way to do it, but I would probably do this:
string getTenDigitNumber(string input)
{
StringBuilder sb = new StringBuilder();
for(int i - 0; i < input.Length; i++)
{
int junk;
if(int.TryParse(input[i], ref junk))
sb.Append(input[i]);
}
return sb.ToString();
}
try this
public static string cleanPhone(string inVal)
{
char[] newPhon = new char[inVal.Length];
int i = 0;
foreach (char c in inVal)
if (c.CompareTo('0') > 0 && c.CompareTo('9') < 0)
newPhon[i++] = c;
return newPhon.ToString();
}