Validate and pass only valid characters against regex expression in c#

Validate and pass only valid characters against regex expression in c# - c#

I am working on solution where I need to validate and pass only valid characters of string in c#.
E.g. my regular expression is : "^\\S(|(.|\\n)*\\S)\\Z"
and text I want validate is below
127 Finchfield Lane
Now I know its invalid. But how do I remove invalid against regex and pass only if string validate successfully against regex ?

if i understand you correctly, you are looking for Regex.IsMatch
if(Regex.IsMatch(str, "^\\S(|(.|\\n)*\\S)\\Z"))
{
// do something with the valid string
}
else
{
// strip invalid characters from the string
}

using System;
using System.Text.RegularExpressions;
namespace PatternMatching
{
class Program
{
static void Main()
{
string pattern = #"(\d+) (\w+)";
string[] strings = { "123 ABC", "ABC 123", "CD 45678", "57998 DAC" };
foreach (var s in strings)
{
Match result = Regex.Match(s, pattern);
if (result.Success)
{
Console.WriteLine("Match: {0}", result.Value);
}
}
Console.ReadKey();
}
}
}
This seems to do what you require. Hope I haven't misunderstood.

To validate the string against regex you can use use Regex.IsMatch.
Regex.IsMatch(string, pattern) //returns true if string is valid
If you want to get the Match value only then you can use it.
Match match = new Regex(#"\d+").Match(str);
match.value; //it returns only the matched string and unmatched string automatically stripped out

Related

How do I search and replace text containing placeholder tokens with a values from an xml file using regular expression matching. VB.net or C#

I have a problem that requires vb.net or C# solution with regular expressions matching.
I am not very good with regular expressions and so I thought I would ask for some help.
I have some text that has one or more tokens that I need to replace with values retrieved from an xml file. Tokens are similar but are of 2 different types. For matches of the first type I will replace with a value from file1.xml and for matches of the 2nd type from file2.xml.
The replaceable tokens are in this format:
Type 1 Tokens: &*T1& and &*T1001&
Type 2 Tokens: &*SomeValue& and &*A2ndValue&
The replacement values for the Type 1 tokens are in File1.xml and for Type 2 Tokens are in File2.xml
In the above example, when a match is found for Type 1 (T1000), I need to replace the entire token (&*T1000&) with the value of Element T1000 in File1.xml. <T1000>ValueT1000</T1000>
In the 2nd Type: When a match is found for Type 2 (SomeValue), I need to replace the entire token (&*SomeValue&) with the value of Element SomeValue in File2.xml. <SomeValue>Value2</SomeValue>
Example input text:
This is some text with first token &T1& and the second token &*T1001& and more tokens &*SomeValue& and still more &*A2ndValue&.
So far with help of the code from pirs, in vb.net, I have this:
Public Shared Sub Main()
Dim pattern As String = "\&\*?([\w]+)\&"
Dim input As String = "This is some text with first token &*T1& and the second token &*T1001& and more tokens &*SomeValue& and still more &*A2ndValue&."
For Each m As Match In Regex.Matches(input, pattern)
Console.WriteLine("'{0}' found at index {1}.", m.Groups(1).Value, m.Index)
Next
End Sub
Which returns:
'T1' found at index 35.
'T1001' found at index 62.
'SomeValue' found at index 87.
'A2ndValue' found at index 115
I need to process this text and replace all the tokens with their values retrieved from the 2 xml files.
Any help is appreciated.
[EDIT]
With answer from #pirs. Maybe the way to do it is to first find matches of type T1000 and then replace by regex index of match. When replacing by index, I think I have to start at last index since each replace will change the index of matches.
After all T1000 matches are replaced I think I can do another match on the output string from the above and then replace all the matches of 2nd type.
What is regex match for T1000 (T followed by any number of digits)

[EDIT] Replace with an index so..
public static string ReplaceIndex(this string self, string OldString, string newString, int index)
{
return self.Remove(index, OldString.Length).Insert(index, newString);
}
// ...
s = s.ReplaceIndex(m.Groups(1).Value, "newString", m.Index)
// ...
[EDIT] Try to replace the value directly
// ...
s = s.Replace(m.Groups(1).Value, "newValue")
// ...
[EDIT] regex for &* and & : https://regex101.com/r/MVRS7U/1/
the generated regex function for c#
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"&\*?([^&\*\d]+)";
string input = #"&*cool&*it's&working&in&*all&case";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
It's should be ok now :-)
__
I'm not sure about what you want exactly but here the regex for your case: https://regex101.com/r/5i3RII/1/
And here, the generated regex function for c# (you should do a custom function to fit with your need..):
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"<[a-zA-Z-0-9]+\s?>([\w]+)<\/[a-zA-Z-0-9]+\s?>";
// the example you gave
string input = #"<T1>value1</T1>
<T1001>value2</T1001>
<T2000 />
<SomeValue>value1</SomeValue >
<A2ndValue>value2</A2ndValue >";
foreach (Match m in Regex.Matches(input, pattern))
{
// the output
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}

I understand what you want to do. Code below does everything :
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication1
{
class Program
{
const string FILENAME = #"c:\temp\text.xml";
static void Main(string[] args)
{
string input = "This is some text with first token &*T1& and the second token &*T1001& and more tokens &*SomeValue& and still more &*A2ndValue&.";
XDocument doc = XDocument.Load(FILENAME);
string patternToken = "&[^&]+&";
string patternTag = #"&\*(?'tag'[^&]+)&";
MatchCollection matches = Regex.Matches(input, patternToken);
foreach(Match match in matches.Cast<Match>())
{
string token = match.Value;
string tag = Regex.Match(token, patternTag).Groups["tag"].Value;
string tagValue = doc.Descendants(tag).Select(x => (string)x).FirstOrDefault();
input = input.Replace(token, tagValue);
}
}
}
}

Get Removed characters from string

I am using Regex to remove unwanted characters from string like below:
str = System.Text.RegularExpressions.Regex.Replace(str, #"[^\u0020-\u007E]", "");
How can I retrieve distinct characters which will be removed in efficient way?
EDIT:
Sample input : str = "This☺ contains Åüsome æspecialæ characters"
Sample output : str = "This contains some special characters"
removedchar = "☺,Å,ü,æ"

string pattern = #"[\u0020-\u007E]";
Regex rgx = new Regex(pattern);
List<string> matches = new List<string> ();
foreach (Match match in rgx.Matches(str))
{
if (!matches.Contains (match.Value))
{
matches.Add (match.Value);
}
}

Here is an example how you can do it with a callback method inside the Regex.Replace overload with an evaluator:
evaluator
Type: System.Text.RegularExpressions.MatchEvaluator
A custom method that examines each match and returns either the original matched string or a replacement string.
C# demo:
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
public class Test
{
public static List<string> characters = new List<string>();
public static void Main()
{
var str = Regex.Replace("§My string 123”˝", "[^\u0020-\u007E]", Repl);//""
Console.WriteLine(str); // => My string 123
Console.WriteLine(string.Join(", ", characters)); // => §, ”, ˝
}
public static string Repl(Match m)
{
characters.Add(m.Value);
return string.Empty;
}
}
See IDEONE demo
In short, declare a "global" variable (a list of strings, here, characters), initialize it. Add the Repl method to handle the replacement, and when Regex.Replace calls that method, add each matched value to the characters list.

Replace regular expression with regular expression

Consider two regular expressions:
var regex_A = "Main\.(.+)\.Value";
var regex_B = "M_(.+)_Sp";
I want to be able to replace a string using regex_A as input, and regex_B as the replacement string. But also the other way around. And without supplying additional information like a format string per regex.
Specifically I want to create a replaced_B string from an input_A string. So:
var input_A = "Main.Rotating.Value";
var replaced_B = input_A.RegEx_Awesome_Replace(regex_A, regex_B);
Assert.AreEqual("M_Rotating_Sp", replaced_B);
And this should also work in reverse (thats the reason i can't use a simple string.format for regex_B). Because I don't want to supply a format string for every regular expression (i'm lazy).
var input_B = "M_Skew_Sp";
var replaced_A = input_B.RegEx_Awesome_Replace(regex_B, regex_A);
Assert.AreEqual("Main.Skew.Value", replaced_A);
I have no clue if this exists, or how to call it. Google search finds me all kinds of other regex replaces... not this one.
Update:
So basically I need a way to convert a regular expression to a format string.
var regex_A_format = Regex2Format(regex_A);
Assert.AreEqual("Main.$1.Value", regex_A_format);
and
var regex_B_format = Regex2Format(regex_B);
Assert.AreEqual("M_$1_Sp", regex_B_format);
So what should the RegEx_Awesome_Replace and/or Regex2Format function look like?
Update 2:
I guess the RegEx_Awesome_Replace should look something like (using some code from answers below):
public static class StringExtenstions
{
public static string RegExAwesomeReplace(this string inputString,string searchPattern,string replacePattern)
{
return Regex.Replace(inputString, searchPattern, Regex2Format(replacePattern));
}
}
Which would leave the Regex2Format as an open question.

There is no defined way for one regex to refer to a match found in another regex. Regexes are not format strings.
What you can do is to use Tuples of a format string together with its regex. e.g.
var a = new Tuple<Regex,string>(new Regex(#"(?<=Main\.).+(?=\.Value)"), #"Main.{0}.Value")
var b = new Tuple<Regex,string>(new Regex(#"(?<=M_).+(?=_Sp)"), #"M_{0}_Sp")`
Then you can pass these objects to a common replacement method in any order, like this:
private string RegEx_Awesome_Replace(string input, Tuple<Regex,string> toFind, Tuple<Regex,string> replaceWith)
{
return string.Format(replaceWith.Item2, toFind.Item1.Match(input).Value);
}
You will notice that I have used zero-width positive lookahead assertion and zero-width positive lookbehind assertions in my regexes, to ensure that Value contains exactly the text that I want to replace.
You may also want to add error handling, for cases where the match can not be found. Maybe read about Regex.Match

Since you have already reduced your problem to where you need to change a Regex into a string format (implementing Regex2Format) I will focus my answer just on that part. Note that my answer is incomplete because it doesn't address the full breadth of parsing regex capturing groups, however it works for simple cases.
First thing needed is a Regex that will match Regex capture groups. There is a negative lookbehind to not match escaped bracket symbols. There are other cases that break this regex. E.g. a non-capturing group, wildcard symbols, things between square braces.
private static readonly Regex CaptureGroupMatcher = new Regex(#"(?<!\\)\([^\)]+\)");
The implementation of Regex2Format here basically writes everything outside of capture groups into the output string, and replaces the capture group value by {x}.
static string Regex2Format(string pattern)
{
var targetBuilder = new StringBuilder();
int previousEndIndex = 0;
int formatIndex = 0;
foreach (Match match in CaptureGroupMatcher.Matches(pattern))
{
var group = match.Groups[0];
int endIndex = group.Index;
AppendPart(pattern, previousEndIndex, endIndex, targetBuilder);
targetBuilder.Append('{');
targetBuilder.Append(formatIndex++);
targetBuilder.Append('}');
previousEndIndex = group.Index + group.Length;
}
AppendPart(pattern, previousEndIndex, pattern.Length, targetBuilder);
return targetBuilder.ToString();
}
This helper function writes pattern string values into the output, it currently writes everything except \ characters used to escape something.
static void AppendPart(string pattern, int previousEndIndex, int endIndex, StringBuilder targetBuilder)
{
for (int i = previousEndIndex; i < endIndex; i++)
{
char c = pattern[i];
if (c == '\\' && i < pattern.Length - 1 && pattern[i + 1] != '\\')
{
//backslash not followed by another backslash - it's an escape char
}
else
{
targetBuilder.Append(c);
}
}
}
Test cases
static void Test()
{
var cases = new Dictionary<string, string>
{
{ #"Main\.(.+)\.Value", #"Main.{0}.Value" },
{ #"M_(.+)_Sp(.*)", "M_{0}_Sp{1}" },
{ #"M_\(.+)_Sp", #"M_(.+)_Sp" },
};
foreach (var kvp in cases)
{
if (PatternToStringFormat(kvp.Key) != kvp.Value)
{
Console.WriteLine("Test failed for {0} - expected {1} but got {2}", kvp.Key, kvp.Value, PatternToStringFormat(kvp.Key));
}
}
}
To wrap up, here is the usage:
private static string AwesomeRegexReplace(string input, string sourcePattern, string targetPattern)
{
var targetFormat = PatternToStringFormat(targetPattern);
return Regex.Replace(input, sourcePattern, match =>
{
var args = match.Groups.OfType<Group>().Skip(1).Select(g => g.Value).ToArray<object>();
return string.Format(targetFormat, args);
});
}

Something like this might work
var replaced_B = Regex.Replace(input_A, #"Main\.(.+)\.Value", #"M_$1_Sp");

Are you looking for something like this?
public static class StringExtenstions
{
public static string RegExAwesomeReplace(this string inputString,string searchPattern,string replacePattern)
{
Match searchMatch = Regex.Match(inputString,searchPattern);
Match replaceMatch = Regex.Match(inputString, replacePattern);
if (!searchMatch.Success || !replaceMatch.Success)
{
return inputString;
}
return inputString.Replace(searchMatch.Value, replaceMatch.Value);
}
}
The string extension method returns the string with replaced value for search pattern and replace pattern.
This is how you call:
input_A.RegEx_Awesome_Replace(regex_A, regex_B);

Get title attribute from string with regular expression

string asd = "<area href='#' title='name' shape='poly' coords='38,23,242'/>"
how extract text from title atribute in c#
and then insert another atribute after title?

search : (?<=title=')[^']+
replace: something
demo here : http://regex101.com/r/nR3vQ8
something like this in your case:
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
// This is the input string we are replacing parts from.
string input = "<area href='#' title='name' shape='poly' coords='38,23,242'/>";
// Use Regex.Replace to replace the pattern in the input.
// ... The pattern N.t indicates three letters, N, any character, and t.
string output = Regex.Replace(input, "(?<=title=')[^']+", "something");
// Write the output.
Console.WriteLine(input);
Console.WriteLine(output);
}
}
update
for taking out the title attribute as match use this:
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
// First we see the input string.
string input = "<area href='#' title='name' shape='poly' coords='38,23,242'/>";
// Here we call Regex.Match.
Match match = Regex.Match(input, #"title='(\w+)'",
RegexOptions.IgnoreCase);
// Here we check the Match instance.
if (match.Success)
{
// Finally, we get the Group value and display it.
string key = match.Groups[1].Value;
Console.WriteLine(key);
}
}
}
output
name

Try this: In particular you may be interested in the HTMLAgilityPack answer.
Regex reg = new Regex("<a[^>]*?title=\"([^\"]*?\"[^>]*?>");
A couple of gotchas:
This will match is case-sensitive, you may want to adjust that
This expects the title attribute both exists and is quoted
Of course, if the title attribute doesn't exist, you probably don't want the match anyway?
To Extract, use the groups collection:
reg.Match("Howdy").Groups[1].Value

getting number from string

How I can get number from following string:
###E[shouldbesomenumber][space or endofline]
[] here only for Illustration not present in the real string.
I am in .net 2.0.
Thanks.

I would suggest that you use regular expressions or string operations to isolate just the numeric part, and then call int.Parse, int.TryParse, decimal.Parse, decimal.TryParse etc depending on the type of number you need to parse.
The regular expression might look something like:
#"###E(-?\d+) ?$";
You'll need to change it for non-integers, of course. Sample code:
using System;
using System.Text.RegularExpressions;
class Test
{
static void Main(string[] arg)
{
Regex regex = new Regex(#"###E(-?\d+) ?$");
string text = "###E123 ";
Match match = regex.Match(text);
if (match.Success)
{
string group = match.Groups[1].Value;
int parsed = int.Parse(group);
Console.WriteLine(parsed);
}
}
}
Note that this could still fail with a number which exceeds the range of int. (Another reason to use int.TryParse...)

static string ExtractNumber(string text)
{
const string prefix = "###E";
int index = text.IndexOfAny(new []{' ', '\r', '\n'});
string number = text.Substring(prefix.Length, index - prefix.Length);
return number;
}
Now that your number is extracted you can parse it or use it as it is.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Validate and pass only valid characters against regex expression in c# - c#

if i understand you correctly, you are looking for Regex.IsMatch if(Regex.IsMatch(str, "^\\S(|(.|\\n)*\\S)\\Z")) { // do something with the valid string } else { // strip invalid characters from the string }

Related

How do I search and replace text containing placeholder tokens with a values from an xml file using regular expression matching. VB.net or C#

Get Removed characters from string

Replace regular expression with regular expression

Get title attribute from string with regular expression

getting number from string

Categories

Resources