C# Regex Help replacing patterns

C# Regex Help replacing patterns - c#

I'm not really good at regex (I only get to use it a few times a year) and want to see if someone can help with a C# regex statement which finds all instances of
<####-##-##> or </####-##-##>
and replaces it with
<date-####-##-##> or </date-####-##-##>
so that
<2012-01-01>stuff</2012-01-01><2012-05-01>stuff2</2012-05-01>
becomes
<date-2012-01-01>stuff</date-2012-01-01><date-2012-05-01>stuff2</date-2012-05-01>

string test = "<2012-01-01>stuff</2012-01-01><2012-05-01>stuff2</2012-05-01>";
var regex = new Regex(#"<(/?)(\d\d\d\d)-(\d\d)-(\d\d)>");
var result = regex.Replace(test, #"<$1date-$2-$3-$4>");
Console.WriteLine(result);
//output:
//<date-2012-01-01>stuff</date-2012-01-01><date-2012-05-01>stuff2</date-2012-05-01>
Note that the need for detail goes up depending on the other text in the strings your are processing. Are there lots of other tags? Numbers that aren't dates? etc..

If you examine the values inside tags this would be a solution.
if(Regex.IsMatch(input, #"^(19|20)\d\d[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])$"))
{
input.Replace(input,"date-"+input);
}

Related

Replacing strings with changing data

I've been looking at encryption and long story short, I need to remove an xml tag from a string.
Each string needs a global rule for replacing the string, and I've looked at regex for this, but it doesn't make sense.
Here are some examples
<BitStrength>384</BitStrength>
<BitStrength>1024</BitStrength>
<BitStrength>12300</BitStrength>
I need to replace the whole string and the number inside as well, with nothing.
I've tried things like:
string.replace("<BitStrength>12300</BitStrength>","");
But the issue is the length and characters of the number, and a match is never found.
Has anyone got a solution? Maybe regex is the way to go?
PS. Preferably a solution in C#.
EDIT: I'm looking for a solution that replaces the whole string in not only this kind of example but strings in general.
<BitStrength>4633</BitStrength>
<BitStrength>336</BitStrength>
!!SomeConstantData!!5437!!EndConstant!!
I would like 2 eggs today.
I would like 17 eggs today.
I would like 258367 eggs today.
Now if I put string.replace("I would like ","").replace(" eggs today.") I would be left with the number 258367, because I didn't cover this in my statement. I'm looking for a solution to delete this data. It can be any value.
In my particular example I'm looking to replace <BitStrength>384</BitStrength> in <BitStrength>384</BitStrength><RSAKeyValue><Modulus>code</Modulus><Exponent>code</Exponent></RSAKeyValue>
The Issue I face is that the number between the bitstrength tags can be anything between 386 and 16384, and I need to remove the entire bitstrength string.

string input = "<BitStrength>384</BitStrength>";
string pattern = #"<BitStrength>\d*</BitStrength>";
string replacement = " ";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);
Console.WriteLine("Original String: {0}", input);
Console.WriteLine("Replacement String: {0}", result);
returns
Original String: <BitStrength>384</BitStrength>
Replacement String:

You don't show what you have tried, and I wonder if the failure is because you are not making the pattern string as a literal using the # before it?
This example works:
Regex.Replace(#"Blah<BitStrength>12300</BitStrength>Blah",
#"(\<BitStrength\>12300\</BitStrength\>)",
string.Empty)
and returns
BlahBlah
If the actual number does not matter use this pattern:
(\<BitStrength\>\d+\</BitStrength\>)

This works:
var source = string.Join(Environment.NewLine,
"<BitStrength>384</BitStrength>",
"<BitStrength>1024</BitStrength>",
"<BitStrength>12300</BitStrength>");
var result = source.Replace("<BitStrength>12300</BitStrength>", string.Empty);

Finding text between tags and replacing it along with the tags

I am using The following regex pattern to find text between [code] and [/code] tags:
(?<=[code]).*?(?=[/code])
It returns me anything which is enclosed between these 2 tags, e.g. this: [code]return Hi There;[/code] gives me return Hi There;.
I need help with regex to replace entire text along with the tags.

Use this:
var s = "My temp folder is: [code]Path.GetTempPath()[/code]";
var result = Regex.Replace(s, #"\[code](.*?)\[/code]",
m =>
{
var codeString = m.Groups[1].Value;
// then evaluate this string
return EvaluateMyCode(codeString)
});

I would use a HTML Parser for this. I can see that what you are trying to do is simple, however these things have a habit to get much more complicated overtime. The end result is much pain for the poor sole who has to maintain the code in the future.
Take a look at this question about HTML Parsers
What is the best way to parse html in C#?
[Edit]
Here is a much more relevant answer to the question asked.
#Milad Naseri regex is correct you just need to do something like
string matchCodeTag = #"\[code\](.*?)\[/code\]";
string textToReplace = "[code]The Ape Men are comming[/code]";
string replaceWith = "Keep Calm";
string output = Regex.Replace(textToReplace, matchCodeTag, replaceWith);
Check out this web sites for more examples
http://www.dotnetperls.com/regex-replace
http://oreilly.com/windows/archive/csharp-regular-expressions.html
Hope this helps

You need to use back referencing, i.e. replace \[code\](.*?)\[/code\] with something like <code>$1</code> which will give you what's been enclosed by the [code][/code] tags enclosed in -- for this example -- <code></code> tags.

How to Extract the Word Following a Symbol?

I have a string that could have any sentence in it but somewhere in that string will be the # symbol, followed by an attached word, sort of like #username you see on some sites.
so maybe the string is "hey how are you" or it's "#john hey how are you".
IF there's an "#" in the string i want to pull what comes immediately after it into its own new string.
in this instance how can i pull "john" into a different string so i could theoretically notify this person of his new message? i'm trying to play with string.contains or .replace but i'm pretty new and having a hard time.
this btw is in c# asp.net

You can use the Substring and IndexOf methods together to achieve this.
I hope this helps.
Thanks,
Damian

Here's how you do it without regex:
string s = "hi there #john how are you";
string getTag(string s)
{
int atSign = s.IndexOf("#");
if (atSign == -1) return "";
// start at #, stop at sentence or phrase end
// I'm assuming this is English, of course
// so we leave in ' and -
int wordEnd = s.IndexOfAny(" .,;:!?", atSign);
if (wordEnd > -1)
return s.Substring(atSign, wordEnd - atSign);
else
return s.Substring(atSign);
}

You should really learn regular expressions. This will work for you:
using System.Text.RegularExpressions;
var res = Regex.Match("hey #john how are you", #"#(\S+)");
if (res.Success)
{
//john
var name = res.Groups[1].Value;
}
Finds the first occurrence. If you want to find all you can use Regex.Matches. \S means anything else than a whitespace. This means it also make hey #john, how are you => john, and #john123 => john123 which may be wrong. Maybe [a-zA-Z] or similar would suit you better (depends on which characters the usernames is made of). If you would give more examples, I could tune it :)
I can recommend this page:
http://www.regular-expressions.info/
and this tool where you can test your statements:
http://regexlib.com/RESilverlight.aspx

The best way to solve this is using Regular Expressions. You can find a great resource here.
Using RegEx, you can search for the pattern you are after. I always have to refer to some documentation to write one...
Here is a pattern to start with - "#(\w+)" - the # will get matched, and then the parentheses will indicate that you want what comes after. The "\w" means you want only word characters to match (a-z or A-Z), and the "+" indicates that there should be one or more word characters in a row.

You can try Regex...
I think will be something like this
string userName = Regex.Match(yourString, "#(.+)\\s").Groups[1].Value;

RegularExpressions. Dont know C#, but the RegEx would be
/(#[\w]+) / - Everything in the parans is captured in a special variable, or attached to RegEx object.

Use this:
var r = new Regex(#"#\w+");
foreach (Match m in r.Matches(stringToSearch))
DoSomething(m.Value);
DoSomething(string foundName) is a function that handles name (found after #).
This will find all #names in stringToSearch

RegEx for extracting number from a string

I have a bunch of files in a directory, mostly labled something like...
PO1000000100.doc or .pdf or .txt
Some of them are PurchaseOrderPO1000000109.pdf
What i need to do is extract the PO1000000109 part of it. So basically PO with 10 numbers after it...
How can I do this with a regex?
(What i'll do is a foreach loop on the files in the directory, get the filename, and run it through the regex to get the PO number...)
I'm using C# - not sure if this is relevant.

Try this
String data =
Regex.Match(#"PO\d{10}", "PurchaseOrderPO1000000109.pdf",
RegexOptions.IgnoreCase).Value;
Could add a Regex.IsMatch with same vars above ofc :)

If the PO part is always the same, you can just get the number without needing to use a regex:
new string(theString.Where(c => char.IsDigit(c)).ToArray());
Later you can prepend the PO part manually.
NOTE: I'm assuming that you have only one single run of numbers in your strings. If you have for example "abc12345def678" you will get "12345678", which may not be what you want.

Regex.Replace(fileName, #"^.?PO(\d{10}).$", "$1");
Put stars after dots.

string data="PurchaseOrderPO1000000109.pdf\nPO1000000100.doc";
MatchCollection matches = Regex.Matches(data, #"PO[0-9]{10}");
foreach(Match m in matches){
print(m.Value);
}
Results
PO1000000109
PO1000000100

This RegEx will pick up all numbers from a string \d*.
As described here.

A possible regexp could be:
^.*(\d{10})\.\D{3}$

var re = new System.Text.RegularExpressions.Regex("(?<=^PurchaseOrder)PO\\d{10}(?=\\.pdf$)");
Assert.IsTrue(re.IsMatch("PurchaseOrderPO1234567890.pdf"));
Assert.IsFalse(re.IsMatch("some PurchaseOrderPO1234567890.pdf"));
Assert.IsFalse(re.IsMatch("OrderPO1234567890.pdf"));
Assert.IsFalse(re.IsMatch("PurchaseOrderPO1234567890.pdf2"));

C# Extracting a name from a string

I want to extract 'James\, Brown' from the string below but I don't always know what the name will be. The comma is causing me some difficuly so what would you suggest to extract James\, Brown?
OU=James\, Brown,OU=Test,DC=Internal,DC=Net
Thanks

A regex is likely your best approach
static string ParseName(string arg) {
var regex = new Regex(#"^OU=([a-zA-Z\\]+\,\s+[a-zA-Z\\]+)\,.*$");
var match = regex.Match(arg);
return match.Groups[1].Value;
}

You can use a regex:
string input = #"OU=James\, Brown,OU=Test,DC=Internal,DC=Net";
Match m = Regex.Match(input, "^OU=(.*?),OU=.*$");
Console.WriteLine(m.Groups[1].Value);

A quite brittle way to do this might be...
string name = #"OU=James\, Brown,OU=Test,DC=Internal,DC=Net";
string[] splitUp = name.Split("=".ToCharArray(),3);
string namePart = splitUp[1].Replace(",OU","");
Console.WriteLine(namePart);
I wouldn't necessarily advocate this method, but I've just come back from a departmental Christmas lyunch and my brain is not fully engaged yet.

I'd start off with a regex to split up the groups:
Regex rx = new Regex(#"(?<!\\),");
String test = "OU=James\\, Brown,OU=Test,DC=Internal,DC=Net";
String[] segments = rx.Split(test);
But from there I would split up the parameters in the array by splitting them up manually, so that you don't have to use a regex that depends on more than the separator character used. Since this looks like an LDAP query, it might not matter if you always look at params[0], but there is a chance that the name might be set as "CN=". You can cover both cases by just reading the query like this:
String name = segments[0].Split('=', 2)[1];

That looks suspiciously like an LDAP or Active Directory distinguished name formatted according to RFC 2253/4514.
Unless you're working with well known names and/or are okay with a fragile hackaround (like the regex solutions) - then you should start by reading the spec.
If you, like me, generally hate implementing code according to RFCs - then hope this guy did a better job following the spec than you would. At least he claims to be 2253 compliant.

If the slash is always there, I would look at potentially using RegEx to do the match, you can use a match group for the last and first names.
^OU=([a-zA-Z])\,\s([a-zA-Z])
That RegEx will match names that include characters only, you will need to refine it a bit for better matching for the non-standard names. Here is a RegEx tester to help you along the way if you go this route.

Replace \, with your own preferred magic string (perhaps & #44;), split on remaining commas or search til the first comma, then replace your magic string with a single comma.
i.e. Something like:
string originalStr = #"OU=James\, Brown,OU=Test,DC=Internal,DC=Net";
string replacedStr = originalStr.Replace("\,", ",");
string name = replacedStr.Substring(0, replacedStr.IndexOf(","));
Console.WriteLine(name.Replace(",", ","));

Assuming you're running in Windows, use PInvoke with DsUnquoteRdnValueW. For code, see my answer to another question: https://stackoverflow.com/a/11091804/628981

If the format is always the same:
string line = GetStringFromWherever();
int start = line.IndexOf("=") + 1;//+1 to get start of name
int end = line.IndexOf("OU=",start) -1; //-1 to remove comma
string name = line.Substring(start, end - start);
Forgive if syntax is not quite right - from memory. Obviously this is not very robust and fails if the format ever changes.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# Regex Help replacing patterns - c#

If you examine the values inside tags this would be a solution. if(Regex.IsMatch(input, #"^(19|20)\d\d[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])$")) { input.Replace(input,"date-"+input); }

Related

Replacing strings with changing data

Finding text between tags and replacing it along with the tags

How to Extract the Word Following a Symbol?

RegEx for extracting number from a string

C# Extracting a name from a string

Categories

Resources