how to regex that kind of string - c#

I want regex that string, but I really dont know how. I have figured out how I can get the numbers, but not the other strings
string text = "1cb07348-34a4-4741-b50f-c41e584370f7 Youtuber https://youtube.com/lol love youtube";
string regexstring = "[a-z0-9]+-[a-z0-9]+-[a-z0-9]+-[a-z0-9]+-[a-z0-9]*(?<id>)"
code
Match m = Regex.Match(text, regexstring);
if(m.Success)
Console.WriteLine(m.Groups[0]);
Output
1cb07348-34a4-4741-b50f-c41e584370f7
now I want that the output is that
1cb07348-34a4-4741-b50f-c41e584370f7
Youtuber
https://youtube.com/lol
love youtube
what I finished is the first line of the output but I dont know how to regex the other strings

([\w]+-){5} is cleaner to replace what you already did.
\w means [a-zA-Z0-9_].
Then, if your string always has a website preceded and followed by a number of words separated by spaces, you can do this:
string regexstring = "((\w*-){4})(\w*) (.+?)[A-Za-z]?(https://[^ ]+?) (.+)";
Ouput
Match m = Regex.Match(text, regexstring);
if(m.Success)
Console.WriteLine(m.Groups[1] + "" + m.Groups[2] + "" + m.Groups[3] + "\n" + m.Groups[4] + "\n" + m.Groups[5] + "\n" + m.Groups[6]);

I'm guessing that, if our inputs would look like the same, this expression might be somewhat close to what you might have in mind, not sure though:
^(\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b)\s+(.*?)\s+[A-Z](https?:\/\/\S+)\s+(.*)$
The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.
Reference
Searching for UUIDs in text with regex

Related

Remove special characters from string with unicode

I found the most popular answer to this question is:
Regex.Replace(value, "[^a-zA-Z0-9]+", " ", RegexOptions.Compiled);
However, if users type in Non-English name when billing, this method will consider these non- are special characters and remove them.
Is there any way we can build for most of users since my website is multi-language.
Make it Unicode aware:
var res = Regex.Replace(value, #"[^\p{L}\p{M}\p{N}]+", " ");
If you plan to keep only regular digits, keep [0-9].
The regex matches one or more symbols other than Unicode letters (\p{L}), diacritics (\p{M}) and digits (\p{N}).
You might consider var res = Regex.Replace(value, #"\W+", " "), but it will keep _ since the underscore is a "word" character.
I found my self that the best way to achieve this and make work with all languages is create a string with all banned characters, look this code:
string input = #"heya's #FFFFF , CUL8R M8 how are you?'"; // This is the input string
string regex = #"[!""#$%&'()*+,\-./:;<=>?#[\\\]^_`{|}~]"; //Banned characters string, add all characters you don´t want to be displayed here.
Match m;
while ((m = Regex.Match(input, regex)) != null)
{
if (m.Success)
input = input.Remove(m.Index, m.Length);
else // if m.Success is false: break, because while loop can be infinite
break;
}
input = input.Replace(" ", " ").Replace(" "," "); //if string has two-three-four spaces together change it to one
MessageBox.Show(input);
Hope it works!
PS: As others posted here, there are other ways. But I personally prefer that one even though it´s way more code. Choose the one you think better fits for your needing.

Why this function for finding the n-th occurrence does not work on text with line breaks?

I found the following code to find the n-th occurrence of a value in a text here.
This is the code:
public static int NthIndexOf(this string target, string value, int n)
{
Match m = Regex.Match(target, "((" + value + ").*?){" + n + "}");
if (m.Success)
return m.Groups[2].Captures[n - 1].Index;
else
return -1;
}
I tried to find the index of the second occurrence of "< /form>" (the space does not appear in the original string) in some webpage, and it failed, although for sure it exists in the text. I also cut some prefix of the webpage, so the second occurrence will be the first, and then I succeeded to find the expression as the first occurrence.
In one of the comment on this code, someone wrote that "This Regex does not work if the target string contains linebreaks.".
My two questions are:
Why does not this code work if the target string contains linebreaks?
How can I fix this code, so it will work also for strings that contain linebreaks (replacing/removing the linebreaks is not considered a good solution for me)?
I don't look for other techniques to do the same thing.
the regex match till the end of the line.
For what you want you need to use the Singleline mode, so your code should look something like this:
Match m = Regex.Match(target, "((" + value + ").*?){" + n + "}", RegexOptions.Singleline);
By default Regular Expression end on a new line. To fix it you need to specify the regex option
Match m = Regex.Match(target, "((" + value + ").*?){" + n + "}", RegexOptions.MultiLine);
You can find more information about RegExOptions here.

Regular Expression without braces

i have the following sample cases :
1) "Sample"
2) "[10,25]"
I want to form a(only one) regular expression pattern, to which the above examples are passed returns me "Sample" and "10,25".
Note: Input strings do not include Quotes.
I came up with the following expression (?<=\[)(.*?)(?=\]), this satisfies the second case and retreives me only "10,25" but when the first case is matched it returns me blank. I want "Sample" to be returned? can anyone help me.
C#.
here you go, a small regex using a positive lookbehind, sometime these are very handy
Regex
(?<=^|\[)([\w,]+)
Test string
Sample
[10,25]
Result
MATCH 1
[0-6] Sample
MATCH 2
[8-13] 10,25
try at regex101.com
if " is included in your original string, use this regex, this will look for " mark as well, you may choose to remove ^| from lookup if " mark is always included or you may choose to leave it as it is if your text has combination of with and without " marks
Regex
(?<=^|\[|\")([\w,]+)
try at regex101.com
As far as I can tell, the below regex should help:
Regex regex = new Regex(#"^\w+|[[](\w)+\,(\w)+[]]$");
This will match multiple words, or 2 words (alphanumeric) separated by commas and inside square brackets.
One Java example:
// String input = "Sample";
String input = "[10,25]";
String text = "[^,\\[\\]]+";
Pattern pMod = Pattern.compile("(" + text + ")|(?>\\[(" + text + "," + text + ")\\])");
Matcher mMod = pMod.matcher(input);
while (mMod.find()) {
if(mMod.group(1) != null) {
System.out.println(mMod.group(1));
}
if(mMod.group(2)!=null) {
System.out.println(mMod.group(2));
}
}
if input is "[hello&bye,25|35]", then the output is hello&bye,25|35

How do I do this with one regular expression pattern instead of three?

I think I need to use an alternation construct but I can't get it to work. How can I get this logic into one regular expression pattern?
match = Regex.Match(message2.Body, #"\r\nFrom: .+\(.+\)\r\n");
if (match.Success)
match = Regex.Match(message2.Body, #"\r\nFrom: (.+)\((.+)\)\r\n");
else
match = Regex.Match(message2.Body, #"\r\nFrom: ()(.+)\r\n");
EDIT:
Some sample cases should help with your questions
From: email
and
From: name(email)
Those are the two possible cases. I'm looking to match them so I can do
string name = match.Groups[1].Value;
string email = match.Groups[2].Value;
Suggestions for a different approach are welcome!
Thanks!
This is literally what you're asking for: "(?=" + regex1 + ")" + regex2 + "|" + regex3
match = Regex.Match(message.Body, #"(?=\r\nFrom: (.+\(.+\))\r\n)\r\nFrom: (.+)\((.+)\)\r\n|\r\nFrom: ()(.+)\r\n");
But I don't think that's really what you want.
With .net's Regex, you can name groups like this: (?<name>regex).
match = Regex.Match(message.Body, #"\r\nFrom: (?<one>.+)\((?<two>.+)\)\r\n|\r\nFrom: (?<one>)(?<two>.+)\r\n");
Console.WriteLine (match.Groups["one"].Value);
Console.WriteLine (match.Groups["two"].Value);
However, your \r\n is probably not right. That would be a literal rnFrom:. Try this instead.
match = Regex.Match(message.Body, #"^From: (?:(?<one>.+)\((?<two>.+)\)|(?<one>)(?<two>.+))$");
Console.WriteLine (match.Groups["one"].Value);
Console.WriteLine (match.Groups["two"].Value);

Replace multiple lines with .net Regex

I am new to stackoverflow (my first post) and regex.
Currently i am working on a simple dirty app to replace baseclass properties with ctor injected fields. (cos i need to edit about 400 files)
It should find this:
ClassName(WiredObjectRegistry registry) : base(registry)
{
and replace with:
ClassName(IDependency paramName, ISecondDependency secondParam, ... )
{
_fieldName = paramName;
...
so i need to replace the two old lines with three or more new lines.
basically i was thinking:
find this ->
className + ctorParams + zero or more
whitespaces + newline + zero or more
whitespaces + {
replace with ->
className + newCtorParams + newline +
{
my field assignments
i tried this regex for .net
className + ctorParam + #"\w*" + "\r|\n" + #"\w*" + #"\{"
which does not replace the "{" and the whitespaces correctly
the replaced file content looks like this:
public CacheManager(ICallManager callManager, ITetraEventManager tetraEventManager, IConferenceManager conferenceManager, IAudioManager audioManager)
{
_callManager = callManager;
_tetraEventManager = tetraEventManager;
_conferenceManager = conferenceManager;
_audioManager = audioManager;
{
can u please help me with this :-|
david
If you're translating
className + ctorParams + zero or more whitespaces + newline + zero or more whitespaces + {
into regex as
className + ctorParam + #"\w*" + "\r|\n" + #"\w*" + #"\{"
then you're making several errors.
First, the character class for whitespace is \s. \w means "alphanumeric character".
Second, "\r|\n" will result in the alternation operator | separating the entire regex in two alternative parts (= "match either the regex before the | or the regex after the |"). In your case, you don't need this bit at all since \s will already match spaces, tabs and newlines. If you do want a regex that matches a Unix, Mac or DOS newline, use \r?\n?.
But, as the comments show, unless you show us what you really want to do, we can't help you further.

Categories

Resources