Regex to extract only domain from sub-domains [duplicate] - c#

This question already has answers here:
What is a regular expression which will match a valid domain name without a subdomain?
(23 answers)
Closed 7 years ago.
i will be using the expression with
Regex.Replace();
to replace the rest with "".
inputs:
http://therealzenstar.blogspot.fr
output:
blogspot.fr

Just to iterate on Jens' comment, we have to guess: What is your expected output when additional information appears, e.g. http://therealzenstar.blogspot.fr/somedata.html. Is it still blogspot.fr? Are such examples needed to be adresed?
You said you want to replace "everything else" with "". Replace() will replace everything that is matched with what you want. So, to replace it with "", you'd need to match everything that you do not want.It's possible, however, it's much easier to capture what you DO want and replace all the match with $1.
Assuming you always want only the domain.xx, even if more information appears. Something like this will work: ^(?:https?:\/\/)?[^\/\s]*\.([^.\s\/]*\.[^.\s\/]*)(?:$|\/.*), as seen: https://regex101.com/r/hN8iQ7/1
A problem arises if your domains also include those with multiple extensions. I.e. domain.co.uk. You'd need to adress them specifically (naming them), as it is very hard to generalize a way to distinguish these items.
^(?:https?:\/\/)?[^\/\s]*?\.([^.\s\/]*\.(?:co\.uk|[^.\s\/]*))(?:$|\/.*) - with .co.uk option added. https://regex101.com/r/hN8iQ7/2 .
yourregex.Replace(yourstring, "$1") may do what you need.

Related

C# Trim() is not working in one of my projects [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I have silly problem with Asp.Net project that I'm working on it for more than 5 years.
Today suddenly the Trim() function stopped working.
Notes:
I update project framework from 4.7.2 to 4.8 and the problem still happen.
I tried TrimLeft() and TrimRight() also have the same problem of Trim()
Replace Function is still working fine not effected but it is not a good solution for me to use it every where.
Trim working fine on new projects
I also check the char code of space it is 32
any idea?
this is the code
string val = "abo ‎"; // "abo\x0020\x0020\x0020\x0020\x200e"
string userName = val.Trim();
you can run the problem in this link
Update:
thanks you all for the comments, I also found a simple check to test the end of the char when you set the cursor at the end and press backspace one time nothing is happen the second press start deleting and that because of the \x200e char at the end.
any idea how to trim hidden Char from left and right and deal with just like spaces.
String.Trim works. If it didn't, hundreds of thousands of developers would have noticed 16 years ago.
The string ends with a formatting character, specifically \x200e, the Left-to-Right-Mark. That's definitely not a whitespace. Calling Char.GetUnicodeCategory('‎') returns Format. I suspect the input came from mixed Arabic and Latin text, perhaps something copied by a user from a longer string?
One way to handle this is to use String.Trim(char[]) specifying the LTR mark along with other characters. That's not quite the same as String.Trim() though, which removes any character that returns true with Char.IsWhiteSpace() :
var userName=val.Trim(' ','\t','\r','\n','\u200e`);
Another option would be to use a regular expression that trims both whitespace \s and characters in the Format Unicode category Cf, only from the start ^[\p{Cf}\s] or end ([\p{Cf}\s]+$) :
string userName = Regex.Replace(val,#"(^[\p{Cf}\s]+)|([\p{Cf}\s]+$)","");
Perhaps a better option would be to prevent unexpected characters using input validation, and require that the input TextBox contains only letter or letter and digit characters. After all, the user could paste some other unexpected non-printable character. It's better to warn the user than try to handle all possible bad data.
Usernames are typically letter and number combinations without whitespace. All ASP.NET stacks allow validation. Modern browsers allow regular expression validation in the input element directly, so we could come up with a regex that allows only valid characters, eg :
<input type="text" required pattern="[A-Z0-9]' ..../>
The NumberLetter block (Nl) could be used to capture numbers and letters in any language, just like Cf is used to capture format characters

.Net Regex - trying to locate a one liner to convert the case of fully qualified server names [duplicate]

This question already has answers here:
Use C# regex to convert casing in a string
(3 answers)
Use RegEx to uppercase and lowercase the string
(2 answers)
Closed 4 years ago.
Trying to answer myself an academic exercise here.
Is there a method using Regular Expressions (.net syntax, so see the caveat below) that I can convert fully qualified server name to a combo upper and lower case string (server name is UPPER case, domain name(s) in lower case).
e.g.
db01.local => DB01.local
DB02.TEST.LOCAL => DB02.test.local
db03.LOCAL = > DB03.local
I've been playing around with the RE and so far have ([A-Za-z0-9-]+)\.(.+) as the pattern, but I'm struggling how to do this in a simple one liner.
My initial tests had me fritzing with Matches and getting a returned list, but that feels fugly to me because I then need to check the number of matches, do casting and ToUpper() \ ToLower() operations etc. and, yeah, well...
Caveat: If I wasn't using .NET then I think I should be able to do something simple like use \U${1}.\L${2} as my replacement string, but it doesn't look like .NET supports that syntax.
Using the 'Possible duplicate of' link this is what I ended up with:
Regex re = new Regex(#"([A-Za-z0-9-]+)\.(.+)");
foreach (var i in _knownServers)
clean.Add(re.Replace(i, m => $"{m.Groups[1].ToString().ToUpper()}.{m.Groups[2].ToString().ToLower()}"));

How do I make this regex stop at the first match? [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 6 years ago.
I'm converting a lot of code from legacy to maintainable and I'm creating a list of regex we can use to do all the pages quickly and the same. My regex skills are that of a child running with a knife...its not great. I've looked up a lot of different ways to only find the first set but I can't seem to get it to work. Can anyone solve this specific problem for me?
Here is the regex search and replace I'm using.
regex: (rs.*)\.Fields\[\"(\w+)\"\].Value
replace: $1.GetValue<object>("$2")
Works
code to search: ...rsProducts.Fields["Price"].Value...
result: rsProducts.GetValue<object>("Price")
This, as I want it to, finds the rs (recordset) of something and changes the way that we extract the value to use an extension method.
Does Not Work
code to search: ...rsProducts.Fields["Price"].Value + rsProducts.Fields["Price2"].Value...
result: rsProducts.Fields["Price"].Value + rsProducts.Fields["Price2"].Value
should be: rsProducts.GetValue<object>("Price") + rsProducts.GetValue<object>("Price2")
In this case the search does match 2 distinct instances but instead it matches the entire line. Here's a pic from regexr.com.
// sorry I don't have the reputation to post the image as an image but heres the
Link to Example Image
You're not dealing handling the case for the + between the two.
(rs.*?)\.Fields\[\"(\w+)\"\].Value

Best way to remove unknown characters and spaces using C#? [duplicate]

This question already has answers here:
How can I remove the spaces, tabs, new lines between characters using c#'s REGEX?
(2 answers)
Closed 6 years ago.
Unknown Characters:
|b9-12-2016,¢Xocoak¡LO2A35(2)(b)¡ÓocORe3ao-i|],¢Xa?u¡±o¡±i?¢X$3,597,669On 9-12-2016, the price adjusted to $3,597,669 dueto the reason allowed under section 35(2)(b) of theOrdinance
Good Result:
$3,597,669On 9-12-2016, the price adjusted to $3,597,669 due to the reason allowed under section 35 of the Ordinance
You should be able to use regular expressions to do this. You can use the Regex.Replace method to run regular expressions on your text. Regular expressions are patterns that a regular expression engine tries to match in input text. I recommend that you take a look at the MSDN article here. You can also take a look at the documentation for the Regex.Replace method here. For example, in order to remove the letter c you could use this snippet of code:
output = Regex.Replace(input, "c", "", RegexOptions.IgnoreCase);
This would replace both lowercase and capital Cs because the ignore case option is turned on.
If it is a standard pattern as what you've told me. Use the following code. It takes everything after the last $ sign.
string str = "|b9-12-2016,¢Xocoak¡LO2A35(2)(b)¡ÓocORe3ao-i|],¢Xa?u¡±o¡±i?¢X$3,597,669On 9-12-2016, the price adjusted to $3,597,669 dueto the reason allowed under section 35(2)(b) of theOrdinance";
var result = str.Substring(str.LastIndexOf('$'));

Regular expression for characters after '.' [duplicate]

This question already has answers here:
How do I match an entire string with a regex?
(8 answers)
Closed 6 years ago.
I need to detect following format when I enter serial number like
CK123456.789
I used Regex with pattern of
^(CV[0-9]{6}\.[0-9]{3}
to match but if I enter
CK123456.7890
it still able to proceed without flagging error. Is there a better regular expression to detect the trailing 3 digits after '.'?
Depending on how you use the regular expression matcher, you might need to enclose it in ^...$ which forces the pattern to be the whole string, i.e.
^CK[0-9]{6}\.[0-9]{3}$ (Note the CK prefix).
I've also removed your leading (mismatched) parenthesis.

Categories

Resources