I have a file loaded into a stream reader. The file contains ip addresses scattered about. Before and after each IP addresses there is "\" if this helps. Also, the first "\" on each line ALWAYS comes before the ip address, there are no other "\" before this first one.
I already know that I should use a while loop to cycle through each line, but I dont know the rest of the procedure :<
For example:
Powerd by Stormix.de\93.190.64.150\7777\False
Cupserver\85.236.100.100\8178\False
Euro Server\217.163.26.20\7778\False
in the first example i would need "93.190.64.150"
in the second example i would need "85.236.100.100"
in the third example i would need "217.163.26.20"
I really struggle with parsing/splicing/dicing :s
thanks in advance
*** I require to keep the IP in a string a bool return is not sufficient for what i want to do.
using System.Text.RegularExpressions;
…
var sourceString = "put your string here";
var match = Regex.Match(sourceString, #"\b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\b");
if(match.Success) Console.WriteLine(match.Captures[0]);
This will match any IP address, but also 999.999.999.999. If you need more exactness, see details here: http://www.regular-expressions.info/examples.html
The site has lots of great info an regular expressions, which is a domain-specific language used within most popular programming languages for text pattern matching. Actually, I think the site was put together by the author of Mastering Regular Expressions.
update
I modified the code above to capture the IP address, as you requested (by adding parentheses around the IP address pattern). Now we check to make sure there was a match using the Success property, and then you can get the IP address using Captures[0] (because we only have one capture group, we know to use the first index, 0).
EDIT: Edited to take account of the "slash at beginning and end" part.
Try to match each line against a regex of (all as one string; split for readability).
\\(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\
Full sample:
using System;
using System.Text.RegularExpressions;
class Program
{
private static readonly Regex Pattern = new Regex
(#"\\(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}" +
#"(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\");
static void Main(string[] args)
{
Console.WriteLine(ContainsAddress("Bad IP \\400.100.100.100\\ xyz"));
Console.WriteLine(ContainsAddress("Good IP \\200.255.123.100\\ xyz"));
Console.WriteLine(ContainsAddress("No IP \\but slashes\\ xyz"));
Console.WriteLine(ContainsAddress("Long IP \\123.100.100.100.100\\ x"));
}
static bool ContainsAddress(string line)
{
return Pattern.IsMatch(line);
}
}
Looks like, for each line, you're looking for "^.*?\\(?<address>[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})\\.*$"
To break this down:
^ - matches the beginning of a line, helping ensure you'll start matching at the right point.
.*? - matches any character zero or more times, but as few times as possible.
\ - matches the backslash character. Coupled with the two prior terms, this will get us to the first backslash of a line so we can capture the next term.
(?) - specifies a named group of characters that can be referred to from within matches. The text of the full match will be the entire line the way this is written, but this named group will be only what you're looking for out of the match.
[0-9]{1,3} - matches a sequence of between 1 and 3 digit characters. The [0-9] is equivalent to \d but I find that when a regex has fewer backslashes and more characters you'd normally see in the string, it's more understandable.
. - matches a period.
.* - matches any character zero or more times. Used to skip to the end.
$ - matches the end of a line.
Related
I have a problem with a regex command,
I have a file with a tons of lines and with a lot of sensitive characters,
this is an Example with all sensitive case 0123456789/*-+.&é"'(-è_çà)=~#{[|`\^#]}²$*ù^%µ£¨¤,;:!?./§<>AZERTYUIOPMLKJHGFDSQWXCVBNazertyuiopmlkjhgfdsqwxcvbn
I tried many regex commands but never get the expected result,
I have to select everything from Example to the end
I tried this command on https://www.regextester.com/ :
\sExample(.*?)+
Image of the result here
And when I tried it in C# the only result I get was : Example
I don't understand why --'
Here's a quick chat about greedy and pessimistic:
Here is test data:
Example word followed by another word and then more
Here are two regex:
Example.*word
Example.*?word
The first is greedy. Regex will match Example then it will take .* which consumes everything all the way to the END of the string and the works backwards spitting a character at a time back out, trying to make the match succeed. It will succeed when Example word followed by another word is matched, the .* having matched word followed by another (and the spaces at either end)
The second is pessimistic; it nibbled forwards along the string one character at a time, trying to match. Regex will match Example then it'll take one more character into the .*? wildcard, then check if it found word - which it did. So pessimistic matching will only find a single space and the full match in pessimistic mode is Example word
Because you say you want the whole string after Example I recommend use of a greedy quantifier so it just immediately takes the whole string that remains and declares a match, rather than nibbling forwards one at a time (slow)
This, then, will match (and capture) everything after Example:
\sExample(.*)
The brackets make a capture group. In c# we can name the group using ?<namehere> at the start of the brackets and then everything that .* matches can be retrieved with:
Regex r = new Regex("\sExample(?<x>.*)");
Match m = r.Match("Exampleblahblah");
Console.WriteLine(m.Groups["x"].Value); //prints: blahblah
Note that if your data contains newlines you should note that . doesn't match a newline, unless you enable RegexOptions.SingleLine when you create the regex
I have the following:
https://www.example.com/my-suburl/sub-dept/xx-xxxx-xx-yyyyyy/
Im trying to find the 'yyyyy' in the url so far I have:
(.*)\/sub-dept\/(.*[^\/])\/([^\/]*)$
Which matches on:
https://www.example.com/my-suburl
and
xx-xxxx-xx-yyyyyy
However like i say I need the 'yyyyy' specific match
NON-C#-BASED SOLUTION
If xx are numbers in the actual strings, just use
\d+(?=\/$)
Or else, use
[^-\/]*(?=\/?$)
See Demo 1 and Demo 2
Note that in JS, there is no look-behind, thus, if you must check if /sub-dept/ is in front of the substring you need, you will have to rely on capturing group mechanism:
\/sub-dept\/[^\/]*-([^-\/]*)\/?
See yet another demo
ORIGINAL ANSWER
Here is a regex you can use
(?<=/sub-dept/[^/]*-)[^/-]*(?=/$)
See demo
The regex matches a substring that contains 0 or more characters other than a / or - that is...
(?<=/sub-dept/[^/]*-) preceded with /sub-dept/ followed by 1 or more characters other than / and then a hyphen
(?=/$) - is followed by a / symbol right at the end of the string.
Or, there is a non-regex way: split the string by /, get the last part and split by -. Here is an example (without error/null checking for demo sake):
var result = text.Trim('/').Split('/').LastOrDefault().Split('-').LastOrDefault();
I have the following string:
"483 432,96 (HM: 369 694,86; ZP: 32 143,48; NP: 4 507,19; SP: 40 800,62; SDS: 4 389,84; IP: 9 497,14; PvN: 3 157,25; ÚP: 3 102,14; GP: 808,28; PRFS: 15 332,16)"
What I am trying to do, is to retrieve all values (if they exist) for the following letters (I highlighted necessary values in bold below):
483 432,96 (HM: 369 694,86; ZP: 32 143,48; NP: 4 507,19; SP: 40 800,62; SDS: 4 389,84; IP: 9 497,14; PvN: 3 157,25; ÚP: 3 102,14; GP: 808,28; PRFS: 15 332,16)
I tried to retrieve values one by one with the following regex:
string regex = "NP: ^[0-9]^[\\s\\d]([.,\\s\\d][0-9]{1,4})?$";
But with no luck either (I am a newbie in Regex patterns).
Is it possible to retrieve all values in a one string (and then simply loop through the results), or do I have to go one key at the time?
Here is my full code:
string sTest = "483 432,96 (HM: 369 694,86; ZP: 32 143,48; NP: 4 507,19; SP: 40 800,62; SDS: 4 389,84; IP: 9 497,14; PvN: 3 157,25; ÚP: 3 102,14; GP: 808,28; PRFS: 15 332,16)";
string regex = "NP: ^[0-9]^[\\s\\d]([.,\\s\\d][0-9]{1,4})?$";
System.Text.RegularExpressions.MatchCollection coll = System.Text.RegularExpressions.Regex.Matches(sTest, regex);
String result = coll[0].Groups[1].Value;
You can't get them all with one regex, unless you are absolutely sure that they will all appear next to each other. Also, what would be the point of getting them all and having to split the result afterwards anyway. Here is a regex which would find the values you wanted:
(ZP|NP|SP|SDS|IP|PvN|ÚP|GP|PRFS): ([^;)]+)
Now the first group will be the key and the second group will be the value.
The idea is:
(x|y|z) matches either x or y or z
[^;)]+ matches something, which is not ; (because this is how they are currently delimited) or ) (for the last position) one or more times
I tried to retrieve values one by one with the following regex:
Let's fix your one-by-one regex:
Caret ^ outside the [] character class means "start of line", so your expression with two carets in different places will not match anything.
Use \d instead of [0-9] and \D instead of [^0-9]
Here is one expression that matches NP: pattern (demo):
NP: \d+\D\d+([.,]\d{1,4})?
Now convert it to an expression that matches other tags like this:
(NP|ZP|SP|...): \d+\D\d+([.,]\d{1,4})?
Applying this pattern in a loop repeatedly will let you extract the tags one by one.
Here is an excerpt from my code:
string[] myStr =
{
" Line1: active 56:09 - tst0063, tst0063",
"Contacts accosiated with line 1 - tst0063, tst0063",
"Line 1: 00:00:32 Wrap: 00:00:20 - tst0063, tst0063",
"Line 1: 00:00:17 Active: 00:00:15 - tst0064, tst0064"
};
string sPattern = #"^Line(\s*\S*)*tst0063$";
RegexOptions options = RegexOptions.IgnoreCase;
foreach (string s in myStr)
{
System.Console.Write(s);
if (System.Text.RegularExpressions.Regex.IsMatch(s, sPattern, options))
{
System.Console.WriteLine(" - valid");
}
else
{
System.Console.WriteLine(" - invalid");
}
System.Console.ReadLine();
}
RegularExpressions.Regex.IsMatch hangs while working on the last line. I did some experiments, but still can't understand why it's hanging when there is no match in the end of the line. Please help!
The question is not why the fourth test hangs, but why the first three don't. The first string starts with a space, and the second starts with Contacts, neither of which matches the regex ^Line, so the first two match attempts fail immediately. The third string matches the regex; although it takes much longer than it should (for reasons I'm about to explain), it still seems instantaneous.
The fourth match fails because the string doesn't match the end part of the regex: tst0063$. When that fails, the regex engine backs up to the variable portion of the regex, (\s*\S*)*, and starts trying all the different ways to fit that onto the string. Unlike the third string, this time it has to try every every possible combination of zero or more whitespace characters (\s*) followed by zero or more non-whitespace characters (\S*), zero or more times, before it can give up. The possibilities aren't infinite, but they might as well be.
You were probably thinking of [\s\S]*, which is a well-known idiom for matching any character including newlines. It's used in JavaScript, which doesn't have a way to make the dot (.) match line separator characters. Most other flavors let you specify a matching mode that changes the behavior of the dot; some call it DOTALL mode, but .NET uses the more common Singleline.
string sPattern = #"^Line.*tst0063$";
RegexOptions options = RegexOptions.IgnoreCase | RegexOptions.Singleline;
You can also use inline modifiers:
string sPattern = #"(?is)^Line.*tst0063$";
UPDATE: In response to your comment, yes, it does seem odd that the regex engine can't tell that any match must end with tst0063. But it's not always so easy to tell. How much effort should it put into looking for shortcuts like that? And how many shortcuts can you bolt onto the normal matching algorithm before all matches (successful as well as failed) become too slow?
.NET has one of the best regex implementations out there: fast, powerful, and with some truly amazing features. But you have to think about what you're telling it to do. For example, if you know there has to be at least one of something, use +, not *. If you had followed that rule, you wouldn't have had this problem. This regex:
#"^Line(\s+\S+)*tst0063$"
...works just fine. (\s+\S+)* is a perfectly reasonable way to match zero or more words, where words are defined as one or more non-whitespace characters, separated from other words by one or more whitespace characters. (Is that what you were trying to do?)
Move System.Console.ReadLine(); outside the foreach loop.
You're blocking the thread at the end of the first iteration of the loop, waiting for user input.
I have an application where I need to parse a string to find all the e-mail addresses in a that string. I am not a regular espression guru by any means and not sure what the differnce is between some expressions. I have found 2 expressions that, apprently, will match all of the e-mail addresses in a string. I cannot get either to work in my C# application. Here are the expressions:
/\b([A-Z0-9._%-]+)#([A-Z0-9.-]+\.[A-Z]{2,4})\b/i
^[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}$
Here is an example string:
Adam
<mailto:aedwards#domain.com?subject=Your%20prospect%20Search%20-%20ID:
%2011111> Edwards - Prospect ID: 11111, Ph: (555) 555-5555
Al
<mailto:Al#anotherdomain.com?subject=Your%20prospect%20Search%20-%20
ID:%20222222> Grayson - Prospect ID: 222222, Ph:
Angie
Here is the code in c#:
var mailReg = new Regex(EmailMatch, RegexOptions.IgnoreCase | RegexOptions.Multiline);
var matches = mailReg.Matches(theString);
The first regex is a Perl object (delimited by slashes). Drop the slashes and the mode modifier (i), and it should work:
EmailMatch = #"\b([A-Z0-9._%-]+)#([A-Z0-9.-]+\.[A-Z]{2,6})\b"
Also, .museum is a valid domain, so {2,6} is a bit better.
The second regex only matches entire strings that consist of nothing but an email address.
I would leave the \b intact.
The first of your two examples should work if you remove the \b from both ends. The \b means that it expects a word boundary (a space, end of line, &c.) before and after the email address and this is not present in your case.
(Please do not use your new found powers for evil.)
This expression worked: ([a-zA-Z0-9_-.]+)#([a-zA-Z0-9_-.]+).([a-zA-Z]{2,5})
Thanks for looking!