how can I find a number in web page.
Let me to take an example:
for example, if I wanna to find 1234 from following numbers, it just show me the 1234, not 123412(which is including 1234).
1234124 -
113412 -
352523434653 -
1234
I wrote the following code, How can i change it to get my result from it?
foreach(DataRow row in dt.Rows)
{
string url = "http://play.dcc.fc.up.pt:2241/PTECH/recommenders/music/<userid>?groups=<userid>";
var test = url.Replace("<userid>", Convert.ToString(row["UserID"]));
System.Diagnostics.Process.Start(url);
string client = (new WebClient()).DownloadString("http://play.dcc.fc.up.pt:2241/PTECH/recommenders/music/UserID?groups=UserID");
if (client.Contains(Convert.ToString(TrackID)))
Regular expression that sets some sort of boundary before/after the value should work fine. I.e. word boundary \b will let you pick number if it have spaces/other separators around it (don't forget to check if match found when using it):
var value = Regex.Matches("foo,1234 bar", #"\b1234\b")[0].Value;
Check Regular Expression Language reference for more options.
Related
I'm really struggling to find a way of getting the following to work, I have some data in the following format ===some text=== I want to replace the === around the text with html tags.
I've tried using Match and replace, but I get a bad compile constant value, I have also tried Replace {tag} with a value or completely remove {any-tag} but that just removes all the text. I have also tried http://www.rexegg.com/regex-lookarounds.html but none work, I think the problem I'm having is because the tags around the text do not have closing tags I'm unable to find the text
So I have tried something like this:
string format = Regex.Replace(data.FirstOrDefault().countrylist, "=== This could be any text ===", " </p><p class=\"strong\">Need to keep text here<p>");
example of how the text looks:
====Rise and fall of the Roman empire====
====20th and 21st centuries====
So I would want it to look:
</p><p class=\"strong\">Rise and fall of the Roman empire<p>
</p><p class=\"strong\">20th and 21st centuries<p>
I'm not the greatest at regular expressions, and all my attempts have failed so any help would be much appreciated.
Try this one:
var yourstring = "===20th and 21st centuries===";
var regex = new Regex(Regex.Escape("==="));
// The last 1 tells to replace only the first occurence of the Escape
yourstring = regex.Replace(yourstring, "</p><p class=\"strong\">", 1);
yourstring = regex.Replace(yourstring, "<p>", 1);
Do not forget the error handling, I don't exactly know what happens if it wants to replace an occurence and can't find it
Edit: If you have multiple entries which should be replaced, loop the replace part until it will not be able to replace anymore then it will throw an error you can catch to continue
The following worked in my environment:
string text = "===Rise and fall of the Roman empire===";
var pattern = #"===(.*)===";
var regex = new Regex(pattern);
var match = regex.Match(text);
var result = string.Concat("</p><p class=\"strong\">", match.Groups[1].Value, "<p>");
Regards,
Given the following text in an email body:
DO NOT MODIFY SUBJECT LINE ABOVE. Sending this email signifies my Request to Delay distribution of this Product Change Notification (PCN) to 9001 (Qwest). The rationale for this Request to Delay is provided below:
This is the reason I need to capture.
It can be many many lines long.
And go on for a long time
I'm trying to capture all the text that follows "... is provided below:".
The pattern being passed into BodyRegex is:
.*provided below:(?<1>.*)
The code being executed is:
Regex regex2 = new Regex(BodyRegex, RegexOptions.IgnoreCase | RegexOptions.Multiline);
string note = null;
Match m2 = regex2.Match(body);
if (m2.Success)
{
note = m2.Groups[1].Value;
}
The match is not being found.
What match pattern do I need to use to capture all lines of text following "is provided below:"?
The section (?...) is look ahead syntax which isn't what you want.
You might want to try a look behind instead:
(?<=provided below:)[.|\n|\W|\w]*
I've had issues with .NET not recognizing end of line characters the way you'd expect it to using .* , hence the or conditions.
Use this regex with single line option
^.*?provided below:(.*?)$
works here
This question is, in a way, continuation of my previously answered question: Getting "Unterminated [] set." Error in C#
I'm using regular expression in C# to extract URLs:
Regex find = new Regex(#"(?<First>[,""]url=)(?<Url>[^\\]+)(?<Last>\\u00)");
Where the text contains URLs in the format:
,url=http://domain.com?itag=25\u0026,url=http://hello.com?itag=11\u0026
I'm getting the entire URL in 'Url' group, but I'd also like to have the itag value in a separate "iTag" group. I know this can be done using sub-groups and I've been trying but can't figure out exactly how to do this.
You already have named groups defined in the Regex. The syntax ?<First> is naming everything within those parenthesis First.
When you match using Regex, using the Groups property to access the GroupCollection and extract a group value by name.
var first = regex.Match(line).Groups["First"].Value;
This will add an additional group for iTag, but retain the full Url. Move it outside the other parenthesis to change this.
(?<First>[,""]url=)(?<Url>[^\?]+?itag=(?<iTag>[0-9]*))(?<Last>\\u0026)
Here's the code.
Regex regex = new Regex("(?<First>[,\"]url=)(?<Url>[^\\?]*\\?itag=(?<iTag>[0-9]*))(?<Last>\\u0026)");
string input = ",url=http://domain.com?itag=25\u0026,url=http://hello.com?itag=11\u0026";
foreach(Match match in regex.Matches(input))
{
System.Console.WriteLine("1. "+match);
System.Console.WriteLine(" 1. "+match.Groups["First"]);
System.Console.WriteLine(" 2. "+match.Groups["Url"]);
System.Console.WriteLine(" 3. "+match.Groups["iTag"]);
System.Console.WriteLine(" 4. "+match.Groups["Last"]);
}
Results:
1. ,url=http://domain.com?itag=25&
1. ,url=
2. http://domain.com?itag=25
3. 25
4. &
1. ,url=http://hello.com?itag=11&
1. ,url=
2. http://hello.com?itag=11
3. 11
4. &
I want to create a Regex for url in order to get all links from input string.
The Regex should recognize the following formats of the url address:
http(s)://www.webpage.com
http(s)://webpage.com
www.webpage.com
and also the more complicated urls like:
- http://www.google.pl/#sclient=psy&hl=pl&site=&source=hp&q=regex+url&pbx=1&oq=regex+url&aq=f&aqi=g1&aql=&gs_sm=e&gs_upl=1582l3020l0l3199l9l6l0l0l0l0l255l1104l0.2.3l5l0&bav=on.2,or.r_gc.r_pw.&fp=30a1604d4180f481&biw=1680&bih=935
I have the following one
((www\.|https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+[\w\d:##%/;$()~_?\+-=\\\.&]*)
but it does not recognize the following pattern: www.webpage.com. Can someone please help me to create an appropriate Regex?
EDIT:
It should works to find an appropriate link and moreover place a link in an appropriate index like this:
private readonly Regex RE_URL = new Regex(#"((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+[\w\d:##%/;$()~_?\+-=\\\.&]*)", RegexOptions.Multiline);
foreach (Match match in (RE_URL.Matches(new_text)))
{
// Copy raw string from the last position up to the match
if (match.Index != last_pos)
{
var raw_text = new_text.Substring(last_pos, match.Index - last_pos);
text_block.Inlines.Add(new Run(raw_text));
}
// Create a hyperlink for the match
var link = new Hyperlink(new Run(match.Value))
{
NavigateUri = new Uri(match.Value)
};
link.Click += OnUrlClick;
text_block.Inlines.Add(link);
// Update the last matched position
last_pos = match.Index + match.Length;
}
I don't know why your result in match is only http:// but I cleaned your regex a bit
((?:(?:https?|ftp|gopher|telnet|file|notes|ms-help):(?://|\\\\)(?:www\.)?|www\.)[\w\d:##%/;$()~_?\+,\-=\\.&]+)
(?:) are non capturing groups, that means there is only one capturing group left and this contains the complete matched string.
(?:(?:https?|ftp|gopher|telnet|file|notes|ms-help):(?://|\\\\)(?:www\.)?|www\.) The link has now to start with something fom the first list followed by an optional www. or with an www.
[\w\d:##%/;$()~_?\+,\-=\\.&] I added a comma to the list (otherwise your long example does not match) escaped the - (you were creating a character range) and unescaped the . (not needed in a character class.
See this here on Regexr, a useful tool to test regexes.
But URL matching is not a simple task, please see this question here
I've just written up a blog post on recognising URLs in most used formats such as:
www.google.com
http://www.google.com
mailto:somebody#google.com
somebody#google.com
www.url-with-querystring.com/?url=has-querystring
The regular expression used is /((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+#)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+#)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%#.\w_]*)#?(?:[\w]*))?)/ however I would recommend you got to http://blog.mattheworiordan.com/post/13174566389/url-regular-expression-for-links-with-or-without-the to see a complete working example along with an explanation of the regular expression in case you need to extend or tweak it.
The regex you give doesn't work for www. addresses because it is expecting a URI scheme (the bit before the URL, like http://). The 'www.' part in your regular expression doesn't work because it would only match www.:// (which is meaningless)
Try something like this instead:
(((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+)|(www\.)[\w\d:##%/;$()~_?\+-=\\\.&]*)
This will match something with a valid URI scheme, or something beginning with 'www.'
I have the following string:
i:0#.w|domain\x123456
I know about the possibility to group searchterms by using <mysearchterm> and calling it via RegEx.Match(myRegEx).Result("${mysearchtermin}");.
I also know that I can lookbehind assertions like (?<= subexpression) via MSDN. Could someone help me in geting the (including the possibility to search for them via groups as shown before):
domain ("domain")
user account ("x12345")
I don't need anything from before the pipe character (nor the pipe character itself) - so basically I am interested in domain\x123456.
As others have noted, this can be done without regex, or without lookbehinds. That being said, I can think of reasons you might want them: to write a RegexValidator instead of having to roll up a CustomValidator, for example. In ASP.NET, CustomValidators can be a little longer to write, and sometimes a RegexValidator does the job just fine.
As far as lookbehinds, the main reason you'd want one for something like this is if the target string could contain irrelevant copies of the |domain\x123456 pattern:
foo#bar|domain\x999999 says: 'i:0#.w|domain\x888888i:0#.w|domain\x123456|domain\x000000'
If you only wanted to grab domain\x888888 and domain\x123456 out of that, a lookbehind could be useful. Or maybe you just want to learn about lookbehinds. Anyway, since we only have one sample input, I can only guess at the rules; so perhaps something like this:
#"(?<=[a-z]:\d#\.[a-z]\|)(?<domain>[^\\]*)\\(?<user>x\d+)"
Lookarounds are one of the most subtle and misunderstood features of regex, IMHO. I've gotten a lot of use out of them in preventing false positives, or in limiting the length of matches when I'm not trying to match the entire string (for example, if I want only the 3-digit numbers in blah 1234 123 1234567 123 foo, I can use (?<!\d)\d{3}(?!\d)). Here's a good reference if you want to learn more about named groups and lookarounds.
You can just use the regex #"\|([^\\]+)\\(.+)".
The domain and user will be in groups 1 and 2, respectively.
You don't need regular expressions for that.
var myString = #"i:0#.w|domain\x123456";
var relevantParts = myString.Split('|')[1].Split('\\');
var domain = relevantParts[0];
var user = relevantParts[1];
Explanation: String.Split(separator) returns an array of substrings separated by separator.
If you insist of using regular expressions, this is how you do it with named groups and Match.Result, based on SLaks answer (+1, by the way):
var myString = #"i:0#.w|domain\x123456";
var r = new Regex(#"\|(?<domain>[^\\]+)\\(?<user>.+)");
var match = r.Matches(myString)[0]; // get first match
var domain = match.Result("${domain}");
var user = match.Result("${user}");
Personally, however, I would prefer the following syntax, if you are just extracting the values:
var domain = match.Groups["domain"];
var user = match.Groups["user"];
And you really don't need lookbehind assertions here.