Regex from a html parsing, how do I grab a specific string?

Regex from a html parsing, how do I grab a specific string? - c#

I'm trying to specifically get the string after charactername= and before " >. How would I use regex to allow me to catch only the player name?
This is what I have so far, and it's not working. Not working as it doesn't actually print anything. On the client.DownloadString it returns a string like this:
<a href="https://my.examplegame.com/charactername=Atro+Roter" >
So, I know it actually gets string, I'm just stuck on the regex.
using (var client = new WebClient())
{
//Example of what the string looks like on Console when I Console.WriteLine(html)
//<a href="https://my.examplegame.com/charactername=Atro+Roter" >
// I want the "Atro+Roter"
string html = client.DownloadString(worldDest + world + inOrderName);
string playerName = "https://my.examplegame.com/charactername=(.+?)\" >";
MatchCollection m1 = Regex.Matches(html, playerName);
foreach (Match m in m1)
{
Console.WriteLine(m.Groups[1].Value);
}
}

I'm trying to specifically get the string after charactername= and before " >. 
So, you just need a lookbehind with lookahead and use LINQ to get all the match values into a list:
var input = "your input string";
var rx = new Regex(#"(?<=charactername=)[^""]+(?="")";
var res = rx.Matches(input).Cast<Match>().Select(p => p.Value).ToList();
The res variable should hold all your character names now.

I assume your issue is trying to parse the URL. Don't - use what .NET gives you:
var playerName = "https://my.examplegame.com/?charactername=NAME_HERE";
var uri = new Uri(playerName);
var queryString = HttpUtility.ParseQueryString(uri.Query);
Console.WriteLine("Name is: " + queryString["charactername"]);
This is much easier to read and no doubt more performant.
Working sample here: https://dotnetfiddle.net/iJlBKW

All forward slashes must be unescaped with back slashes like this \/
string input = #"<a href=""https://my.examplegame.com/charactername=Atro+Roter"" >";
string playerName = #"https:\/\/my.examplegame.com\/charactername=(.+?)""";
Match match = Regex.Match(input, playerName);
string result = match.Groups[1].Value;
Result = Atro+Roter

Related

How to get a loop of all tagged users

I am trying to get all tagged users from a String in ASP.NET
For example the string "Hello my name is #Naveh and my friend is named #Amit", I would like it to return me "Naveh" and "Amit" in a way I can send each of those user a notification method, like a loop on the code behind.
The only way I know to catch those Strings is by the 'Replace' method like that: (But that is only good for editing of course)
Regex.Replace(comment, #"#([\S]+)", #"<b>$1</b>")
You can't loop those strings like that. How can I loop all of the tagged users in the code behind?

You should probably use Regex.Match.
Regex.Match
E.g.
string pat = #"#([a-z]+)";
string src = "Hello my name is #Naveh and my friend is named #Amit";
string output = "";
// Instantiate the regular expression object.
Regex r = new Regex(pat, RegexOptions.IgnoreCase);
// Match the regular expression pattern against a text string.
Match m = r.Match(src);
while (m.Success)
{
string matchValue = m.Groups[1].Value; //m.Groups[0] = "#Name". m.Groups[1] = "Name"
output += "Match: " + matchValue + "\r\n";
m = m.NextMatch();
}
Console.WriteLine(output);
Console.ReadLine();

You can use Regex.Matches to get MatchCollection object and loot through it with foreach. MSDN

Regex pattern BBCode to Wiki Notation, C#

I am tasked with converting BB code to WIKI notation and thanx to the many examples on SO I have cracked most of the tougher nuts. This is my first foray into Regex and I'm trying to learn it as I go (I would prefer stringbuilder but it doesnt seem to work with BB code). I have 4 items I need replaced that I cannot seem to create the proper pattern to identify: (original string on left, what I need on right after double dash)
the first item is a problem child because the wiki engine adds a new line where the spaces are. It is not a separate field but part of a larger string so I cant TRIM() it. I am currently using
result = result.Replace("[b]", "*").Replace("[/b]", "*");
the img issue is a need to somehow include the attributes if possible in the given format.
for the last 2 I am stumped. I have used
Regex r = new Regex(#"<a .*?href=['""](.+?)['""].*?>(.+?)</a>");
foreach (var match in r.Matches(multistring).Cast<Match>().OrderByDescending(m => m.Index))
{
string href = match.Groups[1].Value;
string txt = match.Groups[2].Value;
string wikilink = "[" + txt + "|" + href + "]";
sb.Remove(match.Groups[2].Index, match.Groups[2].Length);
sb.Insert(match.Groups[2].Index, wikilink);
}
in the past for HTML but cant seem to refactor it for my current needs. Suggestions, links to resources, all would be appreciated.
EDIT
solved the img issue, though it's not pretty and I still risk removing a closing [/img] tag that may not be caught earlier. The [img] code is fairly consistent, so I used:
Regex imgparser = new Regex(#"\[img[^\]]*\]([^\[]*)");
foreach (var itag in imgparser.Matches(multistring).Cast<Match>().OrderByDescending(m => m.Index))
{
string isrc = itag.Groups[1].Value;
string wikipic = itag.ToString().Replace("[img ", "!" + isrc).Replace("width=", "!width=").Replace("height=", ",height=").Replace("]" + isrc, string.Empty);
result = result.Replace(itag.ToString(), wikipic);
}
result = result.Replace("[/img]", "!");

I can give you a little example for the last case :
string str1 = "[url=http://aadqsdqsd]link[/url]";
var pattern = #"^\[url=(.*)\](.*)\[\/url\]$";
var match = Regex.Match(str1, pattern);
var result = string.Format("[{0}| {1}]", match.Groups[2].Value, match.Groups[1].Value);
//[link| http://aadqsdqsd]
Is it what you want ?
EDIT
if you want to match a larger string you can do :
var strTomatch = "[url=http://1]link1[/url][url=http://2]link2[/url]" + Environment.NewLine +
"[url = http://3]link3[/url]" + Environment.NewLine +
"[url=http://4]link4[/url]";
var match = Regex.Match(strTomatch, #"\[url\s*=\s*(.*?)\](.*?)\[\/url\]", RegexOptions.Multiline);
while (match.Success)
{
var result = string.Format("[{0}| {1}]", match.Groups[2].Value, match.Groups[1].Value);
Debug.WriteLine(result);
match = match.NextMatch();
}
Output
[link1| http://1]
[link2| http://2]
[link3| http://3]
[link4| http://4]

Regex replace all matched tokens with lowercase

Given the following html text snippet
<th>Member name:</th>
<td>$$FULLNAME$$</td>
<th>Club:</th>
<td>$$ClubName$$</td>
<th>Business Category:</th>
<td>$$SubCategory$$</td>
I am trying to replace all the tokens e.g. $$FULLNAME$$ becomes $$fullname$$ using C#, the output should be
<th>Member name:</th>
<td>$$fullname$$</td>
<th>Club:</th>
<td>$$clubname$$</td>
<th>Business Category:</th>
<td>$$subcategory$$</td>
I have come up with this which does not work correctly as the \Lis not converting the matches to lowercase
public static string TokenReplacer(string value)
{
var pattern = Regex.Escape("$$") + "(.*?)" + Regex.Escape("$$");
var regex = new Regex(pattern);
return regex.Replace(value, Regex.Unescape("$$$$") + #"\L$1" + Regex.Unescape("$$$$"));
}

var output = Regex.Replace(input, #"\$\$.+?\$\$", m => m.Value.ToLower());

String operation in C#

I have an input string which data is coming in the following format:
"http://testing/site/name/lists/tasks"
"http://testing/site/name1/lists/tasks"
"http://testing/site/name2/lists/tasks" etc.,
How can I extract only name, name1, name2, etc. from this string?
Here is what I have tried:
SiteName = (Url.Substring("http://testing/site/".Length)).Substring(Url.Length-12)
It is throwing an exception stating StartIndex cannot be greater than the number of characters in the string. What is wrong with my expression? How can I fix it? Thanks.

A better option will be to use Regex matching/replace
But the following will also work based on the assumption that all the urls will be similar in pattern
var value = Url.Replace(#"http://testing/site/", "").Replace(#"/lists/tasks", "");
The other option will be to use Uri
var uriAddress = new Uri(#"http://testing/site/name/lists/tasks");
then breaking down uri parts according to your requirement

This is a job for regexp:
string strRegex = #"http://testing/site/(.+)/lists/tasks";
RegexOptions myRegexOptions = RegexOptions.IgnoreCase;
Regex myRegex = new Regex(strRegex, myRegexOptions);
string strTargetString = #"http://testing/site/name/lists/tasks" + "\r\n" + #"http://testing/site/name1/lists/tasks" + "\r\n" + #"http://testing/site/name2/lists/tasks" + "\r\n" + #"http://testing/site/name3/lists/tasks";
foreach (Match myMatch in myRegex.Matches(strTargetString))
{
if (myMatch.Success)
{
// Add your code here. Reference to first group
}
}

You could also use the Uri class to get the desired part:
string[] urlString = urlText.Split();
Uri uri = default(Uri);
List<string> names = urlString
.Where(u => Uri.TryCreate(u, UriKind.Absolute, out uri))
.Select(u => uri.Segments.FirstOrDefault(s => s.StartsWith("name", StringComparison.OrdinalIgnoreCase)))
.ToList();
Assuming that the part always start with "name".

Because the Substring function with a single argument takes the index of the starting charachter and consume all to the end of the string. It will be a little naive, but you can start at charachter 19: Url.Substring(19);

.NET Regex question

I'm trying to parse some data out of a website. The problem is that a javascript generates the data, thus I can't use a HTML parser for it. The string inside the source looks like:
<a href="http:www.domain.compid.php?id=123">
Everything is constant except the id that comes after the =. I don't know how many times the string will occur either. Would appreciate any help and an explanation on the regex example if possible.

Do you need to save any of it? A blanket regex href="[^"]+"> will match the entire string. If you need to save a specific part, let me know.
EDIT: To save the id, note the paren's after id= which signifies to capture it. Then to retrieve it, use the match object's Groups field.
string source = "a href=\"http:www.domain.compid.php?id=123\">";
Regex re = new Regex("href=\"[^\"]+id=([^\"]+)\">");
Match match = re.Match(source);
if(match.Success)
{
Console.WriteLine("It's a match!\nI found:{0}", match.Groups[0].Value);
Console.WriteLine("And the id is {0}", match.Groups[1].Value);
}
EDIT: example using MatchCollection
MatchCollection mc = re.Matches(source);
foreach(Match m in mc)
{
//do the same as above. except use "m" instead of "match"
//though you don't have to check for success in each m match object
//since it wouldn't have been added to the MatchCollection if it wasn't a match
}

This does the parsing in javascript and creates a csv-string:
var re = /<a href="http:www.domain.compid.php\?id=(\d+)">/;
var source = document.body.innerHTML;
var result = "result: ";
var match = re(source);
while (match != null) {
result += match[1] + ",";
source = source.substring(match.index + match[0].length);
match = re(source);
}
Demo. If the html-content is not used for anything else on the server it should be sufficient to send the ids.
EDIT, For performance and reliability it's probably better to use builtin javascript-functions (or jQuery) to find the urls instead of searching the entire content:
var re = /www.domain.compid.php\?id=(\d+)/;
var as = document.getElementsByTagName('a');
var result = "result: ";
for (var i = 0; i < as.length; i++) {
var match = re(as[i].getAttribute('href'));
if (match != null) {
result += match[1] + ",";
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex from a html parsing, how do I grab a specific string? - c#

Related

How to get a loop of all tagged users

Regex pattern BBCode to Wiki Notation, C#

Regex replace all matched tokens with lowercase

String operation in C#

.NET Regex question

Categories

Resources