I need to extract the $value from the given piece of string .
string text = "<h2 class="knownclass unknownclass1 unknownclass2" title="Example title>$Value </h2>"
Using the code -:
Match m2 = Regex.Match(text, #"<h2 class=""knownclass(.*)</h2>", RegexOptions.IgnoreCase);
It gets me the full value -: unknownclass1 unknownclass2" title="Example title>$Value .But I just need the $value part.
Please tell me .Thanks in advance.
Assuming the string always follows this format, consider the following code:
var index = text.IndexOf(">");
text.Substring(index + 1, text.IndexOf("<", index));
As had been said multiple time, using a Regex for parsing HTML or XML is bad. Ignoring that, you are capturing too much. Here is an alternative Regex that should work.
#"<h2 class=""knownclass[^""]*"">(.*)</h2>"
If its always the same pattern of your string, you can consider this:
string text = "<h2 class=\"knownclass unknownclass1 unknownclass2\" title=\"Example title>$Value </h2>";
string result = "";
Regex test = new Regex(#"\<.*?\>(.*?)\</h2\>");
MatchCollection matchlist = test.Matches(text);
if (matchlist.Count > 0)
{
for (int i = 0; i < matchlist.Count; i++)
{
result = matchlist[i].Groups[1].ToString();
}
}
But if you are working with XML files or HTML files, I recommend you use XmlTextReader for XML and HtmlAgilityPack for HTML
http://msdn.microsoft.com/en-us/library/system.xml.xmltextreader.aspx
http://htmlagilitypack.codeplex.com/
hope it helps!
Related
I have this one problem, what pattern I should use to find all rich text color tags within other color tags?
So for example I have this input:
<color=yellow>Hello <color=cyan>World!</color></color>
And remove, by replacing with empty string matched tags and have this as an input after:
<color=yellow>Hello World!</color>
It could be even more tags within, for example:
<color=yellow>Hello my <color=cyan>name</color> is <color=gray>Kite <color=white>Watson!</color></color></color>
And have this after:
<color=yellow>Hello my name is Kite Watson!</color>
The reason I need this is because I use Regex to apply code highlighter to text in text box and some keywords are colorized within comments, like in below example
So I want to check and remove if there are any color tags within color tags, like in this comment example.
I'm pretty new to Regex, so currently a bit lost and not sure what to do. Can someone give me some advice on how can I accomplish this? :) Thank you!
Remove all tags except first and last and you get what you desire using following regex,
(?<!^)<[^>]*>(?!$)
This basically matches all tags except first and last using negative look around. Let me know if this works for your scenario else I can strengthen the regex further.
Check this Demo
I went with a bit different approach.
Regex tags = new Regex(#"<color=#.*?>|<\/color>");
MatchCollection matches = tags.Matches(c);
bool hasStarted = false;
int innerTags = 0;
const string tempStart = "¬¬¬¬¬¬¬Â";
const string tempEnd = "Â~Â~Â~Â~";
foreach (Match match in matches)
{
if (match.Value.Contains("<color=#"))
{
if (hasStarted)
{
var cBuilder = new StringBuilder(c);
cBuilder.Remove(match.Index, match.Length);
cBuilder.Insert(match.Index, tempStart);
c = cBuilder.ToString();
innerTags++;
}
else
{
hasStarted = true;
}
}
else if (match.Value.Equals("</color>"))
{
if (innerTags > 0)
{
var cBuilder = new StringBuilder(c);
cBuilder.Remove(match.Index, match.Length);
cBuilder.Insert(match.Index, tempEnd);
c = cBuilder.ToString();
innerTags--;
}
else if (innerTags <= 0)
{
hasStarted = false;
}
}
}
c = c.Replace(tempStart, string.Empty);
c = c.Replace(tempEnd, string.Empty);
Not sure if it's the best way, but it works quite well.
I am working on a simple facebook messenger client (without the need of a developer account) and so far what i have achieved is getting all my messages - name, preview, time. What i'd like to find is the users href link
so far i have this:
MatchCollection name = Regex.Matches(
htmlText, "<div class=\"_l2\">(.*?)</div>");
MatchCollection preview = Regex.Matches(
htmlText, "<div class=\"_l3 fsm fwn fcg\">(.*?)</div>");
MatchCollection time = Regex.Matches(
htmlText, "<div class=\"_l4\">(.*?)</div>");
which fully works.
but i've tried a few things that i found on this website but nothing seemed to work. The href goes like: <a class="_k_ hoverZoomLink" rel="ignore" href="
and ends with a ". Could someone refer me to an article that actually might help me know how i can get that href. Or even a better way of doing it other than regex but i would really prefer regex:
for (int i = 0; i < name.Count; i++)
{
String resultName = Regex.Replace(name[i].Value, #"<[^>]*>", String.Empty);
String newName = resultName.Substring(0, resultName.Length - 5);
String resultPreview = Regex.Replace(preview[i].Value, #"<[^>]*>", String.Empty);
String s = time[i].Value;
int start = s.IndexOf("data-utime=\"") + 28;
int end = s.IndexOf("</abbr>", start);
String newTime = s.Substring(start, (end - start));
threads.Add(new Thread(newName, resultPreview, newTime, ""));
}
Thanks in advanced.
Use a real html parser like HtmlAgilityPack
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlstring);
var link = doc.DocumentNode.SelectSingleNode("//a[#class='_k_ hoverZoomLink']")
.Attributes["href"].Value;
Instead of XPath, you can use Linq too
var link = doc.DocumentNode.Descendants("a")
.Where(a => a.Attributes["class"] != null)
.First(a => a.Attributes["class"].Value == "_k_ hoverZoomLink")
.Attributes["href"].Value;
I want to find all the instagram urls within a string, and replace them with the embed url.
But I'm keen on performance, as this could be 5 to 20 posts each anything up to 6000 characters with an unknown amount of instagram urls in which need converting.
Url examples (Could be any of these in each string, so would need to match all)
http://instagram.com/p/xPnQ1ZIY2W/?modal=true
http://instagram.com/p/xPnQ1ZIY2W/
http://instagr.am/p/xPnQ1ZIY2W/
And this is what I need to replace them with (An embedded version)
<img src="http://instagram.com/p/xPnQ1ZIY2W/media/?size=l" class="instagramimage" />
I was thinking about going for regex? But is this the quickest and most performant way of doing this?
Any examples greatly appreciated.
Something like:
Regex reg = new Regex(#"http://instagr\.?am(?:\.com)?/\S*");
Edited regex. However i would combine this with a stringreader and do it line by line. Then put the string (modified or not) into a stringbuilder:
string original = #"someotherText http://instagram.com/p/xPnQ1ZIY2W/?modal=true some other text
some other text http://instagram.com/p/xPnQ1ZIY2W/ some other text
some other text http://instagr.am/p/xPnQ1ZIY2W/ some other text";
StringBuilder result = new StringBuilder();
using (StringReader reader = new StringReader(original))
{
while (reader.Peek() > 0)
{
string line = reader.ReadLine();
if (reg.IsMatch(line))
{
string url = reg.Match(line).ToString();
result.AppendLine(reg.Replace(line,string.Format("<img src=\"{0}\" class=\"instagramimage\" />",url)));
}
else
{
result.AppendLine(line);
}
}
}
Console.WriteLine(result.ToString());
You mean like this?
class Program
{
private static Regex reg = new Regex(#"http://instagr\.?am(?:\.com)?/\S*", RegexOptions.Compiled);
private static Regex idRegex = new Regex(#"(?<=p/).*?(?=/)",RegexOptions.Compiled);
static void Main(string[] args)
{
string original = #"someotherText http://instagram.com/p/xPnQ1ZIY2W/?modal=true some other text
some other text http://instagram.com/p/xPnQ1ZIY2W/ some other text
some other text http://instagr.am/p/xPnQ1ZIY2W/ some other text";
StringBuilder result = new StringBuilder();
using (StringReader reader = new StringReader(original))
{
while (reader.Peek() > 0)
{
string line = reader.ReadLine();
if (reg.IsMatch(line))
{
string url = reg.Match(line).ToString();
result.AppendLine(reg.Replace(line, string.Format("<img src=\"http://instagram.com/p/{0}/media/?size=1\" class=\"instagramimage\" />", idRegex.Match(url).ToString())));
}
else
{
result.AppendLine(line);
}
}
}
Console.WriteLine(result.ToString());
}
}
A well-crafted and compiled regular expression is hard to beat, especially since you're doing replacements, not just searching, but you should test to be sure.
If the Instagram URLs are only within HTML attributes, here's my first stab at a pattern to look for:
(?<=")(https?://instagr[^">]+)
(I added a check for https as well, which you didn't mention but I believe is supported by Instagram.)
Some false positives are theoretically possible, but it will perform better than pedantically matching every legal variation of an Instagram URL. (The ">" check is just in case the HTML is missing the end quote for some reason.)
i need the center string of Rocky44 only using C#
Hi <span>Rocky44</span>
I tried the some split method but can't work
string[] result = temp.Split(new string[] { "<span>" , "</span>" }, StringSplitOptions.RemoveEmptyEntries);
Example:
Hi <span>Rocky44</span>
To:
Rocky44
Use an html parser. I will give an example using HtmlAgilityPack
string html = #"Hi <span>Rocky44</span>";
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var text = doc.DocumentNode.SelectSingleNode("//span").InnerText;
You're on the right track; you're just not escaping your quotes correctly:
string[] result = temp.Split(new string[] { "<span>" , "</span>" }, StringSplitOptions.RemoveEmptyEntries);
Of course, this is assuming that your input will always be in exactly the given format. As I4V mentions, an HTML parser may come in handy if you're trying to do anything more complicated.
If you're only going to get this sort of thing (eg this sort of HTML) then I would use regex. Else, DO NOT USE IT.
string HTML = #"Hi <span>Rocky44</span>"
var result = Regex.Match(HTML, #".*<a.*><span.*>(.*)</span></a>").Groups[1].Value;
Find the index of <span> and </span> using the IndexOf method.
Then (adjusting for the length of <span>) use the String.Substring method to get the desired text.
string FindLinkText(string linkHtml)
{
int startIndex = linkHtml.IndexOf("<span>") + "<span>".Length,
length = linkHtml.IndexOf("</span>") - startIndex;
return linkHtml.Substring(startIndex, length);
}
I a have a string that contains the code of a webpage.
This is an example:
<input type="text" name="x4B07" value="650"
onchange="this.form.x8000.value=this.name;this.form.submit();"/>
<input type="text" name="x4B08" value="250"
onchange="this.form.x8000.value=this.name;this.form.submit();"/>
In that string I want to get the 650 and 250 (these are variables and they change value).
How can I do so?
Example:
name
value
x4b08
254
x4b07
253
x4b06
252
x4b05
251
If you were confident that the markup would never change (and you have a simple snippet like your example line) a regex could get you those values, for example:
Regex re = new Regex("name=\"(.*?)\" value=\"(.*?)\"");
Match match = re.Match(yourString);
if(match.Success && match.Groups.Count == 3){
String name = match.Groups[1];
String value = match.Groups[2];
}
Alternatively you could parse the page content and query the resulting document for the elements, and then extract the values. (C# HTML Parser: Looking for C# HTML parser )
You can use regular expressions to match value="([0-9]*)"
Or you can look for the string "value" using string.IndexOf and then take the following few characters.
This should work for you (assuming that s contains the string you want to parse):
string value = s.Substring(s.IndexOf("value=")+7);
value = value.Substring(0, value.IndexOf("\""));
How specific are your examples? Could you also want to extract varying length alphabetic strings? Will the strings you want to extract always be properties?
While the regex/substring way works for the specified examples I think they will scale quite badly.
I'd parse the HTML using a parser (see ndtreviv's answer) or possibly with an XML parser (if the HTML is valid XHTML). That way you will get better control and don't have to bleed your eyes out from fidgeting with a bucketload of regex.
If you have multiple such controls in the form of string you can create and XmlDocument and iterate through it.
just solved with this
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL);
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
Stream st = resp.GetResponseStream();
StreamReader sr = new StreamReader(st);
string buffer = sr.ReadToEnd();
ArrayList uniqueMatches = new ArrayList();
Match[] retArray = null;
Regex RE = new Regex("name=\"(.*?)\" value=\"(.*?)\"", RegexOptions.Multiline);
MatchCollection theMatches = RE.Matches(buffer);
for (int counter = 0; counter < theMatches.Count; counter++)
{
//string[] tempSplit = theMatches[counter].Value.Split('"');
Regex reName = new Regex("name=\"(.*?)\"");
Match matchName = reName.Match(theMatches[counter].Value);
Regex reValue = new Regex("value=\"(.*?)\"");
Match matchValue = reValue.Match(theMatches[counter].Value);
string[] dados = new string[2];
dados[0] = matchName.Groups[1].ToString();
dados[1] = matchValue.Groups[1].ToString();
uniqueMatches.Add(dados);
}
Tks all for the help