Get title attribute from string with regular expression - c#

string asd = "<area href='#' title='name' shape='poly' coords='38,23,242'/>"
how extract text from title atribute in c#
and then insert another atribute after title?

search : (?<=title=')[^']+
replace: something
demo here : http://regex101.com/r/nR3vQ8
something like this in your case:
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
// This is the input string we are replacing parts from.
string input = "<area href='#' title='name' shape='poly' coords='38,23,242'/>";
// Use Regex.Replace to replace the pattern in the input.
// ... The pattern N.t indicates three letters, N, any character, and t.
string output = Regex.Replace(input, "(?<=title=')[^']+", "something");
// Write the output.
Console.WriteLine(input);
Console.WriteLine(output);
}
}
update
for taking out the title attribute as match use this:
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
// First we see the input string.
string input = "<area href='#' title='name' shape='poly' coords='38,23,242'/>";
// Here we call Regex.Match.
Match match = Regex.Match(input, #"title='(\w+)'",
RegexOptions.IgnoreCase);
// Here we check the Match instance.
if (match.Success)
{
// Finally, we get the Group value and display it.
string key = match.Groups[1].Value;
Console.WriteLine(key);
}
}
}
output
name

Try this: In particular you may be interested in the HTMLAgilityPack answer.
Regex reg = new Regex("<a[^>]*?title=\"([^\"]*?\"[^>]*?>");
A couple of gotchas:
This will match is case-sensitive, you may want to adjust that
This expects the title attribute both exists and is quoted
Of course, if the title attribute doesn't exist, you probably don't want the match anyway?
To Extract, use the groups collection:
reg.Match("Howdy").Groups[1].Value

Related

How do I search and replace text containing placeholder tokens with a values from an xml file using regular expression matching. VB.net or C#

I have a problem that requires vb.net or C# solution with regular expressions matching.
I am not very good with regular expressions and so I thought I would ask for some help.
I have some text that has one or more tokens that I need to replace with values retrieved from an xml file. Tokens are similar but are of 2 different types. For matches of the first type I will replace with a value from file1.xml and for matches of the 2nd type from file2.xml.
The replaceable tokens are in this format:
Type 1 Tokens: &*T1& and &*T1001&
Type 2 Tokens: &*SomeValue& and &*A2ndValue&
The replacement values for the Type 1 tokens are in File1.xml and for Type 2 Tokens are in File2.xml
In the above example, when a match is found for Type 1 (T1000), I need to replace the entire token (&*T1000&) with the value of Element T1000 in File1.xml. <T1000>ValueT1000</T1000>
In the 2nd Type: When a match is found for Type 2 (SomeValue), I need to replace the entire token (&*SomeValue&) with the value of Element SomeValue in File2.xml. <SomeValue>Value2</SomeValue>
Example input text:
This is some text with first token &T1& and the second token &*T1001& and more tokens &*SomeValue& and still more &*A2ndValue&.
So far with help of the code from pirs, in vb.net, I have this:
Public Shared Sub Main()
Dim pattern As String = "\&\*?([\w]+)\&"
Dim input As String = "This is some text with first token &*T1& and the second token &*T1001& and more tokens &*SomeValue& and still more &*A2ndValue&."
For Each m As Match In Regex.Matches(input, pattern)
Console.WriteLine("'{0}' found at index {1}.", m.Groups(1).Value, m.Index)
Next
End Sub
Which returns:
'T1' found at index 35.
'T1001' found at index 62.
'SomeValue' found at index 87.
'A2ndValue' found at index 115
I need to process this text and replace all the tokens with their values retrieved from the 2 xml files.
Any help is appreciated.
[EDIT]
With answer from #pirs. Maybe the way to do it is to first find matches of type T1000 and then replace by regex index of match. When replacing by index, I think I have to start at last index since each replace will change the index of matches.
After all T1000 matches are replaced I think I can do another match on the output string from the above and then replace all the matches of 2nd type.
What is regex match for T1000 (T followed by any number of digits)
[EDIT] Replace with an index so..
public static string ReplaceIndex(this string self, string OldString, string newString, int index)
{
return self.Remove(index, OldString.Length).Insert(index, newString);
}
// ...
s = s.ReplaceIndex(m.Groups(1).Value, "newString", m.Index)
// ...
[EDIT] Try to replace the value directly
// ...
s = s.Replace(m.Groups(1).Value, "newValue")
// ...
[EDIT] regex for &* and & : https://regex101.com/r/MVRS7U/1/
the generated regex function for c#
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"&\*?([^&\*\d]+)";
string input = #"&*cool&*it's&working&in&*all&case";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
It's should be ok now :-)
__
I'm not sure about what you want exactly but here the regex for your case: https://regex101.com/r/5i3RII/1/
And here, the generated regex function for c# (you should do a custom function to fit with your need..):
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"<[a-zA-Z-0-9]+\s?>([\w]+)<\/[a-zA-Z-0-9]+\s?>";
// the example you gave
string input = #"<T1>value1</T1>
<T1001>value2</T1001>
<T2000 />
<SomeValue>value1</SomeValue >
<A2ndValue>value2</A2ndValue >";
foreach (Match m in Regex.Matches(input, pattern))
{
// the output
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
I understand what you want to do. Code below does everything :
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication1
{
class Program
{
const string FILENAME = #"c:\temp\text.xml";
static void Main(string[] args)
{
string input = "This is some text with first token &*T1& and the second token &*T1001& and more tokens &*SomeValue& and still more &*A2ndValue&.";
XDocument doc = XDocument.Load(FILENAME);
string patternToken = "&[^&]+&";
string patternTag = #"&\*(?'tag'[^&]+)&";
MatchCollection matches = Regex.Matches(input, patternToken);
foreach(Match match in matches.Cast<Match>())
{
string token = match.Value;
string tag = Regex.Match(token, patternTag).Groups["tag"].Value;
string tagValue = doc.Descendants(tag).Select(x => (string)x).FirstOrDefault();
input = input.Replace(token, tagValue);
}
}
}
}

Regex matching characters after pattern

I'm trying to extract strings after a pattern in a long string, which is basically HTML output of a page.
For example; I need to extract target of href tag from this string
<h2 class=\ "product-name\">...</h2>\r\n
What I need from this: erkek-ayakkabi-spor-gri-17sfd3007141340-p
But also I need to find strings alike to the one above. SO I need to search for href tags after class=\ "product-name\" in the HTML string.
How can I achieve this?
Please check this.
Regex:
class=\"product-name\"(.*)<a\shref=\"(.*?)\"
Updated Regex:
class=\"product-name\".*<a\shref=\"(.*?)\"
Regex101 Example.
C# Code:
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string data = "<h2 class=\"product-name\">...</h2>\r\n<h2 class=\"test-name\">...</h2>\r\n<h2 class=\"product-name\">...</h2>\r\n";
//string regex = "class=\"product-name\"(.*)<a\\shref=\"(.*?)\"";
string regex = "class=\"product-name\".*<a\\shref=\"(.*?)\"";
var matches = Regex.Matches(data, regex, RegexOptions.Multiline);
foreach(Match item in matches)
{
//Console.WriteLine("Value: " + item.Groups[2]);
Console.WriteLine("Value: " + item.Groups[1]);
}
}
}
DotNetFiddle Example.

Validate and pass only valid characters against regex expression in c#

I am working on solution where I need to validate and pass only valid characters of string in c#.
E.g. my regular expression is : "^\\S(|(.|\\n)*\\S)\\Z"
and text I want validate is below
127 Finchfield Lane
Now I know its invalid. But how do I remove invalid against regex and pass only if string validate successfully against regex ?
if i understand you correctly, you are looking for Regex.IsMatch
if(Regex.IsMatch(str, "^\\S(|(.|\\n)*\\S)\\Z"))
{
// do something with the valid string
}
else
{
// strip invalid characters from the string
}
using System;
using System.Text.RegularExpressions;
namespace PatternMatching
{
class Program
{
static void Main()
{
string pattern = #"(\d+) (\w+)";
string[] strings = { "123 ABC", "ABC 123", "CD 45678", "57998 DAC" };
foreach (var s in strings)
{
Match result = Regex.Match(s, pattern);
if (result.Success)
{
Console.WriteLine("Match: {0}", result.Value);
}
}
Console.ReadKey();
}
}
}
This seems to do what you require. Hope I haven't misunderstood.
To validate the string against regex you can use use Regex.IsMatch.
Regex.IsMatch(string, pattern) //returns true if string is valid
If you want to get the Match value only then you can use it.
Match match = new Regex(#"\d+").Match(str);
match.value; //it returns only the matched string and unmatched string automatically stripped out

Search Strings for particular sequence

fairly new to c#, im looking for a way to search a string for a particular sequence:
string mytext = "I want to find t56b45 in a string"
In the above example i would like to search mytext for the position of "t" but only when it is followed by any two numeric chars and a "b" followed by any two numeric chars. If i find a "t"+any two numeric values+"b"+any two numeric values then i would like to create a sub string up to that position ie. the result string will read "I want to find"
Use a Regex:
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
// \s : Matches a space
// t : Exact match t
// \d{2} : Any digit, 2 repetition
// t : Exact match b
// \d{2} : Any digit, 2 repetition
var match = Regex.Match("I want to find t56b45 in a string", #".*(?=\st\d{2}b\d{2})");
if(match.Success)
Console.WriteLine("\"" + match.Value + "\"");
else
Console.WriteLine("Nothing found.");
// Outputs: "I want to find"
}
}
Fiddle:
https://dotnetfiddle.net/kguwDW
So this is the code i ended up using, seems to get the job done and ignore the case
var tempmatch = Regex.Match(TempCleaned, "(?i)y[0-9]+(?i)z[0-9]+");
if (tempmatch.Success)
{
//clean all text from YxxZxx
string NewName = TempCleaned.Substring(0, tempmatch.Index -1);

Get Removed characters from string

I am using Regex to remove unwanted characters from string like below:
str = System.Text.RegularExpressions.Regex.Replace(str, #"[^\u0020-\u007E]", "");
How can I retrieve distinct characters which will be removed in efficient way?
EDIT:
Sample input : str = "This☺ contains Åüsome æspecialæ characters"
Sample output : str = "This contains some special characters"
removedchar = "☺,Å,ü,æ"
string pattern = #"[\u0020-\u007E]";
Regex rgx = new Regex(pattern);
List<string> matches = new List<string> ();
foreach (Match match in rgx.Matches(str))
{
if (!matches.Contains (match.Value))
{
matches.Add (match.Value);
}
}
Here is an example how you can do it with a callback method inside the Regex.Replace overload with an evaluator:
evaluator
Type: System.Text.RegularExpressions.MatchEvaluator
A custom method that examines each match and returns either the original matched string or a replacement string.
C# demo:
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
public class Test
{
public static List<string> characters = new List<string>();
public static void Main()
{
var str = Regex.Replace("§My string 123”˝", "[^\u0020-\u007E]", Repl);//""
Console.WriteLine(str); // => My string 123
Console.WriteLine(string.Join(", ", characters)); // => §, ”, ˝
}
public static string Repl(Match m)
{
characters.Add(m.Value);
return string.Empty;
}
}
See IDEONE demo
In short, declare a "global" variable (a list of strings, here, characters), initialize it. Add the Repl method to handle the replacement, and when Regex.Replace calls that method, add each matched value to the characters list.

Categories

Resources