Regular expression to get url collection from string

Regular expression to get url collection from string - c#

I have a string.An example is given below.
[playlist]\r\npath1=url1\r\npath2=url2\r\npath=url3\r\npath4=url4\r\ncount=1
How can I extract path properties values from the above string.There may be many properties other than path properties.
Thr result i am expecting is
url1
url2
url3
url4
I think regular expression is best to do this. Any ideas(regular expressions) regarding the Rgular expression needed. How about using string.split method.. Which one is efficient? ..
Thanks in advance

Well, this regex works in your particular example:
path\d?=(.+?)\\r\\n
What isn't immediately obvious is if \r\n in your strings are literally the characters \r\n, or a carriage return + new line. The regex above matches those characters literally. If your text is actually this:
[playlist]
path1=url1
path2=url2
path=url3
path4=url4
count=1
Then this regex will work:
path\d?=(.+?)\n
And a quick example of how to use that in C#:
var str = #"[playlist]\r\npath1=url1\r\npath2=url2\r\npath=url3\r\npath4=url4\r\ncount=1";
var matches = Regex.Matches(str, #"path\d?=(.+?)\\r\\n");
foreach (Match match in matches)
{
var path = match.Groups[1].Value;
Console.WriteLine(path);
}

Related

Regex within a regex?

Truth is, I'm having a hard time writing a regex string to parse something in the form of
[[[tab name=dog content=cat|tab name=dog2 content=cat2]]]
This regex would be parsed so that I can dynamically build tabs as demonstrated here. Initially I tried a regex pattern like \[\[\[tab name=(?'name'.*?) content=(?'content'.*?)\]\]\]
But I realized I couldn't get the tab as a whole and build upon a query without doing a regex.replace. Is it possible to take the entire tab leading up to the pipe symbol as a group and then parse that group down from the sub key/value pairs?
This is the current regex string I'm working with \[\[\[(?'tab'tab name=(?'name'.*?) content=(?'content'.*?))\]\]\]
And here is my code for performing the regex. Any guidance would be appreciated.
public override string BeforeParse(string markupText)
{
if (CompiledRegex.IsMatch(markupText))
{
// Replaces the [[[code lang=sql|xxx]]]
// with the HTML tags (surrounded with {{{roadkillinternal}}.
// As the code is HTML encoded, it doesn't get butchered by the HTML cleaner.
MatchCollection matches = CompiledRegex.Matches(markupText);
foreach (Match match in matches)
{
string tabname = match.Groups["name"].Value;
string tabcontent = HttpUtility.HtmlEncode(match.Groups["content"].Value);
markupText = markupText.Replace(match.Groups["content"].Value, tabcontent);
markupText = Regex.Replace(markupText, RegexString, ReplacementPattern, CompiledRegex.Options);
}
}
return markupText;
}

Is this what you want?
string input = "[[[tab name=dog content=cat|tab name=dog2 content=cat2]]]";
Regex r = new Regex(#"tab name=([a-z0-9]+) content=([a-z0-9]+)(\||])");
foreach (Match m in r.Matches(input))
{
Console.WriteLine("{0} : {1}", m.Groups[1].Value, m.Groups[2].Value);
}
http://regexr.com/3boot

Maybe string.split will be better in that case? For example something like that :
strgin str = "[[[tab name=dog content=cat|tab name=dog2 content=cat2]]]";
foreach(var entry in str.Split('|')){
var eqBlocks = entry.Split('=');
var tabName = eqBlocks[1].TrimEnd(" content");
var content = eqBlocks[2];
}
Ugly code, but should work.

Try this:
Starts with a word boundary and followed only by allowed characters.
/\b[\w =]*/g
https://regex101.com/r/cI7jS7/1

Just distill the regex pattern down to the individual tab patterns such as name=??? content=??? and match that only. That pattern which will make each Match (two in you example) where the data can be extracted.
string text = #"[[[tab name=dog content=cat|tab name=dog2 content=cat2]]]";
string pattern = #"name=(?<Name>[^\s]+)\scontent=(?<Content>[^\s|\]]+)";
var result = Regex.Matches(text, pattern)
.OfType<Match>()
.Select(mt => new
{
Name = mt.Groups["Name"].Value,
Content = mt.Groups["Content"].Value,
});
The result is an enumerable list with the created dynamic entities with the tabs needed which can be directly bound to the control:
Note in the set notation [^\s|\]] the pipe | is treated as a literal in the set and not used as an or. The bracket ] does have to be escaped though to be treated as a literal. Finally the logic the parse will look for: "To not (^) be a space or a pipe or a brace for that set".

Regular expressions in C# for extracting parts

I have this text:
" </SYM field/NN name=/IN ""/"" object/NN ""/"" >/SYM Categories/NNS :/: Cars/NNS ,/, About/RB Model/NNP :/: "
I would like to extract values such as
Categories/NNS :/: Cars/NNS ,/, About/RB
where the pattern is
WORD + /NNS + :/: ANYTHING until you reach the same pattern
I tried:
Match match = Regex.Match(input, #"([A-Za-z0-9\-]+)/NNS :/: ([A-Za-z0-9\-/s]+)",
RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Groups[1].Value;
Console.WriteLine(key);
}
and the answer I got back was:
Categories
instead of
Categories/NNS :/: Cars/NNS ,/, About/RB
What I am doing wrong?

You need to enclose the bits of the regex you want as result inside parenthesis.
To obtain what you're looking for, you need to replace your regexp by (not tested, moreover I don't know C# regex specifics but the below should be OK):
"((?:[A-Za-z0-9\-]+)/NNS :/: (?:[A-Za-z0-9\-/s]+))"
The main parenthesis mean that you'll get the entire string as result.
The opening parenthesis followed by ?: mean that you don't want that part in the result.
If you would not put the ?:, it would result in a tuple with your entire string, then the string matching the first sub-regex, then the string matching the second sub-regex.

Why don't you use match.Value? Everything you put in parenthesis represents a group, but it looks like you want the whole thing.
Match match = Regex.Match(input, #"([A-Za-z0-9\-]+)/NNS :/: ([A-Za-z0-9\-/s]+)",
RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Value;
Console.WriteLine(key);
}

Pattern Matching c#

Lets say I have a text file with the line below within it. I want to take both values within the quotations by matching between (" and "), so that would be I retreive ABC and DEF and put them in a string list or something, what's the best way of doing this? It's so annoying
If EXAMPLEA("ABC") AND EXAMPLEB("DEF")

Assuming a case where the value between the double quotes can not contain escaped double quotes might work like this:
var text = "If EXAMPLEA(\"ABC\") AND EXAMPLEB(\"DEF\")";
Regex pattern = new Regex("\"[^\"]*\"");
foreach (Match match in pattern.Matches(text))
{
Console.WriteLine(match.Value.Trim('"'));
}
But this is only one of the many ways you could do it and maybe not the smartest way out there. Try something yourself!

Best way...
List<string> matches=Regex.Matches(File.ReadAllText(yourPath),"(?<="")[^""]*(?="")")
.Cast<Match>()
.Select(x=>x.Value)
.ToList();

This pattern should do the trick:
\"([^"]*)\"
string str = "If EXAMPLEA(\"ABC\") AND EXAMPLEB(\"DEF\")";
MatchCollection matched = Regex.Matches(str, #"\""([^\""]*)\""");
foreach (Match match in matched)
{
Console.WriteLine(match.Groups[1].Value);
}
Note that the quotation marks are doubled in the actual code in order to escape them. And the code refers to group [1] to get just the part inside the parentheses.

IEnumerable<string> matches =
from Match match
in Regex.Matches(File.ReadAllText(filepath), #"\""([^\""]*)\""")
select match.Groups[1].Value;
Others already posted some answers, but my takes into account that you just want ABC and DEF in your example, without quotation marks and save it in a IEnumerable<string>.

string placeholder and regular expressions

I have created a method where I can search for string placeholders, this I do with Regular expressions.
At the moment I try to expand this method by adding grouping features.
For example if I have this string:
"Hallo {g:test1} asdasd {p:test1} sdfsdf{o:test1}"
I want to :
Search for the string test1, even if there is standing a letter:(like g:) before it.
I want to search for: all strings with for example a g: before it.
I can't really figure out how to do this in C# can someone help me?
At the moment I programmed this:
private string test() {
string pattern = #"\{(.*?)\}";
string query = "Hallo {g:test1} asdasd {p:test1} sdfsdf{o:test1}";
var matches = Regex.Matches(query, pattern);
foreach (Match m in matches) {
Test = m.Groups[1].Value;
}
return Test;
}

Try this:
\{(?:.:)?(.*?)\}
It will match the text not including the letter and the colon which may be before it.
To limit this to strings with a particular letter before it:
\{(?:#:)(.*?)\} replacing # with the letter you are filtering on
e.g.
\{(?:g:)(.*?)\}

\{.:test1\}
\{g:.+?\}

question about regex

i'v got code
s = Regex.Match(item.Value, #"\/>.*?\*>", RegexOptions.IgnoreCase).Value;
it returns string like '/>test*>', i can replace symbols '/>' and '*>', but how can i return string without this symbols , only string 'test' between them?

You can save parts of the regex by putting ()'s around the area. so for your example:
// item.Value == "/>test*>"
Match m = Regex.Match(item.Value, #"\/>(.*?)\*>");
Console.WriteLine(m.Groups[0].Value); // prints the entire match, "/>test*>"
Console.WriteLine(m.Groups[1].Value); // prints the first saved group, "test*"
I also removed RegexOptions.IgnoreCase because we aren't dealing with any letters specifically, whats an uppercase /> look like? :)

You can group patterns inside the regx and get those from the match
var match= Regex.Match(item.Value, #"\/>(?<groupName>.*)?\*>", RegexOptions.IgnoreCase);
var data= match.Groups["groupName"].Value

You can also use look-ahead and look-behind. For your example it would be:
var value = Regex.Match(#"(?<=\/>).*?(?=\*>)").Value;

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regular expression to get url collection from string - c#

Related

Regex within a regex?

Regular expressions in C# for extracting parts

Pattern Matching c#

string placeholder and regular expressions

question about regex

Categories

Resources