RegEx to extract link - c#

I'm looking for a RegEx to extract links from a URL. The URL would be as below:
/redirecturl?u=http://www.abc.com/&tkn=Ue4uhv&ui=fWrQfyg46CADA&scr=SSTYQFjAA&mk=4D6GHGLfbQwETR
I need to extract the link http://www.abc.com from the above URL.
I tried the RegEx:
redirecturl\\?u=(?<link>[^\"]+)&
This works, but the problem is that it does not truncate all the characters after the first occurrence of &.
It would be great if you could modify the RegEx so that I just get the link.
Thanks in advance.

redirecturl\\?u=([^\"&]+)
That should truncate when it reaches an & or if there is no & at all

What about using URI class ?
Example:
string toParse = "/redirecturl?u=http://www.abc.com/&tkn=Ue4uhv&ui=fWrQfyg46CADA&scr=SSTYQFjAA&mk=4D6GHGLfbQwETR";
// remove "/redirecturl?u="
string urlString = toParse.Substring(15,toParse.Length - 15);
var url = new Uri(urlString);
var leftPart = url.GetLeftPart(UriPartial.Scheme | UriPartial.Authority);
// leftPart = "http://www.abc.com"

Escape the special characters by \ i.e for matching / use [\/]
var matchedString = Regex.Match(s,#"[\/]redirecturl[\?]u[\=](?<link>.*)[\/]").Groups["link"];

using System.Text.RegularExpressions;
// A description of the regular expression:
//
// [Protocol]: A named capture group. [\w+]
// Alphanumeric, one or more repetitions
// :\/\/
// :
// Literal /
// Literal /
// [Domain]: A named capture group. [[\w#][\w.:#]+]
// [\w#][\w.:#]+
// Any character in this class: [\w#]
// Any character in this class: [\w.:#], one or more repetitions
// Literal /, zero or one repetitions
// Any character in this class: [\w\.?=%&=\-#/$,], any number of repetitions
public Regex MyRegex = new Regex(
"(?<Protocol>\\w+):\\/\\/(?<Domain>[\\w#][\\w.:#]+)\\/?[\\w\\."+
"?=%&=\\-#/$,]*",
RegexOptions.IgnoreCase
| RegexOptions.CultureInvariant
| RegexOptions.IgnorePatternWhitespace
| RegexOptions.Compiled
);
// Replace the matched text in the InputText using the replacement pattern
string result = MyRegex.Replace(InputText,MyRegexReplace);
// Split the InputText wherever the regex matches
string[] results = MyRegex.Split(InputText);
// Capture the first Match, if any, in the InputText
Match m = MyRegex.Match(InputText);
// Capture all Matches in the InputText
MatchCollection ms = MyRegex.Matches(InputText);
// Test to see if there is a match in the InputText
bool IsMatch = MyRegex.IsMatch(InputText);
// Get the names of all the named and numbered capture groups
string[] GroupNames = MyRegex.GetGroupNames();
// Get the numbers of all the named and numbered capture groups
int[] GroupNumbers = MyRegex.GetGroupNumbers();

Related

Regex get nth value separated with slash

I have a universal regex code where it uses Groups[1] value to extract the result. It's easy to extract SN and Ref by just giving a sn=(.*?)\. pattern. But it's so difficult to get for example, PKSC and V928. I have to use Groups[1] because users who use this application can choose their own value to display. It can be NC339 or PKXC.
//var source = "SN=1395939213.#variable/OGT84/PKXC/Undetermined.Thank You#{customer}"
//sometimes like this
var source = "SN=8029758034.Ref=BFO7Y95B3KN5#resolved/NC339/V928/ClearenceBBF.Brief#{supervisor}/verified"
var value = Regex.Match(source, pattern, RegexOptions.IgnoreCase | RegexOptions.Singleline).Groups[1].Value
You can use
^(?:[^/]*/){2}([^/]+)
See the regex demo.
Details
^ - start of a string
(?:[^/]*/){2} - two occurrences of any chars other than / and then a /
([^/]+) - Group 1: one or more chars other than /.
Try following :
string pattern = #"SN=(?'sn'\d+).*\{(?'supervisor'[^}]+)";
string input = "SN=1395939213.#variable/OGT84/PKXC/Undetermined.Thank You#{customer}";
Match match = Regex.Match(input, pattern);
string sn = match.Groups["sn"].Value;
string supervisor = match.Groups["supervisor"].Value;

Extract substring with Regex

im trying to extract a substring with regex but im having some troubles...
The string is build from a columns of strings and i need the the 4th column only
string stringToExtractFrom = "289 120 00001110 ??
4Control#SimApi##QAEAAV01#ABV01##Z = ??4Control#SimApi##QAEAAV01#ABV01##Z
(public: class SimApi::Control & __thiscall SimApi::Control::operator=(class
SimApi::Control const &))"
string pattern = #"\s+\d+\s+\d+\s+\S+\s(.*)\=";
RegexOptions options = RegexOptions.Multiline;
Regex regX = new Regex(pattern, options);
Match m = regX.Match(stringToExtractFrom);
while (m.Success)
{
Group g = m.Groups[1];
defData += g+"\n";
m = m.NextMatch();
}
this is the wanted string: ??
4Control#SimApi##QAEAAV01#ABV01##Z
with the string below it worked when i got the substring i want as a group
1 0 00002E00 ??0ADOFactory#SimApiEx##QAE#ABV01##Z =
??0ADOFactory#SimApiEx##QAE#ABV01##Z (public: __thiscall
SimApiEx::ADOFactory::ADOFactory(class SimApiEx::ADOFactory const &))
If the second string works for you and the first one does not, you might first match 1+ digits and use \S+ for the third part. Then use a negated character class to capture matching not an equals sign:
\d+\s+\d+\s+\S+\s+([^=]+) =
See a .NET regex demo | C# Demo

How to split string by another string

I have this string (it's from EDI data):
ISA*ESA?ISA*ESA?
The * indicates it could be any character and can be of any length.
? indicates any single character.
Only the ISA and ESA are guaranteed not to change.
I need this split into two strings which could look like this: "ISA~this is date~ESA|" and
"ISA~this is more data~ESA|"
How do I do this in c#?
I can't use string.split, because it doesn't really have a delimeter.
You can use Regex.Split for accomplishing this
string splitStr = "|", inputStr = "ISA~this is date~ESA|ISA~this is more data~ESA|";
var regex = new Regex($#"(?<=ESA){Regex.Escape(splitStr)}(?=ISA)", RegexOptions.Compiled);
var items = regex.Split(inputStr);
foreach (var item in items) {
Console.WriteLine(item);
}
Output:
ISA~this is date~ESA
ISA~this is more data~ESA|
Note that if your string between the ISA and ESA have the same pattern that we are looking for, then you will have to find some smart way around it.
To explain the Regex a bit:
(?<=ESA) Look-behind assertion. This portion is not captured but still matched
(?=ISA) Look-ahead assertion. This portion is not captured but still matched
Using these look-around assertions you can find the correct | character for splitting
Simply use the
int x = whateverString.indexOf("?ISA"); // replace ? with the actual character here
and then just use the substring from 0 to that indexOf, indexOf to length.
Edit:
If ? is not known,
can we just use the regex Pattern and Matcher.
Matcher matcher = Patter.compile("ISA.*ESA").match(whateverString);
if(matcher.find()) {
matcher.find();
int x = matcher.start();
}
Here x would give that start index of that match.
Edit: I mistakenly saw it as java one, for C#
string pattern = #"ISA.*ESA";
Regex myRegex = new Regex(pattern, RegexOptions.IgnoreCase);
Match m = myRegex.Match(whateverString); // m is the first match
while (m.Success)
{
Console.writeLine(m.value);
m = m.NextMatch(); // more matches
}
RegEx will probably be the best for this. See this link
Mask would be
ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.
This will give you 2 groups with data you need
Match match = Regex.Match(input, #"ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.",RegexOptions.IgnoreCase);
if (match.Success)
{
var data1 = match.Groups["data1"].Value;
var data2 = match.Groups["data2"].Value;
}
Use Regex.Matches If you need multiple matches found, and specify different RegexOptions if needed.
It's kinda hacky but you could do...
string x = "ISA*ESA?ISA*ESA?";
x = x.Replace("*","~"); // OR SOME OTHER DELIMITER
string[] y = x.Split('~');
Not perfect in all situations, but it could solve your problem simply.
You could split by "ISA" and "ESA" and then put the parts back together.
string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";
string start = "ISA",
end = "ESA";
var splitedInput = input.Split(new[] { start, end }, StringSplitOptions.None);
var firstPart = $"{start}{splitedInput[1]}{end}{splitedInput[2]}";
var secondPart = $"{start}{splitedInput[3]}{end}{splitedInput[4]}";
firstPart = "ISA~this is date~ESA|"
secondPart = "ISA~this is more data~ESA|";
Use a Regex like ISA(.+?)ESA and select the first group
string input = "ISA~mycontent+ESA";
Match match = Regex.Match(input, #"ISA(.+?)ESA",RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Groups[1].Value;
}
Instead of "splitting" by a string, I would instead describe your question as "grouping" by a string. This can easily be done using a regular expression:
Regular expression: ^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$
Explanation:
^ - asserts position at start of the string
( - start capturing group
ISA - match string ISA exactly
.*?(?=ESA) - match any character 0 or more times, positive lookahead on the
string ESA (basically match any character until the string ESA is found)
ESA - match string ESA exactly
. - match any character
) - end capturing group
repeat one more time...
$ - asserts position at end of the string
Try it on Regex101
Example:
string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";
Regex regex = new Regex(#"^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$",
RegexOptions.Compiled);
Match match = regex.Match(input);
if (match.Success)
{
string firstValue = match.Groups[1].Value; // "ISA~this is date~ESA|"
string secondValue = match.Groups[2].Value; // "ISA~this is more data~ESA|"
}
There are two answers to the question "How to split a string by another string".
var matches = input.Split(new [] { "ISA" }, StringSplitOptions.RemoveEmptyEntries);
and
var matches = Regex.Split(input, "ISA").ToList();
However, the first removes empty entries, while the second does not.

c# regex for number/number/string pattern

I am trying to find {a number} / { a number } / {a string} patterns. I can get number / number to work, but when I add / string it does not.
Example of what I'm trying to find:
15/00969/FUL
My regex:
Regex reg = new Regex(#"\d/\d/\w");
You should use + quantifier that means 1 or more times and it applies to the pattern preceding the quantifier, and I would add word boundaries \b to only match whole words:
\b\d+/\d+/\w+\b
C# code (using verbatim string literal so that we just could copy/paste regular expressions from testing tools or services without having to escape backslashes):
var rx = new Regex(#"\b\d+/\d+/\w+\b");
If you want to precise the number of characters corresponding to some pattern, you can use {}s:
\b\d{2}/\d{5}/\w{3}\b
And, finally, if you have only letters in the string, you can use \p{L} (or \p{Lu} to only capture uppercase letters) shorthand class in C#:
\b\d{2}/\d{5}/\p{L}{3}\b
Sample code (also featuring capturing groups introduced with unescaped ( and )):
var rx = new Regex(#"\b(\d{2})/(\d{5})/(\p{L}{3})\b");
var res = rx.Matches("15/00969/FUL").OfType<Match>()
.Select(p => new
{
first_number = p.Groups[1].Value,
second_number = p.Groups[2].Value,
the_string = p.Groups[3].Value
}).ToList();
Output:
Regex reg = new Regex(#"\d+/\d+/\w+");
Complete example:
Regex r = new Regex(#"(\d+)/(\d+)/(\w+)");
string input = "15/00969/FUL";
var m = r.Match(input);
if (m.Success)
{
string a = m.Groups[1].Value; // 15
string b = m.Groups[2].Value; // 00969
string c = m.Groups[3].Value; // FUL
}
You are missing the quantifiers in your Regex
If you want to match 1 or more items you should use the +.
If you already know the number of items you need to match, you can specify it using {x} or {x,y} for a range (x and y being two numbers)
So your regex would become:
Regex reg = new Regex(#"\d/+\d+/\w+");
For example if all the elements you want to match have this format ({2 digit}/{5 digit}/{3 letters}), you could write:
Regex reg = new Regex(#"\d/{2}\d{5}/\w{3}");
And that would match 15/00969/FUL
More info on the Regular Expressions can be found here
bool match = new Regex(#"[\d]+[/][\d]+[/][\w]+").IsMatch("15/00969/FUL"); //true
Regular Expression:
[\d]+ //one or more digits
[\w]+ //one or more alphanumeric characters
[/] // '/'-character

Regular expression match substring

I tried to create a regular expression which pulls everything that matches:
[aA-zZ]{2}[0-9]{5}
The problem is that I want to exclude from matching when I have eg. ABCD12345678
Can anyone help me resolve this?
EDIT1:
I am looking two letters and five digits in the string, but I want to exclude from matching when I have string like ABCD12345678, because when I use above regular expression it will return CD12345.
EDIT2:
I didn't check everything but I think I found answer:
WHEN field is null then field
WHEN fnRegExMatch(field, '[a-zA-Z]{2}[0-9]{5}') = 'N/A' THEN field
WHEN field like '%[^a-z][a-z][a-z][0-9][0-9][0-9][0-9][0-9][^0-9]%' or field like '[a-z][a-z][0-9][0-9][0-9][0-9][0-9][^0-9]%' THEN fnRegExMatch(field, '[a-zA-Z]{2}[0-9]{5}')
ELSE field
First [aA-zZ] haven't any sense, second use word boundaries:
\b[a-zA-Z]{2}[0-9]{5}\b
You could also use case insensitive modifier:
(?i)\b[a-z]{2}[0-9]{5}\b
According to your comment, it seems you may have underscore after the five digits. In this case, word boundary doesn't work, you have to use ths instead:
(?i)(?<![a-z])([a-z]{2}[0-9]{5})(?![0-9])
(?<![a-z]) is a negative lookbehind that assumes you haven't a letter before the two that are mandatory
(?![0-9]) is a negative lookahead that assumes you haven't a digit after the five that are mandatory
This would be the code, along with usage samples.
public static Regex regex = new Regex(
"\\b[a-zA-Z]{2}\\d{5}\\b",
RegexOptions.CultureInvariant
| RegexOptions.Compiled
);
//// Replace the matched text in the InputText using the replacement pattern
// string result = regex.Replace(InputText,regexReplace);
//// Split the InputText wherever the regex matches
// string[] results = regex.Split(InputText);
//// Capture the first Match, if any, in the InputText
// Match m = regex.Match(InputText);
//// Capture all Matches in the InputText
// MatchCollection ms = regex.Matches(InputText);
//// Test to see if there is a match in the InputText
// bool IsMatch = regex.IsMatch(InputText);
//// Get the names of all the named and numbered capture groups
// string[] GroupNames = regex.GetGroupNames();
//// Get the numbers of all the named and numbered capture groups
// int[] GroupNumbers = regex.GetGroupNumbers();

Categories

Resources