Finding pattern using in c# [duplicate] - c#

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I have a couple of strings and I want them to be transformed like shown below
In the first two examples " is included in the input string.
But " does not comes always with the input string as shown in last two examples.
Basically I need the string between |" and "| or string between first and last occurrence of |
Can someone please let me know how to find the match for the output string that I need which will work for all of these strings? I am trying to code these in C#.
Thanks in advance for any help

I would propose an alternative to regex. Simply using Substring and Replace:
List<string> input = new List<string>
{
"501000061|\"B084PD449Q|2088|1\"|",
"504000585|\"B000NSIAG0|3115|0\"|",
"508000036|B084S1FVH5|42|1|",
"504000584|B000NSIAG0|3115|0|"
};
foreach (var element in input)
{
string transformed = element.Substring(10, element.Length - 11)
.Replace("\"", string.Empty);
Console.WriteLine(transformed);
}
Output:
B084PD449Q|2088|1
B000NSIAG0|3115|0
B084S1FVH5|42|1
B000NSIAG0|3115|0

^[0-9]{9}\|"?([A-Z0-9]){10}\|([0-9]){2,}\|([0-9])"?\|$
This regex is a bit more rigid than the ones already proposed based on the input examples that you've provided.
I suggest you look into non-regex solutions that have already been pointed out, however, if you absolutely must use regex here's how to do it in C# for this example.
var pattern = #"^[0-9]{9}\|"?([A-Z0-9]){10}\|([0-9]){2,}\|([0-9])"?\|$";
var replacement = "$1|$2|$3";
var input = "501000061|\"B084PD449Q|2088|1\"|";
var result = Regex.Replace(input, pattern, replacement);

Here's one without regex:
using System;
using System.Linq;
public class Program
{
public static void Main()
{
var inp = "504000585|\"B000NSIAG0|3115|0\"|";
var res = string
.Join("|", inp.Split(new []{'|'}, StringSplitOptions.RemoveEmptyEntries).Skip(1))
.Replace("\"", "");
Console.WriteLine(res);
}
}
https://dotnetfiddle.net/i39XUY
B000NSIAG0|3115|0

Related

C#: get rid of multiple invalid characters in string [duplicate]

This question already has answers here:
Replace multiple characters in a C# string
(15 answers)
Closed 3 years ago.
I am new to C#. Say that I have a string like this:
string test = 'yes/, I~ know# there# are% invalid£ characters$ in& this* string^";
If I wanted to get rid of a single invalid symbol, I would do:
if (test.Contains('/'))
{
test = test.Replace("/","");
}
But is there a way I can use a list of symbols as argument of the Contains and Replace functions, instead of deleting symbols one by one?
I would go with the regular expression solution
string test = Regex.Replace(test, #"\/|~|#|#|%|£|\$|&|\*|\^", "");
Add a | or parameter for each character and use the replace
Bear in mind the \/ means / but you need to escape the character.
You'll likely be better off defining acceptable characters than trying to think of and code for everything you need to eliminate.
Because you mention that you are learning, sounds like the perfect time to learn about Regular Expressions. Here are a couple of links to get you started:
Regular Expression Language - Quick Reference (MSDN)
C# Regex.Match Examples (DotNetPerls
I don't think there is such a feature out of the box.
I think your idea is pretty much on point, despite the fact the in my opinion you don't really need the if(test.Contains(..)) part. Doing this, once you iterate the characters of the string to see if such element is present when at the end if indeed this character is in the string you replace it
It would be faster just to replace the special characters right away. So...
List<string> specialChars = new List<string>() {"*", "/", "&"}
for (var i = 0; i < specialChars.Count; i++)
{
test = test.Replace(specialChars[i],"");
}
Your solution is:
Path.GetInvalidPathChars()
So the code would look something like this:
string illegal = "yes/, I~ know# there# are% invalid£ characters$ in& this* string^";
string invalid = new string(Path.GetInvalidFileNameChars()) + new
string(Path.GetInvalidPathChars());
foreach (char c in invalid)
{
illegal = illegal.Replace(c.ToString(), "");
}
Another variant:
List<string> chars = new List<string> {"!", "#"};
string test = "My funny! string#";
foreach (var c in chars)
{
test = test.Replace(c,"");
}
No need to use Contains as Replace does that.

Regex.Split Not working properly? [duplicate]

This question already has answers here:
Is this a bug in .NET's Regex.Split?
(2 answers)
Closed 9 years ago.
I have the following input:
void Main()
{
string inputData = "37.7879\r\n-122.3874\r\n40.7805\r\n-111.9288\r\n36.0667\r\n-115.0927\r\n37.7879\r\n-122.3874";
// string[] inputLines = Regex.Split(inputData, #"\r\n");
string[] inputLines = Regex.Split(inputData, #"(\r)?\n");
Console.WriteLine("The size of the list is: {0}", inputLines.Length);
bool results = inputLines.All(IsValidNumber);
foreach (string line in inputLines)
{
Console.WriteLine("{0} is: {1}", line, IsValidNumber(line));
}
}
// Define other methods and classes here
public bool IsValidNumber(string input)
{
Match match = Regex.Match(input, #"^-?\d+\.\d+$", RegexOptions.IgnoreCase);
return match.Success;
}
I am trying to a Regex.Split on #"\r\n", if I use the commented line, then I get the expected results. If I use the uncommented one, I do not get the results I expect. I'm almost 100% positive that my regex is correct if the "\r" doesn't exist (which may or may not be the case).
I'm expecting 8 values from inputData that I'm trying to validate if they're all valid numbers.
Is there a possibility that my "(\r)?" isn't working correctly? If so, what am I missing?
If your pattern contains a capturing group Regex.Split will capture the group as it's splitting the contents. This will give you 15 items instead of just 8.
If you're only trying to make a single character or character class optional, you don't need a group. Try getting rid of the group around \r:
string[] inputLines = Regex.Split(inputData, #"\r?\n");
Alternatively, yes, you can make it a non-capturing group:
string[] inputLines = Regex.Split(inputData, #"(?:\r)?\n");

I need to get a string between two strings using regex in C#

I have a string for example: "GamerTag":"A Talented Boy","GamerTileUrl" and what I have been trying and failing to get is the value: A Talented Boy. I need help creating a regex string to get specifically just A Talented Boy. Can somebody please help me!
var str = "\"GamerTag\":\"A Talented Boy\",\"GamerTileUrl\"";
var colonParts = str.Split(':');
if (colonParts.Length >= 2) {
var commaParts = colonParts[1].Split(',');
var aTalentedBoy = commaParts[0];
var gamerTileUrl = commaParts[1];
}
This allows you to also get other parts of the comma-separated list.
Suppose s is your string (no check here):
s = s.Split(':')[1].Split(',')[0].Trim('"');
If you want to have a Regex solution, here it is:
s = "\"GamerTag\":\"A Talented Boy\",\"GamerTileUrl\"";
Regex reg = new Regex("(?<=:\").+?(?=\")");
s = reg.Match(s).Value;
You can use string methods:
string result = text.Split(':').Last().Split(',').First().Trim('"');
The First/Last extension methods prevent exceptions when the separators are missing.
Demo
I think it's safe to assume that your string is actually bigger than what you showed us and it contains multiple key/value pairs? I think this is will do what you are looking for:
str.Split("GamerTag:\"")[1].Split("\"")[1];
The first split targets "GamerTag:" and gets everything after it. The second split gets everything between first and second " that exists in that chunk after "GamerTag:"
How about this?
\:\"([^\"]+)\"
This matches the semicolon and the opening quote, and matches any non-quote characters until the next quote.

Howto parse string like '/aaa/bbb/ccc/ddd'?

Here is the sample string like '/aaa/bbb/ccc/ddd' or '/aaa/zzz'
I want to get a Regex to get each block, like
aaa
bbb
ccc
ddd
thus, I wrote
Regex r = new Regex(#"(/[^/]*)*");
But it can only get last match,
/ddd
How can I get everyone, Many thanks.
update my question:
I know 'split',just for fun.
the situation is I have a string:
string s = #"ftp://127.0.0.1/TE ST.中文 空格CC/T # ES T.OK/# ##中文 测试##^##!aaa.txt";
I want to encode each block between /.../ (using Uri.EscapeDataString(each))
I prefer to use Regex.Replace, is it possible?
You don't need (and shouldn't use) regex for something so simple.
string s = "/aaa/bbb/ccc/ddd";
var blocks = s.Split('/');
foreach(var block in blocks) {
Console.WriteLine(block);
}
Output:
aaa
bbb
ccc
ddd
Edit: Oh, I see what you're trying to do. So now we don't want to remove empty entries and we want to say
string encoded = String.Join("/", blocks.Select(b => Uri.EscapeDataString(b)));
In this case, why not just split on / ?
String[] split = "/aaa/bbb/ccc/ddd".split("/");
(note, it has been a while since I wrote C#, so the above might contain one or two errors, however the idea should be clear.)
Your initial question, using a Regex:
Regex r = new Regex(#"(/[^/]*)");
var matches = r.Matches("/aaa/bbb/ccc/ddd");
foreach (var match in matches)
{
// ...
}
Simply removed the trailing * of your pattern.
Your second question:
Regex r = new Regex(#"/([^/]*)");
var result = r.Replace("1.1.1.1/aaa/bbb/ccc/test.ext", match => {
return string.Format("/{0}", Uri.EscapeDataString(match.Groups[1].Value));
});
thanks for everyone, i wrote them this morning.
string s = #"ftp://127.0.0.1/# #/中 文.NET/###_%TRY 字符.txt";
s = ftpUrlPattern.Replace(s, new MatchEvaluator((match) => {
return "/" + Uri.EscapeDataString(match.Groups["tag"].Value);
}));
the pattern like this:
static Regex ftpUrlPattern = new Regex(#"(?<!/)/(?<tag>[^/]+)");
You are probably calling the Match method and want Matches. However, you could also use the split method and just split on "/" (if i understand your use correctly).
To replace:
public void Replace(string input)
{
Regex r = new Regex(#"(/[^/]*)");
var matchEval = new MatchEvaluator(Encode);
r.Replace(input, matchEval);
}
public string Encode(Match m)
{
//TODO: Encode the match
}
Note: I have not run this but the methodology should be sound.

How to extract the useful data with regular expression in C#?

Sorry guys, it seems like I didn't explain my question clearly. Please allow me to rephrase my question again.
I use WebClient to download the whole webpage and I got the content as a string
"
.......
.....
var picArr ="/d/manhua/naruto/516/1.png|/d/manhua/naruto/516/2.png|/d/manhua/naruto/516/3.png|/d/manhua/naruto/516/4.png|/d/manhua/naruto/516/5.png|/d/manhua/naruto/516/6.png|/d/manhua/naruto/516/7.png|/d/manhua/naruto/516/8.png|/d/manhua/naruto/516/9.png|/d/manhua/naruto/516/10.png|/d/manhua/naruto/516/11.png|/d/manhua/naruto/516/12.png|/d/manhua/naruto/516/13.png|/d/manhua/naruto/516/14.png|/d/manhua/naruto/516/15.png|/d/manhua/naruto/516/16.png"
......
";
in this content, I want to get only one line which is
var picArr ="/d/manhua/naruto/516/1.png|/d/manhua/naruto/516/2.png|/d/manhua/naruto/516/3.png|/d/manhua/naruto/516/4.png|/d/manhua/naruto/516/5.png|/d/manhua/naruto/516/6.png|/d/manhua/naruto/516/7.png|/d/manhua/naruto/516/8.png|/d/manhua/naruto/516/9.png|/d/manhua/naruto/516/10.png|/d/manhua/naruto/516/11.png|/d/manhua/naruto/516/12.png|/d/manhua/naruto/516/13.png|/d/manhua/naruto/516/14.png|/d/manhua/naruto/516/15.png|/d/manhua/naruto/516/16.png"
now I want use regular expression to get this string and get the value of picArr.
my reg exp is
var picArr ="([.]*)"
I think the dot means any characters. But it doesn't work. :(
Any idea?
THanks a lot
/picArr =\"([^\"]+)\"/
If I got this right that's what you need.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ExtractFileNames
{
class Program
{
static void Main(string[] args)
{
string pageData = #"blah blah
var picArr =""/d/manhua/naruto/516/1.png|/d/manhua/naruto/516/2.png|/d/manhua/naruto/516/3.png|/d/manhua/naruto/516/4.png|/d/manhua/naruto/516/5.png|/d/manhua/naruto/516/6.png|/d/manhua/naruto/516/7.png|/d/manhua/naruto/516/8.png|/d/manhua/naruto/516/9.png|/d/manhua/naruto/516/10.png|/d/manhua/naruto/516/11.png|/d/manhua/naruto/516/12.png|/d/manhua/naruto/516/13.png|/d/manhua/naruto/516/14.png|/d/manhua/naruto/516/15.png|/d/manhua/naruto/516/16.png""
more blah decimal blah";
var match = Regex.Match(pageData, #"var\s+picArr\s*=\s*""(.*?)""");
var str = match.Groups[1].Value;
var files = str.Split('|');
foreach(var f in files)
{
Console.WriteLine(f);
}
Console.ReadLine();
}
}
}
Output:
/d/manhua/naruto/516/1.png
/d/manhua/naruto/516/2.png
/d/manhua/naruto/516/3.png
/d/manhua/naruto/516/4.png
/d/manhua/naruto/516/5.png
/d/manhua/naruto/516/6.png
/d/manhua/naruto/516/7.png
/d/manhua/naruto/516/8.png
/d/manhua/naruto/516/9.png
/d/manhua/naruto/516/10.png
/d/manhua/naruto/516/11.png
/d/manhua/naruto/516/12.png
/d/manhua/naruto/516/13.png
/d/manhua/naruto/516/14.png
/d/manhua/naruto/516/15.png
/d/manhua/naruto/516/16.png
If you just want to get the filenames, you could just do a split on the pipe:
var picArr = "/d/manhua/naruto/516/1.png|/d/manhua/naruto/516/2.png|/d/manhua/naruto/516/3.png|/d/manhua/naruto/516/4.png|/d/manhua/naruto/516/5.png|/d/manhua/naruto/516/6.png|/d/manhua/naruto/516/7.png|/d/manhua/naruto/516/8.png|/d/manhua/naruto/516/9.png|/d/manhua/naruto/516/10.png|/d/manhua/naruto/516/11.png|/d/manhua/naruto/516/12.png|/d/manhua/naruto/516/13.png|/d/manhua/naruto/516/14.png|/d/manhua/naruto/516/15.png|/d/manhua/naruto/516/16.png";
var splitPics = picArr.Split('|');
foreach (var pic in splitPics)
{
Console.WriteLine(pic);
}
It looks like you want the value of the string literal in your snippet, "/d/manhua/naruto/516/1.png|..."
Get rid of the square brackets. "." matches any character just as it is, without brackets. Square brackets are for matching a limited set of characters: For example, you'd use "[abc]" to match any "a", "b", or "c".
It looks like the brackets have the effect of escaping the ".", a feature I hadn't known about (or forgot, sometime in the Ordovician). But I tested the regex as you have it with the string value replaced with a series of dots, and the regex matched. It's being treated as a literal "." character, which you would more likely try to match with a backslash escape: "\."
So just get rid of the brackets and it should work. It works in VS2008 for me.

Categories

Resources