I have atext file that contains a lot of records like this:
05/11/04+11:10PM+117+04+0218735793+0'00+00:01'51+TR+
or
05/11/04+11:10PM+117+04+0218735793+0'00+00:01'51+TR+
(without INCOMING)
I want to validate these lines and invalidate all other lines(empty lines or comment lines and corrupted lines.
Is it OK to regular expressions for this purpose?
If yes, what is the regular expression?
Thanks.
I wouldn't try to use a regex for all of it. For example, you have what looks like a date and a time in there, and a couple of other fields that could be times of some kind, which are tricky to do with regular expressions.
I'd handle this with
String.Split on +
Check you have the right number of fields
Check each field individually depending on what it represents, e.g.
DateTime.TryParseExact
DateTime.TryParseExact
regular expression for 3 digits
...
^\d\d\/\d\d\/\d\d\+\d\d:\d\d[AP]M\+[\d+':]+\+TR\+$
^^^^^^^^
I "cheated" in the marked section because I'm not sure exactly what's staying the same, but from the rest of the expression I think you should get the idea.
var regexPattern = #"^\d{2}/\d{2}/\d{2}\+\d{2}:\d{2}(?:AM|PM)\+\d{3}\+\d{2}" +
#"\+\d{10}\+\d'\d{2}\+\d{2}:\d{2}'\d{2}\+TR\+$"
Related
I need to do some very light parsing of C# (actually transpiled Razor code) to replace a list of function calls with textual replacements.
If given a set containing {"Foo.myFunc" : "\"def\"" } it should replace this code:
var res = "abc" + Foo.myFunc(foo, Bar.otherFunc( Baz.funk()));
with this:
var res = "abc" + "def"
I don't care about the nested expressions.
This seems fairly trivial and I think I should be able to avoid building an entire C# parser using something like this for every member of the mapping set:
find expression start (e.g. Foo.myFunc)
Push()/Pop() parentheses on a Stack until Count == 0.
Mark this as expression stop
replace everything from expression start until expression stop
But maybe I don't need to ... Is there a (possibly built-in) .NET library that can do this for me? Counting is not possible in the family of languages that RE is in, but maybe the extended regex syntax in C# can handle this somehow using back references?
edit:
As the comments to this answer demonstrates simply counting brackets will not be sufficient generally, as something like trollMe("(") will throw off those algorithms. Only true parsing would then suffice, I guess (?).
The trick for a normal string will be:
(?>"(\\"|[^"])*")
A verbatim string:
(?>#"(""|[^"])*")
Maybe this can help, but I'm not sure that this will work in all cases:
<func>(?=\()((?>/\*.*?\*/)|(?>#"(""|[^"])*")|(?>"(\\"|[^"])*")|\r?\n|[^()"]|(?<open>\()|(?<-open>\)))+?(?(open)(?!))
Replace <func> with your function name.
Useless to say that trollMe("\"(", "((", #"abc""de((f") works as expected.
DEMO
I have a collection of url's and i need to write regular expression to filter needed content.
/data/43492-someText/"
/data/221639-anotherText/"
/data/116345-differentText/"
/data/6630-boooring/"
/data/220742-foo/"
What i need is only strings without /" on the end, so
/data/220742-foo
My Regular Expression looks like this:
#"/data/[0-9]{1,10}-.*""\s"
Note: I dont want to do this with string replace, because of some limitations on my project.
If that (string not ending in /) is the only requirement, then use something like this:
var desiredUrls = urls.Where(url => !url.EndsWith("/\""))
I initially read the question as a desire to filter urls but I can see how it could be a mapping question.
var withoutSuffix = urls.Select(url => url.TrimEnd("/\"".ToCharArray());
I think Regular Expressions are kind of overkill for what you're trying to do.
Anyways you can use something like this:
#"/data/[0-9]{1,10}-[^/]+"
You could use TrimEnd to remove the characters from the end of a string:
s.TrimEnd('/', '"')
You could use something like:
(/data/[0-9]{1,10}-.+)/
And the string without the trailing / will be in the first capture group.
I have been using this regular expression to extract file names out of file path strings:
Regex r = new Regex(#"\w+[.]\w+$+");
This works, as long as there is no space in the file name. For example:
r.Match("c:\somestuff\myfile.doc").Value = "myfile.doc"
r.Match("c:\somestuff\my file.doc").Value = "file.doc"
I need my regular expression to give me "my file.doc", and not just "file.doc"
I tried messing around with the expression myself. In particular I tried adding \s+ after learning that that is for matching whitespaces. I didn't get the results I hoped for.
I did devise a solution just to get the job done: I started at the end of the string, went backwards until a backslash was reached. This gave me the file name in reverse order (i.e. cod.elifym) into an array of chars, then I used Array.Reverse() to turn it around. However I'd like to learn how to achieve this by simply modifying my original regular expression.
Does it have to be a regular expression? Use System.IO.Path.GetFileName() instead.
Regex r = new Regex(#"[\w ]+\.\w+$");
A working regex might simply look like:
[^\\]+$
Consider using:
System.IO.Path.GetFileName(path)
I am new to regular expressions and the one that i have written might be a very simple one but donot know where I am wrong.
#"^([a-zA-Z._]+)#([\d]+)"
This RE is for the following string:
somename#somenumber
Now i am trying to retrieve the somename and somenumber. This is what i did:
ac.name = m.Groups[0].Value;
ac.number = m.Groups[1].Value;
Here ac.name reads the complete string, and ac.number reads somenumber. Where am I wrong in ac.name?
i guess the regex is correct, the problem is, you get the ac.name not from group 1 but group(0), which is the whole string. try this:
ac.name = m.Groups[1].Value;
ac.number = m.Groups[2].Value;
This regex is correct. I think your mistake is in somewhere else. You seem to use C#. So, you should think about the regex usage in the language.
Looking to the code sample in MSDN, you need to use 1-based indexes while accessing Groups instead of zero-based (as also Kent suggested). So, use this:
String name = m.Groups[1].Value;
String number = m.Groups[2].Value;
use this regex (\w+)#(\d+([.,]\d+)?)
Groups[1] will be contain name
Groups[2] will be contain number
I think you should move the + into the capture group:
#"^([a-zA-Z._]+)#([\d]+)"
If this is C#, try without the ^
([a-zA-Z\._]+)#([\d]+)
I just tried it out and it groups properly
Update: escaped the .
If you want only one match (and hence the ^ in original expression), use .Match instead of .Matches method. See MSDN documentation on Regular Expression Classes.
It would be great if someone could provide me the Regular expression for the following string.
Sample 1: <div>abc</div><br>
Sample 2: <div>abc</div></div></div></div></div><br>
As you can see in the samples provided above, I need to match the string no matter how many number of </div> occurs. If there occurs any other string between </div> and <br>, say like this <div>abc</div></div></div>DEF</div></div><br> OR <div>abc</div></div></div></div></div>DEF<br>, then the Regex should not match.
Thanks in advance.
Try this:
<div>([^<]+)(?:<\/div>)*<br>
As seen on rubular
Notes:
This only works if there are not tags in the abc part (or anything that has a < symbol).
You might want to use start and end of string anchors (^<div>([^<]+)(?:<\/div>)*<br>$ if you want your string to match the pattern exactly.
If you want to allow the abc part to be empty, use * instead of +
That being said, you should be wary of using regex to parse HTML.
In this example, you can use regex because you are parsing a (hopefully) known, regular subset of HTML. But a more robust solution (ie: an [X]HTML parser like HtmlAgilityPack) is preferred when it comes to parsing HTML.
You need to use a real parser. Things like infinitely nested tags can't be handled via regex.
You could also include a named group in the the expression, e.g.:
<div>(?<text>[^<]*)(?:<\/div>)*<br>
Implemented in C#:
var regex = new Regex(#"<div>(?<text>[^<]*)(?:<\/div>)*<br>");
Func<Match, string> getGroupText = m => (m.Success && m.Groups["text"] != null) ? m.Groups["text"].Value : null;
Func<string, string> getText = s => getGroupText(regex.Match(s));
Console.WriteLine(getText("<div>abc</div><br>"));
Console.WriteLine(getText("<div>123</div></div></div></div></div><br>"));
NullUserException's answer is good. Here are a couple of questions, and variations, depending on what you want.
Do you want to prevent anything from occurring before the open div tag? If so, keep the ^ at the beginning of the regex. If not, drop it.
The rest of this post refers to the following section of the regex:
([^<]+?)
Do you want to capture the contents of the div, or just know that it matches your form? To capture, leave it as is. If you don't need to capture, drop the parentheses from the above.
Do you want to match if there is nothing inside the div? If so change the + in the above to *
Finally, although it will work fine, you don't need the ? in the above.
I think, this regex is more flexible:
<div\b[^><]*+>(?>.*?</div>)(?:\s*+</div>)*+\s*+<br(?:\s*+/)?>
I don't include the ^ and $ in the beginning and the end of my regex because we cannot assure that your sample will always in a single line.