Alright, Regex gurus, how can I change my logic to fix this one?
I've made a regex:
(,[,]+)
It's supposed to remove extra commas on the end of a line. (end of line being \r\n) when formatted as a string.
It works (sort of).
This is the string:
Date,1-Jul-18,1-Jul-19,1-Jul-20,1-Jul-21,1-Jul-22,1-Jul-23,1-Jul-24,\r\nDate,1-Jul-18,1-Jul-19,1-Jul-20,1-Jul-21,1-Jul-22,1-Jul-23,1-Jul-24,,,,,\r\nDate,1-Jul-18,1-Jul-19,1-Jul-20,1-Jul-21,1-Jul-22,1-Jul-23,1-Jul-24,,,,,\r\nDate,1-Jul-18,1-Jul-19,1-Jul-20,1-Jul-21,1-Jul-22,1-Jul-23,1-Jul-24,,\r\n
When I run that regex, it gives a result of:
Date,1-Jul-18,1-Jul-19,1-Jul-20,1-Jul-21,1-Jul-22,1-Jul-23,1-Jul-24,\r\nDate,1-Jul-18,1-Jul-19,1-Jul-20,1-Jul-21,1-Jul-22,1-Jul-23,1-Jul-24\r\nDate,1-Jul-18,1-Jul-19,1-Jul-20,1-Jul-21,1-Jul-22,1-Jul-23,1-Jul-24\r\nDate,1-Jul-18,1-Jul-19,1-Jul-20,1-Jul-21,1-Jul-22,1-Jul-23,1-Jul-24\r\n
I need to remove the comma at the end of the first line (I think I need to be finding \r\n and killing any commas before that, until a non-comma.
Any thoughts about how to do this?
Thanks
(,+$) perhaps? (One or more commas followed immediately by the end of a line.)
If your language supports positive lookahead, try this -
([,]*)(?=\\r\\n)
I think you can match one or more , followed by \r\n by using ,+\\r\\n. Don't know how to replace that using C# sorry. In perl I would do
perl -pi -e 's/,+\\r\\n/\\r\\n/g' c.txt
(assuming that c.txt is a file containing your input text).
Related
I'm using regex in C# to getting filtered lines from a file. First I'm reading the whole .txt file into a string. In that case the code insert \r\n between the lines. But these are not special characters at the moment, just normaly strings. In one line the datas are separated by * char. My regex is looking for some value in some sepcial "cell" of a line, if it match, it must give back the whole line without \r\n at the begining and at the end.
Can yout take a look on my solution, and give me a tip, how to get matches without \r\n at the begining?
Thanks!
Regex solution
Try this out:
(?<=\\r\\n|^(?!\\r\\n))[^*]*?[*](iLVL_DUMMY)[*](?:[^*]*?[*])+?(2020|2019)[^*]*?[*].*?(?=\\r\\n)
you can add [^\r\n] to your regex's begin and end, like
[^\\r\\n*]*?[*](iLVL_DUMMY)[*][^*]*?[*][^*]*?[*][^*]*?[*][^*]*?[*](2020|2019)[^*]*?[*].*?(?=\\r\\n)
I want to match all comments in a text file and I use the following regex to match single line comment:
//(.*?)\r?\n
But it could not match the last line if the last line is a single comment line such as:
// test
so, how to write a single regex to match a whole line that with or without '\n' in C#, thanks!
You could write your regex as:
//(.*?)\r?$
The $ sign will match the end of the line.
You could consider this instead:
//(.*)\s*$
This will exclude any trailing whitespace (including newlines) from your capture group. I assume you wouldn't care about capturing trailing spaces.
The limitations of a regex to match comments are important to keep in mind: it will match some non-comment items in code. For example, consider this line:
get_web_page('http://www.foo.com');
If you only want to match comments on their own line, you could do this:
^\s*//(.*)\s*$
If you need to match comments that come after code, as well, the above problem can't be overcome easily with a regex.
Update: I am assuming that your code iterates through the file line by line, the most common case. However, if you have the entire file in a string and are matching the regex against that, you can enable multi-line mode for this to work.
^/s*?//([^\s]*?)/s*?$
a whole line start with '^' and end with '$', and some other white space characters '/s'
You dont need to specify new line because . matches any character except new line.
//(.*)
The last line does not contain a newline at the end, that is why your //(.*?)\r?\n regex fails.
You need to use a non-capturing group with a $ anchor as an alternative:
//(.*?)(?:\r?\n|$)
^^^^^^^^^^
See the .NET regex demo. Results:
Learning myself some Regex, while trying to parse a datasheet, and I'm thinking there's not an easy way (in Regex, I mean.. in C#, sure!) to do this. Say I have a file with the lines:
0000AA One Token - Value
0000AA Another Token- Another Value
0000AA YA Token - Yet Another
0000AA Yes, Another - Even More
0000AA
0000AA ______________________________________________________________________
0000AA This line - while it will match the regex, shouldn't.
So I have an easy multi-line regex:
^\s*[A-Z]{2}[0-9]{4}\s\s*(?<token>.*?)\-(?<value>.*?)$
This loads All the 'Tokens' into 'token', and all the values into 'value' group. Pretty simple! However, the Regex ALSO matches the bottom line, putting 'This line' into the token, and 'while it will [...]' into the value.
Essentially, I'd like the regex to only match the lines above the ____ separator line. Would this be possible with Regex alone, or will I need to modify my incoming string first to .Split() on the ____ separator line?
Cheers all --Mike.
Parsing such a text file with regex only would not be using the right tool for the job. Although possible, it would be both inefficient and unnecessarily complex.
I would actually not load all the text into a string and split on this line either, as it's not the most efficient way of doing this. I would rather read through the file in a loop, one line at a time, processing each line as needed. Then stop processing when you reach this particular line.
I'd like the regex to only match the lines above the ____ separator line. Would this be possible with Regex alone?
Sure it's possible. Add a lookahead to make sure such a line follows, something like:
(?=(?s).*^\w{6}[ \t]+_{4,})
Add this to the end of your expression to make sure that such a line follows. Eg:
(?m)^\s*[A-Z]{2}[0-9]{4}\s\s*(?<token>.*?)\-(?<value>.*)$(?=(?s).*^\w{6}[ \t]+_{4,})
(Also added m and s flags in the expression.)
This is not very efficient tho, as the regex engine will probably need to scan through most of the string for every match.
I am trying to use Regex to find out if a string matches *abc - in other words, it starts with anything but finishes with "abc"?
What is the regex expression for this?
I tried *abc but "Regex.Matches" returns true for xxabcd, which is not what I want.
abc$
You need the $ to match the end of the string.
.*abc$
should do.
So you have a few "fish" here, but here's how to fish.
An online expression library and .NET-based tester: RegEx Library
An online Ruby-based tester (faster than the .NET one) Rubular
A windows app for testing exressions (most fully-featured, but no zero-width look-aheads or behind) RegEx Coach
Try this instead:
.*abc$
The $ matches the end of the line.
^.*abc$
Will capture any line ending in abc.
It depends on what exactly you're looking for. If you're trying to match whole lines, like:
a line with words and spacesabc
you could do:
^.*abc$
Where ^ matches the beginning of a line and $ the end.
But if you're matching words in a line, e.g.
trying to match thisabc and thisabc but not thisabcd
You will have to do something like:
\w*abc(?!\w)
This means, match any number of continuous characters, followed by abc and then anything but a character (e.g. whitespace or the end of the line).
If you want a string of 4 characters ending in abc use, /^.abc$/
How can I replace lone instances of \n with \r\n (LF alone with CRLF) using a regular expression in C#?
I know to do it using plan String.Replace, like:
myStr.Replace("\n", "\r\n");
myStr.Replace("\r\r\n", "\r\n");
However, this is inelegant, and would destroy any "\r+\r\n" already in the text (although they are not likely to exist).
It might be faster if you use this.
(?<!\r)\n
It basically looks for any \n that is not preceded by a \r. This would most likely be faster, because in the other case, almost every letter matches [^\r], so it would capture that, and then look for the \n after that. In the example I gave, it would only stop when it found a \n, and them look before that to see if it found \r
Will this do?
[^\r]\n
Basically it matches a '\n' that is preceded with a character that is not '\r'.
If you want it to detect lines that start with just a single '\n' as well, then try
([^\r]|$)\n
Which says that it should match a '\n' but only those that is the first character of a line or those that are not preceded with '\r'
There might be special cases to check since you're messing with the definition of lines itself the '$' might not work too well. But I think you should get the idea.
EDIT: credit #Kibbee Using look-ahead s is clearly better since it won't capture the matched preceding character and should help with any edge cases as well. So here's a better regex + the code becomes:
myStr = Regex.Replace(myStr, "(?<!\r)\n", "\r\n");
I was trying to do the code below to a string and it was not working.
myStr.Replace("(?<!\r)\n", "\r\n")
I used Regex.Replace and it worked
Regex.Replace( oldValue, "(?<!\r)\n", "\r\n")
I guess that "myStr" is an object of type String, in that case, this is not regex.
\r and \n are the equivalents for CR and LF.
My best guess is that if you know that you have an \n for EACH line, no matter what, then you first should strip out every \r. Then replace all \n with \r\n.
The answer chakrit gives would also go, but then you need to use regex, but since you don't say what "myStr" is...
Edit:looking at the other examples tells me one thing.. why do the difficult things, when you can do it easy?, Because there is regex, is not the same as "must use" :D
Edit2: A tool is very valuable when fiddling with regex, xpath, and whatnot that gives you strange results, may I point you to: http://www.regexbuddy.com/
myStr.Replace("([^\r])\n", "$1\r\n");
$ may need to be a \
Try this: Replace(Char.ConvertFromUtf32(13), Char.ConvertFromUtf32(10) + Char.ConvertFromUtf32(13))
If I know the line endings must be one of CRLF or LF, something that works for me is
myStr.Replace("\r?\n", "\r\n");
This essentially does the same neslekkiM's answer except it performs only one replace operation on the string rather than two. This is also compatible with Regex engines that don't support negative lookbehinds or backreferences.