Math expression on Regex expression

Math expression on Regex expression - c#

I have many text rows, and I must find some rows, and to change them.
I wrote such regex rule:
^(Position) ([0-9]+)$
For example, I must find all such rows:
Position 10
Position 11
Position 12
Now I must increase numbers at 5. How can I do it through Regex?
I try to wrote such regex rule:
$1 {$2+ 5}
I need get result:
Position 15
Position 16
Position 17
But I have got:
Position {10 +5}
Position {11+5}
Position {12+5}

the Regex Replace function takes either a string, or a function. you used the string replacement, so just the string is inserted. if you want an integer operation, you need to use the replace with function method.
http://msdn.microsoft.com/library/cft8645c(v=vs.80).aspx
this code is not correct, it should just show the way how it could be done
Regex.Replace("^(Position) ([0-9]+)$", ReplaceFunction);
public string ReplaceFunction(Match m) { return "Position " + (int.Parse(m.Groups[2].Value) + 5); };

string input = #"Position 10";
string output = Regex.Replace(input, "^Position ([0-9]+)$", match => "Position " + Int32.Parse(match.Groups[1].Value) + 5);

Related

How to remove datetime from a Logfile string

I have a logfile like this:
[2016 01 10 11:10:44] Operation3 \r\n
[2016 01 10 11:10:40] Operation2 \r\n
[2016 01 10 11:10:36] Operation1 \r\n
on that I perform a readAlllines operation so that in a string I have:
[2016 01 10 11:10:44] Operation3 \r\n[2016 01 10 11:10:40] Operation2 \r\n[2016 01 10 11:10:36] Operation1 \r\n
Now I have to remove all those timestamps.
Being a newbie and to be on the safe side I'd split it and the search on each item for start=indexOf("[") and indexOf("]") and the remove the subString by cutting each and then join all of them.
I'd like to know a smarter way to do that.
--EDIT--
Ok for downvoting me I didn't considered everything.
additional constraints:
I can't be sure of the fact that all line have the timestamp so I have to check each line for a "[" starting and a "]" in the middle
I can't even be sure for the [XXXX] lenght since I could have [2016 1 1 11:1:4] instead than [2016 01 01 11:01:04]. So it's important to check for its lenght.
Thanks

You don't need to cut/paste the lines, you can use string.replace.
This takes into account the lenght of Environment.NewLine.
while(true)
{
int start;
if (lines.Substring(0,1) == "[")
start = 0;
else
start = lines.IndexOf(Environment.NewLine + "[") + Environment.NewLine.Length;
int end = lines.IndexOf("] ");
if (start == -1 || end == -1)
break;
string subString = lines.Substring(start, end + 2 - start);
lines = lines.Replace(subString, "");
}

ReadAllLines returns an array of lines, so you don't need to look for the start of each item. If your timestamp format will be consistent, you can just trim off the start of the string.
string[] lines = File.ReadAllLines("log.txt");
foreach (string line in lines)
{
string logContents = line.SubString("[XXXX XX XX XX:XX:XX] ".Length);
}
Or combine this with a linq Select to do it in one step
var logContentsWithoutTimestamps = File.ReadAllLines("log.txt")
.Select(x => x.SubString("[XXXX XX XX XX:XX:XX] ".Length);
Without consistent format, you will need to identify what you are looking for. I would write a regular expression to remove what you are looking for, otherwise you may get caught by things you weren't expecting (for example, you mention that some lines may not have timestamps - they might have something else in square brackets instead which you don't want to remove).
Example:
Regex rxTimeStamp = new Regex("^\[\d{4} \d{2} \d{2} \d{1,2}:\d{1,2}:\d{1,2}\]\s*");
string[] lines = File.ReadAllLines("log.txt");
foreach (string line in lines)
{
string logContents = rxTimeStamp.Replace(line, String.Empty);
}
// or
var logContentsWithoutTimestamps = File.ReadAllLines("log.txt")
.Select(x => rxTimeStamp.Replace(x, String.Empty));
You'll need to tune the regular expression based on whether it misses anything, but that's beyond the scope of this question.

Since your code works and you search for some different way:
string result = string.Join(string.Empty, str.Skip(22));
for each item
Explanation:
Since every timestamp is of equal length you don`t need to search for beginning or end. Normally you would have to do length checks (empty lines etc) but this works even for smaller strings - you will just get an empty string in return if the size is < 22. An alternative way if your file really just contains timestamps.

Filtering on full string match but not on substrings

So I've got a long string of numbers and characters and I'd like to filter out a substring. The thing I'm struggling with is that I need a full match on a certain value (starting with S) but this may not be matched in another value.
Input:
S10 1+0000000297472+00EURS100 1+0000000297472+00EURS1023P 1+0000000816072+00EUR
The input is exactly like this.
Breakdown of input:
S10 1+0000000297472+00EUR
Every part starts with a tag S and ends with EUR
There are spaces in between because every part has a fixed length
=>
index 0 : tag 'S' with length 1
index 1 : code with length 7
index 8 : numbertype with length 1
index 9 : sign with length 1
index 10 : value with length 13
index 23 : sign with length 1
index 24 : exponent with length 2
index 26 : unit with length 3
I need to match on for example S10 and I only want this substring till EUR. I don't want it to match on S100 or S1023P or any other combination. Only on exactly S10
Output:
S10 1+0000000297472+00EUR
I'm trying to use Regex to find my match on 'S + code'. I'm doing a full match on my search query and then as soon as anything follows I don't want it anymore. But doing it like this also discards the actual match as after the S10 the value will follow which will match with [^\d|^\D])+\w
foreach (var field in fieldList)
{
var query = "S" + field.BallanceCode;
var index = Regex.Match(values, Regex.Escape(query) + #"([^\d|^\D])+\w").Index;
}
For example when looking for S10
needs to match:
S10 1+0000000297472+00EUR
may not match:
S10/15 1+0000001748447+00EUR
S1023P 1+0000000816072+00EUR
S10000001+0000000546546+00EUR
Update:
Using this code
var index = Regex.Match(values, Regex.Escape(query) + #"\p{Zs}.*?EUR").Index;
wil yield S10, S10/15, etc when looked for. However looking for S1000000 in the string doesn't work because there is no whitespace between the code and 1+
S10000001+0000000546546+00EUR
For example when looking for S1000000
needs to match:
S10000001+0000000297472+00EUR
may not match:
S10 1+0000001748447+00EUR
S1023P 1+0000000816072+00EUR
S10/15 1+0000000546546+00EUR

You can use a regex that requires a space (or whitespace) to appear right after the field.BallanceCode:
var index = Regex.Match(values, Regex.Escape(query) + (field.BallanceCode.Length < 7 ? #"\p{Zs}" : "") + ".*?EUR").Index;
The regex will match the S10, then any horizontal whitespace (\p{Zs}), then any 0 or more characters other than a newline (as few as possible due to *?) up to the first EUR.
The (field.BallanceCode.Length < 7 ? #"\p{Zs}" : "") check is necessary to support a 7-digit BallanceCode. If it contains 7 digits or more, we do not check if there is a whitespace after it. If the length is less than 7, we check for a space.

So you just want the start (S...) and end (...EUR) of each line and skip everything in between?
^([sS]\d+).*?([\d\+]+EUR)$
http://regexr.com/3c1ob

IP regex mask not working in WPF

I read questions about IP mask but haven't found an answer
I'm trying to write a textbox in wpf with using regex to validate IP. This is my xaml code
This code is working
<TextBox wpfApplication2:Masking.Mask="^([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$"/>
I can write 192 or 255 or 29, for example
After that I want to add a dot character. And this crash my code. So I expecting that I can write
192. or 255. or 29.
I think that problem in brackets, but can't understand how to resolve it. There are my incorrect solutions:
<TextBox wpfApplication2:Masking.Mask="^([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])[.]$"/>
and
<TextBox wpfApplication2:Masking.Mask="^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])[.])$"/>
I'm sure that mistake is very silly but can't find it
UPDATE
Thanks for #stribizhev, who gave explanation and answer for IP address.
Just for my aquestion: I should use {0,1} after [.]. So correct answer for my question (how to create mask for numbers 192. or 255. or 29.) is
<TextBox wpfApplication2:Masking.Mask="^([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])(\.){0,1}$"/>

Here is the regex you can use for live validation (not for final one):
^(?:[0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])?){0,3}$
See demo
The main point when writing a regex for live validation is to make parts optional. It can be done with *, ? and {0,x} quantifiers. Here is a regex break-down:
^ - start of string
(?:[0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]) - this is the first number, it is obligatory, but if you plan to let the value be empty, add a ? at the end
(?:\.(?:[0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])?){0,3} - a sequence of 0 to 3 occurrences of....
\. - a literal dot (in a verbatim string literal, the one with #"...")
(?:[0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])? - a sequence of the numbers allowed, 1 or 0 occurence (as there is ? at the end)
$ - end of string
For final validation, use
^(?:[0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])){3}$
See another demo
This regex checks the whole, final IP string.

If you want to accept any IP address as a subnet mask:
var num = #"(25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})";
var rx = new Regex("^" + num + #"\." + num + #"\." + num + #"\." + num + "$");
I considered easier to split the "repeating" match for a single group of numbers in a separate variable.
As an exercise for the reader, I'll give another variant of the expression. This one will capture all the numbers in the same group but different captures:
var rx = new Regex("^(?:" + num + #"(?:\.(?!$)|$)){4}$");
but it's wrong, you should use this
var num = #"(255|254|252|248|240|224|192|128|0+)";
var rx = new Regex("^" + num + #"\." + num + #"\." +num + #"\." +num + "$");
or
var rx = new Regex("^(?:" + num + #"(?:\.(?!$)|$)){4}$");
http://www.freesoft.org/CIE/Course/Subnet/6.htm

Regex masking of words that contain a digit

Trying to come up with a 'simple' regex to mask bits of text that look like they might contain account numbers.
In plain English:
any word containing a digit (or a train of such words) should be matched
leave the last 4 digits intact
replace all previous part of the matched string with four X's (xxxx)
So far
I'm using the following:
[\-0-9 ]+(?<m1>[\-0-9]{4})
replacing with
xxxx${m1}
But this misses on the last few samples below
sample data:
123456789
a123b456
a1234b5678
a1234 b5678
111 22 3333
this is a a1234 b5678 test string
Actual results
xxxx6789
a123b456
a1234b5678
a1234 b5678
xxxx3333
this is a a1234 b5678 test string
Expected results
xxxx6789
xxxxb456
xxxx5678
xxxx5678
xxxx3333
this is a xxxx5678 test string
Is such an arrangement possible with a regex replace?
I think I"m going to need some greediness and lookahead functionality, but I have zero experience in those areas.

This works for your example:
var result = Regex.Replace(
input,
#"(?<!\b\w*\d\w*)(?<m1>\s?\b\w*\d\w*)+",
m => "xxxx" + m.Value.Substring(Math.Max(0, m.Value.Length - 4)));
If you have a value like 111 2233 33, it will print xxxx3 33. If you want this to be free from spaces, you could turn the lambda into a multi-line statement that removes whitespace from the value.
To explain the regex pattern a bit, it's got a negative lookbehind, so it makes sure that the word behind it does not have a digit in it (with optional word characters around the digit). Then it's got the m1 portion, which looks for words with digits in them. The last four characters of this are grabbed via some C# code after the regex pattern resolves the rest.

I don't think that regex is the best way to solve this problem and that's why I am posting this answer. For so complex situations, building the corresponding regex is too difficult and, what is worse, its clarity and adaptability is much lower than a longer-code approach.
The code below these lines delivers the exact functionality you are after, it is clear enough and can be easily extended.
string input = "this is a a1234 b5678 test string";
string output = "";
string[] temp = input.Trim().Split(' ');
bool previousNum = false;
string tempOutput = "";
foreach (string word in temp)
{
if (word.ToCharArray().Where(x => char.IsDigit(x)).Count() > 0)
{
previousNum = true;
tempOutput = tempOutput + word;
}
else
{
if (previousNum)
{
if (tempOutput.Length >= 4) tempOutput = "xxxx" + tempOutput.Substring(tempOutput.Length - 4, 4);
output = output + " " + tempOutput;
previousNum = false;
}
output = output + " " + word;
}
}
if (previousNum)
{
if (tempOutput.Length >= 4) tempOutput = "xxxx" + tempOutput.Substring(tempOutput.Length - 4, 4);
output = output + " " + tempOutput;
previousNum = false;
}

Have you tried this:
.*(?<m1>[\d]{4})(?<m2>.*)
with replacement
xxxx${m1}${m2}
This produces
xxxx6789
xxxx5678
xxxx5678
xxxx3333
xxxx5678 test string
You are not going to get 'a123b456' to match ... until 'b' becomes a number. ;-)

Here is my really quick attempt:
(\s|^)([a-z]*\d+[a-z,0-9]+\s)+
This will select all of those test cases. Now as for C# code, you'll need to check each match to see if there is a space at the beginning or end of the match sequence (e.g., the last example will have the space before and after selected)
here is the C# code to do the replace:
var redacted = Regex.Replace(record, #"(\s|^)([a-z]*\d+[a-z,0-9]+\s)+",
match => "xxxx" /*new String("x",match.Value.Length - 4)*/ +
match.Value.Substring(Math.Max(0, match.Value.Length - 4)));

How can I get the IndexOf() method to return the correct values?

I have been working with googlemaps and i am now looking to format coordinates.
I get the coordinates in the following format:
Address(coordinates)zoomlevel.
I use the indexof method to get the start of "(" +1 so that i get the first number of the coordinate and store this value in a variable that i call "start".
I then do them same thing but this time i get the index of ")" -2 to get the last number of the last coordinate and store this value in a variable that i call "end".
I get the following error:
"Index and length must refer to a location within the string.Parameter name: length"
I get the following string as an imparameter:
"Loddvägen 155, 840 80 Lillhärdal, Sverige (61.9593214318303,14.0585965625)5"
by my calculations i should get the value 36 in the start variable and the value 65 in the end variable
but for some reason i get the values 41 in start and 71 in end.
why?
public string RemoveParantheses(string coord)
{
int start = coord.IndexOf("(")+1;
int end = coord.IndexOf(")")-2;
string formated = coord.Substring(start,end);
return formated;
}
I then tried hardcoding the correct values
string Test = cord.Substring(36,65);
I then get the following error:
startindex cannot be larger than length of string. parameter name startindex
I understand what both of the errors mean but in this case they are incorrect since im not going beyond the strings length value.
Thanks!

The second parameter of Substring is a length (MSDN source). Since you are passing in 65 for the second parameter, your call is trying to get the characters between 36 and 101 (36+65). Your string does not have 101 characters in it, so that error is thrown. To get the data between the ( characters, use this:
public string RemoveParantheses(string coord)
{
int start = coord.IndexOf("(")+1;
int end = coord.IndexOf(")")-2;
string formated = coord.Substring(start, end - start);
return formated;
}
Edit: The reason it worked with only the coordinates, was because the length of the total string was shorter, and since the coordinates started at the first position, the end coordinate was the last position. For example...
//using "Loddvägen 155, 840 80 Lillhärdal, Sverige (61.9593214318303,14.0585965625)5"
int start = coord.IndexOf("(") + 1; // 36
int end = coord.IndexOf(")")-2; // 65
coord.Substring(start, end); //looks at characters 35 through 101
//using (61.9593214318303,14.0585965625)5
int start = coord.IndexOf("(") + 1; // 1
int end = coord.IndexOf(")")-2; // 30
coord.Substring(start, end); //looks at characters 1 through 31
The second instance was valid because 31 actually existed in your string. Once you added the address to the beginning of the string, your code would no longer work.

Extracting parts of a string is a good use for regular expressions:
var match = Regex.Match(locationString, #"\((?<lat>[\d\.]+),(?<long>[\d\.]+)\)");
string latitude = match.Groups["lat"].Value;
string longitude = match.Groups["long"].Value;

You probably forgot to count newlines and other whitespaces, a \r\n newline is 2 "invisible" characters. The other mistake is that you are calling Substring with (Start, End) while its (Start, Count) or (Start, End - Start)

by my calculations i should get the value 36 in the start variable and the value 65 in the end variable
Then your calculations are wrong. With the string above I also see (and LinqPad confirms) that the open paren is at position 42 and the close paren is at index 73.
The error you're getting when using Substring is becuase the parameters to Substring are a beginning position and the length, not the ending position, so you should be using:
string formated = coord.Substring(start,(end-start+1));

That overload of Substring() takes two parameters, start index and a length. You've provided the second value as the index of the occurance of ) when really you want to get the length of the string you wish to trim, in this case you could subtract the index of ) from the index of (. For example: -
string foo = "Loddvägen 155, 840 80 Lillhärdal, Sverige (61.9593214318303,14.0585965625)5";
int start = foo.IndexOf("(") + 1;
int end = foo.IndexOf(")");
Console.Write(foo.Substring(start, end - start));
Console.Read();
Alternatively, you could parse the string using a regular expression, for example: -
Match r = Regex.Match(foo, #"\(([^)]*)\)");
Console.Write(r.Groups[1].Value);
Which will probably perform a little better than the previous example

string input =
"Loddvägen 155, 840 80 Lillhärdal, Sverige (61.9593214318303,14.0585965625)5";
var groups = Regex.Match(input,
#"\(([\d\.]+),([\d\.]+)\)(\d{1,2})").Groups;
var lat = groups[1].Value;
var lon = groups[2].Value;
var zoom = groups[3].Value;

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Math expression on Regex expression - c#

string input = #"Position 10"; string output = Regex.Replace(input, "^Position ([0-9]+)$", match => "Position " + Int32.Parse(match.Groups[1].Value) + 5);

Related

How to remove datetime from a Logfile string

Filtering on full string match but not on substrings

IP regex mask not working in WPF

Regex masking of words that contain a digit

How can I get the IndexOf() method to return the correct values?

Categories

Resources