How to remove datetime from a Logfile string - c#

I have a logfile like this:
[2016 01 10 11:10:44] Operation3 \r\n
[2016 01 10 11:10:40] Operation2 \r\n
[2016 01 10 11:10:36] Operation1 \r\n
on that I perform a readAlllines operation so that in a string I have:
[2016 01 10 11:10:44] Operation3 \r\n[2016 01 10 11:10:40] Operation2 \r\n[2016 01 10 11:10:36] Operation1 \r\n
Now I have to remove all those timestamps.
Being a newbie and to be on the safe side I'd split it and the search on each item for start=indexOf("[") and indexOf("]") and the remove the subString by cutting each and then join all of them.
I'd like to know a smarter way to do that.
--EDIT--
Ok for downvoting me I didn't considered everything.
additional constraints:
I can't be sure of the fact that all line have the timestamp so I have to check each line for a "[" starting and a "]" in the middle
I can't even be sure for the [XXXX] lenght since I could have [2016 1 1 11:1:4] instead than [2016 01 01 11:01:04]. So it's important to check for its lenght.
Thanks

You don't need to cut/paste the lines, you can use string.replace.
This takes into account the lenght of Environment.NewLine.
while(true)
{
int start;
if (lines.Substring(0,1) == "[")
start = 0;
else
start = lines.IndexOf(Environment.NewLine + "[") + Environment.NewLine.Length;
int end = lines.IndexOf("] ");
if (start == -1 || end == -1)
break;
string subString = lines.Substring(start, end + 2 - start);
lines = lines.Replace(subString, "");
}

ReadAllLines returns an array of lines, so you don't need to look for the start of each item. If your timestamp format will be consistent, you can just trim off the start of the string.
string[] lines = File.ReadAllLines("log.txt");
foreach (string line in lines)
{
string logContents = line.SubString("[XXXX XX XX XX:XX:XX] ".Length);
}
Or combine this with a linq Select to do it in one step
var logContentsWithoutTimestamps = File.ReadAllLines("log.txt")
.Select(x => x.SubString("[XXXX XX XX XX:XX:XX] ".Length);
Without consistent format, you will need to identify what you are looking for. I would write a regular expression to remove what you are looking for, otherwise you may get caught by things you weren't expecting (for example, you mention that some lines may not have timestamps - they might have something else in square brackets instead which you don't want to remove).
Example:
Regex rxTimeStamp = new Regex("^\[\d{4} \d{2} \d{2} \d{1,2}:\d{1,2}:\d{1,2}\]\s*");
string[] lines = File.ReadAllLines("log.txt");
foreach (string line in lines)
{
string logContents = rxTimeStamp.Replace(line, String.Empty);
}
// or
var logContentsWithoutTimestamps = File.ReadAllLines("log.txt")
.Select(x => rxTimeStamp.Replace(x, String.Empty));
You'll need to tune the regular expression based on whether it misses anything, but that's beyond the scope of this question.

Since your code works and you search for some different way:
string result = string.Join(string.Empty, str.Skip(22));
for each item
Explanation:
Since every timestamp is of equal length you don`t need to search for beginning or end. Normally you would have to do length checks (empty lines etc) but this works even for smaller strings - you will just get an empty string in return if the size is < 22. An alternative way if your file really just contains timestamps.

Related

Retain 0 in front of a number

I need help figuring out a logic.
I have a working code where the code will take a string and figure out every numbers in that string then add 1.
string str = "";
str = Console.ReadLine();
str = Regex.Replace(
str,
#"\d+",
m => (Double.Parse(m.Groups[0].Value) + 1).ToString()
);
Example:
If I enter "User 000079 is making $1000 from Jan 02 to Feb 24".
The code will produce output: "User 80 is making $1001 from Jan 3 to Feb 25".
The problem is, I want to keep the 0 in front of the 80 and 3. (i.e. User 000080 is making $1001 from Jan 03 to Feb 25)
How do I do that?
Additional Info: Let me clarify, this is just an example. What I want is just a way to add 1 to every number appearing in the string. So if it means UserID, January 31 - Yes, I still want them to increase by 1
This will do what you need:
string str = Console.ReadLine();
str = Regex.Replace(
str,
#"\d+",
m => (Double.Parse(m.Groups[0].Value) + 1).ToString().PadLeft(m.Groups[0].Value.Length, '0')
);
You can fix this with the ToString format like below:
String x = (50).ToString("D8"); // "00000050"
You can find more info about this here: msdn
edit: About the lengte if you do this the length will be correct:
System.Text.RegularExpressions.Regex.Replace(
str,
#"\d+",
m => (int.Parse(m.Groups[0].Value) + 1).ToString("D" + m.Groups[0].Value.Length)
);
It sounds like your user ID numbers aren't really 'numbers', but rather strings of numeric characters. Can the representation of the ID be changed to a string instead?
If you only care about the extra zeroes when rendering the ID, you can use String.Format to render the numeric value correctly using Custom Numeric Formats (specifically, see the 'Zero Placeholder' section).

Regex masking of words that contain a digit

Trying to come up with a 'simple' regex to mask bits of text that look like they might contain account numbers.
In plain English:
any word containing a digit (or a train of such words) should be matched
leave the last 4 digits intact
replace all previous part of the matched string with four X's (xxxx)
So far
I'm using the following:
[\-0-9 ]+(?<m1>[\-0-9]{4})
replacing with
xxxx${m1}
But this misses on the last few samples below
sample data:
123456789
a123b456
a1234b5678
a1234 b5678
111 22 3333
this is a a1234 b5678 test string
Actual results
xxxx6789
a123b456
a1234b5678
a1234 b5678
xxxx3333
this is a a1234 b5678 test string
Expected results
xxxx6789
xxxxb456
xxxx5678
xxxx5678
xxxx3333
this is a xxxx5678 test string
Is such an arrangement possible with a regex replace?
I think I"m going to need some greediness and lookahead functionality, but I have zero experience in those areas.
This works for your example:
var result = Regex.Replace(
input,
#"(?<!\b\w*\d\w*)(?<m1>\s?\b\w*\d\w*)+",
m => "xxxx" + m.Value.Substring(Math.Max(0, m.Value.Length - 4)));
If you have a value like 111 2233 33, it will print xxxx3 33. If you want this to be free from spaces, you could turn the lambda into a multi-line statement that removes whitespace from the value.
To explain the regex pattern a bit, it's got a negative lookbehind, so it makes sure that the word behind it does not have a digit in it (with optional word characters around the digit). Then it's got the m1 portion, which looks for words with digits in them. The last four characters of this are grabbed via some C# code after the regex pattern resolves the rest.
I don't think that regex is the best way to solve this problem and that's why I am posting this answer. For so complex situations, building the corresponding regex is too difficult and, what is worse, its clarity and adaptability is much lower than a longer-code approach.
The code below these lines delivers the exact functionality you are after, it is clear enough and can be easily extended.
string input = "this is a a1234 b5678 test string";
string output = "";
string[] temp = input.Trim().Split(' ');
bool previousNum = false;
string tempOutput = "";
foreach (string word in temp)
{
if (word.ToCharArray().Where(x => char.IsDigit(x)).Count() > 0)
{
previousNum = true;
tempOutput = tempOutput + word;
}
else
{
if (previousNum)
{
if (tempOutput.Length >= 4) tempOutput = "xxxx" + tempOutput.Substring(tempOutput.Length - 4, 4);
output = output + " " + tempOutput;
previousNum = false;
}
output = output + " " + word;
}
}
if (previousNum)
{
if (tempOutput.Length >= 4) tempOutput = "xxxx" + tempOutput.Substring(tempOutput.Length - 4, 4);
output = output + " " + tempOutput;
previousNum = false;
}
Have you tried this:
.*(?<m1>[\d]{4})(?<m2>.*)
with replacement
xxxx${m1}${m2}
This produces
xxxx6789
xxxx5678
xxxx5678
xxxx3333
xxxx5678 test string
You are not going to get 'a123b456' to match ... until 'b' becomes a number. ;-)
Here is my really quick attempt:
(\s|^)([a-z]*\d+[a-z,0-9]+\s)+
This will select all of those test cases. Now as for C# code, you'll need to check each match to see if there is a space at the beginning or end of the match sequence (e.g., the last example will have the space before and after selected)
here is the C# code to do the replace:
var redacted = Regex.Replace(record, #"(\s|^)([a-z]*\d+[a-z,0-9]+\s)+",
match => "xxxx" /*new String("x",match.Value.Length - 4)*/ +
match.Value.Substring(Math.Max(0, match.Value.Length - 4)));

How Can I read From Line number() to line Starts with in C#

Let's say I have text file like this
<pre>----------------
hPa m C
---------------------
1004.0 28 13.6
1000.0 62 16.2
998.0 79 17.2
992.0 131 18.0
<pre>----------------
Sometext here
1000.0 10 10.6
1000.0 10 11.2
900.0 10 12.2
900.0 100 13.0
<aaa>----------------
How Can I Create Array in C# that reads text file from line number 5 (1004.0) to just before line that starts with string <pre>-
I used string[] lines = System.IO.File.ReadAllLines(Filepath);
To make each line in the array
The problem is I want only numbers of first section in the array in order to separate them later to another 3 arrays (hPa, m, C) .
Here's a possible solution. It's probably way more complicated than it should be, but that should give you an idea of possible mechanisms to further refine your data.
string[] lines = System.IO.File.ReadAllLines("test.txt");
List<double> results = new List<double>();
foreach (var line in lines.Skip(4))
{
if (line.StartsWith("<pre>"))
break;
Regex numberReg = new Regex(#"\d+(\.\d){0,1}"); //will find any number ending in ".X" - it's primitive, and won't work for something like 0.01, but no such data showed up in your example
var result = numberReg.Matches(line).Cast<Match>().FirstOrDefault(); //use only the first number from each line. You could use Cast<Match>().Skip(1).FirstOrDefault to get the second, and so on...
if (result != null)
results.Add(Convert.ToDouble(result.Value, System.Globalization.CultureInfo.InvariantCulture)); //Note the use of InvariantCulture, otherwise you may need to worry about , or . in your numbers
}
Do you mean this?
System.IO.StreamReader file = new System.IO.StreamReader(FILE_PATH);
int skipLines = 5;
for (int i = 0; i < skipLines; i++)
{
file.ReadLine();
}
// Do what you want here.

How can I get the IndexOf() method to return the correct values?

I have been working with googlemaps and i am now looking to format coordinates.
I get the coordinates in the following format:
Address(coordinates)zoomlevel.
I use the indexof method to get the start of "(" +1 so that i get the first number of the coordinate and store this value in a variable that i call "start".
I then do them same thing but this time i get the index of ")" -2 to get the last number of the last coordinate and store this value in a variable that i call "end".
I get the following error:
"Index and length must refer to a location within the string.Parameter name: length"
I get the following string as an imparameter:
"Loddvägen 155, 840 80 Lillhärdal, Sverige (61.9593214318303,14.0585965625)5"
by my calculations i should get the value 36 in the start variable and the value 65 in the end variable
but for some reason i get the values 41 in start and 71 in end.
why?
public string RemoveParantheses(string coord)
{
int start = coord.IndexOf("(")+1;
int end = coord.IndexOf(")")-2;
string formated = coord.Substring(start,end);
return formated;
}
I then tried hardcoding the correct values
string Test = cord.Substring(36,65);
I then get the following error:
startindex cannot be larger than length of string. parameter name startindex
I understand what both of the errors mean but in this case they are incorrect since im not going beyond the strings length value.
Thanks!
The second parameter of Substring is a length (MSDN source). Since you are passing in 65 for the second parameter, your call is trying to get the characters between 36 and 101 (36+65). Your string does not have 101 characters in it, so that error is thrown. To get the data between the ( characters, use this:
public string RemoveParantheses(string coord)
{
int start = coord.IndexOf("(")+1;
int end = coord.IndexOf(")")-2;
string formated = coord.Substring(start, end - start);
return formated;
}
Edit: The reason it worked with only the coordinates, was because the length of the total string was shorter, and since the coordinates started at the first position, the end coordinate was the last position. For example...
//using "Loddvägen 155, 840 80 Lillhärdal, Sverige (61.9593214318303,14.0585965625)5"
int start = coord.IndexOf("(") + 1; // 36
int end = coord.IndexOf(")")-2; // 65
coord.Substring(start, end); //looks at characters 35 through 101
//using (61.9593214318303,14.0585965625)5
int start = coord.IndexOf("(") + 1; // 1
int end = coord.IndexOf(")")-2; // 30
coord.Substring(start, end); //looks at characters 1 through 31
The second instance was valid because 31 actually existed in your string. Once you added the address to the beginning of the string, your code would no longer work.
Extracting parts of a string is a good use for regular expressions:
var match = Regex.Match(locationString, #"\((?<lat>[\d\.]+),(?<long>[\d\.]+)\)");
string latitude = match.Groups["lat"].Value;
string longitude = match.Groups["long"].Value;
You probably forgot to count newlines and other whitespaces, a \r\n newline is 2 "invisible" characters. The other mistake is that you are calling Substring with (Start, End) while its (Start, Count) or (Start, End - Start)
by my calculations i should get the value 36 in the start variable and the value 65 in the end variable
Then your calculations are wrong. With the string above I also see (and LinqPad confirms) that the open paren is at position 42 and the close paren is at index 73.
The error you're getting when using Substring is becuase the parameters to Substring are a beginning position and the length, not the ending position, so you should be using:
string formated = coord.Substring(start,(end-start+1));
That overload of Substring() takes two parameters, start index and a length. You've provided the second value as the index of the occurance of ) when really you want to get the length of the string you wish to trim, in this case you could subtract the index of ) from the index of (. For example: -
string foo = "Loddvägen 155, 840 80 Lillhärdal, Sverige (61.9593214318303,14.0585965625)5";
int start = foo.IndexOf("(") + 1;
int end = foo.IndexOf(")");
Console.Write(foo.Substring(start, end - start));
Console.Read();
Alternatively, you could parse the string using a regular expression, for example: -
Match r = Regex.Match(foo, #"\(([^)]*)\)");
Console.Write(r.Groups[1].Value);
Which will probably perform a little better than the previous example
string input =
"Loddvägen 155, 840 80 Lillhärdal, Sverige (61.9593214318303,14.0585965625)5";
var groups = Regex.Match(input,
#"\(([\d\.]+),([\d\.]+)\)(\d{1,2})").Groups;
var lat = groups[1].Value;
var lon = groups[2].Value;
var zoom = groups[3].Value;

Need multiple regular expression matches using C#

So I have this list of flight data and I need to be able to parse through it using regular expressions (this isn't the entire list).
1 AA2401 F7 A4 Y7 B7 M7 H7 K7 /DFW A LAX 4 0715 0836 E0.M80 9 3:21
2 AA2421 F7 A1 Y7 B7 M7 H7 K7 DFWLAX 4 1106 1215 E0.777 7 3:09
3UA:US6352 B9 M9 H9 K0 /DFW 1 LAX 1200 1448 E0.733 1:48
For example, I might need from the first line 1, AA, 2401, and so on and so on. Now, I'm not asking for someone to come up with a regular expression for me because for the most part I'm getting to where I can pretty much handle that myself. My issue has more to do with being able to store the data some where and access it.
So I'm just trying to initially just "match" the first piece of data I need, which is the line number '1'. My "pattern" for just getting the first number is: ".?(\d{1,2}).*" . The reason it's {1,2} is because obviously once you get past 10 it needs to be able to take 2 numbers. The rest of the line is set up so that it will definitely be a space or a letter.
Here's the code:
var assembly = Assembly.GetExecutingAssembly();
var textStreamReader = new StreamReader(
assembly.GetManifestResourceStream("FlightParser.flightdata.txt"));
List<string> lines = new List<string>();
do
{
lines.Add(textStreamReader.ReadLine());
} while (!textStreamReader.EndOfStream);
Regex sPattern = new Regex(#".?(\d{1,2}).*");//whatever the pattern is
foreach (string line in lines)
{
System.Console.Write("{0,24}", line);
MatchCollection mc = sPattern.Matches(line);
if ( sPattern.IsMatch(line))
{
System.Console.WriteLine(" (match for '{0}' found)", sPattern);
}
else
{
System.Console.WriteLine();
}
System.Console.WriteLine(mc[0].Groups[0].Captures);
System.Console.WriteLine(line);
}//end foreach
System.Console.ReadLine();
With the code I'm writing, I'm basically just trying to get '1' into the match collection and somehow access it and write it to the console (for the sake of testing, that's not the ultimate goal).
Your regex pattern includes an asterisk which matches any number of characters - ie. the whole line. Remove the "*" and it will only match the "1". You may find an online RegEx tester such as this useful.
Assuming your file is not actually formatted as you posted and has each of the fields separated by something, you can match the first two-digit number of the line with this regex (ignoring 0 and leading zeros):
^\s*([1-9]\d?)
Since it is grouped, you can access the matched part through the Groups property of the Match object.
var line = "12 foobar blah 123 etc";
var re = new Regex(#"^\s*([1-9]\d?)");
var match = re.Match(line);
if (match.Success)
{
Console.WriteLine(match.Groups[1].Value); // "12"
}
else
{
Console.WriteLine("No match");
}
The following expression matches the first digit, that you wanted to capture, in the group "First".
^\s*(?<First>\d{1})
I find this regular expression tool highly useful when dealing with regex. Give it a try.
Also set RegexOption to Multiline when you are making the match.

Categories

Resources