This question already has answers here:
Getting required information from Log file using Split
(4 answers)
Closed 9 years ago.
I am reading a text file to upload it into database. The text file contains like this with no headers...
[10-10-2013 11:20:33.444 CDF] 1000020 Incident T This is the error message
[10-10-2013 11:20:33.445 CDF] 1000020 Incident T This is the second error message
How can I store "10-10-2013 11:20:33" in Date Column and milliseconds 444 in integer column of database. Here if I try to use split with space first, it will split date into 3 parts. I want to get date between the brackets and then get the rest with split spaces.
Two points to mention here.
1. Here we have spaces in between date column.
2. Also I should be able to get other columns
The real simplest way to do this is to use regular expressions, not gobs of split and indexof operations.
Regular expressions allow you to specify a pattern out of which pieces of a string can be extracted in a straightforward fashion. If the format changes, or there is some subtlety not initially accounted for, you can fix the problem by adjusting the expression, rather than rewriting a bunch of code.
Here's some documentation for regular expressions in .NET: http://msdn.microsoft.com/en-us/library/az24scfc.aspx
This is some sample code that'll probably do what you want. You may need to tweak a little to get the desired results.
var m = Regex.Match(currentLine, #"^\[(?<date>[^\]]*)\]\s+(?<int>[0-9]+)\s+(?<message>.*)\s*$");
if(m.Success) {
// may need to do something fancier to parse the date, but that's an exercise for the reader
var myDate = DateTime.Parse(m.Groups["date"].Value);
var myInt = int.Parse(m.Groups["int"].Value);
var myMessage = m.Groups["message"].Value;
}
The simplest way to do this is to just use String.Split and String.Substring
Generically I would do this:
//find the indices of the []
var leftIndex = currentLine.IndexOf("[");
var rightIndex = currentLine.IndexOf("]");
//this get's the date portion of the string
var dateSubstring = currentLine.Substring(leftIndex, rightIndex - leftIndex);
var dateParts = dateSubstring.Split(new char[] {'.'});
// get the datetime portion
var dateTime = dateParts[0];
var milliseconds = Int16.Parse(dateParts[1]);
EDIT
Since the date portion is fixed width you could just use Substring for everything.
Related
I am reading a couple of csv files into var's as follows:
var myFullCsv = ReadFile(myFullCsvFilePath);
var masterCsv = ReadFile(csvFilePath);
Some of the line entries in each csv appear in both files and I am able to create a new var containing lines that exists in myFullCsv but not in masterCsv as follows:
var extraFilesCsv = myFullCsv.Except(masterCsv);
This is great because its very simple. However, I now wish to identify lines in myFullCsv where a specific string appears in the line. The string will correspond to one column of the csv data. I know that I can do this by reading each line of the var and splitting it up, then comparing the field I'm interested in to the string that I am searching for. However, this seems like a very long and inefficient approach as compared to my code above using the 'Except' command.
Is there some way that I can get the lines from myFullCsv with a very simple command or will I have to do it the long way? Please don't ask me to show the long way as that's what I am trying to avoid having to code although I can do it.
Sample csv data:
07801.jpg,67466,9452d316,\Folder1\FolderA\,
07802.jpg,78115,e50492d8,\Folder1\FolderB\,
07803.jpg,41486,37b6a100,\Folder1\FolderC\,
07804.jpg,93500,acdffc2b,\Folder2\FolderA\,
07805.jpg,67466,9452d316,\Folder2\FolderB\,
Sample desired output (I'm always looking for the entry in the 3rd column to match a string, in this case 9452d316):
07801.jpg,67466,9452d316,\Folder1\FolderA\,
07805.jpg,67466,9452d316,\Folder2\FolderB\,
Well you could use:
var results = myFullCsv.Where(line => line.Split(',')[2] == targetValue)
.ToList();
That's just doing the "splitting and checking" you mention in the question but it's pretty simple code. It could be more efficient if you only consider as far as the third comma, but I wouldn't worry about that until it's proved to be a problem.
Personally I'd probably parse each line to an object with meaningful properties rather than treating is as a string, but that's probably what you mean by "the long way".
Note that this doesn't perform any validation, or try to handle escaped commas, or lines with fewer columns etc. Depending on your data source, you may need to make it a lot more robust.
You could use a regex. It doesn't require every line to have at least 3 elements. It doesn't allocate a string array for each line. Therefore it may be faster, but you'd have to test it to prove it.
var regex = new Regex("^.+?,.+?," + Regex.Escape(targetValue) + ",");
var results = myFullCsv.Where(l => regex.IsMatch(l)).ToList();
I have a text file which contains a list of alphabetically organized variables with their variable numbers next to them formatted something like follows:
aabcdef 208
abcdefghijk 1191
bcdefga 7
cdefgab 12
defgab 100
efgabcd 999
fgabc 86
gabcdef 9
h 11
ijk 80
...
...
I would like to read each text as a string and keep it's designated id# something like read "aabcdef" and store it into an array at spot 208.
The 2 issues I'm running into are:
I've never read from file in C#, is there a way to read, say from
start of line to whitespace as a string? and then the next string as
an int until the end of line?
given the nature and size of these files I do not know the highest ID value of each file (not all numbers are used so some
files could house a number like 3000, but only actually list 200
variables) So how could I make a flexible way to store these
variables when I don't know how big the array/list/stack/etc.. would
need to be.
Basically you need a Dictionary instead of an array or list. You can read all lines with File.ReadLines method then split each of them based on space and \t (tab), like this:
var values = File.ReadLines("path")
.Select(line => line.Split(new [] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries))
.ToDictionary(parts => int.Parse(parts[1]), parts => parts[0]);
Then values[208] will give you aabcdef. It looks like an array doesn't it :)
Also make sure you have no duplicate numbers because Dictionary keys should be unique otherwise you will get an exception.
I've been thinking about how I would improve other answers and I've found this alternative solution based on Regex which makes the search into the whole string (either coming from a file or not) safer.
Check that you can alter the whole regular expression to include other separators. Sample expression will detect spaces and tabs.
At the end of the day, I found that MatchCollection returns a safer result, since you always know that 3rd group is an integer and 2nd group is a text because regular expression does a lot of checking for you!
StringBuilder builder = new StringBuilder();
builder.AppendLine("djdodjodo\t\t3893983");
builder.AppendLine("dddfddffd\t\t233");
builder.AppendLine("djdodjodo\t\t39838");
builder.AppendLine("djdodjodo\t\t12");
builder.AppendLine("djdodjodo\t\t444");
builder.AppendLine("djdodjodo\t\t5683");
builder.Append("djdodjodo\t\t33");
// Replace this line with calling File.ReadAllText to read a file!
string text = builder.ToString();
MatchCollection matches = Regex.Matches(text, #"([^\s^\t]+)(?:[\s\t])+([0-9]+)", RegexOptions.IgnoreCase | RegexOptions.Multiline);
// Here's the magic: we convert an IEnumerable<Match> into a dictionary!
// Check that using regexps, int.Parse should never fail because
// it matched numbers only!
IDictionary<int, string> lines = matches.Cast<Match>()
.ToDictionary(match => int.Parse(match.Groups[2].Value), match => match.Groups[1].Value);
// Now you can access your lines as follows:
string value = lines[33]; // <-- By value
Update:
As we discussed in chat, this solution wasn't working in some actual use case you showed me, but it's not the approach what's not working but your particular case, because keys are "[something].[something]" (for example: address.Name).
I've changed given regular expression to ([\w\.]+)[\s\t]+([0-9]+) so it covers the case of key having a dot.
It's about improving the matching regular expression to fit your requirements! ;)
Update 2:
Since you told me that you need keys having any character, I've changed the regular expression to ([^\s^\t]+)(?:[\s\t])+([0-9]+).
Now it means that key is anything excepting spaces and tabs.
Update 3:
Also I see you're stuck in .NET 3.0 and ToDictionary was introduced in .NET 3.5. If you want to get the same approach in .NET 3.0, replace ToDictionary(...) with:
Dictionary<int, string> lines = new Dictionary<int, string>();
foreach(Match match in matches)
{
lines.Add(int.Parse(match.Groups[2].Value), match.Groups[1].Value);
}
I'm developing Windows Phone app which consuming web API calls. Most of call are returning JSON strings, but one of them return the following line of code:
var buyPrice=[[Date.UTC(2012,0,9),385.250000], [Date.UTC(2012,0,10),386.250000], [Date.UTC(2012,0,11),387.000000]];
It seems that above mentioned line of code is a regular declaration of JavaScript collection. It does mean I can't parse it as JSON, moreover it contains a word
"var buyPrice="
which can't be parsed as well.
So I need to convert above mentioned collection to the corresponding C# array/collection, but not sure I'm able to do that.
Is it possible to do this by using C# abilities or I need some third party library?
Without external librairies (and I don't know any for this kind of tasks), you can use Regex:
var input = "var buyPrice=[[Date.UTC(2012,0,9),385.250000], [Date.UTC(2012,0,10),386.250000], [Date.UTC(2012,0,11),387.000000]];"
var regex = #"\[Date.UTC\((?<year>\d{4}),(?<month>\d{1,2}),(?<day>\d{1,2})\),(?<price>\d+(\.\d+)?)]";
var matches = Regex.Matches(input, regex)
.OfType<Match>()
.Select(m => new
{
Date = new DateTime(
Int32.Parse(m.Groups["year"].Value),
Int32.Parse(m.Groups["month"].Value) + 1,
Int32.Parse(m.Groups["day"].Value)
),
Price = Decimal.Parse(m.Groups["price"].Value)
});
Because Date.UTC takes a month starting from 0, you have to add one. Based on your input, this will return three anonymous object with a Datetime Date and decimal Price properties.
Note that this Regex does not try to validate the input (the month and day 00-99 is valid), but it's a good starting point.
I want to search for all possible dates in a string using Regex.
In my code i have this:
String dateSearchPattern = #"(?<Day>\d{2}).(?<Month>\d{2}).(?<Year>\d{4})|(?<Day>\d{2}).(?<Month>\d{2}).(?<Year>\d{2})";
// date format: dd.mm.yyyy or d.m.yyyy or dd.mm.yy or d.m.yy
String searchText = "20.03.2010.25.03.10";
Regex.Matches(searchText, dateSearchPattern); // the matching SHOULD give a count of 2
The above code gives only 1 match where it should give 2. Also i need to have a patthern when the date format is like d.m.yyyy or d.m.yy.
The pattern seems perfectly ok. It is giving two match. By any chance have you used the following line to check the count?
var match = Regex.Matches(searchText, dateSearchPattern);
Console.WriteLine(match.Count);
I used SD 3 on .Net 3.5 (w/o sp1) and your code is giving your desired result.
You can change your pattern to this:
"(?<Day>\d{1,2}).(?<Month>\d{1,2}).(?:(?<Year>\d{4})|(?<Year>\d{2}))"
This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Parsing formatted string.
How can I use a String.Format format and transform its output to its inputs?
For example:
string formatString = "My name is {0}. I have {1} cow(s).";
string s = String.Format(formatString, "strager", 2);
// Call the magic method...
ICollection<string> parts = String.ReverseFormat(formatString, s);
// parts now contains "strager" and "2".
I know I can use regular expressions to do this, but I would like to use the same format string so I only need to maintain one line of code instead of two.
Here is some code from someone attempting a Scanf equivalent in C#:
http://www.codeproject.com/KB/recipes/csscanf.aspx
You'll have to implement it yourself, as there's nothing built in to do it for you.
To that end, I suggest you get the actual source code for the .Net string.format implmentation (actually, the relevant code is in StringBuilder.AppendFormat()). It's freely available, and it uses a state machine to walk the string in a very performant manner. You can mimic that code to also walk your formatted string and extract that data.
Note that it won't always be possible to go backwards. Sometimes the formatted string can have characters the match the format specifiers, making it difficult to impossible for the program to know what the original looked like. As I think about it, you might have better luck walking the original string to turn it into a regular expression, and then use that to do the match.
I'd also recommend renaming your method to InvertFormat(), because ReverseFormat sounds like you'd expect this output:
.)s(woc 2 evah .regarts si eman yM
I don't believe there's anything in-box to support this, but in C#, you can pass an array of objects directly to any method taking params-marked array parameters, such as String.Format(). Other than that, I don't believe there's some way for C# & the .NET Framework to know that string X was built from magic format string Y and undo the merge.
Therefore, the only thing I can think of is that you could format your code thusly:
object[] parts = {"strager", 2};
string s = String.Format(formatString, parts);
// Later on use parts, converting each member .ToString()
foreach (object p in parts)
{
Console.WriteLine(p.ToString());
}
Not ideal, and probably not quite what you're looking for, but I think it's the only way.