Extracting a complicated date / time format out of a string - c#

I am trying to find the best possible way to extract a Date and Time string that is stored in a very very strange format out of a file name (string) that was retrieved from an FTP file listing.
The string is as follows:
-rwxr-xr-x 1 ftp ftp 267662 Jun 06 09:13 VendorInventory_20130606_021303.txt\r
The specific data I am trying to extract is 20130606_021303. 021303 is formatted as hours, seconds and milliseconds. DateTime.Parse and DateTime.ParseExact are not willing to cooperate. Any idea on how to get this up and running?

Looks like you've got the entire row of the file listing, including permissions, user, owner, file size, timestamp and filename.
The data you're asking for appears to be just part of the filename. Use some basic string manipulation (Split, Substring, etc...) first. Then when you have just the datetime portion, you can then call DateTime.ParseExact.
Give it a try yourself first. If you run into problems, update your question to show the code you are attempting, and someone will help you further.
...
Oh, fine. What the heck. I'm feeling generous. Here's a one-liner:
string s = // your string as in the question
DateTime dt = DateTime.ParseExact(string.Join(" ", s.Split('_', '.'), 1, 2),
"yyyyMMdd HHmmss", null);
But please, next time, try something on your own first.

UPDATE I assume there is a fixed structure to the file display of the FTP listing, so you could simply use String.Substring to extract the datetime string, and then parse with DateTime.ParseExact:
var s = "-rwxr-xr-x 1 ftp ftp 267662 Jun 06 09:13 VendorInventory_20130606_021303.txt\r";
var datetime = DateTime.ParseExact(s.Substring(72,15),"yyyyMMddHHmmss",null);
Original Answer
Use a regular expression. Try the following:
var s = "-rwxr-xr-x 1 ftp ftp 267662 Jun 06 09:13 VendorInventory_20130606_021303.txt\r";
/*
The following pattern means:
\d{8}) 8 digits (\d), captured in a group (the parentheses) for later reference
_ an underscore
(\d{6}) 6 digits in a group
\. a period. The backslash is needed because . has special meaning in regular expressions
.* any character (.), any number of times (*)
\r carriage return
$ the end of the string
*/
var pattern = #"(\d{8})_(\d{6})\..*\r$";
var match = Regex.Match(s, pattern);
string dateString = matches.Groups[1].Value;
string timeString = matches.Groups[2].Value;
and parse using ParseExact:
var datetime = DateTime.ParseExact(dateString + timeString,"yyyyMMddHHmmss",null);

This might work:
string s = "-rwxr-xr-x 1 ftp ftp 267662 Jun 06 09:13 VendorInventory_20130606_021303.txt\r";
// you might need to adjust the IndexOf method a bit - if the filename/string ever changes...
// or use a regex to check if there's a date in the given string
// however - the first thing to do is extract the dateTimeString:
string dateTimeString = s.Substring(s.IndexOf("_") + 1, 15);
// and now extract the DateTime (you could also use DateTime.TryParseExact)
// this should save you the trouble of substringing and parsing loads of ints manually :)
DateTime dt = DateTime.ParseExact(dateTimeString, "yyyyMMdd_hhmmss", null);

Related

How to parse string into an array using set number of characters in C#

I have some c# code like this:
string myString = "20180426";
I know how to parse around specific characters (using the string.Split thing), but how do I get it to return 3 strings like this:
2018
04
26
I have several strings that are formatted this way ("YYYYMMDD"), so I don't want code that will only work for this specific string. I tried using
var finNum = myString[0] + myString[1] + myString[2] + myString[3];
Console.Write(finNum);
But I guess it's treating the characters as integers, rather than a text string because it's doing some mathematical operation with them instead of concatenating (it's not addition either because it's returning 203, which isn't the sum of 2, 0, 1 and 8).
I've tried changing var to string, but it won't let me implicitly convert int to string. Why does it think that string myString is an int, rather than a string, which is what I declared it as?
I could also use DateTime.Parse and DateTime.ParseExact, but apparently "20180426" isn't recognized as a valid DateTime:
DateTime myDate = DateTime.ParseExact(myString, "YYYYMMDD", null);
Console.WriteLine(myDate);
Thank you for your help. I know the answer is probably stupidly easy and I feel dumb for asking but I seriously checked all over various websites and can't find a solution that works for my issue here.
I could also use DateTime.Parse and DateTime.ParseExact, but
apparently "20180426" isn't recognized as a valid DateTime.
Yes, because the format string YYYYMMDD is incorrect, years and days are lowercase:
DateTime myDate = DateTime.ParseExact(myString, "yyyyMMdd", null);
If you want the year, month and day:
int year = myDate.Year;
int month = myDate.Month;
int day = myDate.Day;
If you want year, month and day separated by variables you could try:
string mystring = "20180426";
mystring = mystring.Insert(4,"-");
mystring = mystring.Insert(7,"-");
string year = mystring.Split('-')[0];
string month = mystring.Split('-')[1];
string day = mystring.Split('-')[2];
First I add a character "-" to separate year and month, then another to separate month and day. You get something like "2018-04-26"
Then I split the string and save the position 0 that store the first 4 numbers of your string into a variable named year.
Good luck!

Parse a String date like this: March 03/12/2016 to just 03/12/2016

I need to do a simple parse the strip out the actual word March (or whatever month it is) from a data saved to a string like this: "March 03/12/2016".
The ending results need to be a string such as: "03/12/2016".
I have been looking through date time formatters and I am not finding a simple method to strip out a month. I was thinking of just cutting the string down to count 11 characters from right to left and then just trimming out the rest but I feel like that is sloppy and there is probably a date format option out there that I'm just not finding.
Any Suggestions?
Just do this:
string input = "March 03/12/2016";
string output = input.Substring(input.IndexOf(' ') + 1);
Another approach:
string result = input.Split(' ')[1];
string input = "March 03/12/2016";
string output;
int index = input.IndexOf(' ');
if(index >= 0) //Checks if there exists a space
{
output = input.Substring(input.IndexOf(' ') + 1);
}
You need to first check if there will always be a space, because if it does not exist it will present problems since input.IndexOf does not have error handling.
Assuming that month name is proper English name for the month, you can use MMMM to extract month name. Then, you can just format the date however you wish.
var date = "March 03/12/2016";
var parsedDate = DateTime.ParseExact(date, "MMMM MM/dd/yyyy", new CultureInfo("en-US"));
Console.WriteLine(parsedDate.ToString("MM/dd/yyyy"));
See in dotnetfiddle.net.
Bear in mind that if months parsed from the date will be different, eg. October 03/12/2016, exception will be thrown.

Extracting Date from File Where File Name Is Variable

I have a series of files that I am attempting to parse the date out of the file name. Here is an example of the files that I am currently trying to parse:
AC SCR063_6.8.15.xlsx
AC SCR064_6.22.15_REVISED.xlsx
AccentCare July 2015 Rent Report 06.26.15 Final.xlsx
AccentCare June 2015 Rent Report 05.26.15 Final.xlsx
In these files, the date will most likely always be in a format of dd.mm.yy or dd.mm.yyyy. I've tried to devise a regex expression to match these dates within the string and I've gotten as far as:
^(\d{1,2})\.(\d{1,2})\.(\d{2,4})$
But due to the variability in the file name and my limited knowledge of regex, I am not sure what else I need to do in order to get this regex to match all of these file name cases. Do I need to create an optional capture group before the date portion of the regex to match anything proceeding it and an optional capture group after it as well to exclude the Final.xlsx or the _REVISED.xlsx etc?
EDIT: I should also note these filenames would also have the proceeding pathing information within the string I would be evaluating, although I am sure I could just get the straight filename another way if it would be easier to evaluate the string that way.
EDIT 2: Desired output would be 6.8.15 or 06.26.15 etc, just the date portion that is in dd.mm.yy format. That way I could cast it to a date time within my application.
So the allowed formats are M.d.yyyyand M.d.yy(not dd.mm.yyyy as stated), i would use DateTime.TryParseExact. For example with this LINQ query:
var fileNames = new string[] { "AC SCR063_6.8.15.xlsx", "AC SCR064_6.22.15_REVISED.xlsx", "AccentCare July 2015 Rent Report 06.26.15 Final.xlsx", "AccentCare June 2015 Rent Report 05.26.15 Final.xlsx" };
string[] allowedFormats = { "M.d.yyyy", "M.d.yy" };
DateTime[] dates = fileNames
.Select(fn => Path.GetFileNameWithoutExtension(fn).Split(' ', '_'))
.Select(arr => arr.Select(s => s.TryGetDateTime(null, allowedFormats))
.FirstOrDefault(dt => dt.HasValue))
.Where(nullableDate => nullableDate.HasValue)
.Select(nullableDate => nullableDate.Value)
.ToArray();
which uses this handy extension method to parse strings to DateTime?:
public static DateTime? TryGetDateTime(this string item, DateTimeFormatInfo dfi, params string[] allowedFormats)
{
if (dfi == null) dfi = DateTimeFormatInfo.InvariantInfo;
DateTime dt;
bool success = DateTime.TryParseExact(item, allowedFormats, dfi, DateTimeStyles.None, out dt);
if (success) return dt;
return null;
}
Result is:
08.06.2015 00:00:00 System.DateTime
22.06.2015 00:00:00 System.DateTime
26.06.2015 00:00:00 System.DateTime
26.05.2015 00:00:00 System.DateTime
That roughly looks correct, but you have a start of line and end of line check in your regex (the ^ at the start and the $ at the end).
Try this: (\d{1,2})\.(\d{1,2})\.(\d{2,4})
This works with your examples :
[a-zA-Z\d\s]+(?:_|\s)(\d{1,2}\.\d{1,2}\.\d{2,4})
Demo here : https://regex101.com/r/hA6dQ3/1

Regex.Replace - pattern for correcting date format

I am trying to format date entered by user.
Dates are provided in the following format:
d/M/yyyy (ie: 1/1/2012, but could be also 12/1/2012 or 1/12/2012)
however I need them converted to:
dd/MM/yyyy (ie: 01/01/2012)
I managed to do it in non-Regex way, like this:
string date = "1/1/2012";
if (date.IndexOf("/") == 1)
{
date = "0" + date;
}
if (date.Substring(4, 1) == "/")
{
date = date.Insert(3, "0");
}
I would really like to know how to do it with Regex.Replace, however, as it would probably be neater.
I tired different variations of the below:
string date = "1/1/2012"
date = Regex.Replace(date, #"\d{1}/", "0$&");
The above will work, but if the date is 12/1/2012, it will also make 102 out of 12. If I add ^ at the beginning of pattern I don't get the second number changed. I also tried combinations with [^|/] at the beginning, but also no luck. So at the moment it is either or.
Use word boundary \b which matches between a word character and a non-word character.
date = Regex.Replace(date, #"\b\d/", "0$&");
OR
date = Regex.Replace(date, #"\b(\d)/", "0$1/");
DEMO
If you're sure of the incoming format, I'd use DateTime.ParseExact instead, then use .ToString() to reformat the date:
DateTime dt = DateTime.ParseExact(input, "d/M/yyyy", CultureInfo.CurrentCulture);
string reformatted = dt.ToString("dd/MM/yyyy");

C# seconds in string format to TimeSpan

I'm having a bit of a issue with this.
What I want to do is take this string 27.0 and convert it to a timespan.
I tried every way I could think of in order to get it to work.
TimeSpan.Parse("27.0") I know it's a format issue but I'm not sure of the format to use.
I basically have 4 values
27.0
52.4
1:24.4
1:43.3
Is there a easy way to handle all these formats?
Thanks!
Sorry these are all seconds except the 1 is minute so 1 minute 24 seconds 4 milliseconds
You can use two different approaches. Use one of the TimeSpan.From...() methods. Those convert numbers to a TimeSpan. For example to convert the double 27 to a TimeSpan with 27 seconds you use
var ts = TimeSpan.FromSeconds(27)
The only problem you will face here is that it does not allow you to specify a string. So you could for example first parse your string as an double. If you do it naivly just like that, it can be you get what you wanted, or not.
var ts = TimeSpan.FromSeconds(double.Parse("27.0"))
But if you run this for example on a system with a German locale you will get a TimeSpan with 4 minutes and 30 seconds. The reason for that is that a dot in German is not a divider for a number, it is the thousand seperator. So that number is parsed as "270". So to be safe you should also provide a NumberFormat. A better way would be.
var culture = new CultureInfo("en-US");
var tsc = TimeSpan.FromSeconds(double.Parse("27.0", culture.NumberFormat));
Now you get your 27 seconds. But the problem is still that it only parses your two first strings correctly. Your other 3 strings will still not parse, because you can't convert them to numbers. But I still added this, to be aware of culture difference if you just go up and try to parse a number to an double und use TimeSpan.FromSeconds() and so on.
Now lets look further how you can parse every string. There exists TimeSpan.Parse() and TimeSpan.ParseExact().
Now you still must knew that TimeSpan.Parse() uses culture specific formatting. In a country where a time is not separated with colons a TimeSpan.Parse() will fail. On Top of that, TimeSpan assumes a format "hh:mm" at minimum. But the Colon in this format is culture-sensitive. You could use the "en-US" Culture once again, but it wouldn't solve the problem because he doesn't accept the format "27.0".
That is the reason why you must use the TimeSpan.ParseExact() method and and provide the formats that this method should be able to parse. It also allows you to specify formats that he should be able to parse. You now should end with something like this.
var culture = new CultureInfo("en-US");
var formats = new string[] {
#"s\.f",
#"ss\.f",
#"ss\.ff",
#"m\:ss\.f",
#"m\:ss\.ff",
#"mm\:ss\.ff"
};
foreach ( var str in new string[] { "27.0", "52.4", "1:24.4", "1:43.3" } ) {
var ts = TimeSpan.ParseExact(str, formats, culture.NumberFormat);
Console.WriteLine(ts.ToString());
}
Note that in this example I added a backslash to escape the dot and the colon. If you don't do this then the formatter itself treats this as a culture-sensitive separator. But what you want is exactly the colon or the dot.
The output of this code will be
00:00:27
00:00:52.4000000
00:01:24.4000000
00:01:43.3000000
try something like this:
var timeString = "1:24.4";
var timeComponents = timeString.Split(':', '.').Reverse().ToList();
var milliseconds = timeComponents.Any() ? int.Parse(timeComponents[0]) : 0;
var seconds = timeComponents.Count() > 1 ? int.Parse(timeComponents[1]) : 0;
var minutes = timeComponents.Count() > 2 ? int.Parse(timeComponents[2]) : 0;
var timeSpan = new TimeSpan(0, 0, minutes, seconds, milliseconds);
this will deal with the milliseconds literally. You may want to pad the string component of the milliseconds with '0's, as pointed out in the comments.

Categories

Resources