I'm trying to parse a date-formatted file name, e.g.
C:\Documents/<yyyy>\<MMM>\Example_CSV_<ddMM>.csv
and return "Todays" filename.
So for the example above, I would return (for 9th August 2013),
C:\Documents\2013\Aug\Example_CSV_0908.csv
I wondered if Regex would work, but I'm just having a mental block as to how to approach it!
I can't just replace the xth to yth sections with the date, as the files I will be processing are stored in different folders all over the system (not my idea). All of the date codes will be contained in <> however, so as far as I'm aware, I couldn't do something like
Return DateTime.Today.ToString(RawFileName);
Plus I imagine it would have unintended consequences if a part of the ordinary filename could be interpreted as a date code!
If someone could give me a pointer in the right direction, that would be great. If you need a little bit more context, here is the class that will contain this method:
public class ImportSetting
{
public string ID { get; private set; }
public List<ImportMapping> Mappings { get; set; }
public string RawFileName { get; set; }
public string GetFileName()
{
string ToFormat = RawFileName; //e.g. C:\Documents/<yyyy>\<MMM>\Example_CSV_<ddMM>.csv
//Do some clever stuff.
return ToFormat; //C:\Documents\2013\Aug\Example_CSV_0908.csv
}
public int GetCSVColumn(string AttributeName) { return Mappings.First(x => x.Attribute == AttributeName).ColumnID; }
public ImportSetting(string Name)
{
ID = Name;
Mappings = new List<ImportMapping>();
}
}
Thankyou very much for your help!
There is no need to replace anything in the text as you can use the Date.ToString() method with a format string like this:
public string GetFileName(DateTime date)
{
string format = #"'C:\\Documents'\\yyyy\\MMM'\\Example_CSV_'ddMM'.csv'";
return date.ToString(format);
}
Call GetFileName with today's date:
Console.WriteLine(GetFileName(DateTime.Now));
Output:
C:\Documents\2013\Aug\Example_CSV_0908.csv
Anything that you don't want to be parsed as a date, put in single quotes ' to have it parsed as a string literal. A full list of the date format strings can be found here: http://msdn.microsoft.com/en-us/library/8kb3ddd4.aspx
var path = new Regex("<([dMy]+)>").Replace(pathFormat, o => DateTime.Now.ToString(o.Groups[1].Value));
Nb: Add all the possible letters/symbols that could occure within the square brackets.
Nb2: This will however not restrict weird DateTime strings. If you want to ensure a uniformed format, you could make a more restrictive Regex like so :
var path = new Regex("<(ddMM)|(MMM)|(yyyy)>").Replace(pathFormat, o => DateTime.Now.ToString(o.Groups[1].Value));
Edit: Gotta love one-liners :)
What you could do (although I can't imagen this is a real scenario but that might be my lacking imagenation is the following regex;
<([fdDmMyYs]+?)>
This will give you any matches within the < and > symbols, as short as possible so in testing for me it returned;
Then strip the first and last symbol, or use some fancier regex functions to do this for you.
Then just use the DateTime.Now.ToString(RegexMatchWithout<> here)
And replace the match with the output.
So a code example (untested, but i'm feeling confident ;-)) would be:
public string GetFileName(string fileName)
{
Regex regex = new Regex(#"<([fdDmMyYs]+?)>");
foreach(Match m in regex.Matches(fileName))
{
fileName = fileName.Replace(m.Value, DateTime.Now.ToString(m.Value.Substring(1, m.Value.Length - 2)));
}
return fileName;
}
Related
I have the following string
string a = #"\\server\MainDirectory\SubDirectoryA\SubdirectoryB\SubdirectoryC\Test.jpg";
I'm trying to remove part of the string so in the end I want to be left with
string a = #"\\server\MainDirectory\SubDirectoryA\SubdirectoryB";
So currently I'm doing
string b = a.Remove(a.LastIndexOf('\\'));
string c = b.Remove(b.LastIndexOf('\\'));
Console.WriteLine(c);
which gives me the correct result. I was wondering if there is a better way of doing this? because I'm having to do this in a fair few places.
Note: the SubdirectoryC length will be unknown. As it is made of the numbers/letters a user inputs
There is Path.GetDirectoryName
string a = #"\\server\MainDirectory\SubDirectoryA\SubdirectoryB\SubdirectoryC\Test.jpg";
string b = Path.GetDirectoryName(Path.GetDirectoryName(a));
As explained in MSDN it works also if you pass a directory
....passing the returned path back into the GetDirectoryName method will
result in the truncation of one folder level per subsequent call on
the result string
Of course this is safe if you have at least two directories level
Heyho,
if you just want to get rid of the last part.
You can use :
var parentDirectory = Directory.GetParent(Path.GetDirectoryName(path));
https://msdn.microsoft.com/de-de/library/system.io.directory.getparent(v=vs.110).aspx
An alternative answer using Linq:
var b = string.Join("\\", a.Split(new string[] { "\\" }, StringSplitOptions.None)
.Reverse().Skip(2).Reverse());
Some alternatives
string a = #"\\server\MainDirectory\SubDirectoryA\SubdirectoryB\SubdirectoryC\Test.jpg";
var b = Path.GetFullPath(a + #"\..\..");
var c = a.Remove(a.LastIndexOf('\\', a.LastIndexOf('\\') - 1));
but I do find this kind of string extensions generally usefull:
static string beforeLast(this string str, string delimiter)
{
int i = str.LastIndexOf(delimiter);
if (i < 0) return str;
return str.Remove(i);
}
For such repeated tasks, a good solution is often to write an extension method, e.g.
public static class Extensions
{
public static string ChopPath(this string path)
{
// chopping code here
}
}
Which you then can use anywhere you need it:
var chopped = a.ChopPath();
Is there a format string for the C# string.Format method that picks a substring from the corresponding argument? Like so:
var lang1 = "EN";
var lang2 = "FR";
var shortFormat = "Foo-{0:0-0}.pdf";
var longFormat = "Foo-{0:0-1}.pdf";
string.Format(shortFormat, lang1) // Foo-E.pdf
string.Format(shortFormat, lang2) // Foo-F.pdf
string.Format(longFormat, lang1) // Foo-EN.pdf
string.Format(longFormat, lang2) // Foo-FR.pdf
To anticipate a few comments: Yes, I know the Substring method. I have also read that string.Format is slower than a simple Substring. The example above is heavily simplified. Imagine that the string.Format statement resides in one place, while the lang1/lang2 argument is an input from another place and the shortFormat/longFormat is defined in a resx file.
That is, in the place where the format is to be defined we don't know anything about the value being formatted (lang1/lang2 in the example) nor do we have any means to execute C# code. Hence we can't call any method such as Substring on the value. At the place where the formatting code runs, in turn, we take the format as a parameter, so we can't simply perform a Substring on the value because we don't know whether the format requires it or not (except if we inspect the format).
No, the string.Format does not have this feature, which is better explained here: Can maximum number of characters be defined in C# format strings like in C printf?
If you don't want to use Substring I would create an extension class for string like this: http://msdn.microsoft.com/en-us/library/bb311042.aspx
namespace CustomExtensions
{
public static class StringExtension
{
public static string ShortFormat(this string str)
{
// manipulate and return str here
}
public static string LongFormat(this string str)
{
// manipulate and return str here
}
}
}
XSLT formatting can be an option: user gets ability to provide almost everything in configuration file and even execute custom c# code in your domain if it is required.
Please also consider that changes of format can be restricted to relatively small amount of actions: crop, pad or insert one or two things in some positions. Each one can be set as individual function and provided with own parameters.
There are two ways to provide custom formatting. You can either implement IFormattable on a custom type to control how that type is always formatted, or implement IFormatProvider to override how other types are formatted in specific cases.
In your case I would suggest creating a new type to encapsulate how your software deals with language codes;
public struct LanguageCode : IFormattable {
public readonly string Code;
public LanguageCode(string code) {
Code = code;
}
public override string ToString()
=> this.ToString("L", CultureInfo.CurrentCulture);
public string ToString(string format)
=> this.ToString(format, CultureInfo.CurrentCulture);
public string ToString(string format, IFormatProvider provider){
if (String.IsNullOrEmpty(format))
format = "L";
if (provider == null)
provider = CultureInfo.CurrentCulture;
switch (format.ToUpperInvariant()){
case "L": // Long
return Code.ToString(provider);
case "S": // Short
return Code.SubString(0,1).ToString(provider);
default:
throw new FormatException($"The {format} format string is not supported.");
}
}
public static implicit operator LanguageCode(string code)
=> new LanguageCode(code);
public static implicit operator string(LanguageCode language)
=> language.Code;
}
Then from your example;
var lang1 = (LanguageCode)"EN";
LanguageCode lang2 = "FR";
var shortFormat = "Foo-{0:S}.pdf";
var longFormat = "Foo-{0:L}.pdf";
I am trying the following stemming class :
static class StemmerSteps
{
public static string stepSufixremover(this string str, string suffex)
{
if (str.EndsWith(suffex))
{
................
}
return str;
}
public static string stepPrefixemover(this string str, string prefix)
{
if (str.StartsWith(prefix)
{
.....................
}
return str;
}
}
this class works with one prefix or suffix. is there any suggestion to allow a list of prefixes or suffixes to go through the class and compare against each (str). your kind action really appreciated.
Instead of creating your own class from scratch (unless this is homework) I would definitive use an existing library. This answer provides an example of code that that implements the Porter Stemming Algorithm:
https://stackoverflow.com/questions/7611455/how-to-perform-stemming-in-c
Put your suffix/prefixes in a collection (like a List<>), and loop through and apply each possible one. This collection would need to be passed into the method.
List<string> suffixes = ...;
for (suffix in suffixes)
if (str.EndsWith(suffix))
str = str.Remove(str.Length - suffix.Length, suffix.Length);
EDIT
Considering your comment:
"just want to look if the string starts-/endswith any of the passed strings"
may be something like this can fit your needs:
public static string stepSufixremover(this string str, IEnumerable<string> suffex)
{
string suf = suffex.Where(x=>str.EndsWith(x)).SingleOrDefault();
if(!string.IsNullOrEmpty(suf))
{
str = str.Remove(str.Length - suf.Length, suf.Length);
}
return str;
}
If you use this like:
"hello".stepone(new string[]{"lo","l"}).Dump();
it produces:
hel
The simplest code would involve regular expressions.
For example, this would identify some English suffixes:
'^(.*?)(ing|ly|ed|ious|ies|ive|es|s|ment)?$'
One problem is that stemming is not as accurate as lemmatization. Lematization would require POS tagging for accuracy. For example, you don't want to add an -ing suffix to dove if it's a noun.
Another problem is that some suffixes also require prefixes. For example, you must add en- to -rich- to add a -ment suffix in en-rich-ment -- unlike a root like -govern- where you can add the suffix without any prefix.
We want to show some JSON to a user who is testing our application. So we call our REST service in the ASP.NET code behind file and return a string, which holds a lot of JSON.
We then put it in a PRE element in the page, call beautify to create nice readable JSON and all is good: sort of human readable content is shown.
Good but for one thing: all the dates are shown in the normal JSON format like this "/Date(1319266795390+0800)/"
What I want to do is replace those JSON dates with 'normal' dates, in the JSON (C#) string, so in the code behind that is, before I add the string to the PRE element.
I was thinking about some regex, but i couldn't figure out how...
I'v been dealing with dates in JSON string for some time now, there's no standard way for that and which is why there are so many different ways to do it! Maybe it was better if JSON specification could specify an standard format for dates in the first place!
Microsoft is doing it in its own way, counting the msecs since 1970 in UTC format this is something like "/Date(1319266795390+0800)/"
We've been changing the above string to ISO-8601 format ever since using Regular Expressions on top of ASP.Net JavaScriptSerializer output. It is a W3C standard, human readable and the way most browsers serialize Date to string, here's how:
static readonly long DATE1970_TICKS = new DateTime(1970, 1, 1, 0, 0, 0, 0, DateTimeKind.Utc).Ticks;
static readonly Regex DATE_SERIALIZATION_REGEX = new Regex(#"\\/Date\((?<ticks>-?\d+)\)\\/", RegexOptions.Compiled);
static string ISO8601Serialization(string input)
{
return DATE_SERIALIZATION_REGEX.Replace(input, match =>
{
var ticks = long.Parse(match.Groups["ticks"].Value) * 10000;
return new DateTime(ticks + DATE1970_TICKS).ToLocalTime().ToString("yyyy-MM-ddTHH:mm:ss.fff");
});
}
You can easily change the format to satisfy your needs, to see custom Date and Time formats check out MSDN article here
Here's how it's used:
JavaScriptSerializer ser = new JavaScriptSerializer();
var JsonSrt = ISO8601Serialization(ser.Serialize(DateTime.Now)); // "\"2012-05-09T14:51:38.333\""
Update:
There's an alternative to tweak the JSON string returned from the server in JavaScript to more readable form using Regex:
var str = "/Date(1319266795390+0800)/";
str.replace(/\/Date\((\d+)\+\d+\)\//, function (str, date) {
return new Date(Number(date)).toString();
});
The solution is within the string shown in the question. The JavaScript Date object will parse that format and produce a readable version so Date(1319266795390+0800) returns "Wed Apr 18 2012 08:13:22 GMT-0500 (Central Daylight Time)".
To remove the forward slash from the string you could use the replace function with a regular expression: "/Date(1319266795390+0800)/".replace(/\//g, '').
You can use this:
string date = "/Date(1319266795390+0800)/";
string regex = #"/Date\((.*?)\+(.*?)\)/";
Match match = Regex.Match(date, regex);
DateTime d = new DateTime(1970, 01, 01).AddMilliseconds(long.Parse(match.Result("$1")));
suppose the class you want to serialize looks like this:
public class Something
{
public int ID;
public string Name;
public DateTime Date;
}
change it to:
public class Something
{
public int ID;
public string Name;
public DateTime Date;
public string HumanReadableDate { get { return Date.ToLongDateString(); } }
}
or, if you want that extra property to display only in test enviroment:
public class Something
{
public int ID;
public string Name;
public DateTime Date;
#if DEBUG
public string HumanReadableDate { get { return Date.ToLongDateString(); } }
#endif
}
also, instead of .ToLongDateString() you can use .ToString("yyyy-MM-dd HH:mm") or any other format
Use as regex something like:
(?<= /Date\( )
(?<ticks>[0-9]+)
((?<zonesign>[+-])
(?<zonehour>[0-9]{2})
(?<zoneminutes>[0-9]{2})
)?
(?= \)/ )
This will match the part inside the parentheses of /Date(1319266795390+0800)/. You can then call Regex.Replace on the whole JSON string to replace the numbers with a nicely formatted DateTime:
Use the Match object you get in the match evaluator delegate and extract the ticks, zonesign, zonehour and zoneminutes part, convert it to integers.
Then convert the javascript ticks to .NET ticks (should be *10000), construct the .NET DateTime out of ticks and add/substract the hours and minutes for the time zone.
Convert the DateTime to a string and return it as the replacement.
If your JSON is a serialised representation of a .NET class, maybe you could use the DataContractJsonSerializer to deserialise it on the server, or perhaps you could just define a stub class for your JSON object if you don't need a generic solution to handle multiple datasets:
string json = "{\"Test\": \"This is the content\"}";
DataContractJsonSerializer ds = new DataContractJsonSerializer(typeof(TestJson));
var deserialisedContent = ds.ReadObject(new MemoryStream(Encoding.ASCII.GetBytes(json)));
foreach (var field in typeof (TestJson).GetFields())
{
Console.WriteLine("{0}:{1}", field.Name, field.GetValue(deserialisedContent));
}
...
[DataContract]
private class TestJson
{
[DataMember]
public string Test;
}
Use Newtonsoft.JSON. You can provide your own serializers per type, and serialize dates however you want.
http://james.newtonking.com/projects/json-net.aspx
Make a string property for example dateofbirth I am defining here, and return your datetime variable as:
public string DateOfBirthString
{
get { return DateOfBirth.ToUniversalTime().ToString("yyyy-MM-dd HH:mm:ss"); }
set { DateOfBirth = string.IsNullOrEmpty(value) ? new DateTime(1900, 1, 1) : Convert.ToDateTime(value); }
}
because this will return string so it will be same at client side so and aslo take string dateTime from user and convert it.
string input = [yourjsonstring];
MatchEvaluator me = new MatchEvaluator(MTListServicePage.MatchDate);
string json = Regex.Replace(input, "\\\\/\\Date[(](-?\\d+)[)]\\\\/", me, RegexOptions.None)
I am using C# 2.0 and I have got below type of strings:
string id = "tcm:481-191820"; or "tcm:481-191820-32"; or "tcm:481-191820-8"; or "tcm:481-191820-128";
The last part of string doesn't matter i.e. (-32,-8,-128), whatever the string is it will render below result.
Now, I need to write one function which will take above string as input. something like below and will output as "tcm:0-481-1"
public static string GetPublicationID(string id)
{
//this function will return as below output
return "tcm:0-481-1"
}
Please suggest!!
If final "-1" is static you could use:
public static string GetPublicationID(string id)
{
int a = 1 + id.IndexOf(':');
string first = id.Substring(0, a);
string second = id.Substring(a, id.IndexOf('-') - a);
return String.Format("{0}0-{1}-1", first, second);
}
or if "-1" is first part of next token, try this
public static string GetPublicationID(string id)
{
int a = 1 + id.IndexOf(':');
string first = id.Substring(0, a);
string second = id.Substring(a, id.IndexOf('-') - a + 2);
return String.Format("{0}0-{1}", first, second);
}
This syntax works even for different length patterns, assuming that your string is
first_part:second_part-anything_else
All you need is:
string.Format("{0}0-{1}", id.Substring(0,4), id.Substring(4,5));
This just uses substring to get the first four characters and then the next five and put them into the format with the 0- in there.
This does assume that your format is a fixed number of characters in each position (which it is in your example). If the string might be abcd:4812... then you will have to modify it slightly to pick up the right length of strings. See Marco's answer for that technique. I'd advise using his if you need the variable length and mine if the lengths stay the same.
Also as an additional note your original function of returning a static string does work for all of those examples you provided. I have assumed there are other numbers visible but if it is only the suffix that changes then you could happily use a static string (at which point declaring a constant or something rather than using a method would probably work better).
Obligatory Regular Expression Answer:
using System.Text.RegularExpressions;
public static string GetPublicationID(string id)
{
Match m = RegEx.Match(#"tcm:([\d]+-[\d]{1})", id);
if(m.Success)
return string.Format("tcm:0-{0}", m.Groups[1].Captures[0].Value.ToString());
else
return string.Empty;
}
Regex regxMatch = new Regex("(?<prefix>tcm:)(?<id>\\d+-\\d)(?<suffix>.)*",RegexOptions.Singleline|RegexOptions.Compiled);
string regxReplace = "${prefix}0-${id}";
string GetPublicationID(string input) {
return regxMatch.Replace(input, regxReplace);
}
string test = "tcm:481-191820-128";
stirng result = GetPublicationID(test);
//result: tcm:0-481-1