What's a good way of doing string templating in .NET?

What's a good way of doing string templating in .NET? - c#

I need to send email notifications to users and I need to allow the admin to provide a template for the message body (and possibly headers, too).
I'd like something like string.Format that allows me to give named replacement strings, so the template can look like this:
Dear {User},
Your job finished at {FinishTime} and your file is available for download at {FileURL}.
Regards,
--
{Signature}
What's the simplest way for me to do that?

Here is the version for those of you who can use a new version of C#:
// add $ at start to mark string as template
var template = $"Your job finished at {FinishTime} and your file is available for download at {FileURL}."
In a line - this is now a fully supported language feature (string interpolation).

You can use the "string.Format" method:
var user = GetUser();
var finishTime = GetFinishTime();
var fileUrl = GetFileUrl();
var signature = GetSignature();
string msg =
#"Dear {0},
Your job finished at {1} and your file is available for download at {2}.
Regards,
--
{3}";
msg = string.Format(msg, user, finishTime, fileUrl, signature);
It allows you to change the content in the future and is friendly for localization.

Use a templating engine. StringTemplate is one of those, and there are many.
Example:
using Antlr.StringTemplate;
using Antlr.StringTemplate.Language;
StringTemplate hello = new StringTemplate("Hello, $name$", typeof(DefaultTemplateLexer));
hello.SetAttribute("name", "World");
Console.Out.WriteLine(hello.ToString());

I wrote a pretty simple library, SmartFormat which meets all your requirements. It is focused on composing "natural language" text, and is great for generating data from lists, or applying conditional logic.
The syntax is extremely similar to String.Format, and is very simple and easy to learn and use. Here's an example of the syntax from the documentation:
Smart.Format("{Name}'s friends: {Friends:{Name}|, |, and}", user)
// Result: "Scott's friends: Michael, Jim, Pam, and Dwight"
The library has great error-handling options (ignore errors, output errors, throw errors) and is open source and easily extensible, so you can also enhance it with additional features too.

Building on Benjamin Gruenbaum's answer, in C# version 6 you can add a # with the $ and pretty much use your code as it is, e.g.:
var text = $#"Dear {User},
Your job finished at {FinishTime} and your file is available for download at {FileURL}.
Regards,
--
{Signature}
";
The $ is for string interpolation: https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/tokens/interpolated
The # is the verbatim identifier: https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/tokens/verbatim
...and you can use these in conjunction.
:o)

A very simple regex-based solution. Supports \n-style single character escape sequences and {Name}-style named variables.
Source
class Template
{
/// <summary>Map of replacements for characters prefixed with a backward slash</summary>
private static readonly Dictionary<char, string> EscapeChars
= new Dictionary<char, string>
{
['r'] = "\r",
['n'] = "\n",
['\\'] = "\\",
['{'] = "{",
};
/// <summary>Pre-compiled regular expression used during the rendering process</summary>
private static readonly Regex RenderExpr = new Regex(#"\\.|{([a-z0-9_.\-]+)}",
RegexOptions.IgnoreCase | RegexOptions.Compiled);
/// <summary>Template string associated with the instance</summary>
public string TemplateString { get; }
/// <summary>Create a new instance with the specified template string</summary>
/// <param name="TemplateString">Template string associated with the instance</param>
public Template(string TemplateString)
{
if (TemplateString == null) {
throw new ArgumentNullException(nameof(TemplateString));
}
this.TemplateString = TemplateString;
}
/// <summary>Render the template using the supplied variable values</summary>
/// <param name="Variables">Variables that can be substituted in the template string</param>
/// <returns>The rendered template string</returns>
public string Render(Dictionary<string, object> Variables)
{
return Render(this.TemplateString, Variables);
}
/// <summary>Render the supplied template string using the supplied variable values</summary>
/// <param name="TemplateString">The template string to render</param>
/// <param name="Variables">Variables that can be substituted in the template string</param>
/// <returns>The rendered template string</returns>
public static string Render(string TemplateString, Dictionary<string, object> Variables)
{
if (TemplateString == null) {
throw new ArgumentNullException(nameof(TemplateString));
}
return RenderExpr.Replace(TemplateString, Match => {
switch (Match.Value[0]) {
case '\\':
if (EscapeChars.ContainsKey(Match.Value[1])) {
return EscapeChars[Match.Value[1]];
}
break;
case '{':
if (Variables.ContainsKey(Match.Groups[1].Value)) {
return Variables[Match.Groups[1].Value].ToString();
}
break;
}
return string.Empty;
});
}
}
Usage
var tplStr1 = #"Hello {Name},\nNice to meet you!";
var tplStr2 = #"This {Type} \{contains} \\ some things \\n that shouldn't be rendered";
var variableValues = new Dictionary<string, object>
{
["Name"] = "Bob",
["Type"] = "string",
};
Console.Write(Template.Render(tplStr1, variableValues));
// Hello Bob,
// Nice to meet you!
var template = new Template(tplStr2);
Console.Write(template.Render(variableValues));
// This string {contains} \ some things \n that shouldn't be rendered
Notes
I've only defined \n, \r, \\ and \{ escape sequences and hard-coded them. You could easily add more or make them definable by the consumer.
I've made variable names case-insensitive, as things like this are often presented to end-users/non-programmers and I don't personally think that case-sensitivity make sense in that use-case - it's just one more thing they can get wrong and phone you up to complain about (plus in general if you think you need case sensitive symbol names what you really need are better symbol names). To make them case-sensitive, simply remove the RegexOptions.IgnoreCase flag.
I strip invalid variable names and escape sequences from the result string. To leave them intact, return Match.Value instead of the empty string at the end of the Regex.Replace callback. You could also throw an exception.
I've used {var} syntax, but this may interfere with the native interpolated string syntax. If you want to define templates in string literals in you code, it might be advisable to change the variable delimiters to e.g. %var% (regex \\.|%([a-z0-9_.\-]+)%) or some other syntax of your choosing which is more appropriate to the use case.

You could use string.Replace(...), eventually in a for-each through all the keywords. If there are only a few keywords you can have them on a line like this:
string myString = template.Replace("FirstName", "John").Replace("LastName", "Smith").Replace("FinishTime", DateTime.Now.ToShortDateString());
Or you could use Regex.Replace(...), if you need something a bit more powerful and with more options.
Read this article on codeproject to view which string replacement option is fastest for you.

In case someone is searching for an alternative -- an actual .NET one:
https://github.com/crozone/FormatWith | https://www.nuget.org/packages/FormatWith
A nice simple extendable solution. Thank you crozone!
So using the string extension provided in FormatWith here are two examples:
static string emailTemplate = #"
Dear {User},
Your job finished at {FinishTime} and your file is available for download at {FileURL}.
--
{Signature}
";
//////////////////////////////////
/// 1. Use a dictionary that has the tokens as keys with values for the replacement
//////////////////////////////////
public void TestUsingDictionary()
{
var emailDictionary = new Dictionary<string, object>()
{
{ "User", "Simon" },
{ "FinishTime", DateTime.Now },
{ "FileUrl", new Uri("http://example.com/dictionary") },
{ "Signature", $"Sincerely,{Environment.NewLine}Admin" }
};
var emailBody = emailTemplate.FormatWith(emailDictionary);
System.Console.WriteLine(emailBody);
}
//////////////////////////////////
/// 2. Use a poco with properties that match the replacement tokens
//////////////////////////////////
public class MessageValues
{
public string User { get; set; } = "Simon";
public DateTime FinishTime { get; set; } = DateTime.Now;
public Uri FileURL { get; set; } = new Uri("http://example.com");
public string Signature { get; set; } = $"Sincerely,{Environment.NewLine}Admin";
}
public void TestUsingPoco()
{
var emailBody = emailTemplate.FormatWith(new MessageValues());
System.Console.WriteLine(emailBody);
}
It allows formatting the replacement inline as well. For example, try changing {FinishTime} to {FinishTime:HH:mm:ss} in emailTemplate.

Actually, you can use XSLT.
You create a simple XML template:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">
<xsl:template match="TETT">
<p>
Dear <xsl:variable name="USERNAME" select="XML_PATH" />,
Your job finished at <xsl:variable name="FINISH_TIME" select="XML_PATH" /> and your file is available for download at <xsl:variable name="FILE_URL" select="XML_PATH" />.
Regards,
--
<xsl:variable name="SIGNATURE" select="XML_PATH" />
</p>
</xsl:template>
Then create a XmlDocument to perform transformation against:
XmlDocument xmlDoc = new XmlDocument();
XmlNode xmlNode = xmlDoc .CreateNode(XmlNodeType.Element, "EMAIL", null);
XmlElement xmlElement= xmlDoc.CreateElement("USERNAME");
xmlElement.InnerXml = username;
xmlNode .AppendChild(xmlElement); ///repeat the same thing for all the required fields
xmlDoc.AppendChild(xmlNode);
After that, apply the transformation:
XPathNavigator xPathNavigator = xmlDocument.DocumentElement.CreateNavigator();
StringBuilder sb = new StringBuilder();
StringWriter sw = new StringWriter(sb);
XmlTextWriter xmlWriter = new XmlTextWriter(sw);
your_xslt_transformation.Transform(xPathNavigator, null, xmlWriter);
return sb.ToString();

Implementing your own custom formatter might be a good idea.
Here's how you do it. First, create a type that defines the stuff you want to inject into your message. Note: I'm only going to illustrate this with the User part of your template...
class JobDetails
{
public string User
{
get;
set;
}
}
Next, implement a simple custom formatter...
class ExampleFormatter : IFormatProvider, ICustomFormatter
{
public object GetFormat(Type formatType)
{
return this;
}
public string Format(string format, object arg, IFormatProvider formatProvider)
{
// make this more robust
JobDetails job = (JobDetails)arg;
switch (format)
{
case "User":
{
return job.User;
}
default:
{
// this should be replaced with logic to cover the other formats you need
return String.Empty;
}
}
}
}
Finally, use it like this...
string template = "Dear {0:User}. Your job finished...";
JobDetails job = new JobDetails()
{
User = "Martin Peck"
};
string message = string.Format(new ExampleFormatter(), template, job);
... which will generate the text "Dear Martin Peck. Your job finished...".

If you need something very powerful (but really not the simplest way) you can host ASP.NET and use it as your templating engine.
You'll have all the power of ASP.NET to format the body of your message.

If you are coding in VB.NET you can use XML literals. If you are coding in C# you can use ShartDevelop to have files in VB.NET in the same project as C# code.

Related

C# : How to parse EDIFACT message using Xml Serializer

i have this kind EDIFACT message.
UNB+IATB:1+NGI+OOS+180918:2003+Export_Dump++TR2+X'
UNH+1+IFLIRR:15:2:1A'
FDR+OM+135+160918'
FDD++INT'
REF'
STX+ACT'
IFD+++C+USD++N'
APD+:::::::ULN:SVO'
DAT+708:160918:0915+707:160918:1055'
STX+FD'
EQP+J+76W::EIFGN+OM'
EQI+++++++:::FGN'
EQD++++++A01'
SSQ+AVIH:5:5::::0:SSR'
SSQ+BIKE:5:5::::0:SSR'
SSQ+BSCT:2:2::::0:SSR+J'
SSQ+BSCT:5:3::::2:SSR+Y'
SSQ+INFT:15:10::::5:SSR'
SSQ+PETC:1:1::::0:SSR+J'
SSQ+PETC:3:3::::0:SSR+Y'
SSQ+POXY:1:1::::0:SSR'
SSQ+SPEQ:5:5::::0:SSR'
SSQ+STCR:0:0::::0:SSR+J'
SSQ+STCR:1:1::::0:SSR+Y'
SSQ+SVAN:1:1::::0:SSR+J'
SSQ+SVAN:3:3::::0:SSR+Y'
SSQ+TVLG:5:5::::0:SSR'
SSQ+TVSM:10:10::::0:SSR'
SSQ+UMNR:5:5::::0:SSR'
SSQ+WCOB:0:0::::0:SSR'
LEG+A01+NXC'
EQI+J:24:S+J:21:A+J:24:O+J:21:E'
This message continues more than about 1 million line.
I have used C# Xml Serializer and successfully parsed this message into XML file. But not correct structure.
Here's my code:
switch (keyword)
{
case "UNB":
parts = specificLine.Split(new char[] { '+', ':' }, StringSplitOptions.RemoveEmptyEntries);
serialization = new XmlSerializer(typeof(UNB));
UNB HeaderText = new UNB(parts[1], parts[2], parts[3], parts[4], parts[5], parts[6]);
writer = XmlWriter.Create(TxtWriter, settings);
serialization.Serialize(writer, HeaderText, EmptyNS);
break;
case "UNH":
parts = specificLine.Split(new char[] { '+', ':' }, StringSplitOptions.RemoveEmptyEntries);
serialization = new XmlSerializer(typeof(UNH));
UNH BodyText = new UNH(parts[1],parts[2],parts[3],parts[4],parts[5]);
writer = XmlWriter.Create(TxtWriter, settings);
serialization.Serialize(writer, BodyText, EmptyNS);
break;
case "FDR":
flightDateInformation Gr0 = new flightDateInformation();
parts = specificLine.Split(new char[] { '+'}, StringSplitOptions.RemoveEmptyEntries);
serialization = new XmlSerializer(typeof(flightDateInformation));
flightDateDesignator fdrbody = new flightDateDesignator(parts[1], parts[2], parts[3]);
Gr0.flightDateDesignator = fdrbody;
writer = XmlWriter.Create(TxtWriter, settings);
serialization.Serialize(writer, Gr0, EmptyNS);
break;
}
and this is my structure class code example:
[XmlRoot(ElementName = "UNB", IsNullable = false), Serializable]
public class UNB
{
[XmlAttribute]
public string identifier;
[XmlAttribute]
public string version;
[XmlAttribute]
public string sender;
[XmlAttribute]
public string recipient;
[XmlAttribute]
public string dateofpreparation;
[XmlAttribute]
public string timeofpreparation;
public UNB(string identifier, string version,string sender, string recipient, string dateofpreparation, string timeofpreparation)
{
this.identifier = identifier;
this.version = version;
this.sender = sender;
this.recipient = recipient;
this.dateofpreparation = dateofpreparation;
this.timeofpreparation = timeofpreparation;
}
public UNB()
{
}
}
And my output XML file like this :
<UNB identifier="IATB" version="1" sender="NGI" recipient="OOS" dateofpreparation="180918" timeofpreparation="2003" /><UNH identifier="1" type="IFLIRR" version="15" release="2" agency="1A" /><flightDateInformation>
<flightDateDesignator airlineCode="OM" flightNumber="135" departureDate="160918" />
</flightDateInformation><flightLevelInfo flightCharacteristics="INT" /><referenceInfomation /><flightFlags statusIndicator="ACT" /><inventoryParametersFD controlType="C" currencyCode="USD" isUnderActiveRevControl="N" /><additionalproductdetails>
<departureLocation>ULN</departureLocation>
<arrivalLocation>SVO</arrivalLocation>
</additionalproductdetails><scheduledTiming>
<qualifier>708</qualifier>
<date>160918</date>
<time>0915</time>
</scheduledTiming><scheduledTiming>
<qualifier>707</qualifier>
<date>160918</date>
<time>1055</time>
</scheduledTiming><dcsInformation statusIndicator="FD" /><aircraftInformation serviceType="J" aircraftType="76W">
<eqtRegistrationNumber>EIFGN</eqtRegistrationNumber>
<aircraftOwner>OM</aircraftOwner>
</aircraftInformation><acvInformation acvCode="FGN" /><saleableConfiguration configurationCode="A01" />
<newSSR quotaCounterName="AVIH">
<maxQuantity>5</maxQuantity>
<availability>5</availability>
<counter>0</counter>
<quotaType>SSR</quotaType>
</newSSR><newSSR quotaCounterName="BIKE">
<maxQuantity>5</maxQuantity>
<availability>5</availability>
<counter>0</counter>
<quotaType>SSR</quotaType>
</newSSR>
<newSSR quotaCounterName="BSCT" cabinCode="J">
<maxQuantity>2</maxQuantity>
<availability>2</availability>
<counter>0</counter>
<quotaType>SSR</quotaType>
</newSSR>
Now my problem is : Yes my code has worked and parsed successfully into XML file. But not as i want. Each node with only 1 line.
It's my wanted structure.
Each node has included to other parent node. Some nodes expand into other nodes. my output XML don't have any parent.
Can i solve this by improving my code or should try different way?
If you have any need more details, please kindly ask me? i will give you more details
UPDATE: I'm resolved this problem.

This question is very broad. Basically you have to understand the format, then write a software to extract and convert it to your desired format. Luckily you are not the first one with this problem and there are openSource solutions available:
Is there any good open source EDIFACT parser in Java?

I would want to see a specification of the input format, not just an example, before tackling this task, especially as the quantity of data to be converted is too large to check the correctness of the result by visual inspection.
I think you are on the right lines, however: first do a crude parse of the input that produces some kind of XML representation. Then use XML tools (specifically, XSLT) to transform this crude XML into the target XML that you actually want.
I can't tell from your "actual output" and the diagram of your "desired output" what the detailed transformation rules are, but it's likely to be some kind of grouping transformation to create a hierarchic structure from a flat structure. That's a common task in XSLT and is best tacked by getting hold of an XSLT 2.0 (or 3.0) processor and using the <xsl:for-each-group> instruction. For example, if your task is to put wrapper elements around adjacent elements having the same name, you could do:
<xsl:for-each-group select="*" group-adjacent="name()">
<xsl:choose>
<xsl:when test="name()="SSR">
<SSR-LIST><xsl:copy-of select="current-group()"/></SSR-LIST>
</xsl:when>
....
<xsl:otherwise>
<xsl:copy-of select="current-group()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
If you want more specific advice on this transformation, I suggest posting a new question with a concrete (and short!) example of the input and output, expressed as XML documents, with a clear relationship between the two.

Replace Date-Code Sections of String

I'm trying to parse a date-formatted file name, e.g.
C:\Documents/<yyyy>\<MMM>\Example_CSV_<ddMM>.csv
and return "Todays" filename.
So for the example above, I would return (for 9th August 2013),
C:\Documents\2013\Aug\Example_CSV_0908.csv
I wondered if Regex would work, but I'm just having a mental block as to how to approach it!
I can't just replace the xth to yth sections with the date, as the files I will be processing are stored in different folders all over the system (not my idea). All of the date codes will be contained in <> however, so as far as I'm aware, I couldn't do something like
Return DateTime.Today.ToString(RawFileName);
Plus I imagine it would have unintended consequences if a part of the ordinary filename could be interpreted as a date code!
If someone could give me a pointer in the right direction, that would be great. If you need a little bit more context, here is the class that will contain this method:
public class ImportSetting
{
public string ID { get; private set; }
public List<ImportMapping> Mappings { get; set; }
public string RawFileName { get; set; }
public string GetFileName()
{
string ToFormat = RawFileName; //e.g. C:\Documents/<yyyy>\<MMM>\Example_CSV_<ddMM>.csv
//Do some clever stuff.
return ToFormat; //C:\Documents\2013\Aug\Example_CSV_0908.csv
}
public int GetCSVColumn(string AttributeName) { return Mappings.First(x => x.Attribute == AttributeName).ColumnID; }
public ImportSetting(string Name)
{
ID = Name;
Mappings = new List<ImportMapping>();
}
}
Thankyou very much for your help!

There is no need to replace anything in the text as you can use the Date.ToString() method with a format string like this:
public string GetFileName(DateTime date)
{
string format = #"'C:\\Documents'\\yyyy\\MMM'\\Example_CSV_'ddMM'.csv'";
return date.ToString(format);
}
Call GetFileName with today's date:
Console.WriteLine(GetFileName(DateTime.Now));
Output:
C:\Documents\2013\Aug\Example_CSV_0908.csv
Anything that you don't want to be parsed as a date, put in single quotes ' to have it parsed as a string literal. A full list of the date format strings can be found here: http://msdn.microsoft.com/en-us/library/8kb3ddd4.aspx

var path = new Regex("<([dMy]+)>").Replace(pathFormat, o => DateTime.Now.ToString(o.Groups[1].Value));
Nb: Add all the possible letters/symbols that could occure within the square brackets.
Nb2: This will however not restrict weird DateTime strings. If you want to ensure a uniformed format, you could make a more restrictive Regex like so :
var path = new Regex("<(ddMM)|(MMM)|(yyyy)>").Replace(pathFormat, o => DateTime.Now.ToString(o.Groups[1].Value));
Edit: Gotta love one-liners :)

What you could do (although I can't imagen this is a real scenario but that might be my lacking imagenation is the following regex;
<([fdDmMyYs]+?)>
This will give you any matches within the < and > symbols, as short as possible so in testing for me it returned;
Then strip the first and last symbol, or use some fancier regex functions to do this for you.
Then just use the DateTime.Now.ToString(RegexMatchWithout<> here)
And replace the match with the output.
So a code example (untested, but i'm feeling confident ;-)) would be:
public string GetFileName(string fileName)
{
Regex regex = new Regex(#"<([fdDmMyYs]+?)>");
foreach(Match m in regex.Matches(fileName))
{
fileName = fileName.Replace(m.Value, DateTime.Now.ToString(m.Value.Substring(1, m.Value.Length - 2)));
}
return fileName;
}

How to read a string containing XML elements without using the XML properties

I'm doing an XML reading process in my project. Where I have to read the contents of an XML file. I have achieved it.
Just out of curiosity, I also tried using the same by keeping the XML content inside a string and then read only the values inside the elemet tag. Even this I have achieved. The below is my code.
string xml = <Login-Form>
<User-Authentication>
<username>Vikneshwar</username>
<password>xxx</password>
</User-Authentication>
<User-Info>
<firstname>Vikneshwar</firstname>
<lastname>S</lastname>
<email>xxx#xxx.com</email>
</User-Info>
</Login-Form>";
XDocument document = XDocument.Parse(xml);
var block = from file in document.Descendants("client-authentication")
select new
{
Username = file.Element("username").Value,
Password = file.Element("password").Value,
};
foreach (var file in block)
{
Console.WriteLine(file.Username);
Console.WriteLine(file.Password);
}
Similarly, I obtained my other set of elements (firstname, lastname, and email). Now my curiosity draws me again. Now I'm thinking of doing the same using the string functions?
The same string used in the above code is to be taken. I'm trying not to use any XMl related classes, that is, XDocument, XmlReader, etc. The same output should be achieved using only string functions. I'm not able to do that. Is it possible?

Don't do it. XML is more complex than can appear the case, with complex rules surrounding nesting, character-escaping, named-entities, namespaces, ordering (attributes vs elements), comments, unparsed character data, and whitespace. For example, just add
<!--
<username>evil</username>
-->
Or
<parent xmlns=this:is-not/the/data/you/expected">
<username>evil</username>
</parent>
Or maybe the same in a CDATA section - and see how well basic string-based approaches work. Hint: you'll get a different answer to what you get via a DOM.
Using a dedicated tool designed for reading XML is the correct approach. At the minimum, use XmlReader - but frankly, a DOM (such as your existing code) is much more convenient. Alternatively, use a serializer such as XmlSerializer to populate an object model, and query that.
Trying to properly parse xml and xml-like data does not end well.... RegEx match open tags except XHTML self-contained tags

You could use methods like IndexOf, Equals, Substring etc. provided in String class to fulfill your needs, for more info Go here,
Using Regex is a considerable option too.
But it's advisable to use XmlDocument class for this purpose.

It can be done without regular expressions, like this:
string[] elementNames = new string[]{ "<username>", "<password>"};
foreach (string elementName in elementNames)
{
int startingIndex = xml.IndexOf(elementName);
string value = xml.Substring(startingIndex + elementName.Length,
xml.IndexOf(elementName.Insert(1, "/"))
- (startingIndex + elementName.Length));
Console.WriteLine(value);
}
With a regular expression:
string[] elementNames2 = new string[]{ "<username>", "<password>"};
foreach (string elementName in elementNames2)
{
string value = Regex.Match(xml, String.Concat(elementName, "(.*)",
elementName.Insert(1, "/"))).Groups[1].Value;
Console.WriteLine(value);
}
Of course, the only recommended thing is to use the XML parsing classes.

Build an extension method that will get the text between tags like this:
public static class StringExtension
{
public static string Between(this string content, string start, string end)
{
int startIndex = content.IndexOf(start) + start.Length;
int endIndex = content.IndexOf(end);
string result = content.Substring(startIndex, endIndex - startIndex);
return result;
}
}

Remove all HTML tags and format text with returns, spaces, etc. using .NET

I have an issue to strip HTML and show as customer formatted text.
For example:
asdas<br/>asdas
So the tag will be replaced by a margin. But I also need to replace margins by spaces and tabs and remove all tags. Are there any examples or done solutions to get just somehow formatted text after HTML tags removal.
Current solution (searching for better and done):
/// <summary>
/// Methods to remove HTML from strings.
/// </summary>
public static class HtmlRemoval
{
/// <summary>
/// Compiled regular expression for performance.
/// </summary>
static Regex _htmlRegex = new Regex("<.*?>", RegexOptions.Compiled);
/// <summary>
/// Remove HTML from string with compiled Regex.
/// </summary>
public static string StripAllTagsRegex(string source)
{
source = HttpUtility.HtmlEncode(source);
return _htmlRegex.Replace(source, string.Empty);
}
public static string ChangeTagsToTextFormat(string source)
{
if (string.IsNullOrEmpty(source))
return source;
source = HttpUtility.HtmlEncode(source);
return source.Replace("<br/>", Environment.NewLine)
.Replace("</div>", Environment.NewLine)
.Replace("</p>", Environment.NewLine);
}
}

I believe HTML Agility Pack is the simplest solution here, especially since your removing (possibly malformed) Html tags. The idea behind the following code is you just take all the nodes, return their InnerText along with a line break ("\n", or whatever formatting you want to do, since you'll have a Collection to work with after using SelectNodes):
private string stripTags(string html)
{
var output = new StringBuilder();
HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//*"))
{
output.AppendLine(node.InnerText + Environment.NewLine);
}
return output.ToString();
}
To get more specific formatting results, simply use different XPath expressions with the SelectNodes method. (The code presented here not actually tested, and you'll probably want something a little more precise)

Don't use regular expressions to parse HTML.
Use something like the HTML Agility Pack instead. Here's an introduction to how to use it.

If you using Microsoft Sharepoint, it may be archived by SPHttpUtility
Example:
using Microsoft.SharePoint;
[Test]
public void RemoveHtml()
{
string textWithHtml = "<div class='ExternalCla48D45'>value</div>";
textWithHtml = SPHttpUtility.ConvertSimpleHtmlToText(multilinetext, -1);
Assert.That(textWithHtml, Is.EqualTo("value"));
}
It very usable with multi-line fields.

Can .NET load and parse a properties file equivalent to Java Properties class?

Is there an easy way in C# to read a properties file that has each property on a separate line followed by an equals sign and the value, such as the following:
ServerName=prod-srv1
Port=8888
CustomProperty=Any value
In Java, the Properties class handles this parsing easily:
Properties myProperties=new Properties();
FileInputStream fis = new FileInputStream (new File("CustomProps.properties"));
myProperties.load(fis);
System.out.println(myProperties.getProperty("ServerName"));
System.out.println(myProperties.getProperty("CustomProperty"));
I can easily load the file in C# and parse each line, but is there a built in way to easily get a property without having to parse out the key name and equals sign myself? The C# information I have found seems to always favor XML, but this is an existing file that I don't control and I would prefer to keep it in the existing format as it will require more time to get another team to change it to XML than parsing the existing file.

No there is no built-in support for this.
You have to make your own "INIFileReader".
Maybe something like this?
var data = new Dictionary<string, string>();
foreach (var row in File.ReadAllLines(PATH_TO_FILE))
data.Add(row.Split('=')[0], string.Join("=",row.Split('=').Skip(1).ToArray()));
Console.WriteLine(data["ServerName"]);
Edit: Updated to reflect Paul's comment.

Final class. Thanks #eXXL.
public class Properties
{
private Dictionary<String, String> list;
private String filename;
public Properties(String file)
{
reload(file);
}
public String get(String field, String defValue)
{
return (get(field) == null) ? (defValue) : (get(field));
}
public String get(String field)
{
return (list.ContainsKey(field))?(list[field]):(null);
}
public void set(String field, Object value)
{
if (!list.ContainsKey(field))
list.Add(field, value.ToString());
else
list[field] = value.ToString();
}
public void Save()
{
Save(this.filename);
}
public void Save(String filename)
{
this.filename = filename;
if (!System.IO.File.Exists(filename))
System.IO.File.Create(filename);
System.IO.StreamWriter file = new System.IO.StreamWriter(filename);
foreach(String prop in list.Keys.ToArray())
if (!String.IsNullOrWhiteSpace(list[prop]))
file.WriteLine(prop + "=" + list[prop]);
file.Close();
}
public void reload()
{
reload(this.filename);
}
public void reload(String filename)
{
this.filename = filename;
list = new Dictionary<String, String>();
if (System.IO.File.Exists(filename))
loadFromFile(filename);
else
System.IO.File.Create(filename);
}
private void loadFromFile(String file)
{
foreach (String line in System.IO.File.ReadAllLines(file))
{
if ((!String.IsNullOrEmpty(line)) &&
(!line.StartsWith(";")) &&
(!line.StartsWith("#")) &&
(!line.StartsWith("'")) &&
(line.Contains('=')))
{
int index = line.IndexOf('=');
String key = line.Substring(0, index).Trim();
String value = line.Substring(index + 1).Trim();
if ((value.StartsWith("\"") && value.EndsWith("\"")) ||
(value.StartsWith("'") && value.EndsWith("'")))
{
value = value.Substring(1, value.Length - 2);
}
try
{
//ignore dublicates
list.Add(key, value);
}
catch { }
}
}
}
}
Sample use:
//load
Properties config = new Properties(fileConfig);
//get value whith default value
com_port.Text = config.get("com_port", "1");
//set value
config.set("com_port", com_port.Text);
//save
config.Save()

Most Java ".properties" files can be split by assuming the "=" is the separator - but the format is significantly more complicated than that and allows for embedding spaces, equals, newlines and any Unicode characters in either the property name or value.
I needed to load some Java properties for a C# application so I have implemented JavaProperties.cs to correctly read and write ".properties" formatted files using the same approach as the Java version - you can find it at http://www.kajabity.com/index.php/2009/06/loading-java-properties-files-in-csharp/.
There, you will find a zip file containing the C# source for the class and some sample properties files I tested it with.
Enjoy!

Yet another answer (in January 2018) to the old question (in January 2009).
The specification of Java properties file is described in the JavaDoc of java.util.Properties.load(java.io.Reader). One problem is that the specification is a bit complicated than the first impression we may have. Another problem is that some answers here arbitrarily added extra specifications - for example, ; and ' are regarded as starters of comment lines but they should not be. Double/single quotations around property values are removed but they should not be.
The following are points to be considered.
There are two kinds of line, natural lines and logical lines.
A natural line is terminated by \n, \r, \r\n or the end of the stream.
A logical line may be spread out across several adjacent natural lines by escaping the line terminator sequence with a backslash character \.
Any white space at the start of the second and following natural lines in a logical line are discarded.
White spaces are space (, \u0020), tab (\t, \u0009) and form feed (\f, \u000C).
As stated explicitly in the specification, "it is not sufficient to only examine the character preceding a line terminator sequence to decide if the line terminator is escaped; there must be an odd number of contiguous backslashes for the line terminator to be escaped. Since the input is processed from left to right, a non-zero even number of 2n contiguous backslashes before a line terminator (or elsewhere) encodes n backslashes after escape processing."
= is used as the separator between a key and a value.
: is used as the separator between a key and a value, too.
The separator between a key and a value can be omitted.
A comment line has # or ! as its first non-white space characters, meaning leading white spaces before # or ! are allowed.
A comment line cannot be extended to next natural lines even its line terminator is preceded by \.
As stated explicitly in the specification, =, : and white spaces can be embedded in a key if they are escaped by backslashes.
Even line terminator characters can be included using \r and \n escape sequences.
If a value is omitted, an empty string is used as a value.
\uxxxx is used to represent a Unicode character.
A backslash character before a non-valid escape character is not treated as an error; it is silently dropped.
So, for example, if test.properties has the following content:
# A comment line that starts with '#'.
# This is a comment line having leading white spaces.
! A comment line that starts with '!'.
key1=value1
key2 : value2
key3 value3
key\
4=value\
4
\u006B\u0065\u00795=\u0076\u0061\u006c\u0075\u00655
\k\e\y\6=\v\a\lu\e\6
\:\ \= = \\colon\\space\\equal
it should be interpreted as the following key-value pairs.
+------+--------------------+
| KEY | VALUE |
+------+--------------------+
| key1 | value1 |
| key2 | value2 |
| key3 | value3 |
| key4 | value4 |
| key5 | value5 |
| key6 | value6 |
| : = | \colon\space\equal |
+------+--------------------+
PropertiesLoader class in Authlete.Authlete NuGet package can interpret the format of the specification. The example code below:
using System;
using System.IO;
using System.Collections.Generic;
using Authlete.Util;
namespace MyApp
{
class Program
{
public static void Main(string[] args)
{
string file = "test.properties";
IDictionary<string, string> properties;
using (TextReader reader = new StreamReader(file))
{
properties = PropertiesLoader.Load(reader);
}
foreach (var entry in properties)
{
Console.WriteLine($"{entry.Key} = {entry.Value}");
}
}
}
}
will generate this output:
key1 = value1
key2 = value2
key3 = value3
key4 = value4
key5 = value5
key6 = value6
: = = \colon\space\equal
An equivalent example in Java is as follows:
import java.util.*;
import java.io.*;
public class Program
{
public static void main(String[] args) throws IOException
{
String file = "test.properties";
Properties properties = new Properties();
try (Reader reader = new FileReader(file))
{
properties.load(reader);
}
for (Map.Entry<Object, Object> entry : properties.entrySet())
{
System.out.format("%s = %s\n", entry.getKey(), entry.getValue());
}
}
}
The source code, PropertiesLoader.cs, can be found in authlete-csharp. xUnit tests for PropertiesLoader are written in PropertiesLoaderTest.cs.

I've written a method that allows emty lines, outcommenting and quoting within the file.
Examples:
var1="value1"
var2='value2'
'var3=outcommented
;var4=outcommented, too
Here's the method:
public static IDictionary ReadDictionaryFile(string fileName)
{
Dictionary<string, string> dictionary = new Dictionary<string, string>();
foreach (string line in File.ReadAllLines(fileName))
{
if ((!string.IsNullOrEmpty(line)) &&
(!line.StartsWith(";")) &&
(!line.StartsWith("#")) &&
(!line.StartsWith("'")) &&
(line.Contains('=')))
{
int index = line.IndexOf('=');
string key = line.Substring(0, index).Trim();
string value = line.Substring(index + 1).Trim();
if ((value.StartsWith("\"") && value.EndsWith("\"")) ||
(value.StartsWith("'") && value.EndsWith("'")))
{
value = value.Substring(1, value.Length - 2);
}
dictionary.Add(key, value);
}
}
return dictionary;
}

Yeah there's no built in classes to do this that I'm aware of.
But that shouldn't really be an issue should it? It looks easy enough to parse just by storing the result of Stream.ReadToEnd() in a string, splitting based on new lines and then splitting each record on the = character. What you'd be left with is a bunch of key value pairs which you can easily toss into a dictionary.
Here's an example that might work for you:
public static Dictionary<string, string> GetProperties(string path)
{
string fileData = "";
using (StreamReader sr = new StreamReader(path))
{
fileData = sr.ReadToEnd().Replace("\r", "");
}
Dictionary<string, string> Properties = new Dictionary<string, string>();
string[] kvp;
string[] records = fileData.Split("\n".ToCharArray());
foreach (string record in records)
{
kvp = record.Split("=".ToCharArray());
Properties.Add(kvp[0], kvp[1]);
}
return Properties;
}
Here's an example of how to use it:
Dictionary<string,string> Properties = GetProperties("data.txt");
Console.WriteLine("Hello: " + Properties["Hello"]);
Console.ReadKey();

The real answer is no (at least not by itself). You can still write your own code to do it.

C# generally uses xml-based config files rather than the *.ini-style file like you said, so there's nothing built-in to handle this. However, google returns a number of promising results.

I don't know of any built-in way to do this. However, it would seem easy enough to do, since the only delimiters you have to worry about are the newline character and the equals sign.
It would be very easy to write a routine that will return a NameValueCollection, or an IDictionary given the contents of the file.

You can also use C# automatic property syntax with default values and a restrictive set. The advantage here is that you can then have any kind of data type in your properties "file" (now actually a class). The other advantage is that you can use C# property syntax to invoke the properties. However, you just need a couple of lines for each property (one in the property declaration and one in the constructor) to make this work.
using System;
namespace ReportTester {
class TestProperties
{
internal String ReportServerUrl { get; private set; }
internal TestProperties()
{
ReportServerUrl = "http://myhost/ReportServer/ReportExecution2005.asmx?wsdl";
}
}
}

There are several NuGet packages for this, but all are currently in pre-release version.
Capgemini.Cauldron.Core.JavaProperties 2.0.39-beta
Kajabity.Tools.Java 0.2.6638.28124
[Update]
As of June 2018, Capgemini.Cauldron.Core.JavaProperties is now in a stable version (version 2.1.0 and 3.0.20).

I realize that this isn't exactly what you're asking, but just in case:
When you want to load an actual Java properties file, you'll need to accomodate its encoding. The Java docs indicate that the encoding is ISO 8859-1, which contains some escape sequences that you might not correctly interpret. For instance look at this SO answer to see what's necessary to turn UTF-8 into ISO 8859-1 (and vice versa)
When we needed to do this, we found an open-source PropertyFile.cs and made a few changes to support the escape sequences. This class is a good one for read/write scenarios. You'll need the supporting PropertyFileIterator.cs class as well.
Even if you're not loading true Java properties, make sure that your prop file can express all the characters you need to save (UTF-8 at least)

No there is not : But I have created one easy class to help :
public class PropertiesUtility
{
private static Hashtable ht = new Hashtable();
public void loadProperties(string path)
{
string[] lines = System.IO.File.ReadAllLines(path);
bool readFlag = false;
foreach (string line in lines)
{
string text = Regex.Replace(line, #"\s+", "");
readFlag = checkSyntax(text);
if (readFlag)
{
string[] splitText = text.Split('=');
ht.Add(splitText[0].ToLower(), splitText[1]);
}
}
}
private bool checkSyntax(string line)
{
if (String.IsNullOrEmpty(line) || line[0].Equals('['))
{
return false;
}
if (line.Contains("=") && !String.IsNullOrEmpty(line.Split('=')[0]) && !String.IsNullOrEmpty(line.Split('=')[1]))
{
return true;
}
else
{
throw new Exception("Can not Parse Properties file please verify the syntax");
}
}
public string getProperty(string key)
{
if (ht.Contains(key))
{
return ht[key].ToString();
}
else
{
throw new Exception("Property:" + key + "Does not exist");
}
}
}
Hope this helps.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

What's a good way of doing string templating in .NET? - c#

If you need something very powerful (but really not the simplest way) you can host ASP.NET and use it as your templating engine. You'll have all the power of ASP.NET to format the body of your message.

If you are coding in VB.NET you can use XML literals. If you are coding in C# you can use ShartDevelop to have files in VB.NET in the same project as C# code.

Related

C# : How to parse EDIFACT message using Xml Serializer

Replace Date-Code Sections of String

How to read a string containing XML elements without using the XML properties

Remove all HTML tags and format text with returns, spaces, etc. using .NET

Can .NET load and parse a properties file equivalent to Java Properties class?

Categories

Resources