Truncating Xhtmlstring in Episerver

Truncating Xhtmlstring in Episerver - c#

I'd need to get a html friendly version of a truncated Xhtmlstring as the tag endings might get clipped when truncated. Any ideas on how to achieve this? I've thought of just getting rid of all tags first and then clipping but is there a solution for this inside episerver or is this just basic string-manipulation with regex?

There is a built-in helper function in the TextIndexer class called StripHtml which can be used to remove any tags to end up with plain text before truncating:
var plainText = TextIndexer.StripHtml(someHtml);
Note that this method can also be used to truncate the string like so:
// Truncate to 150 characters
var truncatedString = TextIndexer.StripHtml(someHtml, 150);
You'll also be able to have a string such as "..." appended to the string if it was truncated.

For valid XHTML you can use the XElement class to simplify things, i.e. you do not care for the occasional regular expression frenzy. The following example should work well for the trivial case when there is only one text-node present:
public class Truncator {
private const String Ellipsis = "…";
private const String EllipsisHtmlEntity = "…";
public static String Truncate(XElement xElement, Int32 length, Boolean useHtmlEntity = false) {
if (ReferenceEquals(xElement, null))
throw new ArgumentException(nameof(xElement));
var textNode =
(XText)
xElement.DescendantNodes()
.FirstOrDefault(node => !ReferenceEquals(node, null) && node.NodeType == XmlNodeType.Text);
if (!ReferenceEquals(textNode, null))
textNode.Value = Truncate(textNode.Value, length);
var truncatedResult = xElement.ToString(SaveOptions.DisableFormatting);
return useHtmlEntity ? truncatedResult.Replace(Ellipsis, EllipsisHtmlEntity) : truncatedResult;
}
public static String Truncate(String str, Int32 length, Boolean useHtmlEntity = false) {
if (String.IsNullOrWhiteSpace(str))
return str;
var truncated = str.Trim().Substring(0, length - 1).Trim();
return String.IsNullOrWhiteSpace(str) || str.Length < length
? str
: $"{truncated}{(useHtmlEntity ? EllipsisHtmlEntity : Ellipsis)}";
}
}
If you have a String to begin with, just XElement.Parse(it) to get the XElement.

Related

Extract multiple values from string using C#

I'am creating my own forum. I've got problem with quoting messages. I know how to add quoting message into text box, but i cannot figure out how to extract values from string after post. In text box i've got something like this:
[quote IdPost=8] Some quoting text [/quote]
[quote IdPost=15] Second quoting text [/quote]
Could You tell what is the easiest way to extract all "IdPost" numbers from string after posting form ?.

by using a regex
#"\[quote IdPost=(\d+)\]"
something like
Regex reg = new Regex(#"\[quote IdPost=(\d+)\]");
foreach (Match match in reg.Matches(text))
{
...
}

var originalstring = "[quote IdPost=8] Some quoting text [/quote]";
//"[quote IdPost=" and "8] Some quoting text [/quote]"
var splits = originalstring.Split('=');
if(splits.Count() == 2)
{
//"8" and "] Some quoting text [/quote]"
var splits2 = splits[1].Split(']');
int id;
if(int.TryParse(splits2[0], out id))
{
return id;
}
}

I do not know exactly what is your string, but here is a regex-free solution with Substring :
using System;
public class Program
{
public static void Main()
{
string source = "[quote IdPost=8] Some quoting text [/quote]";
Console.WriteLine(ExtractNum(source, "=", "]"));
Console.WriteLine(ExtractNum2(source, "[quote IdPost="));
}
public static string ExtractNum(string source, string start, string end)
{
int index = source.IndexOf(start) + start.Length;
return source.Substring(index, source.IndexOf(end) - index);
}
// just another solution for fun
public static string ExtractNum2(string source, string junk)
{
source = source.Substring(junk.Length, source.Length - junk.Length); // erase start
return source.Remove(source.IndexOf(']')); // erase end
}
}
Demo on DotNetFiddle

Converting [string].ToString([custom format])

How can I achieve formatting string to custom format:
int value = 5000;
String.Format("{0:## ###}", value);
value.ToString("##");
but with value as string, without using conversion to number?
something like this:
String.Format("{0:## ###}, "5000");
** UPDATE:
I'm trying to create a generic function:
public string FormatString(string value, string format = "") {
if (value == null){
return "";
}
return String.Format("{0:" + format + "}", value);
}
public bool OtherFunction(id){
var data = dc.GetData(id);
ViewBag.DescriptionText = FormatString(data.Description).Replace("\n", "<br />");
ViewBag.Phone = FormatString(data.Phone, "(##) ####-#####");
ViewBag.City= FormatString(data.City);
[...]
}

I don't think something like this exists. Like Jon said, this was design for numbers.
If you want just "format" with # you could write simple function, something like this
public string FormatString(string value, string format = "")
{
if (String.IsNullOrEmpty(value) || String.IsNullOrEmpty(format))
return value;
var newValue = new StringBuilder(format);
for (int i = 0; i < newValue.Length; i++)
{
if (newValue[i] == '#')
if (value.Length > 0)
{
newValue[i] = value[0];
value = value.Substring(1);
}
else
{
newValue[i] = '0';
}
}
return newValue.ToString();
}
Of course this is very simple one. You will have to check and decide what to do if format is too long (like here: fill with '0') and when he format is too short (here: just 'truncate' rest of value).
But I think you have an idea how to do this.
Somewhere on my disk I have code for something like this: formatting number in special ways/pattern for invoice number. If I will find this, I'll make some post on blog and paste the link

"5000" is a string. The only overload available for string.ToString() is the one with an IFormatProvider [1]. While you could actually implement that, you'll probably end up in something similar to int.Parse() which you don't like.
[1] http://msdn.microsoft.com/de-de/library/29dxe1x2(v=vs.110).aspx

How to use replace only the first occurence of www

In my code behind in C# I have the following code. How do I change the replace so that only
the first occurance of www is replaced?
For example if the User enters www.testwww.com then I should be saving it as testwww.com.
Currently as per the below code it saves as www.com (guess due to substr code).
Please help. Thanks in advance.
private string FilterUrl(string url)
{
string lowerCaseUrl = url.ToLower();
lowerCaseUrl = lowerCaseUrl.Replace("http://", string.Empty).Replace("https://", string.Empty).Replace("ftp://", string.Empty);
lowerCaseUrl = lowerCaseUrl.Replace("www.", string.Empty);
string lCaseUrl = url.Substring(url.Length - lowerCaseUrl.Length, lowerCaseUrl.Length);
return lCaseUrl;
}

As Ally suggested. You are much better off using System.Uri. This also replaces the leading www as you wish.
private string FilterUrl(string url)
{
Uri uri = new UriBuilder(url).Uri; // defaults to http:// if missing
return Regex.Replace(uri.Host, "^www.", "") + uri.PathAndQuery;
}
Edit: The trailing slash is because of the PathAndQuery property. If there was no path you are left with the slash only. Just add another regex replace or string replace. Here's the regex way.
return Regex.Replace(uri.Host, "^www.", "") + Regex.Replace(uri.PathAndQuery, "/$", "");

I would suggest using indexOf(string) to find the first occurrence.
Edit: okay someone beat me to it ;)

You could use IndexOf like Felipe suggested OR do it the low tech way..
lowerCaseUrl = lowerCaseUrl.Replace("http://", string.Empty).Replace("https://", string.Empty).Replace("ftp://", string.Empty).Replace("http://www.", string.Empty).Replace("https://www.", string.Empty)
Would be interested to know what you're trying to achieve.

Came up with a cool static method, also works for replacing the first x occurrences:
public static string ReplaceOnce(this string s, string replace, string with)
{
return s.ReplaceCount(replace, with);
}
public static string ReplaceCount(this string s, string replace, string with, int howManytimes = 1)
{
if (howManytimes < 0) throw InvalidOperationException("can not replace a string less than zero times");
int count = 0;
while (s.Contains(replace) && count < howManytimes)
{
int position = s.IndexOf(replace);
s = s.Remove(position, replace.Length);
s = s.Insert(position, with);
count++;
}
return s;
}
The ReplaceOnce isn't necessary, just a simplifier. Call it like this:
string url = "http://www.stackoverflow.com/questions/www/www";
var urlR1 - url.ReplaceOnce("www", "xxx");
// urlR1 = "http://xxx.stackoverflow.com/questions/www/www";
var urlR2 - url.ReplaceCount("www", "xxx", 2);
// urlR2 = "http://xxx.stackoverflow.com/questions/xxx/www";
NOTE: this is case-sensitive as it is written

The Replace method will change all content of the string. You have to locate the piece you want to remove using IndexOf method, and remove using Remove method of string. Try something like this:
//include the namespace
using System.Globalization;
private string FilterUrl(string url)
{
// ccreate a Comparer object.
CompareInfo myCompare = CultureInfo.InvariantCulture.CompareInfo;
// find the 'www.' on the url parameter ignoring the case.
int position = myCompare.IndexOf(url, "www.", CompareOptions.IgnoreCase);
// check if exists 'www.' on the string.
if (position > -1)
{
if (position > 0)
url = url.Remove(position - 1, 5);
else
url = url.Remove(position, 5);
}
//if you want to remove http://, https://, ftp://.. keep this line
url = url.Replace("http://", string.Empty).Replace("https://", string.Empty).Replace("ftp://", string.Empty);
return url;
}
Edits
There was a part in your code that is removing a piece of string. If you just want to remove the 'www.' and 'http://', 'https://', 'ftp://', take a look the this code.
This code also ignore the case when it compares the url parameter and what you have been findind, on case, 'www.'.

Why isn't this C# code working? It should return the string between two other strings, but always returns an empty string

The title explains it all. It seems simple enough, so I must be overlooking something stupid. Here's what I've got.
private string getBetween(string strSource, string strStart, string strEnd)
{
int start, end;
if (strSource.Contains(strStart) && strSource.Contains(strEnd))
{
start = strSource.IndexOf(strStart, 0) + strStart.Length;
end = strSource.IndexOf(strEnd, start);
return strSource.Substring(start, end - start);
}
else
{
return "";
}
}
Thanks, guys.

Your code doesn't make sure that start and end are in order.
static string SubString(string source, string prefix, string suffix)
{
int start = source.IndexOf(prefix); // get position of prefix
if (start == -1)
return String.Empty;
int subStart = start + prefix.Length; // get position of substring
int end = source.IndexOf(suffix, subStart); // make sure suffix also exists
if (end == -1)
return String.Empty;
int subLength = end - subStart; // calculate length of substring
if (subLength == 0)
return String.Empty;
return source.Substring(subStart, subLength); // return substring
}

As couple of peoples said the problem that you code is working on very specific input, it's all because of this start and end IndexOf magic =) But when you try to update you code to work correct on more inputs you will get into problem that your code become very long with many indexes, comparsions, substrings, conditions and so on. To avoid this i like to recommend you use regular expressions with theirs help you can express what you need on special language.
Here is the sample which solves your problem with regular expressions:
public static string getBetween(string source, string before, string after)
{
var regExp = new Regex(string.Format("{0}(?<needle>[^{0}{1}]+){1}",before,after));
var matches = regExp.Matches(source).Cast<Match>(). //here we use LINQ to
OrderBy(m => m.Groups["needle"].Value.Length). //find shortest string
Select(m => m.Groups["needle"].Value); //you can use foreach loop instead
return matches.FirstOrDefault();
}
All tricky part is {0}(?<needle>[^{0}{1}]+){1} where 0 - before string and 1 - after string. This expression means that we nned to find string that lies beetween 0 and 1, and also don't contains 0 and 1.
Hope this helps.

I get the correct answer if I try any of these:
var a = getBetween("ABC", "A", "C");
var b = getBetween("TACOBBURRITO", "TACO", "BURRITO");
var c = getBetween("TACOBACONBURRITO", "TACO", "BURRITO");
The problem is likely with your input argument validation, as this fails:
var a = getBetween("ABC", "C", "A");
var a = getBetween("ABC", "C", "C");
You can improve your validation of the issue by writing some test cases like these as a separate fixture (xUnit, or main loop in throw away app).

Extracting first token from a delimeted string

i have a string:
e.g. WORD1_WORD2_WORD3
how do i get just WORD1 from the string?
i.e the text before the first underscore

It may be tempting to say Split - but that involves the creating of an array and lots of individual strings. IMO, the optimal way here is to find the first underscore, and take a substring:
string b = s.Substring(0, s.IndexOf('_')); // assumes at least one _
(edit)
If you are doing this lots, you could add some extension methods:
public static string SubstringBefore(this string s, char value) {
if(string.IsNullOrEmpty(s)) return s;
int i = s.IndexOf(value);
return i > 0 ? s.Substring(0,i) : s;
}
public static string SubstringAfter(this string s, char value) {
if (string.IsNullOrEmpty(s)) return s;
int i = s.IndexOf(value);
return i >= 0 ? s.Substring(i + 1) : s;
}
then:
string s = "a_b_c";
string b = s.SubstringBefore('_'), c = s.SubstringAfter('_');

YOUR_STRING.Split('_')[0]
In fact the Split method returns an array of strings resulting from splitting the original string at any occurrence of the specified character(s), not including the character at which the split was performed.

if s is the string:
int idx = s.IndexOf('_');
if (idx >= 0)
firstPart = s.Substring(0,idx);

("WORD1_WORD2_WORD3").Split('_')[0]
should return "WORD1". If it doesn't work try .Spilt() on a string variable with the Content you specified.
string str="WORD1_WORD2_WORD3";
string result=str.Split('_')[0];
This actually returns an array:
{"WORD1", "WORD2", "WORD3"}

There are several ways. You can use Split, Substring. etc. An example with Split:
String var = "WORD1_WORD2_WORD3";
String result = var.Split('_')[0];

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Truncating Xhtmlstring in Episerver - c#

Related

Extract multiple values from string using C#

Converting [string].ToString([custom format])

How to use replace only the first occurence of www

Why isn't this C# code working? It should return the string between two other strings, but always returns an empty string

Extracting first token from a delimeted string

Categories

Resources