C# scandic letters to html format - c#

How to convert "ö" to "ö" with C#?
I try to convert with WebUtility.HtmlEncode and
HttpUtility.HtmlEncode methods, but they return "ö".
Thanks!

Per this site (https://code.google.com/p/doctype-mirror/wiki/OumlCharacterEntity) the ö character maps to the unicode value of U+000F6 which is exactly the same as 0x246 (what .NET uses). Basically what .NET gives and what you are looking for are the same, then.
If you favor ö semantically for some reason you would have to create an array of each of the replacements you want to make. From there you can use string.Replace on your html. If memory or performance are an issue you will probably need to look into using a StringBuilder. The LINQ version of string.Replace looks something like:
var myHtml = "long string with ö";
var encodedString = HttpContext.Current.Server.HtmlEncode(myHtml);
var replaceValues = new [] { new KeyValuePair<string, string>("ö", "ö") };
var encodedString = replaceValues.Aggregate(encodedString, (current, value) =>
current.Replace(value.Key, value.Value));
This is just pseudocode using LINQ and you may be able to optimize slightly but it gives you the basic idea. Best of luck!

Related

Replace characters using Regex in c# [duplicate]

I like to know how to replace a regex-match of unknown amount of equal-signs, thats not less than 2... to the same amount of underscores
So far I got this:
text = Regex.Replace(text, "(={2,})", "");
What should I use as the 3rd parameter ?
EDIT: Prefferably a regex solution thats compatible in all languages
You can use Regex.Replace(String, MatchEvaluator) instead and analyze math:
string result = new Regex("(={2,})")
.Replace(text, match => new string('_', match.ToString().Length));
A much less clear answer (in term of code clarity):
text = Regex.Replace(text, "=(?==)|(?<==)=", "_");
If there are more than 2 = in a row, then at every =, we will find a = ahead or behind.
This only works if the language supports look-behind, which includes C#, Java, Python, PCRE... and excludes JavaScript.
However, since you can pass a function to String.replace function in JavaScript, you can write code similar to Alexei Levenkov's answer. Actually, Alexei Levenkov's answer works in many languages (well, except Java).

unicode to human readable string c# .net

This is probably a very basic question, but really appreciate if you could help me with this:
I want to convert an string that contains characters like \u000d\u000a\u000d\u000 to a human readable string, however I don't want to use .Replace method since the Unicode characters might be much more than what I include the software to check and replace.
string = "Test \u000d\u000a\u000d\u000aTesting with new line. \u000d\u000a\u000d\u000aone more new line"
I receive this string as a json Object from my server.
Do you even need that?
For example, the following code will print abc which is the actual decoded value:
var unicodeString = "\u0061\u0062\u0063";
Console.WriteLine(unicodeString);

KeyNotFoundException with using HtmlEntity.DeEntitize() method

I am currently working on a scraper written in C# 4.0. I use variety of tools, including the built-in WebClient and RegEx features of .NET. For a part of my scraper I am parsing a HTML document using HtmlAgilityPack. I got everything to work as I desired and went through some cleanup of the code.
I am using the HtmlEntity.DeEntitize() method to clean up the HTML. I made a few tests and the method seemed to work great. But when I implemented the method in my code I kept getting KeyNotFoundException. There are no further details so I'm pretty lost. My code looks like this:
WebClient client = new WebClient();
string html = HtmlEntity.DeEntitize(client.DownloadString(path));
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
The HTML downloaded is UTF-8 encoded. How can I get around the KeyNotFound exception?
I understand that the problem is due to occurrence of non-standard characters. Say, for example, Chinese, Japanese etc.
After you find out that what characters are causing a problem, perhaps you could search for the suitable patch to htmlagilitypack here
This may be of some help to you in case you want to modify the htmlagilitypack source yourself.
Four years later and I have the same problem with some encoded characters (version 1.4.9.5). In my case, there is a limited set of characters that might generate the problem, so I have just created a function to perform the replacements:
// to be called before HtmlEntity.DeEntitize
public static string ReplaceProblematicHtmlEntities(string str)
{
var sb = new StringBuilder(str);
//TODO: add other replacements, as needed
return sb.Replace("&period;", ".")
.Replace("&abreve;", "ă")
.Replace("â", "â")
.ToString();
}
In my case, the string contains both html-encoded characters and UTF-8 characters, but the problem is related to some encoded characters only.
This is not an elegant solution, but a quick fix for all those text with a limited (and known) amount of problematic encoded characters.
My HTML had a block of text like so:
... found in sections: 233.9 & 517.3; ...
Despite the spacing and decimal point, it was interpreting & 517.3; as a unicode character.
Simply HTML Encoding the raw text fixed the problem for me.
string raw = "sections: 233.9 & 517.3;";
// turn '&' into '&', etc, before DeEntitizing
string encoded = System.Web.HttpUtility.HtmlEncode(raw);
string deEntitized = HtmlEntity.DeEntitize(encoded);
In my case I have fixed this by updating HtmlAgilityPack to version 1.5.0

C# equivalent of OBJ-C's

Im converting an OBJ-C application to C# and am having trouble with this one:
What is the C# way to do this:
NSArray *charts = [xmlString componentsSeparatedByString:#"</record>"];
string[] charts = xmlString.Split(new string[] { "</record>" }, StringSplitOptions.None);
I misread the original question (or rather, comment) but I would strongly recommend that if you have some XML, you don't just split it by tag names - you parse it as XML, and then work with the parsed document. That will be much more reliable than using plain string operations.
For example, if you want to get the text within each <record> element you might use:
XDocument doc = XDocument.Parse(text);
List<string> records = doc.Descendants("record")
.Select(x => x.Value)
.ToList();
Treating XML as a plain string is almost always a bad idea.

how to get a String with String.Format to execute?

I have a little chunk of code (see below) that is returning the string:
string.Format("{0}----{1}",3,"test 2");
so how do I get this to actually "Execute"? To run and do the format/replacement of {0} and {1}?
My Code snippet:
StringBuilder sb = new StringBuilder();
sb.Append("{0}----{1}\",");
sb.AppendFormat(ReturnParamValue(siDTO, "siDTO.SuggestionItemID,siDTO.Title"));
string sbStr = "=string.Format(\""+sb.ToString()+");";
yes, ReturnParamValue gives the actually value of the DTO.
Anyways, I've taken a look at the following (but it doesn't say how to execute it:
How to get String.Format not to parse {0}
Maybe, I just should put my code snippet in a method. But, what then?
Why are you including String.Format in the string itself?
If you're looking for a generic "let me evaluate this arbitrary expression I've built up in a string" then there isn't a simple answer.
If, instead, you're looking at how to provide the parameters to the string from a function call, then you've got yourself all twisted up and working too hard.
Try something like this, based on your original code:
string result
= string.Format(
"{0}----{1}",
ReturnParamValue(siDTO, "siDTO.SuggestionItemID,siDTO.Title"));
Though, this won't entirely work since your original code seems to be only providing a single value, and you have two values in your format string - the {0} will be replaced with the value from your function, and {1} left unchanged.
What output are you expecting?
Does your ReturnParamValue() function try to return both the label and the value in a single string? If it does, and if they're comma separated, then you could try this:
var value = ReturnParamValue(siDTO, "siDTO.SuggestionItemID,siDTO.Title"));
var pieces = string.Split(',');
string result
= string.Format( "{0}----{1}", pieces[0], pieces[1]);
Though this is seriously working too hard if ReturnParamValue() is a method you control.
Update Fri 6 August
Check out the declaration for string.Format() as shown on MSDN:
public static string Format(
string format,
params Object[] args
)
Unlike the special casing you might have seen in C for printf(), there's nothing special or unusual about the way string.Format() handles multiple parameters. The key is the params keyword, which asks the compiler to provide a little "syntactic sugar" where it combines the parameters into an array for you.
Key here is that the wrapping doesn't happen if you're already passing a single object[] - so if you wanted to, you could do something like this:
object[] parameters
= ReturnParamValues(siDTO, "siDTO.SuggestionItemID,siDTO.Title");
string result
= string.Format("{0}----{1}----{2}", parameters);
Though, if I saw something like this in any codebase I maintained, I'd be treating it as a code-smell and looking for a better way to solve the problem.
Just because it's possible doesn't mean it's advisable. YMMV, of course.
I don't think you can execute it. Java is not really a interpreted language.
You may make use of scripting languages (which can even embed in your Java app as I know, start from JDK6) for such purpose, like Groovy
You could use RegEx to parse the three parameters out of the string, and then pass them to a real, actual string.Format method :-)
It looks like what you want is something like this:
string sbStr = string.Format("{0}----{1}", siDTO.SuggestionItemID, siDTO.Title);
Maybe i didn't understand your question completely, but it sounds like you need to format a format-string. If that's true you could maybe try something like this:
int width = 5;
string format = String.Format("{{0,{0}}}----{{1,{0}}}", width);
string result = String.Format(format, "ab", "cd");
So the trick is simply to escape the { or } by using a double {{ or }}.

Categories

Resources