I want to replace ALL HTML special entities like > < to custom string.
Lets say i have following string:
string str = "<div>>hello<</div>";
and method:
Method(string str, string replaceStr)
After calling Method(str, ":)") result should be
<div>:)hello:)</div>
The problem is there are too many of special characters and I'm wondering what is the be most efficient way to accomplish this?
EDIT:
String.Replace will not do my work and using Regex for parsing HTML is not really good approach.
By dislikes on this quetion there propably isn't any clean solution therefore I decided go for following algorithm:
create txt file with valid HTML special characters (like
ΒΆ)
parse file into array of string
Thanks to HtmLAgilityPack parse HTML and get raw text and replace all entities.
I know that this is not really effective for big html string but it should do the work for now.
You can try:
string str = "<div>>hello<</div>";
string output = Regex.Replace(str, ">|<", ":)");
You can also use HtmlDecode
string str = "<div>>hello<</div>";
string output = WebUtility.HtmlDecode(str);
Really???
I've searched through https://msdn.microsoft.com/en-us/library/system.string(v=vs.110).aspx and don't see any method that can directly push a character onto the end of a string. The best I can figure is
mystr.Insert(mystr.Length, newchar.ToString());
which seems innefficient because of the overhead involved in converting the character to a string and performing string concatenation. My particular use case looks like
while (eqtn[curidx] >= '0' && eqtn[curidx] <= '9') istr.Insert(istr.Length, eqtn[curidx++].ToString());
only because I can't think of a better way to do it. Is there a better way?
Strings in .NET are immutable, so your code doesn't do anything. Every method on a String creates a new instance, it doesn't modify the existing string.
String class overrides + operator to create a new string with the character appended to the end:
istr = istr + eqtn[curidx++];
If you are doing a lot of such operations it will be more efficient to use a StringBuilder. It's basically a mutable String.
You can use the Append method to add a char to end. When you're ready, call ToString to get the constructed string.
Yes, that is absolutely right: you cannot push a character onto the end of a string because C# strings are immutable. Once you have an object, you are stuck with its value until you create a new string object.
On the other hand, creating a new string with an extra character at the end is very simple: use + operator overload that performs concatenation:
string s = "abc";
s += '9'; // s becomes "abc9"
Note that this solution is not so good for use in a loop, because if your loop runs N times you create N throw-away objects in the process. A better solution is to use StringBuilder, which provides a mutable string in C#. StringBuilder class has a convenient Append method, which pushes characters to the end of the StringBuilder. Once you are done building the string, call ToString to harvest the result as an immutable string object.
I have a string which contains various tags such as , I need to replace the entire matched string with the contents of the filename indicated, keeping in mind that the filename is not actually known so it cannot be directly searched for.
I get most of it, but what's hanging me up is how to use the subexpression outside a normal replace but instead as an argument to a method to return the passed files contents.
page= Regex.Replace(page,"<!--Include:(.*)-->",getFileContents($1));
The $1 is what's hanging me up because I cannot get the subexpression out like that.
Thanks in advance!
You can use the Regex.Replace's overload which takes MatchEvaluator
page= Regex.Replace(page,"<!--Include:(.*)-->",
m => getFileContents(m.Groups[1].Value));
Let's say I have this string:
"param1,r:1234,p:myparameters=1,2,3"
...and I would like to split it into:
param1
r:1234
p:myparameters=1,2,3
I've used the split function and of course it splits it at every comma. Is there a way to do this using regex or will I have to write my own split function?
Personally, I would try something like this:
,(?=[^,]+:.*?)
Basically, use a positive look-ahead to find a comma, followed by a "key-value" pair (this defined by a key, a colon, and more information [data] (including other commas). This should disqualify the commas between the numbers, too.
You can use ; for separating values which makes easy to work with it.
Since you have , for separation and also for values it is difficult to split it.
You have
string str = "param1,r:1234,p:myparameters=1,2,3"
Recommended to use
string str = "param1;r:1234;p:myparameters=1,2,3"
which can be splited as
var strArray = str.Split(';');
strArray[0]; // contains param1
strArray[1]; // r:1234
strArray[2]; // p:myparameters=1,2,3
I'm not sure how you would write a split that knew which commas to split on there, honestly.
Unless it's a fixed number each time in which case, just use the String.Split overload that takes an int specifying how many substrings to return at max
If you're going to have comma-delimited data that's not always a fixed number of items and it could have literal commas in the data itself, they really should be quoted. If you can control the input in any way, you should encourage that, and use an actual CSV parser instead of String.Split
That depends. You can't parse it with regex (or anything else) unless you can identify a consistent rule separating one group from another. Based on your sample, I can't clearly identify such a rule (though I have some guesses). How does the system know that p:myparameters=1,2,3 is a single item? For example, if there were another item after it, what would be the difference between that and the 1,2,3? Figure that out and you'll be pretty close to a solution.
If you're able to change the format of the input string, why not decide on a consistent delimiter between your groups? ; would be a good choice. Use an input like param1;r:1234;p:myparameters=1,2,3 and there will be no ambiguity where the groups are, plus you can just split on ; and you won't need regex.
The simplest approach would be changing your delimiter from "," to something like "|". Then you can split on "|" no problem. However if you can't change the delimiting character then maybe you could encode the sections in a fashion similar to CSV.
CSV files have the same issue... the standard there is to put double quotes "" around columns.
For example, your string would be "param1","r:1234","p:myparameters=1,2,3".
Then you could use the Microsoft.VisualBasic.FileIO.TextFieldParser to split/parse. You can include this in c# even though its in the VisualBasic namespace.
TextFieldParser
Do you mean that:string[] str = System.Text.RegularExpression.Regex.Spilt("param1,r:1234,p:myparameters=1,2,3",#"\,");
I get from another class string that must be converted to char. It usually contains only one char and that's not a problem. But control chars i receive like '\\n' or '\\t'.
Is there standard methods to convert this to endline or tab char or i need to parse it myself?
edit:
Sorry, parser eat one slash. I receive '\\t'
I assume that you mean that the class that sends you the data is sending you a string like "\n". In that case you have to parse this yourself using:
Char.Parse(returnedChar)
Otherwise you can just cast it to a string like this
(string)returnedChar
New line:
string escapedNewline = #"\\n";
string cleanupNewLine = escapedNewline.Replace(#"\\n", Environment.NewLine);
OR
string cleanupNewLine = escapedNewline.Replace(#"\\n", "\n");
Tab:
string escapedTab = #"\\t";
string cleanupTab= escapedTab.Replace(#"\\t", "\t");
Note the lack of the literal string (i.e. i did not use #"\t" because that will not represent a Tab)
Alternatively you could consider Regular Expressions if you need to replace a range of different string patterns.
You should probably write a utility function to encapsulate the common behaviour above for all the possible Escape Sequences
Then you'd write some Unit Tests to cover each of the cases you can think of.
As you encounter any bugs you add more unit tests to cover those cases.
UPDATE
You could represent a tab in the XML with a special character sequence:
see this article
This article applies to SQL Server but may well be relevant to C# also?
To be absolutely sure, you could try generating a string with a tab in it and putting it into some XML (programmatically) and using XmlSerializer to serialize that to a file to see what the output is, then you can be sure that this will faithfully 'round-trip' the string with the tab still in it.
how about using string.ToCharArray()
You can then add the appropriate logic to process whatever was in the string.
char.parse(string); is used to convert string to char and you can do vice versa
char.tostring();
100% solved