i am having a variable in c# holding some string like this
string myText="my text which contains <div>i am text inside div</div>";
now i want to replace all "\n" (new line character) with "<br>" for this variable's data except for text inside div.
How do i do this??
Others have suggested using libraries such as HTMLAgilityPack. The former is indeed a nice tool, but if you don't need HTML parsing functionality beyond what you have requested, a simple parser should suffice:
string ReplaceNewLinesWithBrIfNotInsideDiv(string input) {
int divNestingLevel = 0;
StringBuilder output = new StringBuilder();
StringComparison comp = StringComparison.InvariantCultureIgnoreCase;
for (int i = 0; i < input.Length; i++) {
if (input[i] == '<') {
if (i < (input.Length - 3) && input.Substring(i, 4).Equals("<div", comp)){
divNestingLevel++;
} else if (divNestingLevel != 0 && i < (input.Length - 5) && input.Substring(i, 6).Equals("</div>", comp)) {
divNestingLevel--;
}
}
if (input[i] == '\n' && divNestingLevel == 0) {
output.Append("<br/>");
} else {
output.Append(input[i]);
}
}
return output.ToString();
}
This should handle nested divs as well.
For something like this you will need to parse the HTML in order to distinguish the parts that you do want to make the replacement in from the ones you don't.
I suggest looking at the HTML agility pack - it can parse HTML fragments as well as malformed HTML. You can then query the resulting parse tree using XPath notation and do your replacement on the selected nodes.
That would require some fairly complicated RegEx, out of my league.
But you could try splitting the string:
string[] parts = myText.Split("<div>", "</div>");
for (int i = 0; i < parts.Length; i += 2) // only the even parts
parts[i] = string.Replace(...);
And then use a StringBuilder to re-assemble the parts.
I would split the string on div then look at the tokens if it starts with "div" then don't replace \n with BR if it does start with div then you need to find the closing div and split on that.. then take the 2nd token and do what you just did... of course as you are going to have to keep appending the tokens to a master string... I'll code up a sample here in a few minutes...
Use the string.Replace() method like this:
myText = myText.Replace("\n", "<br>")
You could consider using the Environment.NewLine property to find the newline chars. Are you sure they are not \n\r or \r\n etc...
You may have to pull the text inside the div out first if you dont want to parse that. Use a regex to find it and remove it then do the Replace() as above, then put the strings backtogether.
Related
hi i was trying to make a program that modified a word in a string to a uppercase word.
the uppercase word is in a tag like this :
the <upcase>weather</upcase> is very <upcase>hot</upcase>
the result :
the WEATHER is very HOT
my code is like this :
string upKey = "<upcase>";
string lowKey = "</upcase>";
string quote = "the lazy <upcase>fox jump over</upcase> the dog <upcase> something here </upcase>";
int index = quote.IndexOf(upKey);
int indexEnd = quote.IndexOf(lowKey);
while(index!=-1)
{
for (int a = 0; a < index; a++)
{
Console.Write(quote[a]);
}
string upperQuote = "";
for (int b = index + 8; b < indexEnd; b++)
{
upperQuote += quote[b];
}
upperQuote = upperQuote.ToUpper().ToString();
Console.Write(upperQuote);
for (int c = indexEnd+9;c<quote.Length;c++)
{
if (quote[c]=='<')
{
break;
}
Console.Write(quote[c]);
}
index = quote.IndexOf(upKey, index + 1);
indexEnd = quote.IndexOf(lowKey, index + 1);
}
Console.WriteLine();
}
i have been trying using this code,and a while(while (indexEnd != -1)) :
index = quote.IndexOf(upKey, index + 1);
indexEnd = quote.IndexOf(lowKey, index + 1);
but that not work, the program run into unlimited loop, btw i'm a noob so please give a answer that i can understand :)
You can use a regular expression for this:
string input = "the <upcase>weather</upcase> is very <upcase>hot</upcase>";
var regex = new Regex("<upcase>(?<theMatch>.*?)</upcase>");
var result = regex.Replace(input, match => match.Groups["theMatch"].Value.ToUpper());
// result will be: "the WEATHER is very HOT"
Here's an explanation taken from here for the regular expression used above:
<upcase> matches the characters <upcase> literally (case sensitive)
(?<theMatch>.\*?) Named capturing group theMatch
.*? matches any character (except newline)
Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
< matches the characters < literally
/ matches the character / literally
upcase> matches the characters upcase> literally (case sensitive)
The following will work as long as there are only matching tags and none of them are nested.
public static string Upper(string str)
{
const string start = "<upcase>";
const string end = "</upcase>";
var builder = new StringBuilder();
// Find the first start tag
int startIndex = str.IndexOf(start);
// If no start tag found then return the original
if (startIndex == -1)
return str;
// Append the part before the first tag as is
builder.Append(str.Substring(0, startIndex));
// Continue as long as we find another start tag.
while (startIndex != -1)
{
// Find the end tag for the current start tag
var endIndex = str.IndexOf(end, startIndex);
// Append the text between the start and end as upper case.
builder.Append(
str.Substring(
startIndex + start.Length,
endIndex - startIndex - start.Length).ToUpper());
// Find the next start tag.
startIndex = str.IndexOf(start, endIndex);
// Append the part after the end tag, but before the next start as is
builder.Append(
str.Substring(
endIndex + end.Length,
(startIndex == -1 ? str.Length : startIndex) - endIndex - end.Length));
}
return builder.ToString();
}
I'm not rewriting your code. Just answering your (main) question:
You need to keep a variable of the index you're at, and check for IndexOf from there only (See MSDN). Something like this:
int index = 0;
while (quote.IndexOf(upKey, index) != -1)
{
//Your code, including updating the value of index.
}
(I didn't check this on Visual Studio. This is just to point you in the direction that I think you're looking for.)
The reason for the infinite loop is that you're always testing IndexOf of the same index. Perhaps you mean to have quote.IndexOf(upKey, index += 1); which would change the value of index?
The way to go here is to probably use Regex but these easy parsing excercises are always fun to do manually. This can be easily solved using a very simple state machine.
What states can we have when dealing with strings of this nature? I can think of 4:
We are either parsing normal text
Or we are parsing an opening format tag '<...>'
Or we are parsing a closing format tag '</...>'
Or we are parsing text to be formatted between tags
I can't think of any other states. Now we need to think about the normal flow / transition between states. What should happen when we a parse string with the correct format?
Parser starts up expecting normal text. That is easy to understand.
If expecting normal text we encounter a '<' then the parser should switch to parsing opening format tag state. There is no other valid state transition.
If in parsing opening format tag state we encounter a '>' then the parser should switch to parsing text to be formatted. There is no other valid state transition.
If in parsing text to be formatted we encounter a '<' then the parser should switch to parsing closing tag. Again, there is no other valid state transition.
If in parsing closing tag we encounter a '>' then the parser should switch to normal text. Once more, there is no other valid transition. Note that we are disallowing nested tags.
Ok, so that seems pretty easy to understand. What do we need to implement this?
First we'll need something to represent the parsing states. A good old enum will do:
private enum ParsingState
{
UnformattedText,
OpenTag,
CloseTag,
FormattedText,
}
Now we need some string buffers to keep track of the final formatted string, the current format tag we are parsing and finally the substring we need to format. We will use several StringBuilder's for these as we don't know how long these buffers are and how many concatenations will be performed:
var formattedStringBuffer = new StringBuilder();
var formatBuffer = new StringBuilder();
var tagBuffer = new StringBuilder();
We will also need to keep track of the parser's state and the current active tag if any (so we can make sure that the parsed closing tag matches the current active tag):
var state = ParsingState.UnformattedText;
var activeFormatTag = string.Empty;
And now we are good to go, but before we do, can we generalize this so it works with any format tag?
Yes we can, we just need to tell the parser what to do for each supported tag. We can do this easily just passing a along a Dictionary that ties each tag with the action it should perform. We do this the following way:
var formatter = new Dictionary<string, Func<string, string>>();
formatter.Add("upcase", s => s.ToUpperInvariant());
formatter.Add("lcase", s => s.ToLowerInvariant());
Great! Now our implementation could be the following:
public static string Parse(this string str, Dictionary<string, Func<string,string>> formatter)
{
var formattedStringBuffer = new StringBuilder();
var formatBuffer = new StringBuilder();
var tagBuffer = new StringBuilder();
var state = ParsingState.UnformattedText;
var activeFormatTag = string.Empty;
foreach (var c in str)
{
switch (state)
{
case ParsingState.UnformattedText:
{
if (c != '<')
{
formattedStringBuffer.Append(c);
}
else
{
state = ParsingState.OpenTag;
}
break;
}
case ParsingState.OpenTag:
{
if (c != '>')
{
tagBuffer.Append(c);
}
else
{
state = ParsingState.FormattedText;
activeFormatTag = tagBuffer.ToString();
tagBuffer.Clear();
}
break;
}
case ParsingState.FormattedText:
{
if (c != '<')
{
formatBuffer.Append(c);
}
else
{
state = ParsingState.CloseTag;
}
break;
}
case ParsingState.CloseTag:
{
if (c!='>')
{
tagBuffer.Append(c);
}
else
{
var expectedTag = $"/{activeFormatTag}";
var tag = tagBuffer.ToString();
if (tag != expectedTag)
throw new FormatException($"Expected closing tag not found: <{expectedTag}>.");
if (formatter.ContainsKey(activeFormatTag))
{
var formatted = formatter[activeFormatTag](formatBuffer.ToString());
formattedStringBuffer.Append(formatted);
tagBuffer.Clear();
formatBuffer.Clear();
state = ParsingState.UnformattedText;
}
else
throw new FormatException($"Format tag <{activeFormatTag}> not recognized.");
}
break;
}
}
}
if (state != ParsingState.UnformattedText)
throw new FormatException($"Bad format in specified string '{str}'");
return formattedStringBuffer.ToString();
}
Is it the most elegant solution? No, Regex will do a much better job, but being a beginner I would not recommend you start solving these kind of problems that way, you'll learn a whole lot more solving them manualy. You'll have plenty of time to learn Regex later on.
I'm trying to figure out a way to count the number of characters in a string, truncate the string, then returns it. However, I need this function to NOT count HTML tags. The problem is that if it counts HTML tags, then if the truncate point is in the middle of a tag, then the page will appear broken.
This is what I have so far...
public string Truncate(string input, int characterLimit, string currID) {
string output = input;
// Check if the string is longer than the allowed amount
// otherwise do nothing
if (output.Length > characterLimit && characterLimit > 0) {
// cut the string down to the maximum number of characters
output = output.Substring(0, characterLimit);
// Check if the character right after the truncate point was a space
// if not, we are in the middle of a word and need to remove the rest of it
if (input.Substring(output.Length, 1) != " ") {
int LastSpace = output.LastIndexOf(" ");
// if we found a space then, cut back to that space
if (LastSpace != -1)
{
output = output.Substring(0, LastSpace);
}
}
// end any anchors
if (output.Contains("<a href")) {
output += "</a>";
}
// Finally, add the "..." and end the paragraph
output += "<br /><br />...<a href='Announcements.aspx?ID=" + currID + "'>see more</a></p>";
}
return output;
}
But I'm not happy with this. Is there a better way to do this? If you could provide a new solution to this, or perhaps suggestions on what to add to what I have so far, that would be great.
Disclaimer: I've never worked with C#, so I'm not familiar with the concepts related to the language... I'm doing this because I have to, not by choice.
Thanks,
Hristo
Use the right tool for the problem.
HTML is not a simple format to parse. I would advise that you use a proven, existing parser rather than rolling your own. If you know that you will only ever parse XHTML - then you could use an XML parser instead.
These are the only reliable ways to perform operations on HTML that will preserve the semantic representation.
Don't try to use regular expressions. HTML is not a regular language and you can only cause yourself grief and misery going in that direction.
Suppose I have this CSV file :
NAME,ADDRESS,DATE
"Eko S. Wibowo", "Tamanan, Banguntapan, Bantul, DIY", "6/27/1979"
I would like like to store each token that enclosed using a double quotes to be in an array, is there a safe to do this instead of using the String split() function? Currently I load up the file in a RichTextBox, and then using its Lines[] property, I do a loop for each Lines[] element and doing this :
string[] line = s.Split(',');
s is a reference to RichTextBox.Lines[].
And as you can clearly see, the comma inside a token can easily messed up split() function. So, instead of ended with three token as I want it, I ended with 6 tokens
Any help will be appreciated!
You could use regex too:
string input = "\"Eko S. Wibowo\", \"Tamanan, Banguntapan, Bantul, DIY\", \"6/27/1979\"";
string pattern = #"""\s*,\s*""";
// input.Substring(1, input.Length - 2) removes the first and last " from the string
string[] tokens = System.Text.RegularExpressions.Regex.Split(
input.Substring(1, input.Length - 2), pattern);
This will give you:
Eko S. Wibowo
Tamanan, Banguntapan, Bantul, DIY
6/27/1979
I've done this with my own method. It simply counts the amout of " and ' characters.
Improve this to your needs.
public List<string> SplitCsvLine(string s) {
int i;
int a = 0;
int count = 0;
List<string> str = new List<string>();
for (i = 0; i < s.Length; i++) {
switch (s[i]) {
case ',':
if ((count & 1) == 0) {
str.Add(s.Substring(a, i - a));
a = i + 1;
}
break;
case '"':
case '\'': count++; break;
}
}
str.Add(s.Substring(a));
return str;
}
It's not an exact answer to your question, but why don't you use already written library to manipulate CSV file, good example would be LinqToCsv. CSV could be delimited with various punctuation signs. Moreover, there are gotchas, which are already addressed by library creators. Such as dealing with name row, dealing with different date formats and mapping rows to C# objects.
You can replace "," with ; then split by ;
var values= s.Replace("\",\"",";").Split(';');
If your CSV line is tightly packed it's easiest to use the end and tail removal mentioned earlier and then a simple split on a joining string
string[] tokens = input.Substring(1, input.Length - 2).Split("\",\"");
This will only work if ALL fields are double-quoted even if they don't (officially) need to be. It will be faster than RegEx but with given conditions as to its use.
Really useful if your data looks like
"Name","1","12/03/2018","Add1,Add2,Add3","other stuff"
Five years old but there is always somebody new who wants to split a CSV.
If your data is simple and predictable (i.e. never has any special characters like commas, quotes and newlines) then you can do it with split() or regex.
But to support all the nuances of the CSV format properly without code soup you should really use a library where all the magic has already been figured out. Don't re-invent the wheel (unless you are doing it for fun of course).
CsvHelper is simple enough to use:
https://joshclose.github.io/CsvHelper/2.x/
using (var parser = new CsvParser(textReader)
{
while(true)
{
string[] line = parser.Read();
if (line != null)
{
// do something
}
else
{
break;
}
}
}
More discussion / same question:
Dealing with commas in a CSV file
I have a very simple asp:textbox with the multiline attribute enabled. I then accept just text, with no markup, from the textbox. Is there a common method by which line breaks and returns can be converted to <p> and <br/> tags?
I'm not looking for anything earth shattering, but at the same time I don't just want to do something like:
html.Insert(0, "<p>");
html.Replace(Enviroment.NewLine + Enviroment.NewLine, "</p><p>");
html.Replace(Enviroment.NewLine, "<br/>");
html.Append("</p>");
The above code doesn't work right, as in generating correct html, if there are more than 2 line breaks in a row. Having html like <br/></p><p> is not good; the <br/> can be removed.
I know this is old, but I couldn't find anything better after some searching, so here is what I'm using:
public static string TextToHtml(string text)
{
text = HttpUtility.HtmlEncode(text);
text = text.Replace("\r\n", "\r");
text = text.Replace("\n", "\r");
text = text.Replace("\r", "<br>\r\n");
text = text.Replace(" ", " ");
return text;
}
If you can't use HttpUtility for some reason, then you'll have to do the HTML encoding some other way, and there are lots of minor details to worry about (not just <>&).
HtmlEncode only handles the special characters for you, so after that I convert any combo of carriage-return and/or line-feed to a BR tag, and any double-spaces to a single-space plus a NBSP.
Optionally you could use a PRE tag for the last part, like so:
public static string TextToHtml(string text)
{
text = "<pre>" + HttpUtility.HtmlEncode(text) + "</pre>";
return text;
}
Your other option is to take the text box contents and instead of trying for line a paragraph breaks just put the text between PRE tags. Like this:
<PRE>
Your text from the text box...
and a line after a break...
</PRE>
Depending on exactly what you are doing with the content, my typical recommendation is to ONLY use the <br /> syntax, and not to try and handle paragraphs.
How about throwing it in a <pre> tag. Isn't that what it's there for anyway?
I know this is an old post, but I've recently been in a similar problem using C# with MVC4, so thought I'd share my solution.
We had a description saved in a database. The text was a direct copy/paste from a website, and we wanted to convert it into semantic HTML, using <p> tags. Here is a simplified version of our solution:
string description = getSomeTextFromDatabase();
foreach(var line in description.Split('\n')
{
Console.Write("<p>" + line + "</p>");
}
In our case, to write out a variable, we needed to prefix # before any variable or identifiers, because of the Razor syntax in the ASP.NET MVC framework. However, I've shown this with a Console.Write, but you should be able to figure out how to implement this in your specific project based on this :)
Combining all previous plus considering titles and subtitles within the text comes up with this:
public static string ToHtml(this string text)
{
var sb = new StringBuilder();
var sr = new StringReader(text);
var str = sr.ReadLine();
while (str != null)
{
str = str.TrimEnd();
str.Replace(" ", " ");
if (str.Length > 80)
{
sb.AppendLine($"<p>{str}</p>");
}
else if (str.Length > 0)
{
sb.AppendLine($"{str}</br>");
}
str = sr.ReadLine();
}
return sb.ToString();
}
the snippet could be enhanced by defining rules for short strings
I understand that I was late with the answer for 13 years)
but maybe someone else needs it
sample line 1 \r\n
sample line 2 (last at paragraph) \r\n\r\n [\r\n]+
sample line 3 \r\n
Example code
private static Regex _breakRegex = new("(\r?\n)+");
private static Regex _paragrahBreakRegex = new("(?:\r?\n){2,}");
public static string ConvertTextToHtml(string description) {
string[] descrptionParagraphs = _paragrahBreakRegex.Split(description.Trim());
if (descrptionParagraphs.Length > 0)
{
description = string.Empty;
foreach (string line in descrptionParagraphs)
{
description += $"<p>{line}</p>";
}
}
return _breakRegex.Replace(description, "<br/>");
}
Ok I feel really stupid asking this. I see plenty of other questions that resemble my question, but none seem to be able to answer it.
I am creating an xml file for a program that is very picky about syntax. Sadly I am making the XML file from scratch. Meaning, I am placing each line in individually (lots of file.WriteLine(String)).
I know this is ugly, but its the only way I can get the logic to work out.
ANYWAY. I have a few strings that are coming through with '&' in them.
if (value.Contains("&"))
{
value.Replace("&", "&");
}
Does not seem to work. The value.Contains() seems to see it, but the replace does not work. I am using C# .Net 2.0 sp2. VS 2005.
Please help me out here.. Its been a long week..
If you really want to go that route, you have to assign the result of Replace (the method returns a new string because strings are immutable) back to the variable:
value = value.Replace("&", "&");
I would suggest rethinking the way you're writing your XML though. If you switch to using the XmlTextWriter, it will handle all of the encoding for you (not only the ampersand, but all of the other characters that need encoded as well):
using(var writer = new XmlTextWriter(#"C:\MyXmlFile.xml", null))
{
writer.WriteStartElement("someString");
writer.WriteText("This is < a > string & everything will get encoded");
writer.WriteEndElement();
}
Should produce:
<someString>This is < a > string &
everything will get encoded</someString>
You should really use something like Linq to XML (XDocument etc.) to solve it. I'm 100% sure you can do it without all your WriteLine´s ;) Show us your logic?
Otherwise you could use this which will be bullet proof (as opposed to .Replace("&")):
var value = "hej&hej<some>";
value = new System.Xml.Linq.XText(value).ToString(); //hej&hej<some>
This will also take care of < which you also HAVE TO escape :)
Update: I have looked at the code for XText.ToString() and internally it creates a XmlWriter + StringWriter and uses XNode.WriteTo. This may be overkill for a given application so if many strings should be converted, XText.WriteTo would be better. An alternative which should be fast and reliant is System.Web.HttpUtility.HtmlEncode.
Update 2: I found this System.Security.SecurityElement.Escape(xml) which may be the fastest and ensures max compatibility (supported since .Net 1.0 and does not require the System.Web reference).
you can also use HttpUtility.HtmlEncode class under System.Web namespace instead of doing the replacement yourself.
here you go: http://msdn.microsoft.com/en-us/library/73z22y6h.aspx
You can use Regex for replace char "&" only in node values:
input data example (string)
<select>
<option id="11">Gigamaster&Minimaster</option>
<option id="12">Black & White</option>
<option id="13">Other</option>
</select>
Replace with Regex
Regex rgx = new Regex(">(?<prefix>.*)&(?<sufix>.*)<");
data = rgx.Replace(data, ">${prefix}&${sufix}<");
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(data);
result data
<select>
<option id="11">Gigamaster&MiniMaster</option>
<option id="12">Black & White</option>
<option id="13">Other</option>
</select>
I'm Obviously very late to this, but the right answer is:
System.Text.RegularExpressions.Regex.Replace(input, "&(?!amp;)", "&");
Hope this helps somebody!
You can try:
value = value.Replace("&", "&");
Strings are immutable. You need to write:
value = value.Replace("&", "&");
Note that if you do this and your string contains "&", it's going to get changed to "&".
I've created the following function to encode & and ' without messing up with already encoded & or ' or "
public static string encodeSelectXMLCharacters(string xmlString)
{
string returnValue = Regex.Replace(xmlString, "&(?!quot;|apos;|amp;|lt;|gt;#x?.*?;)|'",
delegate(Match m)
{
string encodedValue;
switch (m.Value)
{
case "&":
encodedValue = "&";
break;
case "'":
encodedValue = "'";
break;
default:
encodedValue = m.Value;
break;
}
return encodedValue;
});
return returnValue;
}
not sure if this is useful to anyone... I was fighting this for a while... here is a glorious regex you can use to fix all your links, javascript, content. I had to deal with a ton of legacy content that nobody wanted to correct.
Add this to your Render override in your master page, control or recode to run a string through it. Please don't flame me for putting this in the wrong place:
// remove the & from href="blaw?a=b&b=c" and replace with &
//in urls - this corrects any unencoded & not just those in URL's
// this match will also ignore any matches it finds within <script> blocks AND
// it will also ignore the matches where the link includes a javascript command like
// blaw
html = Regex.Replace(html, "&(?!(?<=(?<outerquote>[\"'])javascript:(?>(?!\\k<outerquote>|[>]).)*)\\k<outerquote>?)(?!(?:[a-zA-Z][a-zA-Z0-9]*|#\\d+);)(?!(?>(?:(?!<script|\\/script>).)*)\\/script>)", "&", RegexOptions.Singleline | RegexOptions.IgnoreCase);
Its a broad stroke for a rendered page but this can be adapted to many uses without blowing up your page.
What about
Value = Server.HtmlEncode(Value);
I am quite sure it will work if you "embrace" your value with CDATA, so the result is something like
<ampersandData><![CDATA[value with ampersands like …]]></ampersandData>
Hope it helps.
Michael
Very late here, but I want to share my solution which handles the cases where you have both & (incorrect xml) and & (valid xml) in the document in addition to other xml character entities.
This solution is only meant for cases where you cannot control generation of the xml, usually because it comes from some external source. If you control the xml generation please use XmlTextWriter as suggested by #Justin Niessner
It is also quite fast and handles all the different xml character entities/references
Predefined character entities:
& quot;
& amp;
& apos;
& lt;
& gt;
Numeric character entities/references:
& #nnnn;
& #xhhhh;
PS! The space after & should not be included in the entities/references, I just added it here to avoid it being encoded in the page rendering
Code
public static string CleanXml(string text)
{
int length = text.Length;
StringBuilder stringBuilder = new StringBuilder(length);
for (int i = 0; i < length; ++i)
{
if (text[i] == '&')
{
var remaining = Math.Abs(length - i + 1);
var subStrLength = Math.Min(remaining, 12);
var subStr = text.Substring(i, subStrLength);
var firstIndexOfSemiColon = subStr.IndexOf(';');
if (firstIndexOfSemiColon > -1)
subStr = subStr.Substring(0, firstIndexOfSemiColon + 1);
var matches = Regex.Matches(subStr, "&(?!quot;|apos;|amp;|lt;|gt;|#x?.*?;)|'");
if (matches.Count > 0)
stringBuilder.Append("&");
else
stringBuilder.Append("&");
}
else if (XmlConvert.IsXmlChar(text[i]))
{
stringBuilder.Append(text[i]);
}
else if (i + 1 < length && XmlConvert.IsXmlSurrogatePair(text[i + 1], text[i]))
{
stringBuilder.Append(text[i]);
stringBuilder.Append(text[i + 1]);
++i;
}
}
return stringBuilder.ToString();
}