I want to decode the values after &, & is converted to & value. But amp is not converting.
string MyString = "some text &";
Console.WriteLine(System.Net.WebUtility.HtmlDecode(MyString));
So my result is "some text &"
But i want to remove the another amp also. In Html decode it is not happening. Please give me better solution
Your example text is invalid to be decoded twice. A correct text would look like
some text &
With this text you could just take two rounds for decoding:
var someText = "some text &";
var firstRound = System.Net.WebUtility.HtmlDecode(someText);
Console.WriteLine(firstRound);
var secondRound = System.Net.WebUtility.HtmlDecode(firstRound);
Console.WriteLine(secondRound);
If you don't know how many rounds to take, you could try decoding, till the text doesn't change anymore:
var someText = "some text &";
string currentResult;
string lastResult = someText;
do
{
currentResult = System.Net.WebUtility.HtmlDecode(lastResult);
if(currentResult != lastResult)
{
lastResult = currentResult;
}
else
{
break;
}
} while (true);
Related
Can anyone tell me why this is not working:
string txt = "+0°1,0'";
string degree = txt.TrimEnd('°');
I am trying to separate the degrees on this string, but after this, what remains on degree is the same content of txt.
I am using C# in Visual Studio.
string.TrimEnd remove char at the end. In your example, '°' isn't at the end.
For example :
string txt = "+0°°°°";
string degree = txt.TrimEnd('°');
// degree => "+0"
If you want remove '°' and all next characters, you can :
string txt = "+0°1,0'";
string degree = txt.Remove(txt.IndexOf('°'));
// degree => "+0"
string txt = "+0°1,0'";
if(txt.IndexOf('°') > 0) // Checking if character '°' exist in the string
{
string withoutdegree = txt.Remove(txt.IndexOf('°'),1);
}
Another safe way of handling the same is using the String.Split method. You will not have to bother to verify the presence of the character in this case.
string txt = "+0°1,0'";
var str = txt.Split('°')[0]; // "+0"
string txt = "+01,0'";
var str = txt.Split('°')[0]; // "+01,0'"
You can use this to remove all the '°' symbols present in your string using String.Replace
string txt = "+0°1,0'°°";
var text = txt.Replace(#"°", ""); // +01,0'
Edit: Added a safe way to handle the OP's exact query.
I have default JSON file with some text.
Here is one line of the json file
"description": "this will add X dollars to your account"
After I read all content of json file and convert it to object(I know how to do this) in my game I want to replace X with some int value, and my new sting should be
int doll = 50;
sting desc = "this will add 50 dollars to your account";
It should be very simple but I new to c#
Thanks
You can use the Replace method if you want to replace all instances of one string with another:
string desc = description.Replace("X", doll.ToString());
Easiest way is to use numbered placeholders that string.Format() uses instead of X.
var myString = "this will add {0} dollars to your account";
myString = string.Format(myString, 50);
Or multiple:
var myString = "This will add {0} dollars to your account number {1}.";
myString = string.Format(myString, 50, "N012345");
Is there any smart and neat way to I guess escape characters in a string to make it compatible with the specific format the SendKeys uses?
At first I thought this would work:
line = Regex.Replace(line, #"\{{0}", "{{}");
line = Regex.Replace(line, #"\}{0}", "{}}");
But this won't work work because it's doing two checks and messes up the syntax entirely.
How can I handle this?
You can use some place holder instead of { and } and create the formatted result using those place holders. Then at last replace those place holders by { and }. For example:
string PrepareForSendKeys(string input)
{
var specialChars = "+^%~(){}";
var c1 = "[BRACEOPEN]";
var c2 = "[BRACECLOSE]";
specialChars.ToList().ForEach(x =>
{
input = input.Replace(x.ToString(),
string.Format("{0}{1}{2}", c1, x.ToString(), c2));
});
input = input.Replace(c1, "{");
input = input.Replace(c2, "}");
return input;
}
And you can use it this way:
var input = "some string containing + ^ % ~ ( ) { }";
MessageBox.Show(PrepareForSendKeys(input));
And the result would be:
some string containing {+} {^} {%} {~} {(} {)} {{} {}}
I am making a small web analysis tool and need to somehow extract all the text blocks on a given url that contain more than X amount of words.
The method i currently use is this:
public string getAllText(string _html)
{
string _allText = "";
try
{
HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
document.LoadHtml(_html);
var root = document.DocumentNode;
var sb = new StringBuilder();
foreach (var node in root.DescendantNodesAndSelf())
{
if (!node.HasChildNodes)
{
string text = node.InnerText;
if (!string.IsNullOrEmpty(text))
sb.AppendLine(text.Trim());
}
}
_allText = sb.ToString();
}
catch (Exception)
{
}
_allText = System.Web.HttpUtility.HtmlDecode(_allText);
return _allText;
}
The problem here is that i get all text returned, even if its a meny text, a footer text with 3 words, etc.
I want to analyse the actual content on a page, so my idea is to somehow only parse the text that could be content (ie text blocks with more than X words)
Any ideas how this could be achieved?
Well, first approach can be a simple word count analisys on each node.InnerText value using string.Split function:
string[] words;
words = text.Split((string[]) null, StringSplitOptions.RemoveEmptyEntries);
and append only text where words.Length is larger than 3.
Also see this question answer for some more tricks in raw text gathering.
I'm trying to figure out a way to parse out a base64 string from with a larger string.
I have the string "Hello <base64 content> World" and I want to be able to parse out the base64 content and convert it back to a string. "Hello Awesome World"
Answers in C# preferred.
Edit: Updated with a more real example.
--abcdef
\n
Content-Type: Text/Plain;
Content-Transfer-Encoding: base64
\n
<base64 content>
\n
--abcdef--
This is taken from 1 sample. The problem is that the Content.... vary quite a bit from one record to the next.
There is no reliable way to do it. How would you know that, for instance, "Hello" is not a base64 string ? OK, it's a bad example because base64 is supposed to be padded so that the length is a multiple of 4, but what about "overflow" ? It's 8-character long, it is a valid base64 string (it would decode to "¢÷«~Z0"), even though it's obviously a normal word to a human reader. There's just no way you can tell for sure whether a word is a normal word or base64 encoded text.
The fact that you have base64 encoded text embedded in normal text is clearly a design mistake, I suggest you do something about it rather that trying to do something impossible...
In short form you could:
split the string on any chars that are not valid base64 data or padding
try to convert each token
if the conversion succeeds, call replace on the original string to switch the token with the converted value
In code:
var delimiters = new char[] { /* non-base64 ASCII chars */ };
var possibles = value.Split(delimiters, StringSplitOptions.RemoveEmptyEntries);
//need to tweak to include padding chars in matches, but still split on padding?
//maybe better off creating a regex to match base64 + padding
//and using Regex.Split?
foreach(var match in possibles)
{
try
{
var converted = Convert.FromBase64String(match);
var text = System.Text.Encoding.UTF8.GetString(converted);
if(!string.IsNullOrEmpty(text))
{
value = value.Replace(match, text);
}
}
catch (System.ArgumentNullException)
{
//handle it
}
catch (System.FormatException)
{
//handle it
}
}
Without a delimiter though, you can end up converting non-base64 text that happens to be also be valid as base64 encoded text.
Looking at your example of trying to convert "Hello QXdlc29tZQ== World" to "Hello Awesome World" the above algorithm could easily generate something like "ée¡Ý•Í½µ”¢¹]" by trying to convert the whole string from base64 since there is no delimiter between plain and encoded text.
Update (based on comments):
If there are no '\n's in the base64 content and it is always preceded by "Content-Transfer-Encoding: base64\n", then there is a way:
split the string on '\n'
iterate over all the tokens until a token ends in "Content-Transfer-Encoding: base64"
the next token (if there are any) should be decoded (if possible) and then the replacement should be made in the original string
return to iterating until out of tokens
In code:
private string ConvertMixedUpTextAndBase64(string value)
{
var delimiters = new char[] { '\n' };
var possibles = value.Split(delimiters,
StringSplitOptions.RemoveEmptyEntries);
for (int i = 0; i < possibles.Length - 1; i++)
{
if (possibles[i].EndsWith("Content-Transfer-Encoding: base64"))
{
var nextTokenPlain = DecodeBase64(possibles[i + 1]);
if (!string.IsNullOrEmpty(nextTokenPlain))
{
value = value.Replace(possibles[i + 1], nextTokenPlain);
i++;
}
}
}
return value;
}
private string DecodeBase64(string text)
{
string result = null;
try
{
var converted = Convert.FromBase64String(text);
result = System.Text.Encoding.UTF8.GetString(converted);
}
catch (System.ArgumentNullException)
{
//handle it
}
catch (System.FormatException)
{
//handle it
}
return result;
}