Removing a Selective Section from a String - c#

I need help figuring out how to remove only the very last "</span>" tag from a string. He is an example of what one of the strings might look like, but sometimes there are a few
<DIV style="TEXT-ALIGN: center"><span style="text-decoration:underline;"> some text </span> </span></DIV>

var originalString = #"<DIV style='TEXT-ALIGN: center'><span style='text-decoration:underline;'> some text </span> </span></DIV>";
var lastIndex = originalString.LastIndexOf("</span>");
var newwString = originalString.Substring(0, lastIndex) + originalString.Substring(lastIndex + 7);

use this regex </span>(?=[(</span>)])

Related

C# String Remove characters between two characters

For Example i have string Like
"//RemoveFromhere
<div>
<p>my name is blawal i want to remove this div </p>
</div>
//RemoveTohere"
I want to use //RemoveFromhere as starting point from where
and //RemoveTohere as ending point in between all character i want to remove
var sin = "BEFORE//RemoveFromhere"+
"<div>"+
"<p>my name is blawal i want to remove this div </p>"+
"</div>"+
"//RemoveTohereAFTER";
const string fromId = "//RemoveFromhere";
const string toId = "//RemoveTohere";
var from = sin.IndexOf(fromId) + fromId.Length;
var to = sin.IndexOf(toId);
if (from > -1 && to > from)
Console.WriteLine(sin.Remove(from , to - from));
//OR to exclude the from/to tags
from = sin.IndexOf(fromId);
to = sin.IndexOf(toId) + toId.Length;
Console.WriteLine(sin.Remove(from , to - from));
This gives results BEFORE//RemoveFromhere//RemoveTohereAFTER and BEFOREAFTER
See also a more general (better) option using regular expressions from Cetin Basoz added after this answer was accepted.
void Main()
{
string pattern = #"\n{0,1}//RemoveFromhere(.|\n)*?//RemoveTohere\n{0,1}";
var result = Regex.Replace(sample, pattern, "");
Console.WriteLine(result);
}
static string sample = #"Here
//RemoveFromhere
<div>
<p>my name is blawal i want to remove this div </p>
</div>
//RemoveTohere
Keep this.
//RemoveFromhere
<div>
<p>my name is blawal i want to remove this div </p>
</div>
//RemoveTohere
Keep this too.
";

How would I Strip Html from a string and set a character limit?

I'm getting a string from a list of items, The string is currently displayed as "item.ItemDescription" (the 9th row below)
I want to strip out all html from this string. And set a character limit of 250 after the html is stripped.
Is there a simple way of doing this?
I saw there was a posts saying to install HTML Agility Pack but I was looking for something simpler.
EDIT:
It does not always contain html, If the client wanted to add a Bold or italic tag to an items name in the description it would show up as <"strong">Item Name<"/strong"> for instance, I want to strip out all html no matter what is entered.
<tbody>
#foreach (var itemin Model.itemList)
{
<tr id="#("__filterItem_" + item.EntityId + "_" + item.EntityTypeId)">
<td>
#Html.ActionLink(item.ItemName, "Details", "Item", new { id = item.EntityId }, null)
</td>
<td>
item.ItemDescription
</td>
<td>
#if (Model.IsOwner)
{
<a class="btnDelete" title="Delete" itemid="#(item.EntityId)" entitytype="#item.EntityTypeId" filterid="#Model.Id">Delete</a>
}
</td>
</tr>
}
</tbody>
Your best option IMO is to night get into a parsing nightmare with all the possible values, why not simply inject a class=someCssClassName into the <td> as an attribute. Then control the length, color whatever with CSS.
An even better idea is to assign a class to the containing <tr class=trClass> and then have the CSS apply lengths to child <td> elements.
You could do something like this to remove all tags (opening, closing, and self-closing) from the string, but it may have the unintended consequence of removing things the user entered that weren't meant to be html tags:
text = Regex.Replace(text, "<\/?[^>]*\/?>", String.Empty);
Instead, I would recommend something like this and letting the user know html isn't supported:
text = text.Replace("<", "<");
text = text.Replace(">", ">");
Just remember to check for your 250 character limit before the conversion:
text = text.Substring(0, 250);
This Regex will select any html tags (including the ones with double quotes such as <"strong">:
<[^>]*>
Look here: http://regexr.com/3cge4
Using C# regular expressions to remove HTML tags
From there, you can simply check the string size and display appropriately.
var itemDescriptionStripped = Regex.Replace(item.ItemDescription, #"<[^>]*>", String.Empty);
if (itemDescriptionStripped.Length >= 250)
itemDescriptionStripped.Substring(0,249);
else
itemDescriptionStripped;

Extract a text from a file c#

I got a file .mail that contains:
`
FromFild=xxx#gmail.com
ToFild=yyy#gmai.com
SubjectFild=Test
Message=
<b><font size="3" color="blue">testing</font> </b>
<table>
<tr>
<th>Question</th>
<th>Answer</th>
<th>Correct?</th>
</tr>
<tr>
<td>What is the capital of Burundi?</td>
<td>Bujumburra</td>
<td>Yes</td>
</tr>
<tr>
<td>What is the capital of France?</td>
<td>F</td>
<td>Erm... sort of</td>
</tr>
</table>
Message=END
#at least one empty line needed at the end!
`
And i need to extract and save only the text that is between Message= and Message=END.
I tried with split('=').Last/First(). Not good.I can not use Substring, as it accepts only int ofIndex. I am noob and i can not think of a sollution. Can you give a hint, please?
You can use this Regular Expression :
/Message=(?<messagebody>(.*))Message=END/s
Then the code to get message :
string fileContent; //The content of your .mail file
MatchCollection match = Regex.Matches(fileContent, "/Message=(?<messagebody>(.*))Message=END/s");
string message = match[0].Groups["messagebody"].Value;
I will assume that there is no constant number of lines in the text file or in the message your'e looking for that I can rely on.
string prefix = "Message=";
string postfix = "Message=END";
var text = File.ReadAllText("a.txt");
var messageStart = text.IndexOf(prefix) + prefix.Length;
var messageStop = text.IndexOf(postfix);
var result = text.Substring(messageStart, messageStop - messageStart);

how to split the string between two strings in c#?

I have one String variable that contains HTML data.Now i want to split that html string into multiple string and then finally merge those strings into single one.
This is html string:
<p><span style="text-decoration: underline; color: #ff0000;"><strong>para1</strong></span></p>
<p style="text-align: center;"><strong><span style="color: #008000;">para2</span> स्द्स्द्सद्स्द para2 again<br /></strong></p>
<p style="text-align: left;"><strong><span style="color: #0000ff;">para3</span><br /></strong></p>
And this is my expected output:
<p><span style="text-decoration: underline; color: #ff0000;"><strong>para1</strong></span><strong><span style="color: #008000;">para2</span>para2 again<br /></strong><strong><span style="color: #0000ff;">para3</span><br /></strong></p>
My Split Logic is given below...
Split the HTML string into token based on </p> tag.
And take the first token and store it in separate string variable(firstPara).
Now take the each and every token and then remove any tag starting with<p and also ending with </p>.And store each value in separate variable.
4.Then take first token named firstPara and replace the tag </p> and then append each every token that we got through the step 3.
5.So,Now the variable firstPara has whole value...
Finally, we just append </p> at the end of the firstPara...
This is my problem...
Could you please step me to get out of this issue...
Here is regex example how to do it.
String pattern = #"(?<=<p.*>).*(?=</p>)";
var matches = Regex.Matches(text, pattern);
StringBuilder result = new StringBuilder();
result.Append("<p>");
foreach (Match match in matches)
{
result.Append(match.Value);
}
result.Append("</p>");
And this is how you should do it with Html Agility Pack
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(text);
var nodes = doc.DocumentNode.SelectNodes("//p");
StringBuilder result = new StringBuilder();
result.Append("<p>");
foreach (HtmlNode node in nodes)
{
result.Append(node.InnerHtml);
}
result.Append("</p>");
If you would like to split a string by another string, you may use string.Split(string[] separator, StringSplitOptions options) where separator is a string array which contains at least one string that will be used to split the string
Example
//Initialize a string of name HTML as our HTML code
string HTML = "<p><span style=\"text-decoration: underline; color: #ff0000;\"><strong>para1</strong></span></p> <p style=\"text-align: center;\"><strong><span style=\"color: #008000;\">para2</span> स्द्स्द्सद्स्द para2 again<br /></strong></p> <p style=\"text-align: left;\"><strong><span style=\"color: #0000ff;\">para3</span><br /></strong></p>";
//Initialize a string array of name strSplit to split HTML with </p>
string[] strSplit = HTML.Split(new string[] { "</p>" }, StringSplitOptions.None);
//Initialize a string of name expectedOutput
string expectedOutput = "";
string stringToAppend = "";
//Initialize i as an int. Continue if i is less than strSplit.Length. Increment i by 1 each time you continue
for (int i = 0; i < strSplit.Length; i++)
{
if (i >= 1) //Continue if the index is greater or equal to 1; from the second item to the last item
{
stringToAppend = strSplit[i].Replace("<p", "<"); //Replace <p by <
}
else //Otherwise
{
stringToAppend = strSplit[i]; //Don't change anything in the string
}
//Append strSplit[i] to expectedOutput
expectedOutput += stringToAppend;
}
//Append </p> at the end of the string
expectedOutput += "</p>";
//Write the output to the Console
Console.WriteLine(expectedOutput);
Console.Read();
Output
<p><span style="text-decoration: underline; color: #ff0000;"><strong>para1</stro
ng></span> < style="text-align: center;"><strong><span style="color: #008000;">p
ara2</span> ?????????????? para2 again<br /></strong> < style="text-align: left;
"><strong><span style="color: #0000ff;">para3</span><br /></strong></p>
NOTICE: Because my program does not support Unicode characters, it could not read स्द्स्द्सद्स्द. Thus, it was translated as ??????????????.
Thanks,
I hope you find this helpful :)

Html Agility Pack + Get specific node

Hello i have a problem with my application.
I need to pick out a specific text between two nodes.
The html page looks like this
<td align="right" width="186">Text1</td>
<td align="center" width="51">? - ?</td>
<td width="186">Text2</td>`
I can pick out Text1 and Text2 with:
HtmlNodeCollection cols = doc.DocumentNode.SelectNodes("//td[#width='186']");<br />
foreach (HtmlNode col in cols)<br />
{
if (col.InnerText == "Text1")
{
Label1.Text = col.InnerText;
}
}
The reason why i have the if-condition is because there are more td's in the page. And i need to specifically pick out the one who got "Text1" in it.
But the problem is how i can parse out the text "? - ?" There are more text in the document also having the text "? - ?" but i need to pick out specifically the one between my two other nodes..
The result should be Text1 ? - ? Text2 etc..
I guess it has something to do with nextchild or sibling etcetera?
You can check col.NextSibling.InnerText.

Categories

Resources