Get Text Between Two Strings (HTML) in C#

Get Text Between Two Strings (HTML) in C# - c#

I am trying to parse a website's HTML and then get text between two strings.
I wrote a small function to get text between two strings.
public string getBetween(string strSource, string strStart, string strEnd)
{
int Start, End;
if (strSource.Contains(strStart) && strSource.Contains(strEnd))
{
Start = strSource.IndexOf(strStart, 0) + strStart.Length;
End = strSource.IndexOf(strEnd, Start);
return strSource.Substring(Start, End - Start);
}
else
{
return string.Empty;
}
}
I have the HTML stored in a string called 'html'. Here is a part of the HTML that I am trying to parse:
<div class="info">
<div class="content">
<div class="address">
<h3>Andrew V. Kenny</h3>
<div class="adr">
67 Romines Mill Road<br/>Dallas, TX 75204 </div>
</div>
<p>Curious what <strong>Andrew</strong> means? Click here to find out!</p>
So, I use my function like this.
string m2 = getBetween(html, "<div class=\"address\">", "<p>Curious what");
string fullName = getBetween(m2, "<h3>", "</h3>");
string fullAddress = getBetween(m2, "<div class=\"adr\">", "<br/>");
string city = getBetween(m2, "<br/>", "</div>");
The output of the full name works fine, but the others have additional spaces in them for some reason. I tried various ways to avoid them (such as completely copying the spaces from the source and adding them in my function) but it didn't work.
I get an output like this:
fullName = "Andrew V. Kenny"
fullAddress = " 67 Romines Mill Road"
city = "Dallas, TX 75204 "
There are spaces in the city and address which I don't know how to avoid.

Trim the string and the unecessary spaces will be gone:
fullName = fullName.Trim ();
fullAddress = fullAddress.Trim ();
city = city.Trim ();

Related

C# String Remove characters between two characters

For Example i have string Like
"//RemoveFromhere
<div>
<p>my name is blawal i want to remove this div </p>
</div>
//RemoveTohere"
I want to use //RemoveFromhere as starting point from where
and //RemoveTohere as ending point in between all character i want to remove

var sin = "BEFORE//RemoveFromhere"+
"<div>"+
"<p>my name is blawal i want to remove this div </p>"+
"</div>"+
"//RemoveTohereAFTER";
const string fromId = "//RemoveFromhere";
const string toId = "//RemoveTohere";
var from = sin.IndexOf(fromId) + fromId.Length;
var to = sin.IndexOf(toId);
if (from > -1 && to > from)
Console.WriteLine(sin.Remove(from , to - from));
//OR to exclude the from/to tags
from = sin.IndexOf(fromId);
to = sin.IndexOf(toId) + toId.Length;
Console.WriteLine(sin.Remove(from , to - from));
This gives results BEFORE//RemoveFromhere//RemoveTohereAFTER and BEFOREAFTER
See also a more general (better) option using regular expressions from Cetin Basoz added after this answer was accepted.

void Main()
{
string pattern = #"\n{0,1}//RemoveFromhere(.|\n)*?//RemoveTohere\n{0,1}";
var result = Regex.Replace(sample, pattern, "");
Console.WriteLine(result);
}
static string sample = #"Here
//RemoveFromhere
<div>
<p>my name is blawal i want to remove this div </p>
</div>
//RemoveTohere
Keep this.
//RemoveFromhere
<div>
<p>my name is blawal i want to remove this div </p>
</div>
//RemoveTohere
Keep this too.
";

Save data from a textbox to a specific line in an existing text file

I have a text file with these inside,
Apple 5[KG]
Orange 7[KG]
Water 10[L]
I have written this, to get a specific location from the text file
//a function to get a value between two different strings
public static string getBetween(string strSource, string strStart, string strEnd)
{
int Start, End;
if (strSource.Contains(strStart) && strSource.Contains(strEnd))
{
Start = strSource.IndexOf(strStart, 0) + strStart.Length;
End = strSource.IndexOf(strEnd, Start);
return strSource.Substring(Start, End - Start);
}
else
{
return "";
}
}
And I'm using this to display the value in the textBox, (After the selecting the file and loading and stuff)
textBox1.Text = getBetween(filetext, "Apple ", "[");
//filetext is a string that contains the selected file's name
If I were to use similar method (or a better method) to replace the value of apple's weight by editing the value from the same textbox, what are the changes I need to make or what is the code segment for a "putBetween" function or something?

OpenDicom Library C#, problems with Real Value of a tag

I need help in visualize a Tag value ( int , string or others ) in unity using OpendDicom library (C#).
The problem of the code, is that I don't know how to Get the exact value of a tag i.e. the sex of the patient as a String, his/her age as an int...
public void ReadData(AcrNemaFile file)
{
Sequence sq = file.GetJointDataSets().GetJointSubsequences();
string tag = string.Empty;
string description = string.Empty;
string value = string.Empty;
string op = string.Empty;
string val_rep = string.Empty;
string war = string.Empty;
foreach (DataElement el in sq)
{
tag = el.Tag.ToString(); //tag group and element
op = el.VR.Tag.GetDictionaryEntry().Description;//tag description
val_rep = el.VR.ToString();//value representative
war = el.Value.ToString();//
Debug.Log( tag + " : " + op + " \n " + val_rep);
}
}
This is the code to Display the Tags and the related things

The library allows you to get the array of values with .ToArray().
With this you will have a array of objects. Then, you only have to cast or convert the object to the type you want. In this case, you can use the VR to know which kind of type has the object; Decimal String (DS), DateTime (DT), etc...

Acting on the indentation of a c# multiline string

I want to write some Html from c# (html is an example, this might be other languages..)
For example:
string div = #"<div class=""className"">
<span>Mon text</span>
</div>";
will produce:
<div class="className">
<span>Mon text</span>
</div>
that's not very cool from the Html point of view...
The only way to have a correct HTML indentation will be to indent the C# code like this :
string div = #"<div class=""className"">
<span>Mon text</span>
</div>";
We get the correctly indented Html:
<div class="className">
<span>Mon text</span>
</div>
But indenting the C# like this really broke the readability of the code...
Is there a way to act on the indentation in the C# language ?
If not, does someone have a tip better than :
string div = "<div class=\"className\">" + Environment.NewLine +
" <span>Mon text</span>" + Environment.NewLine +
"</div>";
and better than
var sbDiv = new StringBuilder();
sbDiv.AppendLine("<div class=\"className\">");
sbDiv.AppendLine(" <span>Mon text</span>");
sbDiv.AppendLine("</div>");
What i use as a solution:
Greats thanks to #Yotam for its answer.
I write a little extension to make the alignment "dynamic" :
/// <summary>
/// Align a multiline string from the indentation of its first line
/// </summary>
/// <remarks>The </remarks>
/// <param name="source">The string to align</param>
/// <returns></returns>
public static string AlignFromFirstLine(this string source)
{
if (String.IsNullOrEmpty(source)) {
return source;
}
if (!source.StartsWith(Environment.NewLine)) {
throw new FormatException("String must start with a NewLine character.");
}
int indentationSize = source.Skip(Environment.NewLine.Length)
.TakeWhile(Char.IsWhiteSpace)
.Count();
string indentationStr = new string(' ', indentationSize);
return source.TrimStart().Replace($"\n{indentationStr}", "\n");
}
Then i can use it like that :
private string GetHtml(string className)
{
return $#"
<div class=""{className}"">
<span>Texte</span>
</div>".AlignFromFirstLine();
}
That return the correct html :
<div class="myClassName">
<span>Texte</span>
</div>
One limitation is that it will only work with space indentation...
Any improvement will be welcome !

You could wrap the string to the next line to get the desired indentation:
string div =
#"
<div class=""className"">
<span>Mon text</span>
</div>"
.TrimStart(); // to remove the additional new-line at the beginning
Another nice solution (disadvantage: depends on the indentation level!)
string div = #"
<div class=""className"">
<span>Mon text</span>
</div>".TrimStart().Replace("\n ", "\n");
It just removes the indentation out of the string. make sure the number of spaces in the first string of the Replace is the same amount of spaces your indentation has.

I like this solution more, but how about:
string div = "<div class='className'>\n"
+ " <span>Mon text</span>\n"
+ "</div>";
This gets rid of some clutter:
Replace " inside strings with ' so that you don't need to escape the quote. (Single quotes in HTML appear to be legal.)
You can then also use regular "" string literals instead of #"".
Use \n instead of Environment.NewLine.
Note that the string concatenation is performed during compilation, by the compiler. (See also this and this blog post on the subject by Eric Lippert, who previously worked on the C# compiler.) There is no runtime performance penalty.

Inspired by trimIndent() in Kotlin.
This code:
var x = #"
anything
you
want
".TrimIndent();
will produce a string:
anything
you
want
or "\nanything\n you\nwant\n"
Implementation:
public static string TrimIndent(this string s)
{
string[] lines = s.Split('\n');
IEnumerable<int> firstNonWhitespaceIndices = lines
.Skip(1)
.Where(it => it.Trim().Length > 0)
.Select(IndexOfFirstNonWhitespace);
int firstNonWhitespaceIndex;
if (firstNonWhitespaceIndices.Any()) firstNonWhitespaceIndex = firstNonWhitespaceIndices.Min();
else firstNonWhitespaceIndex = -1;
if (firstNonWhitespaceIndex == -1) return s;
IEnumerable<string> unindentedLines = lines.Select(it => UnindentLine(it, firstNonWhitespaceIndex));
return String.Join("\n", unindentedLines);
}
private static string UnindentLine(string line, int firstNonWhitespaceIndex)
{
if (firstNonWhitespaceIndex < line.Length)
{
if (line.Substring(0, firstNonWhitespaceIndex).Trim().Length != 0)
{
return line;
}
return line.Substring(firstNonWhitespaceIndex, line.Length - firstNonWhitespaceIndex);
}
return line.Trim().Length == 0 ? "" : line;
}
private static int IndexOfFirstNonWhitespace(string s)
{
char[] chars = s.ToCharArray();
for (int i = 0; i < chars.Length; i++)
{
if (chars[i] != ' ' && chars[i] != '\t') return i;
}
return -1;
}

If it is one long string then you can always keep the string in a text file and read it into your variable, e.g.
string text = File.ReadAllText(#"c:\file.txt", Encoding.UTF8);
This way you can format it anyway you want using a text editor and it won't negatively effect the look of your code.
If you're changing parts of the string on the fly then StringBuilder is your best option. - or if you did decide to read the string in from a text file, you could include {0} elements in your string and then use string.format(text, "text1","text2", etc) to change the required parts.

find the index of all matching search string in a given string and then insert a new word

I have a string, for example:
In an out of the box way to reduce number of cigarette consumption per day, Vineet, a Delhi University student has started smoking longer cigarettes. Cigarettes have a bad influence on health
Have to find the occurrence of cigarette and insert a <span style='color: red;'> before it to highlight?
string search = "cigaret";
string lbltxt = Label1.Text;
int startIndex = Label1.Text.IndexOf(search,StringComparison.CurrentCultureIgnoreCase);
int endIndex = Label1.Text.IndexOf(search,StringComparison.CurrentCultureIgnoreCase) + (search.Length);
Label1.Text = Label1.Text.Insert(endIndex, "</span>");
Label1.Text = Label1.Text.Insert(findIndex, "<span style='color: red;'>");
This bit of code finds only the first occurrence and highlights it only once in the original string instead of thrice.

Use this JQUERY
$("p").html(function(index, value) {
return value.replace(/\b(cigarette)\b/g, '<strong class="test">$1</strong>');
});
and a css
.test {
color: #ff0000;
}
Go through this fiddle
http://jsfiddle.net/EGtBy/218/
I think this would be easy and safe way of working.
And you can use same for your case sensitive word. I think you can manage now.. Or need any help,then comment.

Use String.Replace if you really need to do that "ugly replacement"(I think it's a bad practice to add styles that way):
string search = "cigarette";
string replacement = "<span style='color: red;'>cigarette</span>";
string sentence = "In an out of the box way to reduce number of cigarette consumption per day, Vineet, a Delhi University student has started smoking longer cigarettes. Cigarettes have a bad influence on health";
string newStr = sentence.Replace(search,replacement);

You can replace the word like this:
string search = "cigaret";
Label1.Text = Label1.Text.Replace(search, "<span style='color: red;'>" + search + "</span>");

Hi finally here is a working code, but am sure there might be a much better way to do it. If anyone can improvise it, much appreciated.
string msg = Label1.Text;
string search = "cigarette";
int searchStringLength = search.Length;
string startTag = "<span style='color: red;'>";
string endTag = "</span>";
int skip = startTag.Length + endTag.Length;
int msgindex = 0;
do
{
msgindex = msg.IndexOf(search, msgindex, StringComparison.OrdinalIgnoreCase);
if (msgindex == -1) break;
msg= msg.Insert((msgindex + searchStringLength), endTag);
msg = msg.Insert(msgindex, startTag);
msgindex += searchStringLength + skip;
} while (msgindex != -1);
Label1.Text = msg;

I would do something like this. Not sure how it will affect the formatting (unless the formatting's HTML).
StringBuilder final = new StringBuilder();
foreach (string split in Label1.Text.Split(' '))
{
if (Regex.Match(split, "[a-zA-Z]+").Value.Equals("cigarettes", StringComparison.CurrentCultureIgnoreCase))
{
final.Append(#"<span style='color: red;'>");
final.Append(split);
final.Append(#"</span>");
}
else
{
final.Append(split);
}
final.Append(' ');
}
Label1.Text = final.ToString().Trim();
result:
In an out of the box way to reduce number of cigarette consumption per day, Vineet, a Delhi University student has started smoking longer <span style='color: red;'>cigarettes.</span> <span style='color: red;'>Cigarettes</span> have a bad influence on health
Update:
Sorry, you don't have to do Regex. I found out that you can that CultureInfo has a CompareInfo property that can act like an ignore case Contains() method:
static void Main(string[] args)
{
string yourstring = #"In an out of the box way to reduce number of cigarette consumption per day, Vineet, a Delhi University student has started smoking longer cigarettes. Cigarettes have a bad influence on health";
StringBuilder final = new StringBuilder();
CultureInfo culture = CultureInfo.CurrentCulture;
foreach (string split in yourstring.Split(' '))
{
if (culture.CompareInfo.IndexOf(split,"cigarette",CompareOptions.IgnoreCase)!=-1)
{
final.Append(#"<span style='color: red;'>");
final.Append(split);
final.Append(#"</span>");
}
else
{
final.Append(split);
}
final.Append(' ');
}
Label1.Text = final.ToString().Trim();
}
Result:
In an out of the box way to reduce number of <span style='color: red;'>cigarette</span> consumption per day, Vineet, a Delhi University student has started smoking longer <span style='color: red;'>cigarettes.</span> <span style='color: red;'>Cigarettes</span> have a bad influence on health
Thanks for trying. only problem with this solution is, it also
highlights the letters which are not in the search string. SInce I
have a solution with index for time being will go wth that.
So apparently this is what you want:
Label1.Text = Regex.Replace(Label1.Text, #"(cigarette)", #"<span style='color: red;'>$1</span>", RegexOptions.IgnoreCase);
output is:
In an out of the box way to reduce number of <span style='color: red;'>cigarette</span> consumption per day, Vineet, a Delhi University student has started smoking longer <span style='color: red;'>cigarette</span>s. <span style='color: red;'>Cigarette</span>s have a bad influence on health

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Get Text Between Two Strings (HTML) in C# - c#

Trim the string and the unecessary spaces will be gone: fullName = fullName.Trim (); fullAddress = fullAddress.Trim (); city = city.Trim ();

Related

C# String Remove characters between two characters

Save data from a textbox to a specific line in an existing text file

OpenDicom Library C#, problems with Real Value of a tag

Acting on the indentation of a c# multiline string

find the index of all matching search string in a given string and then insert a new word

Categories

Resources