Remove characters after specific character for dynamic titles - c#

Hey I would like to cut off a title from an RSS feed after a specific character, in this case, the character ";". I looked up plenty of questions and they all seem to do this with a predefined string. I need my code to pull the title of an RSS feed (which is dynamic, but always in a similar format with the ";" I want to delete the contents before). Here's my Code
ASP.NET - P.S I'm using a fancybox iframe to pull the link up. Its irrelevant to my issue.
<%# FormatTitle( XPath("title") ) %>
C# - I made this code after searching similar questions on StackOverflow
public static string FormatTitle(object TitleIn)
{
string input = "Bid - Contract.: 13-C-00038; Howard F. Curren AWTP New Primary Sludge Pump Station Rehabilitation &#8211; Sheltered Market";
int index = input.IndexOf(";") + 1;
if (index > 0)
input= input.Substring(index);
return input;
}
Now, the problem is now that all of my feeds have the same title, "Howard F. Curren AWTP New Primary Sludge Pump Station Rehabilitation – Sheltered Market". I need the "input" string to accept the "title" field on the xml that's being pulled. Sorry if this has already been answered. I looked up a bunch on StackOverflow and I can't find any that deal with dynamic titles.

Your code ignores the input param TitleIn and uses the local variable input that is set to the string literal. Hence, your method will always return the same value.

Related

C# .NET: Problems with query strings with character references

I'm having problems creating a query string and sending it to another webpage.
The text I'm trying to send is long and has special characters. Here is an example:
Represent a fraction 1/𝘣 on a number line diagram by defining the interval from 0 to 1 as the whole and partitioning it into 𝘣 equal parts. Recognize that each part has size 1/𝘣 and that the endpoint of the part based at 0 locates the number 1/𝘣 on the number line.
I can send this just fine if I hand code it:
<a href="Default.cshtml?standardText=Represent a fraction 1/𝘣 on a number line diagram by defining the interval from 0 to 1 as the whole and partitioning it into 𝘣 equal parts. Recognize that each part has size 1/𝘣 and that the endpoint of the part based at 0 locates the number 1/𝘣 on the number line.">
Link Text
</a>
This goes through without any problems, and I can read the entire Query String on the other side.
But if I am creating the link programmatically, my query string gets cut off right before the first character reference. I am using the following setup in a helper function:
string url = "Default.cshtml";
url += "?standardText=" + standard.text;
Link Text
When I use this, I only get "Understand a Fraction as 1/" and then it stops.
When I look at the page source, the only difference in the links is that one has actual ampersands and the second is having those turned into &
<a href="Default.cshtml?standardText=Understand a fraction 1/&#120355; as the quantity formed by 1 part when a whole is partitioned into &#120355; equal parts; understand a fraction &#120354;/&#119887; as the quantity formed by &#120354; parts of size 1/&#120355;."
So the problem is not really the spaces, but the fact that the & is being interpreted as starting a new query string parameter.
I have tried various things [using HttpUtility.UrlEncode, HttpUtility.UrlEncodeUnicode, Html.Raw, trying to replace spaces with "+"], but the problem isn't with the spaces, its with how the character references are being handled. When I tried HttpUtility.urlEncode I got a double-encoding security error.
On the advice of OmG I tried replacing all the &s, #s, and /s using:
url = url.Replace("&","%26");
url = url.Replace("#","%23");
url = url.Replace("/","%2F");
This led to the following link:
All Items
And now when I click on the link I get a different security warning/error:
A potentially dangerous Request.QueryString value was detected from the client (standardText="...raction 1/𝘣 as the qua...").
I don't see why it is so hard to send character references through a QueryString. Is there a way to prevent Razor from converting all my &s to the &amp ; ? The address works fine when it is just plain "&"s.
Update: using URLDecode() on the string does not affect its character entity references, so when I try to decode the string then re-encode it, I still get the double-escape security warning.
Update: on the suggestion of #MikeMcCaughan, I tried using JS, but I am not very knowledgeable about mixing JS and Razor. I tried creating a link by dropping a script into the body like so:
<script type="text/javascript">
var a = document.createElement('a');
var linkText = document.createTextNode("my title text");
a.appendChild(linkText);
a.title = "my title text";
a.href = encodeURIComponent(#url);
document.body.appendChild(a);
</script>
But no link showed up, so I'm obviously doing it wrong.
For reference, when I try to use #Html.Raw(url),
Link Text
The &s are still turned into &amp ;s. the link renders as:
Link text
One simple solution is replacing the special characters by their encoding which can be accessed from here.
As you can find, replace in the string & with %26 using .replace for string. Also, replace / with %2F, # with %23, ; with %3B, and space with %20.
Also, You can do these in C# by the following function:
Server.URLEncode("<The Url>")
and in Javascript by the following function:
encodeURI("<The Url>")
Also, as you know the double-encoding is this. To prevent the double-encoding, you should have not encoded some part of the string before passing the string into the Server.URLEncode function.

Programmatically get amount of facebook likes for a specific page

I'm building a website in ASP.net/C# and currently I want to get the amount of Facebook likes of a specific page (think of a video/article). I need this value programmatically, because I want to sort on it later, but that's a different story.
I already know the link Facebook itself provides to get this amount, which is posted below.
http://api.facebook.com/method/fql.query?query=select%20like_count%20from%20link_stat%20where%20url=%27http://www.google.com%27
With www.google.com being the website, whose links are being counted and can of course be changed to whichever page one needs.
Does anybody know how I can access the xml file, of the URL/XML file posted above? I've done some research, but I can't seem to find an answer that works for me.
EDIT: I found the answer. I had to navigate through the XML a bit and modify the actual URL used. Working code is posted below.
string result;
string urlToXMLfile, currentURL;
currentURL = Globals.NavigateURL(TabId, "", "CategoryId=" + catId, "MovieId=" + Request.QueryString["MovieId"]);
urlToXMLfile = "https://api.facebook.com/method/fql.query?query=select%20%20like_count%20from%20link_stat%20where%20url=%22";
urlToXMLfile += currentURL;
urlToXMLfile += "%22";
//XDocument xdoc = XDocument.Load(urlToXMLfile);
//string test = xdoc.Descendants(XName.Get("like_count")).First().Value;
XmlDocument doc = new XmlDocument();
doc.Load(urlToXMLfile);
result = doc.FirstChild.NextSibling.InnerText;
return result;
I had same issue once, when I've worked with Selenium. I found that for me it was ok just to get the text representation of that xml and keep it simple string, storing the HTML body in a variable. Which allowed me later to extract the count I need via regex or other algorithm.
I added my own answer below the question. That line of code works and returns a simple String, with the amount of FB likes that page got.
I found a Selenium solution for you, try this:
string pageSource = driver.getPageSource();
and after you get the data, you can do something like:
// Extract the text between the two like_count elements
pattern = "(?i)(<like_count.*?>)(.+?)(</like_count>)";

How to produce a soft return using C#.net

I know this is kind of easy question but i cant seem to find it anywhere. Is there someone out there who knows how to create a soft return inside a set of text using C#.net?
I need to print soft return to a text file/xml file. this text file will be generated using c#.net. you could verify if the answer is correct if you use NOTEPAD++ then enable the option to “View>Show Symbol > Show End of Line” then you will see a symbol like this:
Thanks in advance :)
Not sure what you mean by a soft return. A quick Google search says it's a non-stored line break typically due to word wrapping in which case you wouldn't actually put this in a string, it would only be relevant when the string was rendered for display.
To put a carriage return and/or line feed in the string you would use:
string s = "line one\r\nline two";
And for further reference, here are the other escape codes that you can use.
Link (MSDN Blogs)
In response to your edit
The LF that you see can be represented with \n in a string. Obviously you have a specific line ending sequence that you need to represent. If you were to use Environment.NewLine that is going to give you different results on different platforms.
var message = $"Tom{Convert.ToChar(10)}Harry";
Results in:
Tom
Harry
With just a line feed between.
Lke already mentioned you can use Enviroment.NewLine but I am not sure if that i what you want or if you are actually trying to append a ASCII 141 to your string as mentioned in the comments.
You can add ASCII chr sequences to your string like this.
var myString = new StringBuilder("Foo");
myString.Append((char)141);

multiline textbox to string

I have a multiline textbox that I wish to convert to a string,
I found this
string textBoxValue = textBox1.Text.Replace(Environment.NewLine,"TOKEN");
But dont understand TOKEN what is TOKEN? whitespace or /n newline ?
If this is the incorrect answer then Please let me know of the correct way of doing this
Thanks
In the code snippet you gave, "TOKEN" is any value you wish to insert, such as an HTML <br /> tag, more Environment.NewLines for formatting, or just some random delimiter that will later allow you to split the text on it.
A very simple example:
string text = textBox1.Text.Replace(Environment.NewLine, "^"); // a random token
string[] lines = test.Split( '^' );
If you are handling input from a textbox available on the web, you also need to take into account XSS (http://en.wikipedia.org/wiki/Cross-site_scripting). Also, in a real scenario I would split on a more complex token and make sure to handle multiple carriage returns in the input value.
EDIT: now that I see your actual requirements, this code may do what you need:
// replace newlines with a single whitespace
string text = textBox1.Text.Replace(Environment.NewLine, " ");
EDIT #2:
further I need to enter this data into
SQLite and rewrite his whole
application, The company does not wish
to have information from the previos
application inputted to the new
database, there are hyperlinks etc
inbedded in the content , so if there
is a way I can make the text box only
accept RAW data this would be the
best.
Regular Expressions are the way to go for something like this, unless the data is structured enough to load into an XML or HTML DOM and process. You can build regular expressions in a variety of tools (do a Google search for a free online tester and you will find many). Once you have determined the expressions you need, you can use the Regex object in C# to match, replace, etc.
http://msdn.microsoft.com/en-us/library/ms228595(VS.80).aspx
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.replace(v=VS.100).aspx
As it stands, "TOKEN" is just a meangingless string, unless it is elsewhere in your code? You can replace "TOKEN" with any text you like.
Edit:
Okay, so you say you're removing NewLine's from your client's text. So you would do it like this. Paste their text into a textBox called textBox2, then use the following:
textBox2.Text = textBox2.Text.Replace(Environment.NewLine, string.Empty);

Removing <div>'s from text file?

Ive made a small program in C#.net which doesnt really serve much of a purpose, its tells you the chance of your DOOM based on todays news lol. It takes an RSS on load from the BBC website and will then look for key words which either increment of decrease the percentage chance of DOOM.
Crazy little project which maybe one day the classes will come uin handy to use again for something more important.
I recieve the RSS in an xml format but it contains alot of div tags and formatting characters which i dont really want to be in the database of keywords,
What is the best way of removing these unwanted characters and div's?
Thanks,
Ash
If you want to remove the DIV tags WITH content as well:
string start = "<div>";
string end = "</div>";
string txt = Regex.Replace(htmlString, Regex.Escape(start) + "(?<data>[^" + Regex.Escape(end) + "]*)" + Regex.Escape(end), string.Empty);
Input: <xml><div>junk</div>XXX<div>junk2</div></xml>
Output: <xml>XXX</xml>
IMHO the easiest way is to use regular expressions. Something like:
string txt = Regex.Replace(htmlString, #"<(.|\n)*?>", string.Empty);
Depending on which tags and characters you want to remove you will modify the regex, of course. You will find a lot of material on this and other methods if you do a web search for 'strip html C#'.
SO question Render or convert Html to ‘formatted’ Text (.NET) might help you, too.
Stripping HTML tags from a given string is a common requirement and you can probably find many resources online that do it for you.
The accepted method, however, is to use a Regular expression based Search and Replace. This article provides a good sample along with benchmarks. Another point worth mentioning is that you would require separate Regex based lookups for the different kinds of unwanted characters you are seeing. (Perhaps showing us an example of the HTML you receive would help)
Note that your requirements may vary based on which tags you want to remove. In your question, you only mention DIV tags. If that is the only tag you need to replace, a simple string search and replace should suffice.
A regular expression such as this:
<([A-Z][A-Z0-9]*)\b[^>]*>(.*?)</\1>
Would highlight all HTML tags.
Use this to remove them form your data.

Categories

Resources