c# Parse URL in text

c# Parse URL in text - c#

I have a sentence that may contain URL's. I need to take any URL in uppercase that starts with WWW., and append HTTP://. I have tried the following:
private string ParseUrlInText(string text)
{
string currentText = text;
foreach (string word in currentText.Split(new[] { "\r\n", "\n", " ", "</br>" }, StringSplitOptions.RemoveEmptyEntries))
{
string thing;
if (word.ToLower().StartsWith("www."))
{
if (IsAllUpper(word))
{
thing = "HTTP://" + word;
currentText = ReplaceFirst(currentText, word, thing);
}
}
}
return currentText;
}
public string ReplaceFirst(string text, string search, string replace)
{
int pos = text.IndexOf(search);
if (pos < 0)
{
return text;
}
return text.Substring(0, pos) + replace + text.Substring(pos + search.Length);
}
private static bool IsAllUpper(string input)
{
return input.All(t => !Char.IsLetter(t) || Char.IsUpper(t));
}
However its only appending multiple HTTP:// to the first URL using the following:
WWW.GOOGLE.CO.ZA
WWW.GOOGLE.CO.ZA WWW.GOOGLE.CO.ZA
HTTP:// WWW.GOOGLE.CO.ZA
there are a lot of domains (This shouldn't be parsed)
to
HTTP:// WWW.GOOGLE.CO.ZA
HTTP:// WWW.GOOGLE.CO.ZA HTTP:// WWW.GOOGLE.CO.ZA
HTTP:// WWW.GOOGLE.CO.ZA
there are a lot of domains (This shouldn't be parsed)
Please could someone show me the proper way to do this
Edit: I need to keep the format of the string (Spaces, newlines etc)
Edit2: A url might have an HTTP:// appended. I've updated the demo.

The issue with your code: you're using a ReplaceFirst method, which does exactly what it's meant to: it replaces the first occurence, which is obviously not always the one you want to replace. This is why only your first WWW.GOOGLE.CO.ZA get all the appending of HTTP://.
One method would be to use a StreamReader or something, and each time you get to a new word, you check if it's four first characters are "WWW." and insert at this position of the reader the string "HTTP://". But it's pretty heavy lenghted for something that can be way shorter...
So let's go Regex!
How to insert characters before a word with Regex
Regex.Replace(input, #"[abc]", "adding_text_before_match$1");
How to match words not starting with another word:
(?<!wont_start_with_that)word_to_match
Which leads us to:
private string ParseUrlInText(string text)
{
return Regex.Replace(text, #"(?<!HTTP://)(WWW\.[A-Za-z0-9_\.]+)",
#"HTTP://$1");
}

I'd go for the following:
1) You don't handle same elements twice,
2) You replace all instances once
private string ParseUrlInText(string text)
{
string currentText = text;
var workingText = currentText.Split(new[] { "\r\n", "\n", " ", "</br>" },
StringSplitOptions.RemoveEmptyEntries).Distinct() // .Distinct() gives us just unique entries!
foreach (string word in workingText)
{
string thing;
if (word.ToLower().StartsWith("www."))
{
if (IsAllUpper(word))
{
thing = "HTTP://" + word;
currentText = currentText.Replace("\r\n" + word, "\r\n" + thing)
.Replace("\n" + word, "\n" + thing)
.Replace(" " + word, " " + thing)
.Replace("</br>" + word, "</br>" + thing)
}
}
}
return currentText;
}

Related

Regex catch string between strings

I created a small function to catch a string between strings.
public static string[] _StringBetween(string sString, string sStart, string sEnd)
{
if (sStart == "" && sEnd == "")
{
return null;
}
string sPattern = sStart + "(.*?)" + sEnd;
MatchCollection rgx = Regex.Matches(sString, sPattern);
if (rgx.Count < 1)
{
return null;
}
string[] matches = new string[rgx.Count];
for (int i = 0; i < matches.Length; i++)
{
matches[i] = rgx[i].ToString();
//MessageBox.Show(matches[i]);
}
return matches;
}
However if i call my function like this: _StringBetween("[18][20][3][5][500][60]", "[", "]");
It will fail. A way would be if i changed this line string sPattern = "\\" + sStart + "(.*?)" + "\\" + sEnd;
However i can not because i dont know if the character is going to be a bracket or a word.
Sorry if this is a stupid question but i couldn't find something similar searching.

A way would be if i changed this line string sPattern = "\\" + sStart + "(.*?)" + "\\" + sEnd; However i can not because i don't know if the character is going to be a bracket or a word.
You can escape all meta-characters by calling Regex.Escape:
string sPattern = Regex.Escape(sStart) + "(.*?)" + Regex.Escape(sEnd);
This would cause the content of sStart and sEnd to be interpreted literally.

Insert at index of search term substring

I am trying to highlight search terms in display results. Generally it works OK based on code found here on SO. My issue with it is that it replaces the substring with the search term, i.e. in this example it will replace "LOVE" with "love" (unacceptable). So I was thinking I probably want to find the index of the start of the substring, do an INSERT of the opening <span> tag, and do similar at the end of the substring. As yafs may be quite long I'm also thinking I need to integrate stringbuilder into this. Is this do-able, or is there a better way? As always, thank you in advance for your suggestions.
string yafs = "Looking for LOVE in all the wrong places...";
string searchTerm = "love";
yafs = yafs.ReplaceInsensitive(searchTerm, "<span style='background-color: #FFFF00'>"
+ searchTerm + "</span>");

how about this:
public static string ReplaceInsensitive(string yafs, string searchTerm) {
return Regex.Replace(yafs, "(" + searchTerm + ")", "<span style='background-color: #FFFF00'>$1</span>", RegexOptions.IgnoreCase);
}
update:
public static string ReplaceInsensitive(string yafs, string searchTerm) {
return Regex.Replace(yafs,
"(" + Regex.Escape(searchTerm) + ")",
"<span style='background-color: #FFFF00'>$1</span>",
RegexOptions.IgnoreCase);
}

Check this code
private static string ReplaceInsensitive(string text, string oldtext,string newtext)
{
int indexof = text.IndexOf(oldtext,0,StringComparison.InvariantCultureIgnoreCase);
while (indexof != -1)
{
text = text.Remove(indexof, oldtext.Length);
text = text.Insert(indexof, newtext);
indexof = text.IndexOf(oldtext, indexof + newtext.Length ,StringComparison.InvariantCultureIgnoreCase);
}
return text;
}

Does what you need:
static void Main(string[] args)
{
string yafs = "Looking for LOVE in all the wrong love places...";
string searchTerm = "LOVE";
Console.Write(ReplaceInsensitive(yafs, searchTerm));
Console.Read();
}
private static string ReplaceInsensitive(string yafs, string searchTerm)
{
StringBuilder sb = new StringBuilder();
foreach (string word in yafs.Split(' '))
{
string tempStr = word;
if (word.ToUpper() == searchTerm.ToUpper())
{
tempStr = word.Insert(0, "<span style='background-color: #FFFF00'>");
int len = tempStr.Length;
tempStr = tempStr.Insert(len, "</span>");
}
sb.AppendFormat("{0} ", tempStr);
}
return sb.ToString();
}
Gives:
Looking for < span style='background-color: #FFFF00'>LOVE< /span> in all the wrong < span style='background-color: #FFFF00'>love< /span> places...

Replace closest instance of a word

I have a string like this:
“I’m a member of the Imperial Senate on a diplomatic mission to Alderaan.”
I want to insert <strong> around the "a" in "a diplomatic", but nowhere else.
What I have as input is diplomatic from a previous function, and I wan't to add <strong>to the closest instance of "a".
Right now, of course when I use .Replace("a", "<strong>a</strong>"), every single instance of "a" receives the <strong>-treatment, but is there any way to apply this to just to one I want?
Edit
The string and word/char ("a" in the case above) could be anything, as I'm looping through a lot of these, so the solution has to be dynamic.

var stringyourusing = "";
var letter = "";
var regex = new Regex(Regex.Escape(letter));
var newText = regex.Replace(stringyourusing , "<strong>letter</strong>", 1);

Would this suffice?
string MakeStrongBefore(string strong, string before, string s)
{
return s.Replace(strong + " " + subject, "<strong>" + strong + "</strong> " + before);
}
Used like this:
string s = “I’m a member of the Imperial Senate on a diplomatic mission to Alderaan.”;
string bolded = MakeStrongBefore("a", "diplomatic", s);

Try this:
public string BoldBeforeString(string source, string bolded,
int boldBeforePosition)
{
string beforeSelected = source.Substring(0, boldBeforePosition).TrimEnd();
int testedWordStartIndex = beforeSelected.LastIndexOf(' ') + 1;
string boldedString;
if (beforeSelected.Substring(testedWordStartIndex).Equals(bolded))
{
boldedString = source.Substring(0, testedWordStartIndex) +
"<strong>" + bolded + "</strong>" +
source.Substring(testedWordStartIndex + bolded.Length);
}
else
{
boldedString = source;
}
return boldedString;
}
string phrase = "I’m a member of the Imperial Senate on a diplomatic mission to Alderaan.";
string boldedPhrase = BoldBeforeString(phrase, "a", 41);

Hei!
I've tested this and it works:
String replaced = Regex.Replace(
"I’m a member of the Imperial Senate on a diplomatic mission to Alderaan.",
#"(a) diplomatic",
match => "<strong>" + match.Result("$1") + "</strong>");
So to make it a general function:
public static String StrongReplace(String sentence, String toStrong, String wordAfterStrong)
{
return Regex.Replace(
sentence,
#"("+Regex.Escape(toStrong)+") " + Regex.Escape(wordAfterStrong),
match => "<strong>" + match.Result("$1") + "</strong>");
}
Usage:
String sentence = "I’m a member of the Imperial Senate on a diplomatic mission to Alderaan.";
String replaced = StrongReplace(sentence, "a", "diplomatic");
edit:
considering your other comments, this is a function for placing strong tags around each word surrounding the search word:
public static String StrongReplace(String sentence, String word)
{
return Regex.Replace(
sentence,
#"(\w+) " + Regex.Escape(word) + #" (\w+)",
match => "<strong>" + match.Result("$1") + "</strong> " + word + " <strong>" + match.Result("$2") + "</strong>");
}

String: replace last ".something" in a string?

I have some string and I would like to replace the last .something with a new string. As example:
string replace = ".new";
blabla.test.bla.text.jpeg => blabla.test.bla.text.new
testfile_this.00001...csv => testfile_this.00001...new
So it doesn't matter how many ..... there are, I'd like to change only the last one and the string what after the last . is coming.
I saw in C# there is Path.ChangeExtension but its only working in a combination with a File - Is there no way to use this with a string only? Do I really need regex?

string replace = ".new";
string p = "blabla.test.bla.text.jpeg";
Console.WriteLine(Path.GetFileNameWithoutExtension(p) + replace);
Output:
blabla.test.bla.text.new

ChangeExtension should work as advertised;
string replace = ".new";
string file = "testfile_this.00001...csv";
file = Path.ChangeExtension(file, replace);
>> testfile_this.00001...new

You can use string.LastIndexOf('.');
string replace = ".new";
string test = "blabla.test.bla.text.jpeg";
int pos = test.LastIndexOf('.');
if(pos >= 0)
string newString = test.Substring(0, pos-1) + replace;
of course some checking is required to be sure that LastIndexOf finds the final point.
However, seeing the other answers, let me say that, while Path.ChangeExtension works, it doesn't feel right to me to use a method from a operating system dependent file handling class to manipulate a string. (Of course, if this string is really a filename, then my objection is invalid)

string s = "blabla.test.bla.text.jpeg";
s = s.Substring(0, s.LastIndexOf(".")) + replace;

No you don't need regular expressions for this. Just .LastIndexOf and .Substring will suffice.
string replace = ".new";
string input = "blabla.bla.test.jpg";
string output = input.Substring(0, input.LastIndexOf('.')) + replace;
// output = "blabla.bla.test.new"

Please use this function.
public string ReplaceStirng(string originalSting, string replacedString)
{
try
{
List<string> subString = originalSting.Split('.').ToList();
StringBuilder stringBuilder = new StringBuilder();
for (int i = 0; i < subString.Count - 1; i++)
{
stringBuilder.Append(subString[i]);
}
stringBuilder.Append(replacedString);
return stringBuilder.ToString();
}
catch (Exception ex)
{
if (log.IsErrorEnabled)
log.Error("[" + System.DateTime.Now.ToString() + "] " + System.Reflection.MethodBase.GetCurrentMethod().DeclaringType.FullName + " :: " + System.Reflection.MethodBase.GetCurrentMethod().Name + " :: ", ex);
throw;
}
}

How to trim whitespace between characters

How to remove whitespaces between characters in c#?
Trim() can be used to remove the empty spaces at the beginning of the string as well as at the end. For example " C Sharp ".Trim() results "C Sharp".
But how to make the string into CSharp? We can remove the space using a for or a for each loop along with a temporary variable. But is there any built in method in C#(.Net framework 3.5) to do this like Trim()?

You could use String.Replace method
string str = "C Sharp";
str = str.Replace(" ", "");
or if you want to remove all whitespace characters (space, tabs, line breaks...)
string str = "C Sharp";
str = Regex.Replace(str, #"\s", "");

If you want to keep one space between every word. You can do it this way as well:
string.Join(" ", inputText.Split(new char[0], StringSplitOptions.RemoveEmptyEntries).ToList().Select(x => x.Trim()));

Use String.Replace to replace all white space with nothing.
eg
string newString = myString.Replace(" ", "");

if you want to remove all spaces in one word:
input.Trim().Replace(" ","")
And If you want to remove extra spaces in the sentence, you should use below:
input.Trim().Replace(" +","")
the regex " +", would check if there is one ore more following space characters in the text and replace them with one space.

If you want to keep one space between every word. this should do it..
public static string TrimSpacesBetweenString(string s)
{
var mystring =s.RemoveTandNs().Split(new string[] {" "}, StringSplitOptions.None);
string result = string.Empty;
foreach (var mstr in mystring)
{
var ss = mstr.Trim();
if (!string.IsNullOrEmpty(ss))
{
result = result + ss+" ";
}
}
return result.Trim();
}
it will remove the string in between the string
so if the input is
var s ="c sharp";
result will be "c sharp";

//Remove spaces from a string just using substring method and a for loop
static void Main(string[] args)
{
string businessName;
string newBusinessName = "";
int i;
Write("Enter a business name >>> ");
businessName = ReadLine();
for(i = 0; i < businessName.Length; i++)
{
if (businessName.Substring(i, 1) != " ")
{
newBusinessName += businessName.Substring(i, 1);
}
}
WriteLine("A cool web site name could be www.{0}.com", newBusinessName);
}

var str=" c sharp "; str = str.Trim();
str = Regex.Replace(str, #"\s+", " "); ///"c sharp"

string myString = "C Sharp".Replace(" ", "");

I found this method great for doing things like building a class that utilizes a calculated property to take lets say a "productName" and stripping the whitespace out to create a URL that will equal an image that uses the productname with no spaces. For instance:
namespace XXX.Models
{
public class Product
{
public int ProductID { get; set; }
public string ProductName { get; set; }
public string ProductDescription { get; set; }
public string ProductImage
{
get { return ProductName.Replace(" ", string.Empty) + ".jpg"; }
}
}
}
So in this answer I have used a very similar method as w69rdy, but used it in an example, plus I used string.Empty instead of "". And although after .Net 2.0 there is no difference, I find it much easier to read and understand for others who might need to read my code. I also prefer this because I sometimes get lost in all the quotes I might have in a code block.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

c# Parse URL in text - c#

Related

Regex catch string between strings

Insert at index of search term substring

Replace closest instance of a word

String: replace last ".something" in a string?

How to trim whitespace between characters

Categories

Resources