Extract multiple values from string using C#

Extract multiple values from string using C# - c#

I'am creating my own forum. I've got problem with quoting messages. I know how to add quoting message into text box, but i cannot figure out how to extract values from string after post. In text box i've got something like this:
[quote IdPost=8] Some quoting text [/quote]
[quote IdPost=15] Second quoting text [/quote]
Could You tell what is the easiest way to extract all "IdPost" numbers from string after posting form ?.

by using a regex
#"\[quote IdPost=(\d+)\]"
something like
Regex reg = new Regex(#"\[quote IdPost=(\d+)\]");
foreach (Match match in reg.Matches(text))
{
...
}

var originalstring = "[quote IdPost=8] Some quoting text [/quote]";
//"[quote IdPost=" and "8] Some quoting text [/quote]"
var splits = originalstring.Split('=');
if(splits.Count() == 2)
{
//"8" and "] Some quoting text [/quote]"
var splits2 = splits[1].Split(']');
int id;
if(int.TryParse(splits2[0], out id))
{
return id;
}
}

I do not know exactly what is your string, but here is a regex-free solution with Substring :
using System;
public class Program
{
public static void Main()
{
string source = "[quote IdPost=8] Some quoting text [/quote]";
Console.WriteLine(ExtractNum(source, "=", "]"));
Console.WriteLine(ExtractNum2(source, "[quote IdPost="));
}
public static string ExtractNum(string source, string start, string end)
{
int index = source.IndexOf(start) + start.Length;
return source.Substring(index, source.IndexOf(end) - index);
}
// just another solution for fun
public static string ExtractNum2(string source, string junk)
{
source = source.Substring(junk.Length, source.Length - junk.Length); // erase start
return source.Remove(source.IndexOf(']')); // erase end
}
}
Demo on DotNetFiddle

Related

How to extract email from html link

Hi I have a csv file which I need to format (columns) email, they are in the csv as follows
john#domain.com"
dave.h#domain22.co.uk"
etc...
So i want to remove " and just use john#domain.com
I have the following
foreach (var clientI in clientImportList)
{
newClient = new DomainObjects.Client();
//Remove unwanted email text??
newClient.Email = clientI.Email
}

I would suggest to use HtmlAgilityPack and not parse it yourself:
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[#href"])
{
string href = link["href"].Value;
// use "mailto:john#domain.com" here..
}

You can test regular expressions here:
https://regex101.com/
Using your example, this seems to work:
mailto:(.*?)\\">
The library needed for regex is:
using System.Text.RegularExpressions;

I usually write myself little utility classes and extensions to handle things like this. Since this probably won't be the last time you have to do something like this you could do this:
Create an Extension of the string class:
public static class StringExtensions
{
public static string ExtractMiddle(this string text, string front, string back)
{
text = text.Substring(text.IndexOf(front) + 1);
return text.Remove(text.IndexOf(back));
}
}
And then do this(Could use better naming, but you get the point):
string emailAddress = text.ExtractMiddle(">", "<");

If you want to do it the index way, something like:
const string start = "<a href=\\mailto:";
const string end = "\\\">";
string asd1 = "john#domain.com\"";
int index1 = asd1.IndexOf(start);
int startPosition = index1 + start.Length;
int endPosition = asd1.IndexOf(end);
string email = asd1.Substring(startPosition, endPosition - startPosition);

Ignore special characters in Examine

In Umbraco, I use Examine to search in the website but the content is in french. Everything works fine except when I search for "Français" it's not the same result as "Francais". Is there a way to ignore those french characters? I try to find a FrenchAnalyser for Leucene/Examine but did not found anything. I use Fuzzy so it return results even if the words is not the same.
Here's the code of my search :
public static ISearchResults Search(string searchTerm)
{
var provider = ExamineManager.Instance.SearchProviderCollection["ExternalSearcher"];
var criteria = provider.CreateSearchCriteria(BooleanOperation.Or);
var crawl = criteria.GroupedOr(BoostedSearchableFields, searchTerm.Boost(15))
.Or().GroupedOr(BoostedSearchableFields, searchTerm.Fuzzy(Fuzziness))
.Or().GroupedOr(SearchableFields, searchTerm.Fuzzy(Fuzziness))
.Not().Field("umbracoNavHide", "1");
return provider.Search(crawl.Compile());
}

We ended up using a custom analyer based on the SnowballAnalyzer
public class CustomAnalyzer : SnowballAnalyzer
{
public CustomAnalyzer() : base("French") { }
public override TokenStream TokenStream(string fieldName, TextReader reader)
{
TokenStream result = base.TokenStream(fieldName, reader);
result = new ISOLatin1AccentFilter(result);
return result;
}
}

Try using Regex like this below:
var strInput ="Français";
var strToReplace = string.Empty;
var sNewString = Regex.Replace(strInput, "[^A-Za-z0-9]", strToReplace);
I've used this pattern "[^A-Za-z0-9]" to replace all non-alphanumeric string with a blank.
Hope it helps.

You can actually convert the unicode characters with diacritics to english equivalents using the following method. That will enable you to search for "Français" with the search term "Francais".
public static string RemoveDiacritics(this string text)
{
if (string.IsNullOrWhiteSpace(text))
return text;
text = text.Normalize(NormalizationForm.FormD);
var chars = text.Where(c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark).ToArray();
return new string(chars).Normalize(NormalizationForm.FormC);
}
Use it on any string like this:
var converted = unicodeString.RemoveDiacritics();

How to use replace only the first occurence of www

In my code behind in C# I have the following code. How do I change the replace so that only
the first occurance of www is replaced?
For example if the User enters www.testwww.com then I should be saving it as testwww.com.
Currently as per the below code it saves as www.com (guess due to substr code).
Please help. Thanks in advance.
private string FilterUrl(string url)
{
string lowerCaseUrl = url.ToLower();
lowerCaseUrl = lowerCaseUrl.Replace("http://", string.Empty).Replace("https://", string.Empty).Replace("ftp://", string.Empty);
lowerCaseUrl = lowerCaseUrl.Replace("www.", string.Empty);
string lCaseUrl = url.Substring(url.Length - lowerCaseUrl.Length, lowerCaseUrl.Length);
return lCaseUrl;
}

As Ally suggested. You are much better off using System.Uri. This also replaces the leading www as you wish.
private string FilterUrl(string url)
{
Uri uri = new UriBuilder(url).Uri; // defaults to http:// if missing
return Regex.Replace(uri.Host, "^www.", "") + uri.PathAndQuery;
}
Edit: The trailing slash is because of the PathAndQuery property. If there was no path you are left with the slash only. Just add another regex replace or string replace. Here's the regex way.
return Regex.Replace(uri.Host, "^www.", "") + Regex.Replace(uri.PathAndQuery, "/$", "");

I would suggest using indexOf(string) to find the first occurrence.
Edit: okay someone beat me to it ;)

You could use IndexOf like Felipe suggested OR do it the low tech way..
lowerCaseUrl = lowerCaseUrl.Replace("http://", string.Empty).Replace("https://", string.Empty).Replace("ftp://", string.Empty).Replace("http://www.", string.Empty).Replace("https://www.", string.Empty)
Would be interested to know what you're trying to achieve.

Came up with a cool static method, also works for replacing the first x occurrences:
public static string ReplaceOnce(this string s, string replace, string with)
{
return s.ReplaceCount(replace, with);
}
public static string ReplaceCount(this string s, string replace, string with, int howManytimes = 1)
{
if (howManytimes < 0) throw InvalidOperationException("can not replace a string less than zero times");
int count = 0;
while (s.Contains(replace) && count < howManytimes)
{
int position = s.IndexOf(replace);
s = s.Remove(position, replace.Length);
s = s.Insert(position, with);
count++;
}
return s;
}
The ReplaceOnce isn't necessary, just a simplifier. Call it like this:
string url = "http://www.stackoverflow.com/questions/www/www";
var urlR1 - url.ReplaceOnce("www", "xxx");
// urlR1 = "http://xxx.stackoverflow.com/questions/www/www";
var urlR2 - url.ReplaceCount("www", "xxx", 2);
// urlR2 = "http://xxx.stackoverflow.com/questions/xxx/www";
NOTE: this is case-sensitive as it is written

The Replace method will change all content of the string. You have to locate the piece you want to remove using IndexOf method, and remove using Remove method of string. Try something like this:
//include the namespace
using System.Globalization;
private string FilterUrl(string url)
{
// ccreate a Comparer object.
CompareInfo myCompare = CultureInfo.InvariantCulture.CompareInfo;
// find the 'www.' on the url parameter ignoring the case.
int position = myCompare.IndexOf(url, "www.", CompareOptions.IgnoreCase);
// check if exists 'www.' on the string.
if (position > -1)
{
if (position > 0)
url = url.Remove(position - 1, 5);
else
url = url.Remove(position, 5);
}
//if you want to remove http://, https://, ftp://.. keep this line
url = url.Replace("http://", string.Empty).Replace("https://", string.Empty).Replace("ftp://", string.Empty);
return url;
}
Edits
There was a part in your code that is removing a piece of string. If you just want to remove the 'www.' and 'http://', 'https://', 'ftp://', take a look the this code.
This code also ignore the case when it compares the url parameter and what you have been findind, on case, 'www.'.

Beginner logic development

I am a beginner programmer in C# who just got started. I have a task at hand where a program needs to read a string and perform some string manipulation. The UI provides a TextBox and all the options below as CheckBoxes. User can select any or all.
Remove any spaces.
Remove any special chars like ',' etc.
Remove any numbers.
Convert to camelCase.
There can be more options as part of the string cleanup. I have wrttten the string processing in a method, that has a chasm of if ... else ifs ...
I am sure there is a way around.
Appreciate any help.
Thanks for all the solutions, but I think my point did was not put across correctly.
The string processing will be done in a particular order depending on the checkbox value.
User might select just one or every option provided. In case there is more than one selected, it should be like
if(RemoveSpaces.checked)
{
RemoveSpaces(string inputString);
// After removing spaces do the other operations
}
else if (RemoveSpecialChars.checked)
{
RemoveSpecialChars(string inputString);
// Do other processing
}

For easy String manipulation, use String.replace
See String.replace
This code example might also help:
string start = "a b 3 4 5.7";
string noSpace = start.Replace(" ", "");
string noDot = noSpace.Replace(".", "");
string noNumbers = Regex.Replace(noDot, "[0-9]", "");
Console.WriteLine(start);
Console.WriteLine(noSpace);
Console.WriteLine(noDot);
Console.WriteLine(noNumbers);
The output will then be as follows
"a b 3 4 5.7" // start
"ab345.7" // noSpace
"ab3457" // noDot
"ab" // noNumbers

You can make some class and 4 functions inside. for example:
public static class StringOperations
{
public static string RemoveSpaces(string sourceString)
{
string convertedString = "";
//some operations
return convertedString;
}
public static string RemoveCharacters(string sourceString, params char[] charactersToRemove)
{
string convertedString = "";
//some operations
return convertedString;
}
public static string RemoveAnyNumbers(string sourceString)
{
string convertedString = "";
//some operations
return convertedString;
}
public static string ConvertToCamelCase(string sourceString)
{
string convertedString = "";
//some operations
return convertedString;
}
}
In Your UI you just call one of functions...

C# 2.0 function which will return the formatted string

I am using C# 2.0 and I have got below type of strings:
string id = "tcm:481-191820"; or "tcm:481-191820-32"; or "tcm:481-191820-8"; or "tcm:481-191820-128";
The last part of string doesn't matter i.e. (-32,-8,-128), whatever the string is it will render below result.
Now, I need to write one function which will take above string as input. something like below and will output as "tcm:0-481-1"
public static string GetPublicationID(string id)
{
//this function will return as below output
return "tcm:0-481-1"
}
Please suggest!!

If final "-1" is static you could use:
public static string GetPublicationID(string id)
{
int a = 1 + id.IndexOf(':');
string first = id.Substring(0, a);
string second = id.Substring(a, id.IndexOf('-') - a);
return String.Format("{0}0-{1}-1", first, second);
}
or if "-1" is first part of next token, try this
public static string GetPublicationID(string id)
{
int a = 1 + id.IndexOf(':');
string first = id.Substring(0, a);
string second = id.Substring(a, id.IndexOf('-') - a + 2);
return String.Format("{0}0-{1}", first, second);
}
This syntax works even for different length patterns, assuming that your string is
first_part:second_part-anything_else

All you need is:
string.Format("{0}0-{1}", id.Substring(0,4), id.Substring(4,5));
This just uses substring to get the first four characters and then the next five and put them into the format with the 0- in there.
This does assume that your format is a fixed number of characters in each position (which it is in your example). If the string might be abcd:4812... then you will have to modify it slightly to pick up the right length of strings. See Marco's answer for that technique. I'd advise using his if you need the variable length and mine if the lengths stay the same.
Also as an additional note your original function of returning a static string does work for all of those examples you provided. I have assumed there are other numbers visible but if it is only the suffix that changes then you could happily use a static string (at which point declaring a constant or something rather than using a method would probably work better).

Obligatory Regular Expression Answer:
using System.Text.RegularExpressions;
public static string GetPublicationID(string id)
{
Match m = RegEx.Match(#"tcm:([\d]+-[\d]{1})", id);
if(m.Success)
return string.Format("tcm:0-{0}", m.Groups[1].Captures[0].Value.ToString());
else
return string.Empty;
}

Regex regxMatch = new Regex("(?<prefix>tcm:)(?<id>\\d+-\\d)(?<suffix>.)*",RegexOptions.Singleline|RegexOptions.Compiled);
string regxReplace = "${prefix}0-${id}";
string GetPublicationID(string input) {
return regxMatch.Replace(input, regxReplace);
}
string test = "tcm:481-191820-128";
stirng result = GetPublicationID(test);
//result: tcm:0-481-1

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Extract multiple values from string using C# - c#

by using a regex #"\[quote IdPost=(\d+)\]" something like Regex reg = new Regex(#"\[quote IdPost=(\d+)\]"); foreach (Match match in reg.Matches(text)) { ... }

Related

How to extract email from html link

Ignore special characters in Examine

How to use replace only the first occurence of www

Beginner logic development

C# 2.0 function which will return the formatted string

Categories

Resources