Cutting text from string after matching pattern - c#

I would like cut all text after <.br> before next <.br> and after the last <.br>, example:
string example1 = "some example<br>text1<br>text2";
//do the magic
int match_count = 2;
string match1 = "text1";
string match2 = "text2";
it's hard to explain this without showing an actual example ;)
is there an easy way to accomplish this with regex?
P.S. few more examples of usage:
string example1 = "some example<br>text1";
int match_count = 1;
string match1 = "text1";
and
string example2 = "some example";
int match_count = 0;

One possibility that does not require regular expresions, would be to use one of the String.Split overloads:
var input = #"some example<br>text1<br>text2";
// split on every <br>
var chunks = input.Split(new[] { "<br>" }, StringSplitOptions.RemoveEmptyEntries);
// remove the first entry, everything else is wanted result
foreach (var chunk in chunks.Skip(1))
{
Console.WriteLine(chunk);
}
The output is:
text1
text2
You could then easily check if you have any matches using the Count or Length on the array.

For match_count, you can use just String.Split method like;
string example1 = "some example<br>text1<br>text2";
int match_count = example1.Split(new[] { "<br>" },
StringSplitOptions.RemoveEmptyEntries
.Count() - 1;
For getting text between tags, take a look at this question;
Get innertext between two tags - VB.NET - HtmlAgilityPack
It is in vb.net but you can easyly convert it to c#.

Related

How do I extract text that lies between two indicators?

I have a long string with a number of "merge fields", all of the merge fields will be in the following format: <<FieldName>>.
The string will have multiple merge fields of different type i.e. <<FirstName>>, <<LastName>>
How can I loop through the string and find all the merge fields so that I can replace the field with the text?
I will not know all the different Merge fields in the string, the user may enter anything between the two indicators i.e. <<Anything>>
I ideally would like to stay away from any regex but happy to explore all options.
RegularExpression makes most sense here
string text = "foo <<FieldName>> foo foo <<FieldName>> foo";
string result = Regex.Replace(text, #"[<]{2}\w*[>]{2}", "bar", RegexOptions.None);
UPDATE without RegEx - after the question got updated:
Dictionary<string, string> knownFields = new Dictionary<string, string> { {"<<FirstName>>", "Jon"}, {"<<LastName>>", "Doe"}, {"<<Job>>", "Programmer"}};
string text = "Hello my name is <<FirstName>> <<LastName>> and i work as a <<Job>>";
knownFields.ToList().ForEach(x => text = text.Replace(x.Key, x.Value));
I know you said you want to avoid regular expressions, but it's the right tool for the job.
Dictionary<string,string> fieldToReplacement = new Dictionary<string,string> {
{"<<FirstName>>", "Frank"},
{"<<LastName>>", "Jones"},
{"<<Salutation>>", "Mr."}
};
string text = "Dear <<Salutation>> <<FirstName>> <<LastName>>, thanks for using RegExes when applicable. You're the best <<FirstName>>!!";
string newText = Regex.Replace(text, "<<.+?>>", match => {
return fieldToReplacement[match.Value];
});
Console.WriteLine(newText);
https://dotnetfiddle.net/HPfHph
As #Alex K. wrote, you need to search for the indices of the start and end tag, like so:
class Program {
static void Main(string[] args) {
string text = "<<FieldName>>";
const string startTag = "<<";
const string endTag = ">>";
int offset = 0;
int startIndex = text.IndexOf(startTag, offset);
if(startIndex >= 0) {
int endIndex = text.IndexOf(endTag, startIndex + startTag.Length);
if(endIndex >= 0) {
Console.WriteLine(text.Substring(startIndex + startTag.Length, endIndex - endTag.Length));
//prints "FieldName"
}
}
Console.ReadKey();
}
}

Trim a string in c# after special character

I want to trim a string after a special character..
Lets say the string is str="arjunmenon.uking". I want to get the characters after the . and ignore the rest. I.e the resultant string must be restr="uking".
How about:
string foo = str.EverythingAfter('.');
using:
public static string EverythingAfter(this string value, char c)
{
if(string.IsNullOrEmpty(value)) return value;
int idx = value.IndexOf(c);
return idx < 0 ? "" : value.Substring(idx + 1);
}
you can use like
string input = "arjunmenon.uking";
int index = input.LastIndexOf(".");
input = input.Substring(index+1, input.Split('.')[1].ToString().Length );
Use Split function
Try this
string[] restr = str.Split('.');
//restr[0] contains arjunmenon
//restr[1] contains uking
char special = '.';
var restr = str.Substring(str.IndexOf(special) + 1).Trim();
Try Regular Expression Language
using System.IO;
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
string input = "arjunmenon.uking";
string pattern = #"[a-zA-Z0-9].*\.([a-zA-Z0-9].*)";
foreach (Match match in Regex.Matches(input, pattern))
{
Console.WriteLine(match.Value);
if (match.Groups.Count > 1)
for (int ctr = 1; ctr < match.Groups.Count; ctr++)
Console.WriteLine(" Group {0}: {1}", ctr, match.Groups[ctr].Value);
}
}
}
Result:
arjunmenon.uking
Group 1: uking
Personally, I won't do the split and go for the index[1] in the resulting array, if you already know that your correct stuff is in index[1] in the splitted string, then why don't you just declare a constant with the value you wanted to "extract"?
After you make a Split, just get the last item in the array.
string separator = ".";
string text = "my.string.is.evil";
string[] parts = text.Split(separator);
string restr = parts[parts.length - 1];
The variable restr will be = "evil"
string str = "arjunmenon.uking";
string[] splitStr = str.Split('.');
string restr = splitStr[1];
Not like the methods that uses indexes, this one will allow you not to use the empty string verifications, and the presence of your special caracter, and will not raise exceptions when having empty strings or string that doesn't contain the special caracter:
string str = "arjunmenon.uking";
string restr = str.Split('.').Last();
You may find all the info you need here : http://msdn.microsoft.com/fr-fr/library/b873y76a(v=vs.110).aspx
cheers
I think the simplest way will be this:
string restr, str = "arjunmenon.uking";
restr = str.Substring(str.LastIndexOf('.') + 1);

How to split string that delimiters remain in the end of result?

I have several delimiters. For example {del1, del2, del3 }.
Suppose I have text : Text1 del1 text2 del2 text3 del3
I want to split string in such way:
Text1 del1
text2 del2
text3 del3
I need to get array of strings, when every element of array is texti deli.
How can I do this in C# ?
String.Split allows multiple split-delimeters. I don't know if that fits your question though.
Example :
String text = "Test;Test1:Test2#Test3";
var split = text.Split(';', ':', '#');
//split contains an array of "Test", "Test1", "Test2", "Test3"
Edit: you can use a regex to keep the delimeters.
String text = "Test;Test1:Test2#Test3";
var split = Regex.Split(text, #"(?<=[;:#])");
// contains "Test;", "Test1:", "Test2#","Test3"
This should do the trick:
const string input = "text1-text2;text3-text4-text5;text6--";
const string matcher= "(-|;)";
string[] substrings = Regex.Split(input, matcher);
StringBuilder builder = new StringBuilder();
foreach (string entry in substrings)
{
builder.Append(entry);
}
Console.Out.WriteLine(builder.ToString());
note that you will receive empty strings in your substring array for the matches for the two '-';s at the end, you can choose to ignore or do what you like with those values.
You could use a regex. For a string like this "text1;text2|text3^" you could use this:
(.*;|.*\||.*\^)
Just add more alternative pattens for each delimiter.
If you want to keep the delimiter when splitting the string you can use the following:
string[] delimiters = { "del1", "del2", "del3" };
string input = "text1del1text2del2text3del3";
string[] parts = input.Split(delimiters, StringSplitOptions.RemoveEmptyEntries);
for(int index = 0; index < parts.Length; index++)
{
string part = parts[index];
string temp = input.Substring(input.IndexOf(part) + part.Length);
foreach (string delimter in delimiters)
{
if ( temp.IndexOf(delimter) == 0)
{
parts[index] += delimter;
break;
}
}
}
parts will then be:
[0] "text1del1"
[1] "text2del2"
[2] "text3del3"
As #Matt Burland suggested, use Regex
List<string> values = new List<string>();
string s = "abc123;def456-hijk,";
Regex r = new Regex(#"(.*;|.*-|.*,)");
foreach(Match m in r.Matches(s))
values.Add(m.Value);

Replace placeholders in order

I have a part of a URL like this:
/home/{value1}/something/{anotherValue}
Now i want to replace all between the brackets with values from a string-array.
I tried this RegEx pattern: \{[a-zA-Z_]\} but it doesn't work.
Later (in C#) I want to replace the first match with the first value of the array, second with the second.
Update: The /'s cant be used to separate. Only the placeholders {...} should be replaced.
Example: /home/before{value1}/and/{anotherValue}
String array: {"Tag", "1"}
Result: /home/beforeTag/and/1
I hoped it could works like this:
string input = #"/home/before{value1}/and/{anotherValue}";
string pattern = #"\{[a-zA-Z_]\}";
string[] values = {"Tag", "1"};
MatchCollection mc = Regex.Match(input, pattern);
for(int i, ...)
{
mc.Replace(values[i];
}
string result = mc.GetResult;
Edit:
Thank you Devendra D. Chavan and ipr101,
both solutions are greate!
You can try this code fragment,
// Begin with '{' followed by any number of word like characters and then end with '}'
var pattern = #"{\w*}";
var regex = new Regex(pattern);
var replacementArray = new [] {"abc", "cde", "def"};
var sourceString = #"/home/{value1}/something/{anotherValue}";
var matchCollection = regex.Matches(sourceString);
for (int i = 0; i < matchCollection.Count && i < replacementArray.Length; i++)
{
sourceString = sourceString.Replace(matchCollection[i].Value, replacementArray[i]);
}
[a-zA-Z_] describes a character class. For words, you'll have to add * at the end (any number of characters within a-zA-Z_.
Then, to have 'value1' captured, you'll need to add number support : [a-zA-Z0-9_]*, which can be summarized with: \w*
So try this one : {\w*}
But for replacing in C#, string.Split('/') might be easier as Fredrik proposed. Have a look at this too
You could use a delegate, something like this -
string[] strings = {"dog", "cat"};
int counter = -1;
string input = #"/home/{value1}/something/{anotherValue}";
Regex reg = new Regex(#"\{([a-zA-Z0-9]*)\}");
string result = reg.Replace(input, delegate(Match m) {
counter++;
return "{" + strings[counter] + "}";
});
My two cents:
// input string
string txt = "/home/{value1}/something/{anotherValue}";
// template replacements
string[] str_array = { "one", "two" };
// regex to match a template
Regex regex = new Regex("{[^}]*}");
// replace the first template occurrence for each element in array
foreach (string s in str_array)
{
txt = regex.Replace(txt, s, 1);
}
Console.Write(txt);

How can I parse a string into different strings in C#?

I just started work on a C# application. I need to break down a string into different parts. Is there an easy way to do this using C# patterns? I think I can do it with substrings but it might get messy and I want to do something that's easy to understand. Here's an example of the input:
AB-CDE-GHI-123-45-67-7777
variable1 = "AB-CDE-GHI"
variable2 = "123"
variable3 = "45"
variable4 = "67"
variable5 = "67-7777"
AB-CDE-GHIJKLM-123-45-67-7777
variable1 = "AB-CDE-GHIJKLM"
variable2 = "123"
variable3 = "45"
variable4 = "67"
variable5 = "67-7777"
AB-123-45-67-7777
variable1 = "AB"
variable2 = "123"
variable3 = "45"
variable4 = "67"
variable5 = "67-7777"
The first part of the string up until "123-45-67-7777" can be any length. Lucky for me the last part 123-45-67-7777 is always the same length and contains numbers that are zero padded.
I hope someone can come up with some suggestions for an easy method that uses regular expressions or something.
Input lines look like this:
aa-123-45-67-7777
HJHJH-123-45-67-7777
H-H-H--123-45-67-7777
222-123-45-67-7777
You do not need RegEx for parsing this kind of input.
You can use string.Split, in particular if the input is highly structured.
If you first split by - you will get a string[] with each part in a different index of the array.
The length property of the array will tell you how many parts you got and you can use that to reconstruct the parts you need.
You can rejoin any of the bit you need back.
string[] parts = "AB-CDE-GHI-123-45-67-7777".split('-');
// joining together the first 3 items:
string letters = string.Format("{0}-{1}-{2}", parts[0], parts[1], parts[2]);
// letters = "AB-CDE-GHI"
If the number of sections is variable (apart from the last 4), you can use the length in a loop to rebuild the wanted parts:
StringBuilder sb = new StringBuilder();
for(int i = 0; i < parts.Length - 4; i++)
{
sb.FormatAppend("{0}-", parts[i]);
}
sb.Length = sb.Length - 1; // remove trailing -
If the last part is always a known length (14 characters) you could just do something like this:
var firstPart = inputLine.Substring(inputLine.Length - 14);
var secondPart = inputLine.Substring(0, inputLine.Length - 15); // 15 to exclude the last -
Then you can just do your string splitting and job done :)
Although it is possible to use here String.Split, a better solution, in my opinion, would be to tokenize the input and then parse it.
You can use tools such as ANTLR for this purpose.
string[] str = "AB-CDE-GHI-123-45-67-7777".Split('-')
int a = str.Length;
variable1="";
for(int i=0;i=<a-5;i++)
{
variable1=str[i]+"-"+variable1;
}
// last - remove
variable1 = variable1.Remove(variable1.Length-1,1);
variable2 = str[a-4]
variable3 = str[a-3]
variable4 = str[a-2]
variable5 = str[a-2]+"-"str[a-1];
like Oded say you can use string.Split
I edit my answer like you want
string[] tab = textBox1.Text.Split('-');
int length = tab.Length;
string var1 = string.Empty;
for(int i=0; i <= length-5 ; i++)
{
var1 = var1 + tab[i] + '-';
}
var1 = var1.Remove(var1.Length-1,1);
string var2 = tab[length-4];
string var3 = tab[length-3];
string var4 = tab[length-2];
string var5 = tab[length-2] + '-' + tab[length-1];
it's the same with the answer of #Govind KamalaPrakash Malviya just you have make var1 + tab[i]

Categories

Resources