How to keep the delimiters of Regex.Split? - c#

I'd like to split a string using the Split function in the Regex class. The problem is that it removes the delimiters and I'd like to keep them. Preferably as separate elements in the splitee.
According to other discussions that I've found, there are only inconvenient ways to achieve that.
Any suggestions?

Just put the pattern into a capture-group, and the matches will also be included in the result.
string[] result = Regex.Split("123.456.789", #"(\.)");
Result:
{ "123", ".", "456", ".", "789" }
This also works for many other languages:
JavaScript: "123.456.789".split(/(\.)/g)
Python: re.split(r"(\.)", "123.456.789")
Perl: split(/(\.)/g, "123.456.789")
(Not Java though)

Use Matches to find the separators in the string, then get the values and the separators.
Example:
string input = "asdf,asdf;asdf.asdf,asdf,asdf";
var values = new List<string>();
int pos = 0;
foreach (Match m in Regex.Matches(input, "[,.;]")) {
values.Add(input.Substring(pos, m.Index - pos));
values.Add(m.Value);
pos = m.Index + m.Length;
}
values.Add(input.Substring(pos));

Say that input is "abc1defg2hi3jkl" and regex is to pick out digits.
String input = "abc1defg2hi3jkl";
var parts = Regex.Matches(input, #"\d+|\D+")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
Parts would be: abc 1 defg 2 hi 3 jkl

For Java:
Arrays.stream("123.456.789".split("(?<=\\.)|(?=\\.)+"))
.forEach((p) -> {
System.out.println(p);
});
outputs:
123
.
456
.
789
inspired from this post (How to split string but keep delimiters in java?)

Add them back:
string[] Parts = "A,B,C,D,E".Split(',');
string[] Parts2 = new string[Parts.Length * 2 - 1];
for (int i = 0; i < Parts.Length; i++)
{
Parts2[i * 2] = Parts[i];
if (i < Parts.Length - 1)
Parts2[i * 2 + 1] = ",";
}

for c#:
Split paragraph to sentance keeping the delimiters
sentance is splited by . or ? or ! followed by one space (otherwise if there any mail id in sentance it will be splitted)
string data="first. second! third? ";
Regex delimiter = new Regex("(?<=[.?!] )"); //there is a space between ] and )
string[] afterRegex=delimiter.Split(data);
Result
first.
second!
third?

Related

How to count 2 or 3 letter words in a string using asp c#

How to count 2 or 3 letter words of a string using asp csharp, eg.
string value="This is my string value";
and output should look like this
2 letter words = 2
3 letter words = 0
4 letter words = 1
Please help, Thanks in advance.
You can try something like this:
split sentence by space to get array of words
group them by length of word (and order by that length)
iterate through every group and write letter count and number of words with that letter count
code
using System.Linq;
using System.Diagnostics;
...
var words = value.Split(' ');
var groupedByLength = words.GroupBy(w => w.Length).OrderBy(x => x.Key);
foreach (var grp in groupedByLength)
{
Debug.WriteLine(string.Format("{0} letter words: {1}", grp.Key, grp.Count()));
}
First of all you need to decide what counts as a word. A naive approach is to split the string with spaces, but this will also count commas. Another approach is to use the following regex
\b\w+?\b
and collect all the matches.
Now you got all the words in a words array, we can write a LINQ query:
var query = words.Where(x => x.Length >= 2 && x.Length <= 4)
.GroupBy(x => x.Length)
.Select(x => new { CharCount = x.Key, WordCount = x.Count() });
Then you can print the query out like this:
query.ToList().ForEach(Console.WriteLine);
This prints:
{ CharCount = 4, WordCount = 1 }
{ CharCount = 2, WordCount = 2 }
You can write some code yourself to produce a more formatted output.
If i understood your question correctly
You can do it using dictionary
First split the string by space in this case
string value = "This is my string value";
string[] words = value.Split(' ');
Then loop trough array of words and set the length of each word as a key of dictionary, note that I've used string as a key, but you can modify this to your needs.
Dictionary<string, int> latteWords = new Dictionary<string,int>();
for(int i=0;i<words.Length;i++)
{
string key = words[i].Length + " letter word";
if (latteWords.ContainsKey(key))
latteWords[key] += 1;
else
latteWords.Add(key, 1);
}
And the output would be
foreach(var ind in latteWords)
{
Console.WriteLine(ind.Key + " = " + ind.Value);
}
Modify this by wish.

Substring Specific Word Containing Special Character between them

I have following String
string test = "viv-ek is a good boy.Mah - esh is Cra - zy.";
I want to get {"Vivek","Mahesh","Crazy"} words from that string
Some having only "-" and some having " - " in between words.
You can find your words with following regex :
\b\w+(?:\s-\s|-)\w+\b
and replace the result of match strings with (?:\s-\s|-) with empty string ''.
\b\w+\s*-\s*\w+\b
You can try this.See demo.
https://regex101.com/r/cZ0sD2/14
This might do the trick for you
string test = "viv-ek is a good boy.Mah - esh is Cra - zy.";
test = test.Replace(" -", "-").Replace("- ", "-").Replace(".", ". ");
//Or
//test = test.Replace(" - ", "-").Replace(".", ". ");
string[] allwords = test.Split(' ');
List<string> extractedWords=new List<string>();
foreach(string wrd in allwords)
{
if(wrd.Contains("-"))
{
extractedWords.Add(wrd.Replace("-", ""));
}
}
If you only want to select those words use this:
string test = "viv-ek is a good boy.Mah - esh is Cra - zy.";
var words =
Regex
.Matches(test, #"(?<part>\w+)(\s*-\s*(?<part>\w+))+\b")
.Cast<Match>()
.Select(
x => string.Join(
string.Empty,
x.Groups["part"].Captures.Cast<Capture>().SelectMany(capture => capture.Value)))
.ToList();
words is a list containing "vivek","Mahesh","Crazy".
DEMO
Replacing words will work the same way:
var replacingValues = new Dictionary<string, string> { { "Crazy", "XXX" } };
var test = "viv-ek is a good boy.Mah - esh is Cra - zy.";
var replacedTest =
Regex.Replace(
test,
#"\b(?<part>\w+)(\s*-\s*(?<part>\w+))+\b",
match =>
{
var word = string.Join(string.Empty, match.Groups["part"].Captures.Cast<Capture>().SelectMany(capture => capture.Value));
string replacingValue;
return replacingValues.TryGetValue(word, out replacingValue) ? replacingValue : match.Value;
});
replacedTestcontains viv-ek is a good boy.Mah - esh is XXX.
DEMO

Extract table name from schema and table name

I'm trying to get the table name from a string that is in the format:
[schemaname].[tablename]
I think this can be done with split but not sure how to handle the trailing ] character.
A simple approach is using String.Split and String.Trim in this little LINQ query:
string input = "[schemaname].[tablename]";
string[] schemaAndTable = input.Split('.')
.Select(t => t.Trim('[', ']'))
.ToArray();
string schema = schemaAndTable[0];
string table = schemaAndTable[1];
Another one using IndexOf and Substring:
int pointIndex = input.IndexOf('.');
if(pointIndex >= 0)
{
string schema = input.Substring(0, pointIndex).Trim('[', ']');
string table = input.Substring(pointIndex + 1).Trim('[', ']');
}
//find the seperator
var pos = str.IndexOf('].[');
if (pos == -1)
return null; //sorry, can't be found.
//copy everything from the find position, but ignore ].[
// and also ignore the last ]
var tableName = str.Substr(pos + 3, str.Length - pos - 4);
Just to be the different here is another version with regex;
var result = Regex.Match(s, #"(?<=\.\[)\w+").Value;
Split by 3 characters. i.e [.] with option RemoveEmptyEntries that is pretty self explanatory.
var result = input.Split(new [] {'[','.',']'}, StringSplitOptions.RemoveEmptyEntries);
Try this:
var tableAndSchema = "[schemaname].[tablename]";
var tableName = tableAndSchema
.Split('.')[1]
.TrimStart('[')
.TrimEnd(']');
Split will split the string on the . character and turn it into an array of two strings:
[0] = "[schemaname]"
[1] = "[tablename]"
The second (index 1) element is the one you want. TrimStart and TrimEnd will remove the starting and ending brackets.
Another way to do this is with Regular Expressions:
var tableAndSchema = "[schemaname].[tablename]";
var regex = new Regex(#"\[.*\].\[(.*)\]");
var tableName = regex.Match(tableAndSchema).Groups[1];
The regex pattern \[.*\].\[(.*)\] creates a capture group for the characters within the second pair of brackets and lets you easily pull them out.
var res = input.Split('.')[1].Trim('[', ']');
Another LINQ solution:
var tableName = String.Join("", input.SkipWhile(c => c != '.').Skip(1)
.Where(c => Char.IsLetter(c)));

Using Indexof to check if string contains a character

What I'm trying to do is type in random words into box1, click a button and then print all the words that start with "D" in box2. So if I was to type in something like "Carrots Doors Apples Desks Dogs Carpet" and click the button "Doors Desks Dogs" would print in box2.
string s = box1.Text;
int i = s.IndexOf("D");
string e = s.Substring(i);
box2.Text = (e);
when I use this^^
It would print out "Doors Apples Desks Dogs Carpet" instead of just the D's.
NOTE: These words are an example, I could type anything into box1.
Any help?
You could simplify this by using LINQ
var allDWords = box1.Text.Split(' ').Where(w => w.StartsWith("D"));
box2.Text = String.Join(" ", allDWords);
Try this
box2.Text = String.Join(" ",
box1.Text.Split(' ')
.Where(p => p.StartsWith("D")));
You can match the D words with a regular expression and iterate over the results
Try this regex
D\w+
First you need to split up the text into words and then check to see if each word starts with D. When looking for the first character it's easier to just check it directly.
string s = box1.Text;
StringBuilder builder = new StringBuilder();
foreach (var cur in s.Split(new char[] { ' ' })) {
if (cur.Length > 0 && cur[0] == 'D') {
builder.Append(cur);
builder.Append(' ');
}
}
box2.Text = builder.ToString();
One thing you could do is:
Lets suppose,
string str = "Dog Cat Man etc";
string[] words = str.Split(' ');
List<string> wordStartWithD = new List<string>();
foreach (string strTemp in words)
if (strTemp.StartsWith("D"))
wordStartWithD.Add(strTemp);
Hope this help.

How can i split the string only once using C#

Example : a - b - c must be split as
a and b - c, instead of 3 substrings
Specify the maximum number of items that you want:
string[] splitted = text.Split(new string[]{" - "}, 2, StringSplitOptions.None);
string s = "a - b - c";
string[] parts = s.Split(new char[] { '-' }, 2);
// note, you'll still need to trim off any whitespace
"a-b-c".Split( new char[] { '-' }, 2 );
You could use indexOf() to find the first instance of the character you want to split with, then substring() to get the two aspects. For example...
int pos = myString.IndexOf('-');
string first = myString.Substring(0, pos);
string second = myString.Substring(pos);
This is a rough example - you'll need to play with it if you don't want the separator character in there - but you should get the idea from this.
string[] splitted = "a - b - c".Split(new char[]{' ', '-'}, 2, StringSplitOptions.RemoveEmptyEntries);
var str = "a-b-c";
int splitPos = str.IndexOf('-');
string[] split = { str.Remove(splitPos), str.Substring(splitPos + 1) };
I have joined late and many of above answers are matched with my following words:
string has its own
Split
You can use the same to find the solution of your problem, following is the example as per your issue:
using System;
public class Program
{
public static void Main()
{
var PrimaryString = "a - b - c";
var strPrimary = PrimaryString.Split( new char[] { '-' }, 2 );
Console.WriteLine("First:{0}, Second:{1}",strPrimary[0],strPrimary[1]);
}
}
Output:
First:a , Second: b - c

Categories

Resources