C# Split text into Substrings

C# Split text into Substrings - c#

What I'm actually trying is to split a StreamReader.ReadLine() object such as "1 A & B 2 C & D" into "1", "A & B", "2" and "C & D" substrings. Anybody an idea of a simple algorithm to implement this splitting?

Something like this (using a tiny bit of Linq): ?
static private List<string> Parse(string s)
{
var result = new List<string>();
string[] rawTextParts = s.Split(new char[] { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' });
var textParts = rawTextParts.Where(t => !string.IsNullOrWhiteSpace(t)).Select(t => t.Trim());
foreach (string textPart in textParts)
{
string numberstring = s.Substring(0, s.IndexOf(textPart)).Trim();
s = s.Substring(s.IndexOf(textPart) + textPart.Length);
result.Add(numberstring);
result.Add(textPart);
}
return result;
}

Regex is made for pattern matching. There are two patterns, Alphabetic character(s) a non character and alphabetic character(s) or the final pattern of numbers. Here is the regex to do such:
var input = "1 A & B 2 C & D";
var pattern = #"[a-zA-Z]\s+\W\s+[a-zA-Z]|\d+";
var resultItems =
Regex.Matches(input, pattern)
.OfType<Match>()
.Select(m => m.Value)
.ToList();
Result is
The \s+ was not mentioned for that handles all spaces, such it is 1 to many spaces for something like (A & B). If you believe there will be no spaces such A&B use \s* which is zero to many spaces.

It's hard to infer precise requirements from your question. But according to your example I'd come with something like:
void Main()
{
var input = "1 A & B 2 C & D";
var result = Parse(input);
Console.WriteLine(String.Join("\n", result));
}
static IEnumerable<string> Parse(string input)
{
var words = input.Split();
var builder = new StringBuilder();
foreach (var word in words)
{
if (int.TryParse(word, out var value))
{
if (builder.Length > 0)
{
yield return builder.ToString();
builder.Clear();
}
yield return word;
}
else
{
if (builder.Length > 0)
{
builder.Append(' ');
}
builder.Append(word);
}
}
if (builder.Length > 0) // leftovers
{
yield return builder.ToString();
}
}
The output of the above code will be:
1
A & B
2
C & D

Related

How to find largest word that starts with a capital and add a separator and space

I have code that finds largest word that starts with a capital letter. But I need that word to add a separator and space. Any ideas how I should do it properly?
char[] skyrikliai = { ' ', '.', ',', '!', '?', ':', ';', '(', ')', '\t' };
string eilute = "Arvydas (g. 1964 m. gruodzio 19 d. Kaune)– Lietuvos, krepsininkas, olimpinis ir pasaulio cempionas, nuo 2011 m. spalio 24 d.";
static string Ilgiausias(string eilute, char[] skyrikliai)
{
string[] parts = eilute.Split(skyrikliai,
StringSplitOptions.RemoveEmptyEntries);
string ilgiaus = "";
foreach (string zodis in parts)
if ((zodis.Length > ilgiaus.Length) && (zodis[0].ToString() == zodis[0].ToString().ToUpper()))
ilgiaus = zodis;
return ilgiaus;
}
It should find word Lietuvos and add , and space
Result should be "Lietuvos, "

I would use LINQ for that:
var ilgiaus = parts.Where(s => s[0].IsUpper())
.OrderByDescending(s => s.Length)
.FirstOrDefault();
if(ilgiaus != null) {
return ilgiaus + ", ";
}

Also you can use regex and linq. You dont need to split by many characters.
Regex regex = new Regex(#"[A-Z]\w*");
string str = "Arvydas (g. 1964 m. gruodzio 19 d. Kaune)– Lietuvos, krepsininkas, olimpinis ir pasaulio cempionas, nuo 2011 m. spalio 24 d.";
string longest = regex.Matches(str).Cast<Match>().Select(match => match.Value).MaxBy(val => val.Length);
if you dont want to use MoreLinq, instead of MaxBy(val => val.Length) you can do OrderByDescending(x => x.Length).First()

There are probably more ingenious and elegant ways, but the following pseudocode should work:
List<String> listOfStrings = new List<String>();
// add some strings to the generic list
listOfStrings.Add("bla");
listOfStrings.Add("foo");
listOfStrings.Add("bar");
listOfStrings.Add("Rompecabeza");
listOfStrings.Add("Rumpelstiltskin");
. . .
String longestWorld = String.Empty;
. . .
longestWord = GetLongestCapitalizedWord(listOfStrings);
. . .
private String GetLongestCapitalizedWord(List<String> listOfStrings)
{
foreach (string s in listofstrings)
{
if ((IsCapitalized(s) && (s.Len > longestWord.Len)))
{
longestWord = s;
}
}
}
private bool IsCapitalized(String s)
{
return // true or false
}

How to ignore the punctuation c#

I want to ignore the punctuation.So, I'm trying to make a program that counts all the appearences of every word in my text but without taking in consideration the punctuation marks.
So my program is:
static void Main(string[] args)
{
string text = "This my world. World, world,THIS WORLD ! Is this - the world .";
IDictionary<string, int> wordsCount =
new SortedDictionary<string, int>();
text=text.ToLower();
text = text.replaceAll("[^0-9a-zA-Z\text]", "X");
string[] words = text.Split(' ',',','-','!','.');
foreach (string word in words)
{
int count = 1;
if (wordsCount.ContainsKey(word))
count = wordsCount[word] + 1;
wordsCount[word] = count;
}
var items = from pair in wordsCount
orderby pair.Value ascending
select pair;
foreach (var p in items)
{
Console.WriteLine("{0} -> {1}", p.Key, p.Value);
}
}
The output is:
is->1
my->1
the->1
this->3
world->5
(here is nothing) -> 8
How can I remove the punctuation here?

You should try specifying StringSplitOptions.RemoveEmptyEntries:
string[] words = text.Split(" ,-!.".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
Note that instead of manually creating a char[] with all the punctuation characters, you may create a string and call ToCharArray() to get the array of characters.
I find it easier to read and to modify later on.

string[] words = text.Split(new char[]{' ',',','-','!','.'}, StringSplitOPtions.RemoveEmptyItems);

It is simple - first step is to remove undesired punctuation with function Replace and then continue with splitting as you have it.

... you can go with the making people cry version ...
"This my world. World, world,THIS WORLD ! Is this - the world ."
.ToLower()
.Split(" ,-!.".ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
.GroupBy(i => i)
.Select(i=>new{Word=i.Key, Count = i.Count()})
.OrderBy(k => k.Count)
.ToList()
.ForEach(Console.WriteLine);
.. output
{ Word = my, Count = 1 }
{ Word = is, Count = 1 }
{ Word = the, Count = 1 }
{ Word = this, Count = 3 }
{ Word = world, Count = 5 }

Remove Alphabetic characters from a string, leaving numbers and symbols

I have a bunch of strings that I'm trying to parse the date out of. I have a script that will parse the date, but it's having trouble with all the extra letters in the string. I need to remove all the letters but leave characters such as - / _
I'm not particularly good with Regex, so all attempts to do this so far have ended with too many characters getting removed.
Here's a few sample strings to help:
Littleton, CO - Go-Live 5/8
Brunswick - Go-Live 5/14
CutSheeet_Go Live-5-14-14
Go Live - 5-19-2014

You could do this:
Regex.Replace(input, "([a-zA-Z,_ ]+|(?<=[a-zA-Z ])[/-])", "");
Working regex example:
http://regex101.com/r/kD2jF4
From your example data, output would be:
5/8
5/14
5-14-14
5-19-2014

You can use a function like this:
public static string Parse(string source)
{
var numbers = new [] {'0','1','2','3','4','5','6','7','8','9' };
var chars = new [] { '-', '/', '_' };
return new string(source
.Where(x => numbers.Contains(x) || chars.Contains(x))
.ToArray()).Trim(chars);
}
Here is fiddle

Try this:
public static string StripCrap(string input)
{
return input.Where(c => char.IsNumber(c) || c == '_' || c == '/' ||
c == '-').Aggregate("", (current, c) => current + c);
}
Or, if you want a maintainable list:
public static string StripCrap(string input)
{
char[] nonCrapChars = {'/', '-', '_'};
return input.Where(c => char.IsNumber(c) || nonCrapChars.Contains(c)).Aggregate("", (current, c) => current + c);
}
Or...You could also create an extension method:
public static string ToNonCrappyString(this string input)
{
char[] nonCrapChars = {'/', '-', '_'};
return input.Where(c => char.IsNumber(c) || nonCrapChars.Contains(c)).Aggregate("", (current, c) => current + c);
}
and you can call it like this:
string myString = "Hello 1234!";
string nonCrappyString = myString.ToNonCrappyString();

use this pattern .*?(\d+[\d-\/]*\d+)|.* and replace with $1 Demo

C# string operation. get file name substring

myfinename_slice_1.tif
myfilename_slice_2.tif
...
...
myfilename_slice_15.tif
...
...
myfilename_slice_210.tif
In C#, how can I get file index, like "1", "2", "15", "210" using string operations?

You have some options:
Regular expressions with the Regex class;
String.Split.
Most important is what are the assumptions you can make about the format of the file name.
For example if it's always at the end of the file name, without counting the extension, and after an underscore you can do:
var id = Path.GetFileNameWithoutExtension("myfinename_slice_1.tif")
.Split('_')
.Last();
Console.WriteLine(id);
If for example you can assume that the identifier is guaranteed to appear in the filename and the characters [0-9] are only allowed to appear in the filename as part of the identifier, you can just do:
var id = Regex.Match("myfinename_slice_1.tif", #"\d+").Value;
Console.WriteLine(id);
There are probably more ways to do this, but the most important thing is to assert which assumptions you can make and then code a implementation based on them.

This looks like a job for regular expressions. First define the pattern as a regular expression:
.*?_(?<index>\d+)\.tif
Then get a match against your string. The group named index will contain the digits:
var idx = Regex.Match(filename, #".*?_(?<index>\d+)\.tif").Groups["index"].Value;

You can use the regex "(?<digits>\d+)\.[^\.]+$", and if it's a match the string you're looking for is in the group named "digits"

Here is the method which will handle that:
public int GetFileIndex(string argFilename)
{
return Int32.Parse(argFilename.Substring(argFilename.LastIndexOf("_")+1, argFilename.LastIndexOf(".")));
}
Enjoy

String.Split('_')[2].Split('.')[0]

public class UnitTest1
{
[TestMethod]
public void TestMethod1()
{
var s1 = "myfinename_slice_1.tif";
var s2 = "myfilename_slice_2.tif";
var s3 = "myfilename_slice_15.tif";
var s4 = "myfilename_slice_210.tif";
var s5 = "myfilena44me_slice_210.tif";
var s6 = "7myfilena44me_slice_210.tif";
var s7 = "tif999";
Assert.AreEqual(1, EnumerateNumbers(s1).First());
Assert.AreEqual(2, EnumerateNumbers(s2).First());
Assert.AreEqual(15, EnumerateNumbers(s3).First());
Assert.AreEqual(210, EnumerateNumbers(s4).First());
Assert.AreEqual(210, EnumerateNumbers(s5).Skip(1).First());
Assert.AreEqual(210, EnumerateNumbers(s6).Skip(2).First());
Assert.AreEqual(44, EnumerateNumbers(s6).Skip(1).First());
Assert.AreEqual(999, EnumerateNumbers(s7).First());
}
static IEnumerable<int> EnumerateNumbers(string input)
{
var digits = new char[] { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' };
string result = string.Empty;
foreach (var c in input.ToCharArray())
{
if (!digits.Contains(c))
{
if (!string.IsNullOrEmpty(result))
{
yield return int.Parse(result);
result = string.Empty;
}
}
else
{
result += c;
}
}
if (result.Length > 0)
yield return int.Parse(result);
}
}

Separating numbers from other signs in a string

I got a string that contains:
"(" ")" "&&" "||"
and numbers (0 to 99999).
I want to get a string and return a list like this:
get:
"(54&&1)||15"
return new List<string>(){
"(",
"54",
"&&",
"1",
")",
"||",
"15"}

I suspect a regex would do the trick here. Something like:
string text = "(54&&1)||15";
Regex pattern = new Regex(#"\(|\)|&&|\|\||\d+");
Match match = pattern.Match(text);
while (match.Success)
{
Console.WriteLine(match.Value);
match = match.NextMatch();
}
The tricky bit in the above is that a lot of stuff needs escaping. The | is the alternation operator, so this is "open bracket or close bracket or && or || or at least one digit".

If you want to extract only numbers from your string you can use the regex
but if you want to parse this string and made some as formula and calculate result you should look at the math expression parser
for example look at this Math Parser

Here's the LINQ/Lambda way to do it:
var operators = new [] { "(", ")", "&&", "||", };
Func<string, IEnumerable<string>> operatorSplit = t =>
{
Func<string, string, IEnumerable<string>> inner = null;
inner = (p, x) =>
{
if (x.Length == 0)
{
return new [] { p, };
}
else
{
var op = operators.FirstOrDefault(o => x.StartsWith(o));
if (op != null)
{
return (new [] { p, op }).Concat(inner("", x.Substring(op.Length)));
}
else
{
return inner(p + x.Substring(0, 1), x.Substring(1));
}
}
};
return inner("", t).Where(x => !String.IsNullOrEmpty(x));
};
Now you just call this:
var list = operatorSplit("(54&&1)||15").ToList();
Enjoy!

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# Split text into Substrings - c#

What I'm actually trying is to split a StreamReader.ReadLine() object such as "1 A & B 2 C & D" into "1", "A & B", "2" and "C & D" substrings. Anybody an idea of a simple algorithm to implement this splitting?

Related

How to find largest word that starts with a capital and add a separator and space

How to ignore the punctuation c#

Remove Alphabetic characters from a string, leaving numbers and symbols

C# string operation. get file name substring

Separating numbers from other signs in a string

Categories

Resources