How do I extract substrings from string?

How do I extract substrings from string? - c#

I have so expression that contains numbers and plus symbols:
string expression = 235+356+345+24+5+2+4355+456+365+356.....+34+5542;
List<string> numbersList = new List<string>();
How should I extract every number substring (235, 356, 345, 24....) from that expression and collect them into a string list?

You can do something like
List<string> parts = expression.Split('+').ToList();
http://msdn.microsoft.com/en-us/library/system.string.split.aspx
If there is any potential for white space around the + signs, you could so something a little more fancy:
List<string> parts = (from t in expression.Split('+') select t.Trim()).ToList();

Something like:
string expression = "235+356+345+24+5+2+4355+456+365+356";
List<string> list = new List<string>(expression.Split('+'));

Try this piece of code
string expression = "235+356+345+24+5+2+4355+456+365+356";
string[] numbers = expression.Split('+');
List<string> numbersList = numbers.ToList();

Or this, a positive check for numeric sequences:
private static Regex rxNumber = new Regex( "\d+" ) ;
public IEnumerable<string> ParseIntegersFromString( string s )
{
Match m = rxNumber.Match(s) ;
for ( m = rxNumber.Match(s) ; m.Success ) ; m = m.NextMatch() )
{
yield return m.Value ;
}
}

Related

Why Regex in a while loop will match only the first occurrence length (is not dynamic in a while loop)

I have a regex which I would imagine dynamically captures my group of zeros. What happens is I get a list full of e.g. [00, 00, 00, 00, 00] from a string like "001111110000001100110011111"
I've tried putting my var regex = new Regex() inside the while loop in hopes this might solve my problem. Whatever I try, regex returns only the first occurrences' length of zeros instead of filling my collection with varying zeros amounts.
List<string> ZerosMatch(string input)
{
var newInput = input;
var list = new List<string>();
var regex = new Regex(#"[0]{1,}");
var matches = regex.Match(newInput);
while (matches.Success)
{
list.Add(matches.Value);
try
{
newInput = newInput.Remove(0, matches.Index);
}
catch
{
break;
}
}
return list;
}
vs
List<string> ZerosMatch(string input)
{
var newInput = input;
var list = new List<string>();
bool hasMatch = true;
while (hasMatch)
{
try
{
var regex = new Regex(#"[0]{1,}");
var matches = regex.Match(newInput);
newInput = newInput.Remove(0, matches.Index);
list.Add(matches.Value);
hasMatch = matches.Success;
}
catch
{
break;
}
}
return list;
}
My question is Why is this happening ?

var newInput = input; //The newInput variable is not needed and you can proceed with input
var list = new List<string>();
var regex = new Regex(#"[0]{1,}");
var matches = regex.Matches(newInput);
for(int i=0; i<matches.Count; i++)
{
list.Add(matches[i].Value);
}
return list;

In your first approach, you are only executing regex.Match once, so you are always looking at the very same match until your code throws an Exception. Depending on whether your first match is at index 0 or later, it's an OutOfBounds exception (because you try to remove from an empty string) or an OutOfMemory exception (because you are removing nothing from your string but adding to your result list indefinitively.
Your second approach will suffer from the same OutOfMemory exception if your input starts with a 0 or you arrive at some intermediate result string which starts with 0
See below for a working approach:
List<string> ZerosMatch(string input)
{
var newInput = input;
var list = new List<string>();
var regex = new Regex(#"[0]{1,}");
var match = regex.Match(newInput);
while (match.Success)
{
newInput = newInput.Remove(match.Index, match.Value.Length);
list.Add(match.Value);
match = regex.Match(newInput);
}
return list;
}
Still, using Regex.Matches is the recommended approach, if you want to extract multiple instances of a match from a string ...

I suggest using Matches instead of Match and query with a help of Linq (why should we loop, search again when we can get all the matches in one go):
using Sysem.Linq;
...
static List<string> ZeroesMatch(string input) => Regex
.Matches(input ?? "", "0+")
.Cast<Match>()
.Select(match => match.Value)
.ToList();
Here I've simplified pattern into 0+ (one or more 0 chars) and added ?? "" to avoid exception on null string

How to separate string after whitespace in c#

I'm using c# and have a string like x="12 $Math A Level$"` that could be also x="12 Math A Level"
How can I separate this string in order to have a variable year=12 and subject=Math A Level?
I was using something like:
char[] whitespace = new char[] { ' ', '\t' };
var x = item.Split(whitespace);
but then I didn't know what to do after or if there's a better way to do this.

You could use the override of split that takes the count :
var examples = new []{"2 $Math A Level$", "<some_num> <some text>"} ;
foreach(var s in examples)
{
var parts = s.Split(' ', count: 2, StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries);
Console.WriteLine($"'{parts[0]}', '{parts[1]}'");
}
This prints:
'2', '$Math A Level$'
'<some_num>', '<some text>'

You could do
var item = "12 Math A Level";
var index = item.IndexOf(' ');
var year = item.Substring(0, index);
var subject = item.Substring(index + 1, item.Length - index-1).Trim('$');
This assumes that the year is the first word, and the subject is everything else. It also assumes you are not interested in any '$' signs. You might also want to add a check that the index was actually found, in case there are no spaces in the string.

To add a Regex-based answer:
using System;
using System.Text.RegularExpressions;
public class Program
{
public static readonly Regex regex = new Regex(#"(?<ID>[0-9]+)\s+[$]?(?<Text>[^$]*)[$]?", RegexOptions.Compiled);
public static void Main()
{
MatchCollection matches = regex.Matches("12 $Math A Level$");
foreach( Match m in matches )
{
Console.WriteLine($"{(m.Groups["ID"].Value)} | {(m.Groups["Text"].Value)}");
}
matches = regex.Matches("13 Math B Level");
foreach( Match m in matches )
{
Console.WriteLine($"{(m.Groups["ID"].Value)} | {(m.Groups["Text"].Value)}");
}
}
}
In action: https://dotnetfiddle.net/6XEQw8
Output:
12 | Math A Level
13 | Math B Level
To explain the expression:
(?[0-9]+)\s+[$]?(?[^$]*)[$]?
(?[0-9]+) - Named Catpure-Group "ID"
[0-9] - Match literal chars '0' to '9'
+ - ^^ One or more times
\s+ - Match whitespace one or more times
[$]? - Match literal '$' one or zero times
(?[^$]*) - Named Capture-Group "Text"
[^$] - Match anything that is _not_ literal '$'
* - ^^ Zero or more times
[$]? - Match literal '$' one or zero times
See also https://regex101.com/r/WV366l/1
Mind: I personally would benchmark this solution against a (or several) non-regex solutions and then make a choice.

var x = "12 $Math A Level$".Split('$', StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries);
string year = x[0];
string subject = x[1];
Console.WriteLine(year);
Console.WriteLine(subject);

If you can rely on the string format specified ("12 $Math A Level$"), you could split with at $ like this:
using System;
public class Program
{
public static void Main()
{
var sample = "12 $Math A Level$";
var rec = Parse(sample);
Console.WriteLine($"Year={rec.Year}\nSubject={rec.Subject}");
}
private static Record Parse(string value)
{
var delimiter = new char[] { '$' };
var parts = value.Split(delimiter, StringSplitOptions.RemoveEmptyEntries);
return new Record { Year = Convert.ToInt32(parts[0]), Subject = parts[1] };
}
public class Record
{
public int Year { get; set; }
public string Subject { get; set; }
}
}
Output:
Year=12
Subject=Math A Level
▶️ Try it out here: https://dotnetfiddle.net/DAFLjA

How to find 1 in my string but ignore -1 C#

I have a string
string test1 = "255\r\n\r\n0\r\n\r\n-1\r\n\r\n255\r\n\r\n1\r";
I want to find all the 1's in my string but not the -1's. So in my string there is only one 1. I use string.Contain("1") but this will find two 1's. So how do i do this?

You can use regular expression:
string test1 = "255\r\n\r\n0\r\n\r\n-1\r\n\r\n255\r\n\r\n1\r";
// if at least one "1", but not "-1"
if (Regex.IsMatch(test1, "(?<!-)1")) {
...
}
the pattern is exactly 1 which is not preceed by -. To find all the 1s:
var matches = Regex
.Matches(test1, "(?<!-)1")
.OfType<Match>()
.ToArray(); // if you want an array

Try this simple solution:
Note : You can convert this to extension Method Easily.
static List<int> FindIndexSpecial(string search, char find, char ignoreIfPreceededBy)
{
// Map each Character with its Index in the String
var characterIndexMapping = search.Select((x, y) => new { character = x, index = y }).ToList();
// Check the Indexes of the excluded Character
var excludeIndexes = characterIndexMapping.Where(x => x.character == ignoreIfPreceededBy).Select(x => x.index).ToList();
// Return only Indexes who match the 'find' and are not preceeded by the excluded character
return (from t in characterIndexMapping
where t.character == find && !excludeIndexes.Contains(t.index - 1)
select t.index).ToList();
}
Usage :
static void Main(string[] args)
{
string test1 = "255\r\n\r\n0\r\n\r\n-1\r\n\r\n255\r\n\r\n1\r";
var matches = FindIndexSpecial(test1, '1', '-');
foreach (int index in matches)
{
Console.WriteLine(index);
}
Console.ReadKey();
}

You could use String.Split and Enumerable.Contains or Enumerable.Where:
string[] lines = test1.Split(new[] {Environment.NewLine, "\r"}, StringSplitOptions.RemoveEmptyEntries);
bool contains1 = lines.Contains("1");
string[] allOnes = lines.Where(l => l == "1").ToArray();
String.Contains searches for sub-strings in a given string instance. Enumerable.Contains looks if there's at least one string in the string[] which equals it.

split strings and assign them to dictionary

I have a text file containing many lines which look like this:
Flowers{Tulip|Sun Flower|Rose}
Gender{Female|Male}
Pets{Cat|Dog|Rabbit}
I know how to read lines from a file, but what's the best way to split and store the categories and their subitems in a dictionary afterwards? Let's say from a string array which contains all the above lines?

The idea to use a regexp is right, but I prefer using named captures for readability
var regexp = new Regex(#"(?<category>\w+?)\{(?<entities>.*?)\}");
var d = new Dictionary<string, List<string>>();
// you would replace this list with the lines read from the file
var list = new string[] {"Flowers{Tulip|Sun Flower|Rose}"
, " Gender{Female|Male}"
, "Pets{Cat|Dog|Rabbit}"};
foreach (var entry in list)
{
var mc = regexp.Matches(entry);
foreach (Match m in mc)
{
d.Add(m.Groups["category"].Value
, m.Groups["entities"].Value.Split('|').ToList());
}
}
You get a dictionary with the category as a key, and the values in a list of strings

you can use the Key and value on this code
string T = #"Flowers{Tulip|Sun Flower|Rose}
Gender{Female|Male}
Pets{Cat|Dog|Rabbit}";
foreach (var line in T.Split('\n'))//or while(!file.EndOfFile)
{
var S = line.Split(new char[] { '{', '|','}' }, StringSplitOptions.RemoveEmptyEntries);
string Key = S[0];
MessageBox.Show(Key);//sth like this
for (int i = 1 ; i < S.Length; i++)
{
string value = S[i];
MessageBox.Show(value);//sth like this
}
}

you can use this:
string line = reader.ReadLine();
Regex r = new Regex(#"(\w+){(\w+)}");
now loop the results of this regex:
foreach(Match m in r.Matches(line)) {
yourDict.Add(m.Groups[1], m.Groups[2].Split(' '));
}

How to find out if regexp parsed string part contains another string?

Say we have a list of strings L, a given string S. We have a regexp like (\w+)\-(\w+) we want to get all L elements for which S matches $1 of regexp. How to do such thing?

You can do this:
// sample data
string[] L = new string[] { "bar foo", "foo bar-zoo", "bar-", "zoo bar-foo" };
string S = "bar";
Regex regex = new Regex(#"(\w+)\-(\w+)");
string[] res = L.Where(l => {
Match m = regex.Match(l);
if (m.Success) return m.Groups[1].Value == S;
else return false;
}).ToArray();
and get
foo bar-zoo
zoo bar-foo
An easier way that probably works out for you too is to include S in the regex:
Regex regex = new Regex(S + #"\-(\w+)");
string[] res = L.Where(l => regex.Match(l).Success).ToArray();

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How do I extract substrings from string? - c#

Something like: string expression = "235+356+345+24+5+2+4355+456+365+356"; List<string> list = new List<string>(expression.Split('+'));

Try this piece of code string expression = "235+356+345+24+5+2+4355+456+365+356"; string[] numbers = expression.Split('+'); List<string> numbersList = numbers.ToList();

Or this, a positive check for numeric sequences: private static Regex rxNumber = new Regex( "\d+" ) ; public IEnumerable<string> ParseIntegersFromString( string s ) { Match m = rxNumber.Match(s) ; for ( m = rxNumber.Match(s) ; m.Success ) ; m = m.NextMatch() ) { yield return m.Value ; } }

Related

Why Regex in a while loop will match only the first occurrence length (is not dynamic in a while loop)

How to separate string after whitespace in c#

How to find 1 in my string but ignore -1 C#

split strings and assign them to dictionary

How to find out if regexp parsed string part contains another string?

Categories

Resources