Regex get single, connected number - c#

I am trying to filter out the addressnumber of on inputstring, but the problem is my code yet leads to unwanted results when a string with multiple numbers comes in.
Is there a possibility to tell the Regex to filter into an array or something like that to recognize if there was more than one number in the original string?
String theNumbers = String.Join(String.Empty, Regex.Matches(inputString, #"\d+").OfType<Match>().Select(m => m.Value));
I tried it on a different way now aswell, but Regex.Split generates empty Strings in the Array and just filtering them out seems a bit hacky to me.
String[] extractedNumbersArray = Regex.Split(inputString, #"\D+");

Hope this helps (online):
using System;
using System.Text.RegularExpressions;
using System.Linq;
public class Program
{
public static void Main()
{
var inputString = "1 2 3";
var values = Regex
.Matches(inputString, #"(?<nr>\d+)")
.OfType<Match>()
.Select(m => m.Groups["nr"].Value)
.ToArray();
Console.WriteLine("Multipe numbers: " + (values.Length > 1 ? "yep" : "nope"));
foreach (var v in values)
{
Console.WriteLine(v);
}
}
}

Related

How to check if there is a character between two spaces C#

I have a string and want to check if there is a letter(only one) that is surrounded by spaces. I tried using Regex but something is not right.
Console.Write("Write a string: ");
string s = Console.ReadLine();
string[] results = Regex.Matches(s, #" (a-zA-Z) ")
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToArray();
I am not sure if I am doing this right I am new to C#
A full blown RegEx seems to be heavy stuff for such a simple operation.
This is a sample how to do it. It does include a lot of assumptions that might not be true for you (the fact that I don't consider start or end of string a valid whitespace, the fact I check for WhiteSpace instead of blank, you will have to check those assumptions I made).
namespace ConsoleApplication4
{
using System;
using System.Collections.Generic;
using System.Linq;
public static class StringExtensions
{
public static IEnumerable<int> IndexOfSingleLetterBetweenWhiteSpace(this string text)
{
return Enumerable.Range(1, text.Length-2)
.Where(index => char.IsLetter(text[index])
&& char.IsWhiteSpace(text[index + 1])
&& char.IsWhiteSpace(text[index - 1]));
}
}
class Program
{
static void Main()
{
var text = "This is a test";
var index = text.IndexOfSingleLetterBetweenWhiteSpace().Single();
Console.WriteLine("There is a single letter '{0}' at index {1}", text[index], index);
Console.ReadLine();
}
}
}
This should print
There is a single letter 'a' at index 8

Substitute only one group when dealing with an unknown number of capturing groups

Assuming I have this input:
/green/blah/agriculture/apple/blah/
I'm only trying to capture and replace the occurrence of apple (need to replace it with orange), so I have this regex
var regex = new Regex("^/(?:green|red){1}(?:/.*)+(apple){1}(?:/.*)");
So I'm grouping sections of the input, but as non-capturing, and only capturing the one I'm concerned with. According to this $` will retrieve everything before the match in the input string, and $' will get everything after, so theoretically the following should work:
"$`Orange$'"
But it only retrieves the match ("apple").
Is it possible to do this with just substitutions and NOT match evaluators and looping through groups?
The issue is that apple can occur anywhere in that url scheme, hence an unknown number of capture groups.
Thanks.
To achieve what you want, I slightly changed your regex.
The new regex looks like this look for the updated version at the end of the answer:
What I am doing here is, I want all the other groups to become captured groups. Doing this I can use them as follow:
String replacement = "$1Orange$2";
string result = Regex.Replace(text, regex.ToString(), replacement);
I am using group 1,2 and 4 and in the middle of everything (where I suspect 'apple') I replace it with Orange.
A complete example looks like this:
using System;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
String text = "/green/blah/agriculture/apple/blah/hallo/apple";
var regex = new Regex("^(/(?:green|red)/(?:[^/]+/)*?)apple(/.*)");
String replacement = "$1$2Orange$4";
string result = Regex.Replace(text, regex.ToString(), replacement);
Console.WriteLine(result);
}
}
And as well a running example is here
See the updated regex, I needed to change it again to capture things like this:
/green/blah/agriculture/apple/blah/hallo/apple/green/blah/agriculture/apple/blah/hallo/apple
With the above regex it matched the last apple and not the first as prio designated. I changed the regex to this:
var regex = new Regex("^(/(?:green|red)/(?:[^/]+/)*?)apple(/.*)");
I updated the code as well as the running example.
If you really want to replace only the first occurence of apple and dont mind about the URL structure then can you use one of the following methods:
First simply use apple as regex and use the overloaded Replace method.
using System;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
String text = "/green/blah/agriculture/apple/blah/hallo/apple/green/blah/agriculture/apple/blah/hallo/apple";
var regex = new Regex(Regex.Escape("apple"));
String replacement = "Orange";
string result = regex.Replace(text, replacement.ToString(), 1);
Console.WriteLine(result);
}
}
See working Example
Second is the use of IndexOf and Substring which could be much quick as the use of the regex classes.
See the following Example:
class Program
{
static void Main(string[] args)
{
string search = "apple";
string text = "/green/blah/agriculture/apple/blah/hallo/apple/green/blah/agriculture/apple/blah/hallo/apple";
int idx = text.IndexOf(search);
int endIdx = idx + search.Length;
int secondStrLen = text.Length - endIdx;
if (idx != -1 && idx < text.Length && endIdx < text.Length && secondStrLen > -1)
{
string first = text.Substring(0, idx);
string second = text.Substring(endIdx, secondStrLen);
string result = first + "Orange" + second;
Console.WriteLine(result);
}
}
}
Working Example

Get Removed characters from string

I am using Regex to remove unwanted characters from string like below:
str = System.Text.RegularExpressions.Regex.Replace(str, #"[^\u0020-\u007E]", "");
How can I retrieve distinct characters which will be removed in efficient way?
EDIT:
Sample input : str = "This☺ contains Åüsome æspecialæ characters"
Sample output : str = "This contains some special characters"
removedchar = "☺,Å,ü,æ"
string pattern = #"[\u0020-\u007E]";
Regex rgx = new Regex(pattern);
List<string> matches = new List<string> ();
foreach (Match match in rgx.Matches(str))
{
if (!matches.Contains (match.Value))
{
matches.Add (match.Value);
}
}
Here is an example how you can do it with a callback method inside the Regex.Replace overload with an evaluator:
evaluator
Type: System.Text.RegularExpressions.MatchEvaluator
A custom method that examines each match and returns either the original matched string or a replacement string.
C# demo:
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
public class Test
{
public static List<string> characters = new List<string>();
public static void Main()
{
var str = Regex.Replace("§My string 123”˝", "[^\u0020-\u007E]", Repl);//""
Console.WriteLine(str); // => My string 123
Console.WriteLine(string.Join(", ", characters)); // => §, ”, ˝
}
public static string Repl(Match m)
{
characters.Add(m.Value);
return string.Empty;
}
}
See IDEONE demo
In short, declare a "global" variable (a list of strings, here, characters), initialize it. Add the Repl method to handle the replacement, and when Regex.Replace calls that method, add each matched value to the characters list.

Split string that contains array to get array of values C#

I have a string that contains an array
string str = "array[0]=[1,a,3,4,asdf54,6];array[1]=[1aaa,2,4,k=6,2,8];array[2]=[...]";
I'd like to split it to get an array like this:
str[0] = "[1,a,3,4,asdf54,6]";
str[1] = "[1aaa,2,4,k=6,2,8]";
str[2] = ....
I've tried to use Regex.Split(str, #"\[\D+\]") but it didn't work..
Any suggestions?
Thanks
SOLUTION:
After seen your answers I used
var arr = Regex.Split(str, #"\];array\[[\d, -]+\]=\[");
This works just fine, thanks all!
var t = str.Split(';').Select(s => s.Split(new char[]{'='}, 2)).Select(s => s.Last()).ToArray();
In regex, \d matches any digit, whilst \D matches anything that is not a digit. I assume your use of the latter is erroneous. Additionally, you should allow your regex to also match negation signs, commas, and spaces, using the character class [\d\-, ]. You can also include a lookahead lookbehind for the = character, written as (?<=\=), in order to avoid getting the [0], [1], [2], ...
string str = "array[0]=[1,2,3,4,5,6];array[1]=[1,2,4,6,2,8];array[2]=[...]";
string[] results = Regex.Matches(str, #"(?<=\=)\[[\d\-, ]+\]")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
Try this - using regular expression look behind to grab the relevant parts of your string.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Text.RegularExpressions;
namespace RegexSplit
{
class Program
{
static void Main(string[] args)
{
string str = "array[0]=[1,2,3,4,5,6];array[1]=[1,2,4,6,2,8];array[2]=[...]";
Regex r = new Regex(#"(?<=\]=)(\[.+?\])");
string[] results = r.Matches(str).Cast<Match>().Select(p => p.Groups[1].Value).ToArray();
}
}
}
BONUS - convert to int[][] if you are fancy.
int[][] ints = results.Select(p => p.Split(new [] {'[', ',', ']'}, StringSplitOptions.RemoveEmptyEntries)
.Where(s => { int temp; return int.TryParse(s, out temp);}) //omits the '...' match from your sample. potentially you could change the regex pattern to only catch digits but that isnt what you wanted (i dont think)
.Select(s => int.Parse(s)).ToArray()).ToArray();
Regex would be an option, but it would be a bit complicated. Assuming you don't have a parser for your input, you can try the following:
-Split the string by ; characters, and you'd get a string array (e.g. string[]):
"array[0]=[1,2,3,4,5,6]", "array[1]=[1,2,4,6,2,8]", "array[2]=[...]". Let's call it list.
Then for each of the elements in that array (assuming the input is in order), do this:
-Find the index of ]=, let that be x.
-Take the substring of your whole string from the starting index x + 2. Let's call it sub.
-Assign the result string as the current string in your array, e.g. if you are iterating with a regular for loop, and your indexing variable is i such as for(int i = 0; i < len; i++){...}:
list[i] = sub.
I know it is a dirty and an error-prone solution, e.g. if input is array[0] =[1,2... instead of array[0]=[1,2,... it won't work due to the extra space there, but if your input mathces that exact pattern (no extra spaces, no newlines etc), it will do the job.
UPDATE: cosset's answer seems to be the most practical and easiest way to achieve your result, especially if you are familiar with LINQ.
string[] output=Regex.Matches(input,#"(?<!array)\[.*?\]")
.Cast<Match>()
.Select(x=>x.Value)
.ToArray();
OR
string[] output=Regex.Split(input,#";?array[\d+]=");
string str = "array[0]=[1,a,3,4,asdf54,6];array[1]=[1aaa,2,4,k=6,2,8];array[2]=[2,3,2,3,2,3=3k3k]";
string[] m1 = str.Split(';');
List<string> m3 = new List<string>();
foreach (string ms in m1)
{
string[] m2 = ms.Split('=');
m3.Add(m2[1]);
}

How do I extract a string of text in c#

i am having trouble splitting a string in c#
have a string (text in textbox0)
start and dffdfdddddddfd<m>one</m><m>two</m><m>three</m><m>four</m>dbfjnbjvbnvbnjvbnv and end
and I want to extract the text between <m> and </m> when click in button1 and i need 3 output :
output 1 :
one two three four (output to textbox1)
output 2 :
four (output to textbox2)
output 3 :
one (output to textbox3)
what do i do ?
how would I do this?
please give me full code for button1_Click
thanks and regards.
You can try a regular expression to capture the four values in a list, either using LINQ:
List<string> results = Regex.Matches(s, "<m>(.*?)</m>")
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();
Or for C# 2.0:
List<string> results = new List<string>();
foreach (Match match in Regex.Matches(s, "<m>(.*?)</m>"))
{
results.Add(match.Groups[1].Value);
}
You can then use string.Join, Enumerable.First (or results[0]) and Enumerable.Last (or results[results.Length - 1]) to get the outputs you need.
If this is XML you should use an XML parser instead.
With customary warning against using Regex for XML and HTML:
You can extract text between <m> and </m> like so:
string input =
"start and dffdfdddddddfd<m>one</m><m>two</m><m>three</m><m>four</m>dbfjnbjvbnvbnjvbnv and end";
var matches = Regex.Matches(input, "<m>(.*?)</m>");
foreach (Match match in matches)
{
Console.WriteLine(match.Groups[1]);
}
using System;
using System.Linq;
using System.Xml.Linq;
class Program{
static void Main(string[] args){
string data = "start and dffdfdddddddfd<m>one</m><m>two</m><m>three</m><m>four</m>dbfjnbjvbnvbnjvbnv and end";
string xmlString = "<root>" + data + "</root>";
var doc = XDocument.Parse(xmlString);
var ie = doc.Descendants("m");
Console.Write("output1:");
foreach(var el in ie){
Console.Write(el.Value + " ");
}
Console.WriteLine("\noutput2:{0}",ie.Last().Value);
Console.WriteLine("output3:{0}",ie.First().Value);
}
}

Categories

Resources