I made a program in console that splits two texts from a file in every line that are divided with ":", and checks if they meet the requirements. Every line in the file has a syntax "xxxxx:xxxxx".
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
string textone, texttwo, filename;
Regex reg = new Regex("/W");//svi characteri osim A-Z,a-z,0-9
Regex numb = new Regex("[a-zA-Z]");
Regex numbek = new Regex("[0-9]");
Regex donjacrta = new Regex("_");
filename = Console.ReadLine();//should load richtextbox instead
string[] linije = System.IO.File.ReadAllLines(filename);
for (int i = 0; i < linije.Length; i++)
{
string trenutni = linije[i];
int indexx = trenutni.IndexOf(':');
textone = trenutni.Substring(0, indexx);
texttwo = trenutni.Substring((indexx + 1), (trenutni.Length) - (indexx + 1));
if (textone.Length < 3 || textone.Length > 25 || reg.IsMatch(textone) || donjacrta.IsMatch(textone) || !numb.IsMatch(textone) || !numbek.IsMatch(textone))
{
continue;
}
else if (texttwo.Length < 3 || texttwo.Length > 30 )
{
continue;
}
else
{
Console.WriteLine(textone + ":" + texttwo);
}
}
}
}
}
(when i try to format the code here it deletes/hides some of the code, dont know why)
In my WindowsForms, I first load the file into a RichTextBox. From there i need to connect it somehow and make it either:
clear the whole richtextbox and start typing only the valid lines
delete the invalid lines.
Brief
It took me some time to understand what you're trying to do and what you want, but, I think I found your solution. You can pretty much remove all the code you've written and replace it with a single regular expression.
What I believe you are trying to do:
Match strings that are split by : (i.e. xxxxx:xxxxx - where x is defined below) and each string consumes a single row in a text file
Ensure both sections (before and after the colon :) match a-zA-Z0-9 ONLY (no other character)
Ensure the first section is between 3 and 25 characters
Ensure the second section is between 3 and 30 characters
Code
See this code in use here
^([a-zA-Z\d]{3,25}):([a-zA-Z\d]{3,30})$
For a sample C# program, you can use the following. Obviously, you would replace the logic to pull from a text file rather than a string.
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"^([a-zA-Z\d]{3,25}):([a-zA-Z\d]{3,30})$";
string input = #"xxxxx:xxxxx
1adfasfdfasdfsdfsfsfssfsd:1asfdsfsdfsafsdfsadfdfsdfadf2s
sfsd12321:12sfs3123
#342fdfasd:1dsadafdsfs";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
Results
Input
xxxxx:xxxxx
1adfasfdfasdfsdfsfsfssfsd:1asfdsfsdfsafsdfsadfdfsdfadf2s
sfsd12321:12sfs3123
#342fdfasd:1dsadafdsfs
Output
The below are matched strings. Each string
Full match: xxxxx:xxxxx
Group 1: xxxxx
Group 2: xxxxx
Full match: 1adfasfdfasdfsdfsfsfssfsd:1asfdsfsdfsafsdfsadfdfsdfadf2s
Group 1: 1adfasfdfasdfsdfsfsfssfsd
Group 2: 1asfdsfsdfsafsdfsadfdfsdfadf2s
Full match: sfsd12321:12sfs3123
Group 1: sfsd12321
- Group 2: 12sfs3123
Explanation
Assert position at the beginning of the line
Match and capture into group 1: Any character in the set a-zA-Z\d between 3 and 25 times
Match a colon :
Match and capture into group 2: Any character in the set a-zA-Z\d between 3 and 30 times
Assert position at the end of the line
Related
I have a problem to find the pattern that solves the problem in onestep.
The string looks like this:
Text1
Text1$Text2$Text3
Text1$Text2$Text3$Text4$Text5$Text6 etc.
What i want to get is: Take up to 4x Text. If there are more than "4xText" take only the last sign.
Example:
Text1$Text2$Text3$Text4$Text5$Text6 -> Text1$Text2$Text3$Text4&56
My current solution is:
First pattern:
^([^\$]*)\$?([^\$]*)\$?([^\$]*)\$?([^\$]*)\$?
After this i will do a substitution with the first pattern
New string: Text5$Text6
second pattern is:
([^\$])\b
result: 56
combine both and get the result:
Text1$Text2$Text3$Text4$56
For me it is not clear why i cant easily put the second pattern after the first pattern into one pattern. Is there something like an anchor that tells the engine to start the pattern from here like it would do if is would be the only pattern ?
You might use an alternation with a positive lookbehind and then concatenate the matches.
(?<=^(?:[^$]+\$){0,3})[^$]+\$?|[^$](?=\$|$)
Explanation
(?<= Positive lookbehind, assert what is on the left is
^(?:[^$]+\$){0,3} Match 0-3 times any char except $ followed by an optional $
) Close lookbehind
[^$]+\$? Match 1+ times any char except $, then match an optional $
| Or
[^$] Match any char except $
(?=\$|$) Positive lookahead, assert what is directly to the right is either $ or the end of the string
.NET regex demo | C# demo
Example
string pattern = #"(?<=^(?:[^$]*\$){0,3})[^$]*\$?|[^$](?=\$|$)";
string[] strings = {
"Text1",
"Text1$Text2$Text3",
"Text1$Text2$Text3$Text4$Text5$Text6"
};
Regex regex = new Regex(pattern);
foreach (String s in strings) {
Console.WriteLine(string.Join("", from Match match in regex.Matches(s) select match.Value));
}
Output
Text1
Text1$Text2$Text3
Text1$Text2$Text3$Text4$56
I strongly believe regular expression isn't the way to do that. Mostly because of the readability.
You may consider using simple algorithm like this one to reach your goal:
using System;
public class Program
{
public static void Main()
{
var input = "Text1$Text2$Text3$Text4$Text5$Text6";
var parts = input.Split('$');
var result = "";
for(var i=0; i<parts.Length; i++){
result += (i <= 4 ? parts[i] + "$" : parts[i].Substring(4));
}
Console.WriteLine(result);
}
}
There are also linq alternatives :
using System;
using System.Linq;
public class Program
{
public static void Main()
{
var input = "Text1$Text2$Text3$Text4$Text5$Text6";
var parts = input.Split('$');
var first4 = parts.Take(4);
var remainings = parts.Skip(4);
var result2 = string.Join("$", first4) + "$" + string.Join("", remainings.Select( r=>r.Substring(4)));
Console.WriteLine(result2);
}
}
It has to be adjusted to the actual needs but the idea is there
Try this code:
var texts = new string[] {"Text1", "Text1$Text2$Text3", "Text1$Text2$Text3$Text4$Text5$Text6" };
var parsed = texts
.Select(s => Regex.Replace(s,
#"(Text\d{1,3}(?:\$Text\d{1,3}){0,3})((?:\$Text\d{1,3})*)",
(match) => match.Groups[1].Value +"$"+ match.Groups[2].Value.Replace("Text", "").Replace("$", "")
)).ToArray();
// parsed is now: string[3] { "Text1$", "Text1$Text2$Text3$", "Text1$Text2$Text3$Text4$56" }
Explanation:
solution uses regex pattern: (Text\d{1,3}(?:\$Text\d{1,3}){0,3})((?:\$Text\d{1,3})*)
(...) - first capturing group
(?:...) - non-capturing group
Text\d{1,3}(?:\$Text\d{1,3} - match Text literally, then match \d{1,3}, which is 1 up to three digits, \$ matches $ literally
Rest is just repetition of it. Basically, first group captures first four pieces, second group captures the rest, if any.
We also use MatchEvaluator here which is delegate type defined as:
public delegate string MatchEvaluator(Match match);
We define such method:
(match) => match.Groups[1].Value +"$"+ match.Groups[2].Value.Replace("Text", "").Replace("$", "")
We use it to evaluate match, so takee first capturing group and concatenate with second, removing unnecessary text.
It's not clear to me whether your goal can be achieved using exclusively regex. If nothing else, the fact that you want to introduce a new character '&' into the output adds to the challenge, since just plain matching would never be able to accomplish that. Possibly using the Replace() method? I'm not sure that would work though...using only a replacement pattern and not a MatchEvaluator, I don't see a way to recognize but still exclude the "$Text" portion from the fifth instance and later.
But, if you are willing to mix regex with a small amount of post-processing, you can definitely do it:
static readonly Regex regex1 = new Regex(#"(Text\d(?:\$Text\d){0,3})(?:\$Text(\d))*", RegexOptions.Compiled);
static void Main(string[] args)
{
for (int i = 1; i <= 6; i++)
{
string text = string.Join("$", Enumerable.Range(1, i).Select(j => $"Text{j}"));
WriteLine(KeepFour(text));
}
}
private static string KeepFour(string text)
{
Match match = regex1.Match(text);
if (!match.Success)
{
return "[NO MATCH]";
}
StringBuilder result = new StringBuilder();
result.Append(match.Groups[1].Value);
if (match.Groups[2].Captures.Count > 0)
{
result.Append("&");
// Have to iterate (join), because we don't want the whole match,
// just the captured text.
result.Append(JoinCaptures(match.Groups[2]));
}
return result.ToString();
}
private static string JoinCaptures(Group group)
{
return string.Join("", group.Captures.Cast<Capture>().Select(c => c.Value));
}
The above breaks your requirement into three different capture groups in a regex. Then it extracts the captured text, composing the result based on the results.
Let's say I have a sentence like this:
Regex for taking out words out of a string from a specific position
I need to write a regex that would, combined with a for loop, take out first 3 words from the beginning of the sentence at first (0) loop.
As the loop goes on, the regex would move onto the next part of the sentence, the regex skips the first word and takes the next 3 words in string.
So for example:
1st loop I'd get: "Regex for taking";
2nd loop I'd get: "for taking out";
3rd loop I'd get: "taking out words";
and so on till the end of the string.
I've figured out how to take a first word out of the string, but that's pretty much it, I'm very new to Regex, and I've done it like this:
^([\w\-]+)
But this isn't what I need.
Here is a non regex solution.
public static IEnumerable<List<string>> StrangeLoop(string source)
{
// If word separators are anything other than whitespaces
// then change parameters for Split
var words = source.Split(null);
for (int i = 0; i < words.Length - 2; i++)
{
yield return new List<string>() { words[i], words[i + 1], words[i + 2] };
}
}
var sentence = "Regex for taking out words out of a string from a specific position";
foreach (var triad in StrangeLoop(sentence))
{
//use triad
}
I suggest separating data generation (regular expression or even just a Split(' ')) and data representation (sliding window):
public static IEnumerable<T[]> SlidingWindow<T>(this IEnumerable<T> source,
int windowSize) {
if (null == source)
throw new ArgumentException("source");
else if (windowSize <= 0)
throw new ArgumentOutOfRangeException("windowSize",
"Window size must be positive value");
List<T> window = new List<T>(windowSize);
foreach (var item in source) {
if (window.Count >= windowSize) {
yield return window.ToArray();
window.RemoveAt(0);
}
window.Add(item);
}
// Or (window.Count >= windowSize) if you don't want partial windows
if (window.Count > 0)
yield return window.ToArray();
}
Using SlidingWindow all you have to do is to generate the matches as usual, and then represent them in a different manner (just an additional line).
var sentence = "Regex for taking out words out of a string from a specific position";
// Regex solution: get all matches as usual...
var result = Regex
.Matches(sentence, #"[\w\-]+") // you don't want ^ anchor
.OfType<Match>()
.Select(match => match.Value)
.SlidingWindow(3); // and represent them as sliding windows..
var test = String.Join(Environment.NewLine, result
.Select(line => $"[{string.Join(" ", line)}]"));
Console.Write(test);
The output is
[Regex for taking]
[for taking out]
[taking out words]
[out words out]
[words out of]
[out of a]
[of a string]
[a string from]
[string, from, a]
[from a specific]
[a specific position]
If you happen to shift from regular expressions, to say, a simple Split you'll do it easily:
// Split solution: as usual + final representation as sliding window
var result = sentence
.Split(' ') // just split...
.SlidingWindow(3); // ... and represent as sliding windows
I'm trying to find best solution to verify input document. I need to check every line of the document. Basically in each line can exist invalid character or characters. The result of searching (validating) is: 'get me the index of line with invalid char and index of each invalid character in this line'.
I know how to do in standard way (open file -> read all lines -> check characters one by one), but this method isn't best optimized way. Instead of this, the best solution will be to use "MatchCollection" (in my opinion).
But how to do this correctly in C# ?
Link:
http://www.dotnetperls.com/regex-matches
Example:
"Some Înput text here,\n Îs another lÎne of thÎs text."
In first line [0] found invalid character on [6] index, in line [1]
found invalid characters on [0, 12, 21] index.
using System;
using System.Text.RegularExpressions;
namespace RegularExpresion
{
class Program
{
private static Regex regex = null;
static void Main(string[] args)
{
string input_text = "Some Înput text here, Îs another lÎne of thÎs text.";
string line_pattern = "\n";
string invalid_character = "Î";
regex = new Regex(line_pattern);
/// Check is multiple or single line document
if (IsMultipleLine(input_text))
{
/// ---> How to do this correctly for each line ? <---
}
else
{
Console.WriteLine("Is a single line file");
regex = new Regex(invalid_character);
MatchCollection mc = regex.Matches(input_text);
Console.WriteLine($"How many matches: {mc.Count}");
foreach (Match match in mc)
Console.WriteLine($"Index: {match.Index}");
}
Console.ReadKey();
}
public static bool IsMultipleLine(string input) => regex.IsMatch(input);
}
}
Output:
Is a single line file
How many matches: 4
Index: 5
Index: 22
Index: 34
Index: 43
Link:
http://www.dotnetperls.com/regexoptions-multiline
SOLUTION
using System;
using System.Text.RegularExpressions;
namespace RegularExpresion
{
class Program
{
private static Regex regex = null;
static void Main(string[] args)
{
string input_text = #"Some Înput text here,
Îs another lÎne of thÎs text.";
string line_pattern = "\n";
string invalid_character = "Î";
regex = new Regex(line_pattern);
/// Check is multiple or single line document
if (IsMultipleLine(input_text))
{
Console.WriteLine("Is a multiple line file");
MatchCollection matches = Regex.Matches(input_text, "^(.+)$", RegexOptions.Multiline);
int line = 0;
foreach (Match match in matches)
{
foreach (Capture capture in match.Captures)
{
line++;
Console.WriteLine($"Line: {line}");
RegexpLine(capture.Value, invalid_character);
}
}
}
else
{
Console.WriteLine("Is a single line file");
RegexpLine(input_text, invalid_character);
}
Pause();
}
public static bool IsMultipleLine(string input) => regex.IsMatch(input);
public static void RegexpLine(string line, string characters)
{
regex = new Regex(characters);
MatchCollection mc = regex.Matches(line);
Console.WriteLine($"How many matches: {mc.Count}");
foreach (Match match in mc)
Console.WriteLine($"Index: {match.Index}");
}
public static ConsoleKeyInfo Pause(string message = "please press ANY key to continue...")
{
Console.WriteLine(message);
return Console.ReadKey();
}
}
}
Thx guys for help, basically will be nice if someone smarter then me, check this code in terms of performance.
Regards,
Nerus.
My approach would be split the string into array of string, each contains a line. If the length of the array is just 1, that means you have only 1 line. Then from there you use the Regex to match each line to find the invalid character that you are looking for.
string input_text = "Some Înput text here,\nÎs another lÎne of thÎs text.";
string line_pattern = "\n";
// split the string into string arrays
string[] input_texts = input_text.Split(new string[] { line_pattern }, StringSplitOptions.RemoveEmptyEntries);
string invalid_character = "Î";
if (input_texts != null && input_texts.Length > 0)
{
if (input_texts.Length == 1)
{
Console.WriteLine("Is a single line file");
}
// loop every line
foreach (string oneline in input_texts)
{
Regex regex = new Regex(invalid_character);
MatchCollection mc = regex.Matches(oneline);
Console.WriteLine("How many matches: {0}", mc.Count);
foreach (Match match in mc)
{
Console.WriteLine("Index: {0}", match.Index);
}
}
}
--- EDIT ---
Things to consider:
If you get your input from a file, I would recommend you to read line by line, not the whole text.
Usually, when you search for invalid character, you don't specify it. Instead you look for a pattern. For ex: Not a char from a-z, A-Z, 0-9. Then your regex is going to be a little bit different.
Assuming I have this input:
/green/blah/agriculture/apple/blah/
I'm only trying to capture and replace the occurrence of apple (need to replace it with orange), so I have this regex
var regex = new Regex("^/(?:green|red){1}(?:/.*)+(apple){1}(?:/.*)");
So I'm grouping sections of the input, but as non-capturing, and only capturing the one I'm concerned with. According to this $` will retrieve everything before the match in the input string, and $' will get everything after, so theoretically the following should work:
"$`Orange$'"
But it only retrieves the match ("apple").
Is it possible to do this with just substitutions and NOT match evaluators and looping through groups?
The issue is that apple can occur anywhere in that url scheme, hence an unknown number of capture groups.
Thanks.
To achieve what you want, I slightly changed your regex.
The new regex looks like this look for the updated version at the end of the answer:
What I am doing here is, I want all the other groups to become captured groups. Doing this I can use them as follow:
String replacement = "$1Orange$2";
string result = Regex.Replace(text, regex.ToString(), replacement);
I am using group 1,2 and 4 and in the middle of everything (where I suspect 'apple') I replace it with Orange.
A complete example looks like this:
using System;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
String text = "/green/blah/agriculture/apple/blah/hallo/apple";
var regex = new Regex("^(/(?:green|red)/(?:[^/]+/)*?)apple(/.*)");
String replacement = "$1$2Orange$4";
string result = Regex.Replace(text, regex.ToString(), replacement);
Console.WriteLine(result);
}
}
And as well a running example is here
See the updated regex, I needed to change it again to capture things like this:
/green/blah/agriculture/apple/blah/hallo/apple/green/blah/agriculture/apple/blah/hallo/apple
With the above regex it matched the last apple and not the first as prio designated. I changed the regex to this:
var regex = new Regex("^(/(?:green|red)/(?:[^/]+/)*?)apple(/.*)");
I updated the code as well as the running example.
If you really want to replace only the first occurence of apple and dont mind about the URL structure then can you use one of the following methods:
First simply use apple as regex and use the overloaded Replace method.
using System;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
String text = "/green/blah/agriculture/apple/blah/hallo/apple/green/blah/agriculture/apple/blah/hallo/apple";
var regex = new Regex(Regex.Escape("apple"));
String replacement = "Orange";
string result = regex.Replace(text, replacement.ToString(), 1);
Console.WriteLine(result);
}
}
See working Example
Second is the use of IndexOf and Substring which could be much quick as the use of the regex classes.
See the following Example:
class Program
{
static void Main(string[] args)
{
string search = "apple";
string text = "/green/blah/agriculture/apple/blah/hallo/apple/green/blah/agriculture/apple/blah/hallo/apple";
int idx = text.IndexOf(search);
int endIdx = idx + search.Length;
int secondStrLen = text.Length - endIdx;
if (idx != -1 && idx < text.Length && endIdx < text.Length && secondStrLen > -1)
{
string first = text.Substring(0, idx);
string second = text.Substring(endIdx, secondStrLen);
string result = first + "Orange" + second;
Console.WriteLine(result);
}
}
}
Working Example
How can I find middle character with regex only
For example,this shows the expected output
Hello -> l
world -> r
merged -> rg (see this for even number of occurances)
hi -> hi
I -> I
I tried
(?<=\w+).(?=\w+)
Regular expressions cannot count in the way that you are looking for. This looks like something regular expressions cannot accomplish. I suggest writing code to solve this.
String str="Hello";
String mid="";
int len = str.length();
if(len%2==1)
mid= Character.toString(str.getCharAt(len/2));
else
mid= Character.toString(str.getChatAt(len/2))+ Character.toStringstr.getCharAt((len/2)-1));
This should probably work.
public static void main(String[] args) {
String s = "jogijogi";
int size = s.length() / 2;
String temp = "";
if (s.length() % 2 == 0) {
temp = s.substring(size - 1, (s.length() - size) + 1);
} else if (s.length() % 2 != 0) {
temp = s.substring(size, (s.length() - size));
} else {
temp = s.substring(1);
}
System.out.println(temp);
}
Related: How to match the middle character in a string with regex?
The following regex is based on #jaytea's approach and works well with e.g. PCRE, Java or C#.
^(?:.(?=.+?(.\1?$)))*?(^..?$|..?(?=\1$))
Here is the demo at regex101 and a .NET demo at RegexPlanet (click the green ".NET" button)
Middle character(s) will be found in the second capturing group. The goal is to capture two middle characters if there is an even amount of characters, else one. It works by a growing capture towards the end (first group) while lazily going through the string until it ends with the captured substring that grows with each repitition. ^..?$ is used to match strings with one or two characters length.
This "growing" works with capturing inside a repeated lookahead by placing an optional reference to the same group together with a freshly captured character into that group (further reading here).
A PCRE-variant with \K to reset and full matches: ^(?:.(?=.+?(.\1?$)))+?\K..?(?=\1$)|^..?
Curious about the "easy solution using balancing groups" that #Qtax mentions in his question.