C# - complicated regular expression

C# - complicated regular expression - c#

So I have quite a big problem...
I get a string like:
'x,y',2,4,'y,z'
And I need to seperate it into
'x,y'
2
4
'y,z'
Nothing I tried came anywhere near the expected result...
Thanks in advance!

If you're looking for a quick solution, try this (simple loop and no regular expressions):
private static IEnumerable<string> CsvSplitter(string source) {
if (string.IsNullOrEmpty(source))
yield break; //TODO: you may want to throw exception in case source == null
int lastIndex = 0;
bool inQuot = false;
for (int i = 0; i < source.Length; ++i) {
char c = source[i];
if (inQuot)
inQuot = c != '\'';
else if (c == '\'')
inQuot = true;
else if (c == ',') {
yield return source.Substring(lastIndex, i - lastIndex);
lastIndex = i + 1;
}
}
//TODO: you can well have invalid csv (unterminated quotation):
// if (inQuot)
// throw new FormatException("Incorrect CSV");
yield return source.Substring(lastIndex);
}
Sample:
string source = #"'x,y',2,4,'y,z',";
string[] result = CsvSplitter(source).ToArray();
Console.Write(string.Join(Environment.NewLine, result));
Output:
'x,y'
2
4
'y,z'
However, in general case google for CSV parser

If you wanna go the regex way, you can use
('.*?'|[^,]+)
and browse the capture groups, but I strongly recommend you to use a CSV parser.

If no nested quotes allowed, we can retrieve the required parts with a simple regex '.*?'|[^,]+:
var input = "'x,y',2,4,'y,z'";
var parts = Regex
.Matches(input, "'.*?'|[^,]+")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
Console.WriteLine(string.Join(Environment.NewLine, parts));
Demo: https://dotnetfiddle.net/qo5aHz
Although .NET flavour allows to elaborate a regex for nested quotes, it would be rather hard and therefore it's best to use a ready-made CSV parser. For example, TextFieldParser provided with .NET.

Related

Split a non delimited, and variable length string

given the following string
38051211)JLRx(0x04>0x01):JL_PAGE_INFO(0x63,0x00,0x01,0x03,0x00,0x73,0x00,0x00,0x0A,0x01,0x01,0xF2,0x01)
How can I split it so I can use each split in a listview column?
I can use split at the : for example but then I need to split at the next ( and each value after split using the ,.
Any advice would be greatly appreciated, its how I add the 2nd and 3rd and so on parts I am struggling with
{
if (line.Contains("JLTx"))
{
string[] JLTx = line.Split(new[] { ':' }, StringSplitOptions.RemoveEmptyEntries);
listView1.Items.Add(JLTx[0]);
listView1.Items[listView1.Items.Count - 1].SubItems.Add(JLTx[1]);
}
}
So using the following regex
Regex regex = new Regex(#"(.*)JLTx\((.*)\):(JL_[(A-Z)_]*)\((.*)\)");
I cant seem to split at the : as not in any of the matches. Where am I going wrong
Thanks all

As others have pointed out you have a lot of options for parsing this into some header friendly format. #Johns answer above would work if your JL_PAGE_INFO is stable for all input. You could also use a regex. A lot of it depends on how stable your input data is. Here is a example using string functions to create the list of headers you described.
static IEnumerable<string> Tokenize(string input)
{
if (string.IsNullOrEmpty(input))
yield break;
if (')' != input[input.Length - 1])
yield break;
int colon = input.IndexOf(':');
string pageData = input.Substring(colon + 1);
if (string.IsNullOrEmpty(pageData))
yield break;
int open = pageData.IndexOf('(');
if (colon != -1 && open != -1)
{
yield return input.Substring(0, colon+1);
foreach (var token in pageData.Substring(open+1, pageData.Length - (open + 1) - 1).Split(','))
yield return token;
}
}

If the second item of your split string - JLTx[1] - is always going to be JL_PAGE_INFO(...) I would try this:
string[] mystring = JLTx[1].Replace("JL_PAGE_INFO(","").Replace(")","")Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries);

Using regex on a specific setup

I know a bit about regular expressions, but far from enough to figure out this one.
I have tried to see if I could find something that could help me, but I got a hard time understanding how to construct the REGEX expression in c#.
Here is what I need.If I have a string like the following.
string s = "this is (a (string))"
What I need is to focus on the parentheses.
I want to be able to split this string up into the following List/Array "parts".
1) "this", "is", "a (string)"
or
2) "this", "is", "(a (string))".
would both like how to do it with 1) and 2). Anyone got an idea of how to solve this problem?
Can this be solved using REGEX? Anyone knows a good guide to learn about it?
Hope someone can help.
Greetings.

If you want to split with some kind of escape (do not count for space if it's within parentheses) you
can easily implement something like this, easy loop without regular expressions:
private static IEnumerable<String> SplitWithEscape(String source) {
if (String.IsNullOrEmpty(source))
yield break;
int escapeCount = 0;
int start = 0;
for (int i = 0; i < source.Length; ++i) {
char ch = source[i];
if (escapeCount > 0) {
if (ch == '(')
escapeCount += 1;
else if (ch == ')')
escapeCount -= 1;
}
else {
if (ch == ' ') {
yield return source.Substring(start, i - start);
start = i;
}
else if (ch == '(')
escapeCount += 1;
}
}
if ((start < source.Length - 1) && (escapeCount == 0))
yield return source.Substring(start);
}
....
String source = "this is (a (string))";
String[] split = SplitWithEscape(source).ToArray();
Console.Write(String.Join("; ", split));

You can try something like this:
([^\(\s]+)\s+([^\(\s]+)\s+\((.*)\)
Regex Demo
But this will only match with fixed number of words in your input string, in this case, two words before the parentheses. The final regex will depend on what are your specifications.

.NET regex supports balanced constructs. Thus, you can always safely use .NET regex to match substrings between a balanced number of delimiters that may have something inside them.
So, you can use
\(((?>[^()]+|\((?<o>)|\)(?<-o>))*(?(o)(?!)))\)|\S+
to match parenthesized substrings (while capturing the contents in-between parentheses into Group 1) or match all non-whitespace chunks (\S+ matches 1+ non-whitespace symbols).
See Grouping Constructs in Regular Expressions, Matching Nested Constructs with Balancing Groups or What are regular expression Balancing Groups? for more details on how balancing groups work.
Here is a regex demo
If you need to extract all the match values and captured values, you need to get all matched groups that are not empty or whitespace. So, use this C# code:
var line = "this is (a (string))";
var pattern = #"\(((?>[^()]+|\((?<o>)|\)(?<-o>))*(?(o)(?!)))\)|\S+";
var result = Regex.Matches(line, pattern)
.Cast<Match>()
.SelectMany(x => x.Groups.Cast<Group>()
.Where(m => !string.IsNullOrWhiteSpace(m.Value))
.Select(t => t.Value))
.ToList();
foreach (var s in result) // DEMO
Console.WriteLine(s);

Maybe you can use ((?<=\()[^}]*(?=\)))|\W+ to split in words and then get the content in the group 1...
See this Regex

Data is changing from Upper case to lower case while doing Find and Replace

I'm working on FIND and REPLACE functionality in datagridview.
Below is the result after working on it.
S.No
-----
CODE0001
CODE0002
CODE0003
CODE0004
Where S.No is the column name.
When I FIND 0001 and ask to replace that with 1000, the result is,
S.No
-----
code1000
CODE0002
CODE0003
CODE0004
Find and Replace functionality is working but the text from UPPERCASE is changing to LOWERCASE.
Below is the code for Find and Repalce:
for (int i = 0; i <= dataGridView1.Rows.Count - 1; i++)
{
if (dataGridView1.Rows[i].Cells[f.cmbColumnCombo.Text].Value.ToString().ToLower().Contains(f.txtfind.Text.ToLower()))
{
dataGridView1.Rows[i].Cells[f.cmbColumnCombo.Text].Value = dataGridView1.Rows[i].Cells[f.cmbColumnCombo.Text].Value.ToString().ToLower().Replace(f.txtfind.Text.ToLower(), f.txtreplace.Text);
bulidDataRow(i);
}
}

Add .ToUpper(); at the end, after replace:
Value.ToString().ToLower().Replace(f.txtfind.Text.ToLower(), f.txtreplace.Text).ToUpper();

If you want your resulting string to be completely uppercase then add .ToUpper() to the result.
If you want to maintain case in your string then Replace is not capable. You will need to do something like this:
string x = Value.ToString();
string o = f.txtfind.Text.ToLower();
string n = f.txtreplace.Text;
while (x.ToLower().Contains(o))
{
x = x.SubString(0, x.ToLower().IndexOf(o)) + n + x.SubString(x.ToLower().IndexOf(o) + o.Length);
}

The problem is the use of ToLower to do a case-insensitive replace. You could use Regex.Replace instead, which allows you to specify RegexOptions.IgnoreCase. Perhaps like this:
var cell = dataGridView1.Rows[i].Cells[f.cmbColumnCombo.Text];
var oldValue = cell.Value.ToString();
cell.Value = Regex.Replace(cell.Value.ToString(), f.cmbColumnCombo.Text, f.txtreplace.Text, RegexOptions.IgnoreCase);
if ((string)cell.Value != oldValue)
bulidDataRow(i);

If you want an case insensitive search and replace, I think the easiest way is to use regular expressions.
for (int i = 0; i <= dataGridView1.Rows.Count - 1; i++)
{
if (dataGridView1.Rows[i].Cells[f.cmbColumnCombo.Text].Value.ToString()
.ToLower()
.Contains(f.txtfind.Text.ToLower()))
{
string input = dataGridView1.Rows[i].Cells[f.cmbColumnCombo.Text].Value.ToString();
string pattern = f.txtfind.Text.ToLower();
string replacement = f.txtreplace.Text;
string output = Regex.Replace(input, pattern, replacement, RegexOptions.IgnoreCase);
dataGridView1.Rows[i].Cells[f.cmbColumnCombo.Text].Value = output
bulidDataRow(i);
}
}
If you want all thing in upper case, you can use:
dataGridView1.Rows[i].Cells[f.cmbColumnCombo.Text].Value = dataGridView1.Rows[i].Cells[f.cmbColumnCombo.Text].Value.ToString().ToUpper().Replace(f.txtfind.Text.ToUpper(), f.txtreplace.Text);

Regex patterns C#

I want to validate a string in such a manner that in that string, if a "-" is present it should have an alphabet before and after it.
But I am unable to form the regex pattern.
Can anyone please help me for the same.

Rather than using a regex to check this I think I would write an extension method using Char.IsLetter(). You can handle multiple dashes then, and use languages other than English.
public static bool IsValidDashedString(this String text)
{
bool temp = true;
//retrieve the location of all the dashes
var indexes = Enumerable.Range(0, text.Length)
.Where(i => text[i] == '-')
.ToList();
//check if any dashes occur, if they are the 1st character or the last character
if (indexes.Count() == 0 ||
indexes.Any(i => i == 0) ||
indexes.Any(i => i == text.Length-1))
{
temp = false;
}
else //check if each dash is preceeded and followed by a letter
{
foreach (int i in indexes)
{
if (!Char.IsLetter(text[i - 1]) || !Char.IsLetter(text[i + 1]))
{
temp = false;
break;
}
}
}
return temp;
}

The following will match a string with one alphabetic character before the "-" and one after:
[A-z]-[A-z]
You may need to first test whether there is "-" present if that is not always the case. Could do with more information about the possible string contents and exactly why you need to perform the test

(^.+)(\D+)(-)(\D+)(.+)
I have tested this for some examples here http://regexr.com/39vfq

How do I verify that a string is in English?

I read a string from the console. How do I make sure it only contains English characters and digits?

Assuming that by "English characters" you are simply referring to the 26-character Latin alphabet, this would be an area where I would use regular expressions: ^[a-zA-Z0-9 ]*$
For example:
if( Regex.IsMatch(Console.ReadLine(), "^[a-zA-Z0-9]*$") )
{ /* your code */ }
The benefit of regular expressions in this case is that all you really care about is whether or not a string matches a pattern - this is one where regular expressions work wonderfully. It clearly captures your intent, and it's easy to extend if you definition of "English characters" expands beyond just the 26 alphabetic ones.
There's a decent series of articles here that teach more about regular expressions.
Jørn Schou-Rode's answer provides a great explanation of how the regular expression presented here works to match your input.

You could match it against this regular expression: ^[a-zA-Z0-9]*$
^ matches the start of the string (ie no characters are allowed before this point)
[a-zA-Z0-9] matches any letter from a-z in lower or upper case, as well as digits 0-9
* lets the previous match repeat zero or more times
$ matches the end of the string (ie no characters are allowed after this point)
To use the expression in a C# program, you will need to import System.Text.RegularExpressions and do something like this in your code:
bool match = Regex.IsMatch(input, "^[a-zA-Z0-9]*$");
If you are going to test a lot of lines against the pattern, you might want to compile the expression:
Regex pattern = new Regex("^[a-zA-Z0-9]*$", RegexOptions.Compiled);
for (int i = 0; i < 1000; i++)
{
string input = Console.ReadLine();
pattern.IsMatch(input);
}

The accepted answer does not work for the white spaces or punctuation. Below code is tested for this input:
Hello: 1. - a; b/c \ _(5)??
(Is English)
Regex regex = new Regex("^[a-zA-Z0-9. -_?]*$");
string text1 = "سلام";
bool fls = regex.IsMatch(text1); //false
string text2 = "123 abc! ?? -_)(/\\;:";
bool tru = regex.IsMatch(text2); //true

One other way is to check if IsLower and IsUpper both doesn't return true.
Something like :
private bool IsAllCharEnglish(string Input)
{
foreach (var item in Input.ToCharArray())
{
if (!char.IsLower(item) && !char.IsUpper(item) && !char.IsDigit(item) && !char.IsWhiteSpace(item))
{
return false;
}
}
return true;
}
and for use it :
string str = "فارسی abc";
IsAllCharEnglish(str); // return false
str = "These are english 123";
IsAllCharEnglish(str); // return true

Do not use RegEx and LINQ they are slower than the loop by characters of string
Performance test
My solution:
private static bool is_only_eng_letters_and_digits(string str)
{
foreach (char ch in str)
{
if (!(ch >= 'A' && ch <= 'Z') && !(ch >= 'a' && ch <= 'z') && !(ch >= '0' && ch <= '9'))
{
return false;
}
}
return true;
}

do you have web access? i would assume that cannot be guaranteed, but Google has a language api that will detect the language you pass to it.
google language api

bool onlyEnglishCharacters = !EnglishText.Any(a => a > '~');
Seems cheap, but it worked for me, legit easy answer.
Hope it helps anyone.

bool AllAscii(string str)
{
return !str.Any(c => !Char.IsLetterOrDigit(c));
}

Something like this (if you want to control input):
static string ReadLettersAndDigits() {
StringBuilder sb = new StringBuilder();
ConsoleKeyInfo keyInfo;
while ((keyInfo = Console.ReadKey(true)).Key != ConsoleKey.Enter) {
char c = char.ToLower(keyInfo.KeyChar);
if (('a' <= c && c <= 'z') || char.IsDigit(c)) {
sb.Append(keyInfo.KeyChar);
Console.Write(c);
}
}
return sb.ToString();
}

If i dont wnat to use RegEx, and just to provide an alternate solution, you can just check the ASCII code of each character and if it lies between that range, it would either be a english letter or a number (This might not be the best solution):
foreach (char ch in str.ToCharArray())
{
int x = (int)char;
if (x >= 63 and x <= 126)
{
//this is english letter, i.e.- A, B, C, a, b, c...
}
else if(x >= 48 and x <= 57)
{
//this is number
}
else
{
//this is something diffrent
}
}
http://en.wikipedia.org/wiki/ASCII for full ASCII table.
But I still think, RegEx is the best solution.

I agree with the Regular Expression answers. However, you could simplify it to just "^[\w]+$". \w is any "word character" (which translates to [a-zA-Z_0-9] if you use a non-unicode alphabet. I don't know if you want underscores as well.
More on regexes in .net here: http://msdn.microsoft.com/en-us/library/ms972966.aspx#regexnet_topic8

As many pointed out, accepted answer works only if there is a single word in the string. As there are no answers that cover the case of multiple words or even sentences in the string, here is the code:
stringToCheck.Any(x=> char.IsLetter(x) && !((int)x >= 63 && (int)x <= 126));

<?php
$string="हिन्दी";
$string="Manvendra Rajpurohit";
echo strlen($string); echo '<br>';
echo mb_strlen($string, 'utf-8');
echo '<br>';
if(strlen($string) != mb_strlen($string, 'utf-8'))
{
echo "Please enter English words only:(";
}
else {
echo "OK, English Detected!";
}
?>

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# - complicated regular expression - c#

So I have quite a big problem... I get a string like: 'x,y',2,4,'y,z' And I need to seperate it into 'x,y' 2 4 'y,z' Nothing I tried came anywhere near the expected result... Thanks in advance!

If you wanna go the regex way, you can use ('.*?'|[^,]+) and browse the capture groups, but I strongly recommend you to use a CSV parser.

Related

Split a non delimited, and variable length string

Using regex on a specific setup

Data is changing from Upper case to lower case while doing Find and Replace

Regex patterns C#

How do I verify that a string is in English?

Categories

Resources