how to select only status using regular expression - c#

Hi coders I have a string of statement where I have to fetch only status in it
TKT-2297433475972 RCI- 1A LOC-696L9U
OD-NYCNYC SI- FCMI-K POI-DCA DOI-02JUN14 IOI-49881134
1.SINGH/BALJINDER ADT ST
1 OJFK KU 102 C 03JUN2145 OK CLRTUS5 F 2PC // Want this F
2 XKWI KU 381 C 04JUN2230 OK CLRTUS5 F 2PC // Want this F
3 ODEL KU 382 C 20JUN0555 OK CLRTUS5 O 2PC // Want this 0
4 XKWI KU 117 C 20JUN0905 OK CLRTUS5 I 08JUN 2PC // Want this I
JFK
FARE I IT // Don't want this I
TOTALTAX USD 462.04
TOTAL IT
/FC NYC KU X/KWI KU DEL M/BT CLRTUS5 KU X/KWI KU NYC M/BT CLRTUS
5 END XT5.50YC17.50US17.50US5.00XA7.00XY2.50AY1.60YX25.08IN11.51
YM20.70IN3.65WO4.50XF JFK4.5
TKT-0017434068283-284 RCI- 1A LOC-69DQPQ
OD-AUSAUS SI- FCMI-I POI-DCA DOI-03JUN14
1.MIMBELA/JAIME JOSE ADT ST
1 OAUS AA 290 M 02JUL1005 OK MHXE1NA/CONS O 02JUL02JUL 1PC // Want this 0
2 XJFK AA 66 M 02JUL1730 OK MHXE1NA/CONS O 02JUL02JUL 1PC // Want this 0
3 BCN ARNK
4 OFCO AA 111 L 29JUL1115 OK LHXETNI4/CONS O 29JUL29JUL 1PC // Want this 0
5 XORD AA1398 L 29JUL2040 OK LHXETNI4/CONS O 29JUL29JUL 1PC // Want this 0
AUS
FARE I IT// Don't want this I
TOTALTAX USD 639.21
TOTAL IT
in the above string i have to fetch only ST
like this for first
F
F
O
I
At present i m using this regex (?:\s[A-Z]{1}\s\s)
by using this regex it also give me value of I in front of FARE which i don't want how can i get only ST but not I present in after FARE.
Any help will be appreciated

You know, we are just guessing, but what's with that:
^\d .+?\W{2}([A-Z]) \W{2}
Get the second matching group for that
Here a visual explanation: https://www.debuggex.com/r/kAjngGwRrsHQARYE
To be precise: This works with Ignore Case but which is more important: The Multi line switch must be set so that ^ matches the beginning of a line.
Here is C# Code for you
var foo = "1 OJFK KU 102 C 03JUN2145 OK CLRTUS5 F 2PC";
var r = new Regex(#"^\d .+?\W{2}([A-Z]) \W{2}", RegexOptions.IgnoreCase | RegexOptions.Multiline);
var m = r.Matches(foo);
Console.WriteLine(m[0].Groups[1]);

Hi samuel and anurag thanks for your help with regex of anurag and idea of samuel i made some change and get the out what i desire. This is my code:
string foo = #"
TKT-2207434010779-780 RCI- 1A LOC-68XMG5
OD-HOUHOU SI- FCMI-F POI-NYC DOI-02JUN14 IOI-10729143
1.WAHBA/REDA ADT ST
1 OIAH LH7620UA L 07JUN2050 OK LHHNC5N/CN05 F 07JUN07JUN 1PC
2 XLHR LH 911 L 08JUN1645 OK LHHNC5N/CN05 F 08JUN08JUN 1PC
3 XFRA LH 584 L 08JUN2130 OK LHHNC5N/CN05 F 08JUN08JUN 1PC
4 OCAI LH 585 L 06JUL0340 OK LHHNC5N/CN05 O 06JUL06JUL 1PC
5 XFRA LH 440 L 06JUL1000 OK LHHNC5N/CN05 O 06JUL06JUL 1PC
IAH
FARE F USD 522.00
TOTALTAX USD 656.80
TOTAL USD 1178.8
";
StringReader reader = new StringReader(foo);
string line;
List<string> list = new List<string>();
while (null != (line = reader.ReadLine()))
{
Regex r = new Regex(#"^\s\d.+?\W{1}([A-Z]) \W{2}", RegexOptions.Multiline);
if(r.IsMatch(line)){
MatchCollection m = r.Matches(line);
string value = m[0].Groups[1].ToString();
list.Add(value);
}
}

^\d+\s+\w+\s+\w+\s+\w+\s+\w \w+ \w+ \w+\s+(?<YourVar>\w+)
This would work. but it is hard to create a reg expression with just one sample. When using the regex above, every match should contain a group called "YourVar" which contains the desired value

I checked follwing in my code and it is working.
Regex.Match(s, #"\W{2}([A-Z])\W{2}").Value.Trim()

Related

To Count Occurrences of all sub strings in string C#

Question: I have a long string and I require to find the count of occurrences of all sub strings present under that string and print a list of all sub strings and their count (if count is > 1) in decreasing order of count.
Example:
String = "abcdabcd"
Result:
Substrings Count
abcd 2
abc 2
bcd 2
ab 2
bc 2
cd 2
a 2
b 2
c 2
d 2
Problem: My string can be 5000 character long and I am not able to find a efficient way to achieve this.( Efficiency is very important for application)
Is there any algorithm present or by multi threading it is possible. please help.
Example using: Find a common string within a list of strings
void Main()
{
"abcdabcd".getAllSubstrings()
.AsParallel()
.GroupBy(x => x)
.Select(g => new {g.Key, count=g.Count()})
.Dump();
}
// Define other methods and classes here
public static class Ext
{
public static IEnumerable<string> getAllSubstrings(this string word)
{
return from charIndex1 in Enumerable.Range(0, word.Length)
from charIndex2 in Enumerable.Range(0, word.Length - charIndex1 + 1)
where charIndex2 > 0
select word.Substring(charIndex1, charIndex2);
}
}
Produces:
a 2
dabc 1
abcdabc 1
b 2
abc 2
dabcd 1
bc 2
bcda 1
abcd 2
ab 2
bcdab 1
cdabc 1
abcda 1
d 2
bcdabc 1
dab 1
bcd 2
abcdab 1
c 2
bcdabcd 1
abcdabcd 1
cd 2
da 1
cdab 1
cda 1
cdabcd 1

what is the best way to split string in c#

I have a string that look like that "a,b,c,d,e,1,4,3,5,8,7,5,1,2,6.... and so on.
I am looking for the best way to split it and make it look like that:
a b c d e
1 4 3 5 8
7 5 1 2 6
Assuming, that you have a fix number of columns (5):
string Input = "a,b,c,d,e,11,45,34,33,79,65,75,12,2,6";
int i = 0;
string[][] Result = Input.Split(',').GroupBy(s => i++/5).Select(g => g.ToArray()).ToArray();
First I split the string by , character, then i group the result into chunks of 5 items and select those chunks into arrays.
Result:
a b c d e
11 45 34 33 79
65 75 12 2 6
to write that result into a file you have to
using (System.IO.StreamWriter writer =new System.IO.StreamWriter(path,false))
{
foreach (string[] line in Result)
{
writer.WriteLine(string.Join("\t", line));
}
};

Prepend string and Suffix sting to record using CSVHelper

I need to export Entities to a CSV File using CSVHelper. I made a trial work but I would have to write every field manually. What I want is to Write a record Prepended with either an 'H' or a 'D' and end every line with a single space. My Demo models:
PersonId FirstName LastName DateOfBirth
1 Randy Smith 1968-08-31
2 Zachary Smith 2002-01-10
3 Angie Smith 1969-11-20
4 Khelzie Smith 1996-07-27
AutoId Year Make Model OwnerId
1 2000 Toyota 4Runner 1
2 1995 Ford Mustang 1
3 2014 Chevrolet Corvette Stingray Coupe 2
4 2014 Volkswagen Beetle Coupe 4
5 1980 Ford F-150 2
6 1968 Chevrolet Camaro 3
7 2000 Tonka Truck 3
8 1993 Honda Accord 4
Into a CSV File Like this:
H 1 Randy Smith 8/31/1968
D 1 2000 Toyota 4Runner
D 2 1995 Ford Mustang
H 2 Zachary Smith 1/10/2002
D 3 2014 Chevy Corevett
D 5 1980 Ford F-150
H 3 Angie Smith 11/20/1969
D 6 1968 Chevrolet Camaro
D 7 2000 Tonka Truck
H 4 Khelzie Smith 7/27/1996
D 4 2014 Volkswagen Beetle Coupe
This is the Code I finally got to work:
StreamWriter textWriter = File.CreateText(fileName);
var csv = new CsvWriter(textWriter);
csv.Configuration.Delimiter = delimiter;
csv.Configuration.QuoteNoFields = true;
// This will skip those people who don't own a vehicle
foreach (Person person in people.Where(person => person.Vehicles.Count > 0))
{
// The letter 'H' must prefix every Header line
csv.WriteField((#"H " + person.PersonId));
csv.WriteField(person.FirstName);
csv.WriteField(person.LastName);
// Headers lines must end with a single space.
csv.WriteField((person.DateOfBirth.ToShortDateString() + " "));
csv.NextRecord();
foreach (Automobile auto in person.Vehicles)
{
// The letter 'D' must prefix every Detail line
csv.WriteField((#"D " + auto.AutoId));
csv.WriteField(auto.Year);
csv.WriteField(auto.Make);
// Details lines must end with a single space.
csv.WriteField((auto.Model + " "));
csv.NextRecord();
}
}
The real tables have ~70 fields apiece.
Just for those that have as thick a skull as mine, here is a solution:
foreach (TransactionHeader header in headers)
{
csv.WriteField("H");
csv.WriteRecord(header);
csv.WriteField(" ");
csv.NextRecord();
foreach (TransactionDetail detail in header.TransactionDetail)
{
csv.WriteField("D");
csv.WriteRecord(detail);
csv.WriteField(" ");
csv.NextRecord();
}
}
Thanks to everyone who saw this as pretty obvious and patiently waited for me to bash my head down on my desk enough times and then figure this out myself.

Remove single alphabets from a string

I need help in removing letters but not words from an incoming data string. Like the following,
String A = "1 2 3A 4 5C 6 ABCD EFGH 7 8D 9";
to
String A = "1 2 3 4 5 6 ABCD EFGH 7 8 9";
You need to match a letter and ensure that there is no letter before and after. So match
(?<!\p{L})\p{L}(?!\p{L})
and replace with an empty string.
Look around assertions on regular-expresssion.info
Unicode properties on regular-expresssion.info
In C#:
string s = "1 2 3A 4 5C 6 ABCD EFGH 7 8D 9";
string result = Regex.Replace(s, #"(?<!\p{L}) # Negative lookbehind assertion to ensure not a letter before
\p{L} # Unicode property, matches a letter in any language
(?!\p{L}) # Negative lookahead assertion to ensure not a letter following
", String.Empty, RegexOptions.IgnorePatternWhitespace);
The "obligatory" Linq approach:
string[] words = A.Split();
string result = string.Join(" ",
words.Select(w => w.Any(c => Char.IsDigit(c)) ?
new string(w.Where(c => Char.IsDigit(c)).ToArray()) : w));
This approach looks if each word contains a digit. Then it filters out the non-digit chars and creates a new string from the result. Otherwise it just takes the word.
And here comes the old school:
Dim A As String = "1 2 3A 4 5C 6 ABCD EFGH 7 8D 9"
Dim B As String = "1 2 3 4 5 6 ABCD EFGH 7 8 9"
Dim sb As New StringBuilder
Dim letterCount As Integer = 0
For i = 0 To A.Length - 1
Dim ch As Char = CStr(A(i)).ToLower
If ch >= "a" And ch <= "z" Then
letterCount += 1
Else
If letterCount > 1 Then sb.Append(A.Substring(i - letterCount, letterCount))
letterCount = 0
sb.Append(A(i))
End If
Next
Debug.WriteLine(B = sb.ToString) 'prints True

How to parse fixed width string with complex rules into component fields with regex

I need to parse fixed width records using c# and Regular Expressions.
Each record contains a number of fixed width fields, with each field potentially having non-trivial validation rules. The problem I'm having is with a match being applied across the fixed width field boundaries.
Without the rules it is easy to break apart a fixed width string of length 13 into 4 parts like this:
(?=^.{13}$).{1}.{5}.{6}.{1}
Here is a sample field rule:
Field can be all spaces OR start with [A-Z] and be right padded with spaces. Spaces cannot occur between letters
If the field was the only thing I have to validate I could use this:
(?=^[A-Z ]{5}$)([ ]{5}|[A-Z]+[ ]*)
When I add this validation as part of a longer list I have to remove the ^ and $ from the lookahead and I start to get matches that are not of length 5.
Here is the full regex along with some sample text that should match and not match the expression.
(?=^[A-Z ]{13}$)A(?=[A-Z ]{5})([ ]{5}|(?>[A-Z]{1,5})[ ]{0,4})(?=[A-Z ]{6})([ ]{6}|(?>[A-Z]{1,6})[ ]{0,5})Z
How do I implement the rules so that, for each field, the immediate next XX characters are used for the match and ensure that matches do not overlap?
Lines that should match:
ABCDEFGHIJKLZ
A Z
AB Z
A G Z
AB G Z
ABCDEF Z
ABCDEFG Z
A GHIJKLZ
AB GHIJKLZ
Lines that should not match:
AB D Z
AB D F Z
AB F Z
A G I Z
A G I LZ
A G LZ
AB FG LZ
AB D FG Z
AB FG I Z
AB D FG i Z
The following 3 should not match but do.
AB FG Z
AB FGH Z
AB EFGH Z
EDIT:
General solution (based on Ωmega's answer) with named captures for clarity:
(?<F1>F1Regex)(?<=^.{Len(F1)})
(?<F2>F2Regex)(?<=^.{Len(F1+F2)})
(?<F3>F3Regex)(?<=^.{Len(F1+F2+F3)})
...
(?<Fn>FnRegex)
Another example: Spaces between regex and zero-width positive lookback (?<= are for clarity.
(?<F1>\d{2}) (?<=^.{2})
(?<F2>[A-Z]{5}) (?<=^.{7})
(?<F3>\d{4}) (?<=^.{11})
(?<F4>[A-Z]{6}) (?<=^.{17})
(?<F5>\d{4})
If the input string is fixed in size, then you can match a specific position using look-aheads and look-behinds, like this:
(?<=^.{s})(?<fieldName>.*)(?=.{e}$)
where:
s = start position
e = string length - match length - s
If you concatenate multiple regexes, like this one, then you will get all the fields with specific positioning.
Example
Fixed length: 10
Field 1: start 0, length 3
Field 2: start 3, length 5
Field 3: start 8, length 2
Use this regex, ignoring white spaces:
var match = Regex.Match("0123456789", #"
(?<=^.{0})(?<name1>.*)(?=.{7}$)
(?<=^.{3})(?<name2>.*)(?=.{2}$)
(?<=^.{8})(?<name3>.*)(?=.{0}$)",
RegexOptions.IgnorePatternWhitespace)
var field1 = match.Groups["name1"].Value;
var field2 = match.Groups["name2"].Value;
var field3 = match.Groups["name3"].Value;
You can place whatever rule you want to match the fields.
I used .* for all of them, but you can place anything there.
Example 2
var match = Regex.Match(" 1a any-8888", #"
(?<=^.{0})(?<name1>\s*\d*[a-zA-Z])(?=.{9}$)
(?<=^.{3})(?<name2>.*)(?=.{4}$)
(?<=^.{8})(?<name3>(?<D>\d)\k<D>*)(?=.{0}$)
",
RegexOptions.IgnorePatternWhitespace)
var field1 = match.Groups["name1"].Value; // " 1a"
var field2 = match.Groups["name2"].Value; // " any-"
var field3 = match.Groups["name3"].Value; // "8888"
Here is your regex
I tested all of them, but the this sample is with the one you said should not pass, but passed... this time, it won't pass:
var match = Regex.Match("AB FG Z", #"
^A
(?<=^.{1}) (?<name1>([ ]{5}|(?>[A-Z]{1,5})[ ]{0,4})) (?=.{7}$)
(?<=^.{6}) (?<name2>([ ]{6}|(?>[A-Z]{1,6})[ ]{0,5})) (?=.{1}$)
Z$
",
RegexOptions.IgnorePatternWhitespace)
// no match with this input string
Match match = Regex.Match(
Regex.Replace(text, #"^(.)(.{5})(.{6})(.)$", "$1,$2,$3,$4"),
#"^[A-Z ],[A-Z]*[ ]*,[A-Z]*[ ]*,[A-Z ]$");
Check this code here.
I think it is possible to validate it by single regex pattern
^[A-Z ][A-Z]*[ ]*(?<=^.{6})[A-Z]*[ ]*(?<=^.{12})[A-Z ]$
If you need also capture all such groups, use
^([A-Z ])([A-Z]*[ ]*)(?<=^.{6})([A-Z]*[ ]*)(?<=^.{12})([A-Z ])$
I have already posted this before, but this answer is more specific to your question, and not generalized.
This solves all the cases you have presented in your question, the way you wanted.
Program to test all cases in your question
class Program
{
static void Main()
{
var strMatch = new string[]
{
// Lines that should match:
"ABCDEFGHIJKLZ",
"A Z",
"AB Z",
"A G Z",
"AB G Z",
"ABCDEF Z",
"ABCDEFG Z",
"A GHIJKLZ",
"AB GHIJKLZ",
};
var strNotMatch = new string[]
{
// Lines that should not match:
"AB D Z",
"AB D F Z",
"AB F Z",
"A G I Z",
"A G I LZ",
"A G LZ",
"AB FG LZ",
"AB D FG Z",
"AB FG I Z",
"AB D FG i Z",
// The following 3 should not match but do.
"AB FG Z",
"AB FGH Z",
"AB EFGH Z",
};
var pattern = #"
^A
(?<=^.{1}) (?<name1>([ ]{5}|(?>[A-Z]{1,5})[ ]{0,4})) (?=.{7}$)
(?<=^.{6}) (?<name2>([ ]{6}|(?>[A-Z]{1,6})[ ]{0,5})) (?=.{1}$)
Z$
";
foreach (var eachStrThatMustMatch in strMatch)
{
var match = Regex.Match(eachStrThatMustMatch,
pattern, RegexOptions.IgnorePatternWhitespace);
if (!match.Success)
throw new Exception("Should match.");
}
foreach (var eachStrThatMustNotMatch in strNotMatch)
{
var match = Regex.Match(eachStrThatMustNotMatch,
pattern, RegexOptions.IgnorePatternWhitespace);
if (match.Success)
throw new Exception("Should match.");
}
}
}

Categories

Resources