Splitting a String using regex in c# - c#

I have a program to compare text files. Takes in 2 files spits out 1. The input files have lines of data similar to this
tv_rocscores_DeDeP005M3TSub.csv FMR: 0.0009 FNMR: 0.023809524 SCORE: -4 Conformity: True
tv_..............P006............................................................
tv_..............P007............................................................
etc etc.
For my initial purposes, I was splitting the lines based on spaces, to get the respective values. However, for the first field, tv_rocscores_DeDeP005M3TSbu.csv i only need P005 and not the rest. I cannot opt for position number as well, because the position of P005 in the phrase is not the same for every file.
Any advise on how i split this so that i can identify my first field with only P005??

Your question is a bit unclear. If you're looking for pattern, say "P + three digits", e.g. "P005" you can use regular expressions:
String str = #"tv_rocscores_DeDeP005M3TSub.csv FMR: 0.0009 FNMR: 0.023809524 SCORE: -4 Conformity: True";
String[] parts = str.Split(' ');
parts[0] = Regex.Match(parts[0], #"P\d\d\d").Value; // <- "P005"

To extract the desired part I would try something like this:
var parts = str.Split(' ');
var number = Regex.Match(parts[0], ".*?(?<num>P\d+).*?").Groups["num"].Value;
Or if you know its only three digits you could change the regular expression to .*?(?<num>P\d{3}).*?
Hope that solves your problem :)

How about just checking if the first field contains P005?
bool hasP005 = field1.Contains("P005");

Your question isn't clear. Can't you just replace the first field with your string?
string[] parts = str.Split(' ');
parts[0] = "P005";

Are you looking to try field the field that contains that string? if so then you can use some linq
var field = s.Split(' ').Where(x => x.Contains("P005")).ToList()[0];

Related

How to split first two characters and followed by next two characters after the period of a email address (test.mail#test.com=tema)

I have a email id like below
string email=test.mail#test.com;
string myText = email.split(".");
i am not sure who split first two characters and followed by two characters after the period or dot.
myText = tema //(desired output)
Use LINQ ;)
string myText = string.Join("", email.Remove(email.IndexOf('#')).Split('.')
.Select(r =>new String(r.Take(2).ToArray())));
First Remove text after #, (including #)
Then split on .
From the returned array take first two characters from each element and convert it to array
Pass the array of characters to String constructor creating a string
using String.Join to combine returned strings element.
Another Linq solution:
string first = new string(email.Take(2).ToArray());
string second = new string(email.SkipWhile(c => c != '.').Skip(1).Take(2).ToArray());
string res = first + second;
string.Join(string.Empty, email.Substring(0, email.IndexOf("#")).Split('.').Select(x => x.Substring(0, 2)));
Lots of creative answers here, but the most important point is that Split() is the wrong tool for this job. It's much easier to use Replace():
myText = Regex.Replace(email, #"^(\w{2})[^.]*\.(\w{2})[^.]*#.+$", "$1$2");
Note that I'm making a lot of simplifying assumptions here. Most importantly, I'm assuming the original string contains the email address and nothing else (you're not searching for it), that the string is well formed (you're not trying to validate it), and that both of substrings you're interested in start with at least two word characters.

Omit unnecessary parts in string array

In C#, I have a string comes from a file in this format:
Type="Data"><Path.Style><Style
or maybe
Type="Program"><Rectangle.Style><Style
,etc. Now I want to only extract the Data or Program part of the Type element. For that, I used the following code:
string output;
var pair = inputKeyValue.Split('=');
if (pair[0] == "Type")
{
output = pair[1].Trim('"');
}
But it gives me this result:
output=Data><Path.Style><Style
What I want is:
output=Data
How to do that?
This code example takes an input string, splits by double quotes, and takes only the first 2 items, then joins them together to create your final string.
string input = "Type=\"Data\"><Path.Style><Style";
var parts = input
.Split('"')
.Take(2);
string output = string.Join("", parts); //note: .net 4 or higher
This will make output have the value:
Type=Data
If you only want output to be "Data", then do
var parts = input
.Split('"')
.Skip(1)
.Take(1);
or
var output = input
.Split('"')[1];
What you can do is use a very simple regular express to parse out the bits that you want, in your case you want something that looks like this and then grab the two groups that interest you:
(Type)="(\w+)"
Which would return in groups 1 and 2 the values Type and the non-space characters contained between the double-quotes.
Instead of doing many split, why don't you just use Regex :
output = Regex.Match(pair[1].Trim('"'), "\"(\w*)\"").Value;
Maybe I missed something, but what about this:
var str = "Type=\"Program\"><Rectangle.Style><Style";
var splitted = str.Split('"');
var type = splitted[1]; // IE Data or Progam
But you will need some error handling as well.
How about a regex?
var regex = new Regex("(?<=^Type=\").*?(?=\")");
var output = regex.Match(input).Value;
Explaination of regex
(?<=^Type=\") This a prefix match. Its not included in the result but will only match
if the string starts with Type="
.*? Non greedy match. Match as many characters as you can until
(?=\") This is a suffix match. It's not included in the result but will only match if the next character is "
Given your specified format:
Type="Program"><Rectangle.Style><Style
It seems logical to me to include the quote mark (") when splitting the strings... then you just have to detect the end quote mark and subtract the contents. You can use LinQ to do this:
string code = "Type=\"Program\"><Rectangle.Style><Style";
string[] parts = code.Split(new string[] { "=\"" }, StringSplitOptions.None);
string[] wantedParts = parts.Where(p => p.Contains("\"")).
Select(p => p.Substring(0, p.IndexOf("\""))).ToArray();

how to replace one/multiple spaces into a deliminator using C#

Now I'm parsing a text, I want to split and add one by one
But first thing first, the best way is to replace multiple spaces with one unique deliminator
Below is the sample target text:
Total fare 619,999.0d-
12 11 82139 09/13/2013 D 103,500.00 2/025189 PARK LA000137
09/13/2013 D 50.00 File Ticket - PS1309121018882/
Can anybody know how to handle it in C#?
the best way is to replace multiple spaces with one unique
deliminator
Not really sure if its the best way, but following works, without REGEX
string newStr = string.Join(":",
str.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries));
try
var strings = text.Split(' ').Where(str => str.Length > 0);
You can use a regular expression:
string delimiter = ":";
var whiteSpaceNormalised = Regex.Replace(input, #"\s+", delimiter);
Use regular expressions instead, replace more than one occurrence of space with single space
string parsedText = System.Text.RegularExpressions.Regex.Replace(inputString,"[ ]+"," ");

Quick way of splitting a mixed alphanum string into text and numeric parts?

Say I have a string such as
abc123def456
What's the best way to split the string into an array such as
["abc", "123", "def", "456"]
string input = "abc123def456";
Regex re = new Regex(#"\D+|\d+");
string[] result = re.Matches(input).OfType<Match>()
.Select(m => m.Value).ToArray();
string[] result = Regex.Split("abc123def456", "([0-9]+)");
The above will use any sequence of numbers as the delimiter, though wrapping it in () says that we still would like to keep our delimiter in our returned array.
Note: In the example snippet we will get an empty element as the last entry of our array.
The boundary you look for can be described as "A position where a digit follows a non-digit, or where a non-digit follows a digit."
So:
string[] result = Regex.Split("abc123def456", #"(?<=\D)(?=\d)|(?<=\d)(?=\D)");
Use [0-9] and [^0-9], respectively, if \d and \D are not specific enough.
Add space around digitals, then split it. So there is the solution.
Regex.Replace("abc123def456", #"(\d+)", #" \1 ").Split(' ');
I hope it works.
You could convert the string to a char array and then loop through the characters. As long as the characters are of the same type (letter or number) keep adding them to a string. When the next character no longer is of the same type (or you've reached the end of the string), add the temporary string to the array and reset the temporary string to null.

Regex not working in .NET

So I'm trying to match up a regex and I'm fairly new at this. I used a validator and it works when I paste the code but not when it's placed in the codebehind of a .NET2.0 C# page.
The offending code is supposed to be able to split on a single semi-colon but not on a double semi-colon. However, when I used the string
"entry;entry2;entry3;entry4;"
I get a nonsense array that contains empty values, the last letter of the previous entry, and the semi-colons themselves. The online javascript validator splits it correctly. Please help!
My regex:
((;;|[^;])+)
Split on the following regular expression:
(?<!;);(?!;)
It means match semicolons that are neither preceded nor succeeded by another semicolon.
For example, this code
var input = "entry;entry2;entry3;entry4;";
foreach (var s in Regex.Split(input, #"(?<!;);(?!;)"))
Console.WriteLine("[{0}]", s);
produces the following output:
[entry]
[entry2]
[entry3]
[entry4]
[]
The final empty field is a result of the semicolon on the end of the input.
If the semicolon is a terminator at the end of each field rather than a separator between consecutive fields, then use Regex.Matches instead
foreach (Match m in Regex.Matches(input, #"(.+?)(?<!;);(?!;)"))
Console.WriteLine("[{0}]", m.Groups[1].Value);
to get
[entry]
[entry2]
[entry3]
[entry4]
Why not use String.Split on the semicolon?
string sInput = "Entry1;entry2;entry3;entry4";
string[] sEntries = sInput.Split(';');
// Do what you have to do with the entries in the array...
Hope this helps,
Best regards,
Tom.
As tommieb75 wrote, you can use String.Split with StringSplitOptions Enumeration so you can control your output of newly created splitting array
string input = "entry1;;entry2;;;entry3;entry4;;";
char[] charSeparators = new char[] {';'};
// Split a string delimited by characters and return all non-empty elements.
result = input.Split(charSeparators, StringSplitOptions.RemoveEmptyEntries);
The result would contain only 4 elements like this:
<entry1><entry2><entry3><entry4>

Categories

Resources