My problem is that I have a string in format like that:
dsadadsadas
dasdasda
dasda
4TOT651.43|0.00|651.43|98933|607.75|0.00|607.75|607.75|7621|14|0|0|799.42
dsda
dasad
das
I need to find the line that contains the 4TOT and substring the value between the socond and third '|' any ideas how I can obtain that in regex substring?
For now I Have only that:
var test = Regex.Match(fileContent, "4TOT.*").Value;
Which finds me entire line.
When the input is simple and follows a strict format like this, I usually prefer to use plain old string handling over regex. In this case it's spiced up with some LINQ for simpler code:
// filter out lines to use
var linesToUse = input
.Split(new[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries)
.Where(s => s.StartsWith("4TOT"));
foreach (string line in linesToUse)
{
// pick out the value
string valueToUse = line.Split('|')[2];
// more code here, I guess
}
If you know that the input contains only one line that you are interested in, you can remove the loop:
string line = input
.Split(new[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries)
.Where(s => s.StartsWith("4TOT"))
.FirstOrDefault();
string value = string.IsNullOrEmpty(line) ? string.Empty : line.Split('|')[2];
Update
Here is an approach that will work well when loading the input from a file instead:
foreach (var line in File.ReadLines(#"c:\temp\input.txt")
.Where(s => s.StartsWith("4TOT")))
{
string value = string.IsNullOrEmpty(line) ? string.Empty : line.Split('|')[2];
Console.WriteLine(value);
}
File.ReadLines is new in .NET 4 and enumerates the lines in the file without loading the full file into memory, but instead it reads it line by line. If you are using an earlier version of .NET you can fairly easily make your own method providing this behavior.
What about this regex?
Seems to be working for me.
4TOT.*?\|.*?\|(.*?)\|
Captures the value you're looking for into a group.
Why don't you split your string twice: firstly with newline and then if target substring is found by '|' symbol without using of regex?
var tot = source.Split(Environment.NewLine.ToCharArray())
.FirstOrDefault(s => s.StartsWith("4TOT"));
if (tot != null)
{
// gets 651.43
var result = tot.Split('|')
.Skip(2)
.FirstOrDefault();
}
Use the regex : ^4TOT(?:(?:[0-9]*(?:.[0-9]*)?)\|){2}([0-9]*(?:.[0-9]*)?).*
This regex will match 4TOT at the beginning followed by "2 numbers (decimal separated) then |" two times, and captures a number. The rest is ignored.
If you then use :
Match match = Regex.Match(input, pattern);
You will find the anwser into match.Groups
Memo:
Numbers are [0-9]*\.[0-9]*
Using the (?: ... ) makes a non-capturing parenthesis
Related
I'm trying to split a string on every parenthese into an array and keep all text in C#, get everything in the parenthese.
Example: "hmmmmmmmm (asdfhqwe)asasd"
Should become: "hmmmmmmmm", "(asdfhqwe)" and "asasd".
My current setup is only able to take everything inside the parentheses and discards the rest.
var output = input.Split('(', ')').Where((item, index) => index % 2 != 0).ToList();
How would i go forward to do such thing (disregarding my current code) ?
Use regrx split with positive look-ahead and look-behind and an optional space; then filter out empty strings.
var tokens = Regex
.Split(str, #"(?<=[)])\s*|\s*(?=[(])")
.Where(s => s != string.Empty)
.ToList();
Demo.
Oky so I do not know what the real string will look like in your application, but based on the provided string this will be my hack of a solution:
string sample = "hmmmmmmmm (asdfhqwe)asasd";
var result = sample.Replace("(", ",(").Replace(")", "),").Split(',');
So i replaced where the split should be with a comma, but you can use any other char that might never occur in your string, Say like the '~' could also work.
But not knowing all the required functionality, this would work for above scenario.
Try this:
string[] subString = myString.Split(new char[] { '(', ')' });
I'd like to turn a string such as abbbbcc into an array like this: [a,bbbb,cc] in C#. I have tried the regex from this Java question like so:
var test = "aabbbbcc";
var split = new Regex("(?<=(.))(?!\\1)").Split(test);
but this results in the sequence [a,a,bbbb,b,cc,c] for me. How can I achieve the same result in C#?
Here is a LINQ solution that uses Aggregate:
var input = "aabbaaabbcc";
var result = input
.Aggregate(" ", (seed, next) => seed + (seed.Last() == next ? "" : " ") + next)
.Trim()
.Split(' ');
It aggregates each character based on the last one read, then if it encounters a new character, it appends a space to the accumulating string. Then, I just split it all at the end using the normal String.Split.
Result:
["aa", "bb", "aaa", "bb", "cc"]
I don't know how to get it done with split. But this may be a good alternative:
//using System.Linq;
var test = "aabbbbcc";
var matches = Regex.Matches(test, "(.)\\1*");
var split = matches.Cast<Match>().Select(match => match.Value).ToList();
There are several things going on here that are producing the output you're seeing:
The regex combines a positive lookbehind and a negative lookahead to find the last character that matches the one preceding it but does not match the one following it.
It creates capture groups for every match, which are then fed into the Split method as delimiters. The capture groups are required by the negative lookahead, specifically the \1 identifier, which basically means "the value of the first capture group in the statement" so it can not be omitted.
Regex.Split, given a capture group or multiple capture groups to match on when identifying the splitting delimiters, will include the delimiters used for every individual Split operation.
Number 3 is why your string array is looking weird, Split will split on the last a in the string, which becomes split[0]. This is followed by the delimiter at split[1], etc...
There is no way to override this behaviour on calling Split.
Either compensation as per Gusman's answer or projecting the results of a Matches call as per Ruard's answer will get you what you want.
To be honest I don't exactly understand how that regex works, but you can "repair" the output very easily:
Regex reg = new Regex("(?<=(.))(?!\\1)", RegexOptions.Singleline);
var res = reg.Split("aaabbcddeee").Where((value, index) => index % 2 == 0 && value != "").ToArray();
Could do this easily with Linq, but I don't think it's runtime will be as good as regex.
A whole lot easier to read though.
var myString = "aaabbccccdeee";
var splits = myString.ToCharArray()
.GroupBy(chr => chr)
.Select(grp => new string(grp.Key, grp.Count()));
returns the values `['aaa', 'bb', 'cccc', 'd', 'eee']
However this won't work if you have a string like "aabbaa", you'll just get ["aaaa","bb"] as a result instead of ["aa","bb","aa"]
I read the *.txt file from c# and displayed in the console.
My text file looks like a table.
diwas hey
ivonne how
pokhara d kd
lekhanath when
dipisha dalli hos
dfsa sasf
Now I want to search for a string "pokhara" and if it is found then it should display the "d kd" and if not found display "Not found"
What I tried?
string[] lines = System.IO.ReadAllLines(#"C:\readme.txt");
foreach(string line in lines)
{
string [] words = line.Split();
foreach(string word in words)
{
if (word=="pokhara")
{
Console.WriteLine("Match Found");
}
}
}
My Problem:
Match was found but how to display the next word of the line. Also sometimes
in second row some words are split in two with a space, I need to show both words.
I guess your delimiter is the tab-character, then you can use String.Split and LINQ:
var lineFields = System.IO.File.ReadLines(#"C:\readme.txt")
.Select(l => l.Split('\t'));
var matches = lineFields
.Where(arr => arr.First().Trim() == "pokhara")
.Select(arr => arr.Last().Trim());
// if you just want the first match:
string result = matches.FirstOrDefault(); // is null if not found
If you don't know the delimiter as suggested by your comment you have a problem. If you don't even know the rules of how the fields are separated it's very likely that your code is incorrect. So first determine the business logic, ask the people who created the text file. Then use the correct delimiter in String.Split.
If it's a space you can either use string.Split()(without argument), that includes spaces, tabs and new-line characters or use string.Split(' ') which only includes the space. But note that is a bad delimiter if the fields can contain spaces as well. Then either use a different or wrap the fields in quoting characters like "text with spaces". But then i suggest a real text-parser like the Microsoft.VisualBasic.FileIO.TextFieldParser which can also be used in C#. It has a HasFieldsEnclosedInQuotes property.
This works ...
string[] lines = System.IO.ReadAllLines(#"C:\readme.txt");
string stringTobeDisplayed = string.Empty;
foreach(string line in lines)
{
stringTobeDisplayed = string.Empty;
string [] words = line.Split();
//I assume that the first word in every line is the key word to be found
if (word[0].Trim()=="pokhara")
{
Console.WriteLine("Match Found");
for(int i=1 ; i < words.Length ; i++)
{
stringTobeDisplayed += words[i]
}
Console.WriteLine(stringTobeDisplayed);
}
}
In C#, I have a string comes from a file in this format:
Type="Data"><Path.Style><Style
or maybe
Type="Program"><Rectangle.Style><Style
,etc. Now I want to only extract the Data or Program part of the Type element. For that, I used the following code:
string output;
var pair = inputKeyValue.Split('=');
if (pair[0] == "Type")
{
output = pair[1].Trim('"');
}
But it gives me this result:
output=Data><Path.Style><Style
What I want is:
output=Data
How to do that?
This code example takes an input string, splits by double quotes, and takes only the first 2 items, then joins them together to create your final string.
string input = "Type=\"Data\"><Path.Style><Style";
var parts = input
.Split('"')
.Take(2);
string output = string.Join("", parts); //note: .net 4 or higher
This will make output have the value:
Type=Data
If you only want output to be "Data", then do
var parts = input
.Split('"')
.Skip(1)
.Take(1);
or
var output = input
.Split('"')[1];
What you can do is use a very simple regular express to parse out the bits that you want, in your case you want something that looks like this and then grab the two groups that interest you:
(Type)="(\w+)"
Which would return in groups 1 and 2 the values Type and the non-space characters contained between the double-quotes.
Instead of doing many split, why don't you just use Regex :
output = Regex.Match(pair[1].Trim('"'), "\"(\w*)\"").Value;
Maybe I missed something, but what about this:
var str = "Type=\"Program\"><Rectangle.Style><Style";
var splitted = str.Split('"');
var type = splitted[1]; // IE Data or Progam
But you will need some error handling as well.
How about a regex?
var regex = new Regex("(?<=^Type=\").*?(?=\")");
var output = regex.Match(input).Value;
Explaination of regex
(?<=^Type=\") This a prefix match. Its not included in the result but will only match
if the string starts with Type="
.*? Non greedy match. Match as many characters as you can until
(?=\") This is a suffix match. It's not included in the result but will only match if the next character is "
Given your specified format:
Type="Program"><Rectangle.Style><Style
It seems logical to me to include the quote mark (") when splitting the strings... then you just have to detect the end quote mark and subtract the contents. You can use LinQ to do this:
string code = "Type=\"Program\"><Rectangle.Style><Style";
string[] parts = code.Split(new string[] { "=\"" }, StringSplitOptions.None);
string[] wantedParts = parts.Where(p => p.Contains("\"")).
Select(p => p.Substring(0, p.IndexOf("\""))).ToArray();
So I have got a whole string (about 10k chars) and then searching for a word(or many words) in that string. With regex(word).Matches(scrappedstring).
But how to do so to extract the whole sentence, that contains that word. I was thinking of taking a substring after the searched word until the first dot/exclamation mark/question mark/etc. But how to take the part of the sentence before the searched word ?
Or maybe there's a better logic ?
If your boundaries are e.g. ., !, ? and ;, match all sentences across [^.!?;]*(wordmatch)[^.!?;]* expression.
It will give all sentences with desired wordmatch inside.
Example:
var s = "First sentence. Second with wordmatch ? Third one; The last wordmatch, EOM!";
var r = new Regex("[^.!?;]*(wordmatch)[^.!?;]*");
var m = r.Matches(s);
var result = Enumerable.Range(0, m.Count).Select(index => m[index].Value).ToList();
You can get substrings between sentence finishers (dot/exclamation mark/qustion mark/etc) and search for the word in each sentence inside a loop.
Then return the substring when you find the matching word.
Once you have a position, you would then read up to the next ., or end of the file.. but you also need to read backwards from the beginning of the word to a . or the beginning of the file. Those two positions mean you can then extract the sentence.
Note, it's not fool-proof... in its simplest form as outlined above e.g. would mean the sentence started after the g. which is not probably the case.
Extract the sentances from the input. Then search for the specified word(s) within each sentance.
Return the sentances where the word(s) is present.
public List<string> GetMatchedString(string match, string input)
{
var sentanceList = input.Split(new char[] { '.', '?', '!' });
var regex = new Regex(match);
return sentanceList.Where(sentance => regex.Matches(sentance,0).Count > 0).ToList();
}
You can do that using a process in 2 steps.
1st you fragment the phrases and then filter each one has the word.
something like this:
var input = "A large text with many sentences. Many chars in a string!. A sentence without the pattern word.";
//Step 1: fragment phrase.
var patternPhrase = #"(?<=(^|[.!?]\s*))[^ .!?][^.!?]+[.!?]";
//Step 2: filter out only the phrases containing the word.
var patternWord = #"many";
var result = Regex
.Matches(input, patternPhrase) // step 1
.Cast<Match>()
.Select(s => s.Value)
.Where(w => Regex.IsMatch(w, patternWord, RegexOptions.IgnoreCase)); // step 2
foreach (var item in result)
{
//do something with any phrase.
}