Splitting a string by another string - c#

I got a string which I need to separate by another string which is a substring of the original one. Let's say I got the following text:
string s = "<DOC>something here <TEXT> and some stuff here </TEXT></DOC>"
And I want to retrieve:
"and some stuff here"
I need to get the string between the "<TEXT>" and his locker "</TEXT>".
I don't manage to do so with the common split method of string even though one of the function parameters is of type string[]. What I am trying is :
Console.Write(s.Split("<TEXT>")); // Which doesn't compile
Thanks in advance for your kind help.

var start = s.IndexOf("<TEXT>");
var end = s.IndexOf("</TEXT>", start+1);
string res;
if (start >= 0 && end > 0) {
res = s.Substring(start, end-start-1).Trim();
} else {
res = "NOT FOUND";
}

Splitting on "<TEXT>" isn't going to help you in this case anyway, since the close tag is "</TEXT>".
The most robust solution would be to parse it properly as XML. C# provides functionality for doing that. The second example at http://msdn.microsoft.com/en-us/library/cc189056%28v=vs.95%29.aspx should put you on the right track.
However, if you're just looking for a quick-and-dirty one-time solution your best bet is going to be to hand-code something, such as dasblinkenlight's solution above.

var output = new List<String>();
foreach (Match match in Regex.Matches(source, "<TEXT>(.*?)</TEXT>")) {
output.Add(match.Groups[1].Value);
}

string s = "<DOC>something here <TEXT> and some stuff here </TEXT></DOC>";
string result = Regex.Match(s, "(?<=<TEXT>).*?(?=</TEXT>)").Value;
EDIT: I am using this regex pattern (?<=prefix)find(?=suffix) which will match a position between a prefix and a suffix.
EDIT 2:
Find several results:
MatchCollection matches = Regex.Matches(s, "(?<=<TEXT>).*?(?=</TEXT>)");
foreach (Match match in matches) {
Console.WriteLine(match.Value);
}

If last tag is </doc> then you could use XElement.Load to load XML and then go through it to discover wanted element (you could also use Linq To XML).
If this is not necessarily correct XML string, you could always go with Regural Expressions to find desired part of text. In this case expression should not be to hard to write it yourself.

Related

Check array for string that starts with given one (ignoring case)

I am trying to see if my string starts with a string in an array of strings I've created. Here is my code:
string x = "Table a";
string y = "a table";
string[] arr = new string["table", "chair", "plate"]
if (arr.Contains(x.ToLower())){
// this should be true
}
if (arr.Contains(y.ToLower())){
// this should be false
}
How can I make it so my if statement comes up true? Id like to just match the beginning of string x to the contents of the array while ignoring the case and the following characters. I thought I needed regex to do this but I could be mistaken. I'm a bit of a newbie with regex.
It seems you want to check if your string contains an element from your list, so this should be what you are looking for:
if (arr.Any(c => x.ToLower().Contains(c)))
Or simpler:
if (arr.Any(x.ToLower().Contains))
Or based on your comments you may use this:
if (arr.Any(x.ToLower().Split(' ')[0].Contains))
Because you said you want regex...
you can set a regex to var regex = new Regex("(table|plate|fork)");
and check for if(regex.IsMatch(myString)) { ... }
but it for the issue at hand, you dont have to use Regex, as you are searching for an exact substring... you can use
(as #S.Akbari mentioned : if (arr.Any(c => x.ToLower().Contains(c))) { ... }
Enumerable.Contains matches exact values (and there is no build in compare that checks for "starts with"), you need Any that takes predicate that takes each array element as parameter and perform the check. So first step is you want "contains" to be other way around - given string to contain element from array like:
var myString = "some string"
if (arr.Any(arrayItem => myString.Contains(arrayItem)))...
Now you actually asking for "string starts with given word" and not just contains - so you obviously need StartsWith (which conveniently allows to specify case sensitivity unlike Contains - Case insensitive 'Contains(string)'):
if (arr.Any(arrayItem => myString.StartsWith(
arrayItem, StringComparison.CurrentCultureIgnoreCase))) ...
Note that this code will accept "tableAAA bob" - if you really need to break on word boundary regular expression may be better choice. Building regular expressions dynamically is trivial as long as you properly escape all the values.
Regex should be
beginning of string - ^
properly escaped word you are searching for - Escape Special Character in Regex
word break - \b
if (arr.Any(arrayItem => Regex.Match(myString,
String.Format(#"^{0}\b", Regex.Escape(arrayItem)),
RegexOptions.IgnoreCase)) ...
you can do something like below using TypeScript. Instead of Starts with you can also use contains or equals etc..
public namesList: Array<string> = ['name1','name2','name3','name4','name5'];
// SomeString = 'name1, Hello there';
private isNamePresent(SomeString : string):boolean{
if (this.namesList.find(name => SomeString.startsWith(name)))
return true;
return false;
}
I think I understand what you are trying to say here, although there are still some ambiguity. Are you trying to see if 1 word in your String (which is a sentence) exists in your array?
#Amy is correct, this might not have to do with Regex at all.
I think this segment of code will do what you want in Java (which can easily be translated to C#):
Java:
x = x.ToLower();
string[] words = x.Split("\\s+");
foreach(string word in words){
foreach(string element in arr){
if(element.Equals(word)){
return true;
}
}
}
return false;
You can also use a Set to store the elements in your array, which can make look up more efficient.
Java:
x = x.ToLower();
string[] words = x.Split("\\s+");
HashSet<string> set = new HashSet<string>(arr);
for(string word : words){
if(set.contains(word)){
return true;
}
}
return false;
Edit: (12/22, 11:05am)
I rewrote my solution in C#, thanks to reminders by #Amy and #JohnyL. Since the author only wants to match the first word of the string, this edited code should work :)
C#:
static bool contains(){
x = x.ToLower();
string[] words = x.Split(" ");
var set = new HashSet<string>(arr);
if(set.Contains(words[0])){
return true;
}
return false;
}
Sorry my question was so vague but here is the solution thanks to some help from a few people that answered.
var regex = new Regex("^(table|chair|plate) *.*");
if (regex.IsMatch(x.ToLower())){}

c# regularexpression how to find two records in the string

ok i am building up my email templating engine and trying to break up some of the text between {{{ }}} from my this text
var matches = Regex.Matches("sdfsdfsdf{{{GetServices:Pilotage}}}sdfsdfsdf dfsdf{{{GetServices:Berth Fee}}}sdfdsf{{sss", "{{{(.*)}}}");
how can i parse this string so i get this as a result array. i have been trying different things but with no avail. how can i achieve this
1)GetServices:Pilotage
2)GetServices:Berth Fee
Your attempt (although maybe not optimal) should work if you force it to be non-greedy:
{{{(.*?)}}}
Just add a ? after the *.
Use Grouping to retrieve the matches.
var input = "sdfsdfsdf{{{GetServices:Pilotage}}}sdfsdfsdf dfsdf{{{GetServices:Berth Fee}}}sdfdsf{{sss\", \"{{{(.*)}}}";
var matches = Regex.Matches(input, "\\{\\{\\{(GetServices:[^{]*)\\}\\}\\}");
var result = new List<string>();
foreach (Match match in matches)
{
if (match.Groups.Count == 2)
{
result.Add(match.Groups[1].ToString());
}
}
You could try this regex:
(\{{3}\w*\:[\w\ ]*\}{3})+
Edit:
For obtaining the value, see answer:
Find each RegEx match in string
The pattern would be "\{{3}(.+?)\}{3}" or "\{{3}(.*?)\}{3}". It depends on your need if the string in between is optional or required.

I need to get a string between two strings using regex in C#

I have a string for example: "GamerTag":"A Talented Boy","GamerTileUrl" and what I have been trying and failing to get is the value: A Talented Boy. I need help creating a regex string to get specifically just A Talented Boy. Can somebody please help me!
var str = "\"GamerTag\":\"A Talented Boy\",\"GamerTileUrl\"";
var colonParts = str.Split(':');
if (colonParts.Length >= 2) {
var commaParts = colonParts[1].Split(',');
var aTalentedBoy = commaParts[0];
var gamerTileUrl = commaParts[1];
}
This allows you to also get other parts of the comma-separated list.
Suppose s is your string (no check here):
s = s.Split(':')[1].Split(',')[0].Trim('"');
If you want to have a Regex solution, here it is:
s = "\"GamerTag\":\"A Talented Boy\",\"GamerTileUrl\"";
Regex reg = new Regex("(?<=:\").+?(?=\")");
s = reg.Match(s).Value;
You can use string methods:
string result = text.Split(':').Last().Split(',').First().Trim('"');
The First/Last extension methods prevent exceptions when the separators are missing.
Demo
I think it's safe to assume that your string is actually bigger than what you showed us and it contains multiple key/value pairs? I think this is will do what you are looking for:
str.Split("GamerTag:\"")[1].Split("\"")[1];
The first split targets "GamerTag:" and gets everything after it. The second split gets everything between first and second " that exists in that chunk after "GamerTag:"
How about this?
\:\"([^\"]+)\"
This matches the semicolon and the opening quote, and matches any non-quote characters until the next quote.

Regex: only replace non-nested matches

Given text such as:
This is my [position].
Here are some items:
[items]
[item]
Position within the item: [position]
[/item]
[/items]
Once again, my [position].
I need to match the first and last [position], but not the [position] within [items]...[/items]. Is this doable with a regular expression? So far, all I have is:
Regex.Replace(input, #"\[position\]", "replacement value")
But that is replacing more than I want.
As Wug mentioned, regular expressions aren't great at counting. An easier option would be to just find the locations of all of the tokens you're looking for, and then iterate over them and construct your output accordingly. Perhaps something like this:
public string Replace(input, replacement)
{
// find all the tags
var regex = new Regex("(\[(?:position|/?item)\])");
var matches = regex.Matches(input);
// loop through the tags and build up the output string
var builder = new StringBuilder();
int lastIndex = 0;
int nestingLevel = 0;
foreach(var match in matches)
{
// append everything since the last tag;
builder.Append(input.Substring(lastIndex, (match.Index - lastIndex) + 1));
switch(match.Value)
{
case "[item]":
nestingLevel++;
builder.Append(match.Value);
break;
case "[/item]":
nestingLevel--;
builder.Append(match.Value);
break;
case "[position]":
// Append the replacement text if we're outside of any [item]/[/item] pairs
// Otherwise append the tag
builder.Append(nestingLevel == 0 ? replacement : match.Value);
break;
}
lastIndex = match.Index + match.Length;
}
builder.Append(input.Substring(lastIndex));
return builder.ToString();
}
(Disclaimer: Have not tested. Or even attempted to compile. Apologies in advance for inevitable bugs.)
You could maaaaaybe get away with:
Regex.Replace(input,#"(?=\[position\])(!(\[item\].+\[position\].+\[/item\]))","replacement value");
I dunno, I hate ones like this. But this is a job for xml parsing, not regex. If your brackets are really brackets, just search and replace them with carrots, then xml parse.
What if you check it twice. Like,
s1 = Regex.Replace(input, #"(\[items\])(\w|\W)*(\[\/items\])", "")
This will give you the:
This is my [position].
Here are some items:
Once again, my [position].
As you can see the items section is extracted. And then on s1 you can extract your desired positions. Like,
s2 = Regex.Replace(s1, #"\[position\]", "raplacement_value")
This might not be the best solution. I tried very hard to solve it on regex but not successful.

How can I find a string after a specific string/character using regex

I am hopeless with regex (c#) so I would appreciate some help:
Basicaly I need to parse a text and I need to find the following information inside the text:
Sample text:
KeywordB:***TextToFind* the rest is not relevant but **KeywordB: Text ToFindB and then some more text.
I need to find the word(s) after a certain keyword which may end with a “:”.
[UPDATE]
Thanks Andrew and Alan: Sorry for reopening the question but there is quite an important thing missing in that regex. As I wrote in my last comment, Is it possible to have a variable (how many words to look for, depending on the keyword) as part of the regex?
Or: I could have a different regex for each keyword (will only be a hand full). But still don't know how to have the "words to look for" constant inside the regex
The basic regex is this:
var pattern = #"KeywordB:\s*(\w*)";
\s* = any number of spaces
\w* = 0 or more word characters (non-space, basically)
() = make a group, so you can extract the part that matched
var pattern = #"KeywordB:\s*(\w*)";
var test = #"KeywordB: TextToFind";
var match = Regex.Match(test, pattern);
if (match.Success) {
Console.Write("Value found = {0}", match.Groups[1]);
}
If you have more than one of these on a line, you can use this:
var test = #"KeywordB: TextToFind KeyWordF: MoreText";
var matches = Regex.Matches(test, #"(?:\s*(?<key>\w*):\s?(?<value>\w*))");
foreach (Match f in matches ) {
Console.WriteLine("Keyword '{0}' = '{1}'", f.Groups["key"], f.Groups["value"]);
}
Also, check out the regex designer here: http://www.radsoftware.com.au/. It is free, and I use it constantly. It works great to prototype expressions. You need to rearrange the UI for basic work, but after that it's easy.
(fyi) The "#" before strings means that \ no longer means something special, so you can type #"c:\fun.txt" instead of "c:\fun.txt"
Let me know if I should delete the old post, but perhaps someone wants to read it.
The way to do a "words to look for" inside the regex is like this:
regex = #"(Key1|Key2|Key3|LastName|FirstName|Etc):"
What you are doing probably isn't worth the effort in a regex, though it can probably be done the way you want (still not 100% clear on requirements, though). It involves looking ahead to the next match, and stopping at that point.
Here is a re-write as a regex + regular functional code that should do the trick. It doesn't care about spaces, so if you ask for "Key2" like below, it will separate it from the value.
string[] keys = {"Key1", "Key2", "Key3"};
string source = "Key1:Value1Key2: ValueAnd A: To Test Key3: Something";
FindKeys(keys, source);
private void FindKeys(IEnumerable<string> keywords, string source) {
var found = new Dictionary<string, string>(10);
var keys = string.Join("|", keywords.ToArray());
var matches = Regex.Matches(source, #"(?<key>" + keys + "):",
RegexOptions.IgnoreCase);
foreach (Match m in matches) {
var key = m.Groups["key"].ToString();
var start = m.Index + m.Length;
var nx = m.NextMatch();
var end = (nx.Success ? nx.Index : source.Length);
found.Add(key, source.Substring(start, end - start));
}
foreach (var n in found) {
Console.WriteLine("Key={0}, Value={1}", n.Key, n.Value);
}
}
And the output from this is:
Key=Key1, Value=Value1
Key=Key2, Value= ValueAnd A: To Test
Key=Key3, Value= Something
/KeywordB\: (\w)/
This matches any word that comes after your keyword. As you didn´t mentioned any terminator, I assumed that you wanted only the word next to the keyword.

Categories

Resources