Find exact substring in string array using LINQ in C# - c#

I'm trying to see if an exact substring exists in a string array. It is returning true if the substring exists in the string but it will contains spelling errors.
EDIT:
For example if I am checking if 'Connecticut' exists in the string array but it is spelled 'Connecticute' it will still return true but I do not want it to. I want it to return false for 'Connecticute' and return true for
'Connecticut' only
Is there a way to do this using LINQ?
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
string[] sample = File.ReadAllLines(#"C:\samplefile.txt");
/* Sample file containing data organised like
Niall Gleeson 123 Fake Street UNIT 63 Connecticute 00703 USA
*/
string[] states = File.ReadAllLines(#"C:\states.txt"); //Text file containing list of all US states
foreach (string s in sample)
{
if (states.Any(s.Contains))
{
Console.WriteLine("Found State");
Console.WriteLine(s);
Console.ReadLine();
}
else
{
Console.WriteLine("Could not find State");
Console.WriteLine(s);
Console.ReadLine();
}
}
}
}
}

String.Contains returns true if one part of the string is anywhere within the string being matched.
Hence "Conneticute".Contains("Conneticut") will be true.
If you want exact matches, what you're looking for is String.Equals
...
if (states.Any(s.Equals))
...

You could use \b to match word breaking characters (ie. white spaces, periods, start or end of string etc):
var r = new Regex("\bConneticut\b", RegexOptions.IgnoreCase);
var m = r.Match("Conneticute");
Console.WriteLine(m.Success); // false

Rather than using string.Contains, which matches whether the string contains the sequence of letters, use a regular expression match, with whatever you consider to be appropriate. For example, this will match on word boundaries,
var x = new [] { "Connect", "Connecticute is a cute place", "Connecticut", "Connecticut is a nice place" };
x.Dump();
var p = new Regex(#"\bConnecticut\b", RegexOptions.Compiled);
x.Where(s=>p.IsMatch(s)).Dump();
This will match "Connecticut" and "CConnecticut is a nice place" but not the other strings. Change the regex to suit your exact requirements.
(.Dump() is used in linqpad, which can be used to experiment with this sort of thing )

Related

How do I search and replace text containing placeholder tokens with a values from an xml file using regular expression matching. VB.net or C#

I have a problem that requires vb.net or C# solution with regular expressions matching.
I am not very good with regular expressions and so I thought I would ask for some help.
I have some text that has one or more tokens that I need to replace with values retrieved from an xml file. Tokens are similar but are of 2 different types. For matches of the first type I will replace with a value from file1.xml and for matches of the 2nd type from file2.xml.
The replaceable tokens are in this format:
Type 1 Tokens: &*T1& and &*T1001&
Type 2 Tokens: &*SomeValue& and &*A2ndValue&
The replacement values for the Type 1 tokens are in File1.xml and for Type 2 Tokens are in File2.xml
In the above example, when a match is found for Type 1 (T1000), I need to replace the entire token (&*T1000&) with the value of Element T1000 in File1.xml. <T1000>ValueT1000</T1000>
In the 2nd Type: When a match is found for Type 2 (SomeValue), I need to replace the entire token (&*SomeValue&) with the value of Element SomeValue in File2.xml. <SomeValue>Value2</SomeValue>
Example input text:
This is some text with first token &T1& and the second token &*T1001& and more tokens &*SomeValue& and still more &*A2ndValue&.
So far with help of the code from pirs, in vb.net, I have this:
Public Shared Sub Main()
Dim pattern As String = "\&\*?([\w]+)\&"
Dim input As String = "This is some text with first token &*T1& and the second token &*T1001& and more tokens &*SomeValue& and still more &*A2ndValue&."
For Each m As Match In Regex.Matches(input, pattern)
Console.WriteLine("'{0}' found at index {1}.", m.Groups(1).Value, m.Index)
Next
End Sub
Which returns:
'T1' found at index 35.
'T1001' found at index 62.
'SomeValue' found at index 87.
'A2ndValue' found at index 115
I need to process this text and replace all the tokens with their values retrieved from the 2 xml files.
Any help is appreciated.
[EDIT]
With answer from #pirs. Maybe the way to do it is to first find matches of type T1000 and then replace by regex index of match. When replacing by index, I think I have to start at last index since each replace will change the index of matches.
After all T1000 matches are replaced I think I can do another match on the output string from the above and then replace all the matches of 2nd type.
What is regex match for T1000 (T followed by any number of digits)
[EDIT] Replace with an index so..
public static string ReplaceIndex(this string self, string OldString, string newString, int index)
{
return self.Remove(index, OldString.Length).Insert(index, newString);
}
// ...
s = s.ReplaceIndex(m.Groups(1).Value, "newString", m.Index)
// ...
[EDIT] Try to replace the value directly
// ...
s = s.Replace(m.Groups(1).Value, "newValue")
// ...
[EDIT] regex for &* and & : https://regex101.com/r/MVRS7U/1/
the generated regex function for c#
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"&\*?([^&\*\d]+)";
string input = #"&*cool&*it's&working&in&*all&case";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
It's should be ok now :-)
__
I'm not sure about what you want exactly but here the regex for your case: https://regex101.com/r/5i3RII/1/
And here, the generated regex function for c# (you should do a custom function to fit with your need..):
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"<[a-zA-Z-0-9]+\s?>([\w]+)<\/[a-zA-Z-0-9]+\s?>";
// the example you gave
string input = #"<T1>value1</T1>
<T1001>value2</T1001>
<T2000 />
<SomeValue>value1</SomeValue >
<A2ndValue>value2</A2ndValue >";
foreach (Match m in Regex.Matches(input, pattern))
{
// the output
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
I understand what you want to do. Code below does everything :
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication1
{
class Program
{
const string FILENAME = #"c:\temp\text.xml";
static void Main(string[] args)
{
string input = "This is some text with first token &*T1& and the second token &*T1001& and more tokens &*SomeValue& and still more &*A2ndValue&.";
XDocument doc = XDocument.Load(FILENAME);
string patternToken = "&[^&]+&";
string patternTag = #"&\*(?'tag'[^&]+)&";
MatchCollection matches = Regex.Matches(input, patternToken);
foreach(Match match in matches.Cast<Match>())
{
string token = match.Value;
string tag = Regex.Match(token, patternTag).Groups["tag"].Value;
string tagValue = doc.Descendants(tag).Select(x => (string)x).FirstOrDefault();
input = input.Replace(token, tagValue);
}
}
}
}

Regex - How to capture an arbitrary string appearing anywhere in a known string?

I need help making a regular expression. I have a string that is known at compile time, let's call it SpecificString. I also have another string whose value is not known. Let's call it ArbitraryString for example purposes. The input string is made up of one SpecificString that contains ArbitraryString in it at any position or is adjacent to ArbitraryString. I want a regex pattern that captures ArbitraryString from the input string for me to use later.
Examples:
example format: input string => captured group's value
SpecificArbitraryStringString => ArbitraryString // inside
SpecHAHAHALOLificString => HAHAHALOL
SpecificStringYOLO => YOLO // adjacent
SpecificStrisadng => sad
itsABea8tifulDaySpecificString => itsABea8tifulDay // also adjacent
Show to be a heartbreakerpecificString => how to be a heartbreaker
SpecificSt this is the last example ring => this is the last example (in the output of the last example stackoverflow.com omitted the spaces at both ends for some reason, just ignore that and assume they are there)
I was only able to come up with a regex whose length grows linearly with the length of SpecificString making it very difficult to maintain. Any ideas?
Pseudocode (not necessarily valid C#):
static string GetArbitraryString(string input)
{
const string specificString = "SpecificString";
var regex = // regex pattern to find
var match = regex.Match(input);
string arbitraryString = match.CapturedGroups[0].Value;
return arbitraryString;
}
Only regex answers will be accepted.
edit: the new question: Does an elegant regex solution to this even exist?
Well, here's the best I've got in terms of a regex answer, using chained conditionals to ensure you only get the string you want (though it's still pretty damn inelegant in my opinion):
^(.*)?S(?(1)|(.*))?p(?(2)|(.*))?e(?(3)|(.*))?c(?(4)|(.*))?i(?(5)|(.*))?f(?(6)|(.*))?i(?(7)|(.*))?(?(8)|(.*))?c(?(9)|(.*))?S(?(10)|(.*))?t(?(11)|(.*))?r(?(12)|(.*))?i(?(13)|(.*))?n(?(14)|(.*))?g(?(15)|(.*))?$
Then, all you have to do is iterate over the capture groups and pick up the one that isn't empty. Simple as that.
And, since you're in C#, you can even use named capture groups with the same name for all of them. Whichever one gets picked up will be the value of the named capture.
Demo on Regex101
I would use a dictionary
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string[] inputs = {
"input string => captured group's value",
"SpecificArbitraryStringString => ArbitraryString // inside",
"SpecHAHAHALOLificString => HAHAHALOL",
"SpecificStringYOLO => YOLO // adjacent",
"SpecificStrisadng => sad",
"itsABea8tifulDaySpecificString => itsABea8tifulDay // also adjacent",
"Show to be a heartbreakerpecificString => how to be a heartbreaker",
"SpecificSt this is the last example ring => this is the last example"
};
Dictionary<string, string> dict = new Dictionary<string, string>();
string pattern = "^(?'name'[^=]+)=>(?'value'.*)";
foreach (string input in inputs)
{
Match match = Regex.Match(input, pattern);
dict.Add(match.Groups["name"].Value.Trim(), match.Groups["value"].Value.Trim());
}
}
}
}

Detecting whitespace in textbox

In a WinForms textbox with multiple whitespaces (e.g. 1 1 A), where, between the 1s, there is whitespace, how could I detect this via the string methods or regex?
use IndexOf
if( "1 1a".IndexOf(' ') >= 0 ) {
// there is a space.
}
This function should do the trick for you.
bool DoesContainsWhitespace()
{
return textbox1.Text.Contains(" ");
}
int NumberOfWhiteSpaceOccurances(string textFromTextBox){
char[] textHolder = textFromTextBox.toCharArray();
int numberOfWhiteSpaceOccurances = 0;
for(int index= 0; index < textHolder.length; index++){
if(textHolder[index] == ' ')numberOfWhiteSpaceOccurances++;
}
return numberOfWhiteSpaceOccurances;
}
Not pretty clear what the problem is, but in case you just want a way to tell if there is a white space anywhere in a given string, a different solution to the ones proposed by the other stack users (which also work) is:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
namespace Testing
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(PatternFound("1 1 a"));
Console.WriteLine(PatternFound("1 1 a"));
Console.WriteLine(PatternFound(" 1 1 a"));
}
static bool PatternFound(string str)
{
Regex regEx = new Regex("\\s");
Match match = regEx.Match(str);
return match.Success;
}
}
}
in case what you want is determining whether a given succession of consecutive white spaces appear, then you will need to add more to the regex pattern string.
Refer to http://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx for the options.

How to extract the useful data with regular expression in C#?

Sorry guys, it seems like I didn't explain my question clearly. Please allow me to rephrase my question again.
I use WebClient to download the whole webpage and I got the content as a string
"
.......
.....
var picArr ="/d/manhua/naruto/516/1.png|/d/manhua/naruto/516/2.png|/d/manhua/naruto/516/3.png|/d/manhua/naruto/516/4.png|/d/manhua/naruto/516/5.png|/d/manhua/naruto/516/6.png|/d/manhua/naruto/516/7.png|/d/manhua/naruto/516/8.png|/d/manhua/naruto/516/9.png|/d/manhua/naruto/516/10.png|/d/manhua/naruto/516/11.png|/d/manhua/naruto/516/12.png|/d/manhua/naruto/516/13.png|/d/manhua/naruto/516/14.png|/d/manhua/naruto/516/15.png|/d/manhua/naruto/516/16.png"
......
";
in this content, I want to get only one line which is
var picArr ="/d/manhua/naruto/516/1.png|/d/manhua/naruto/516/2.png|/d/manhua/naruto/516/3.png|/d/manhua/naruto/516/4.png|/d/manhua/naruto/516/5.png|/d/manhua/naruto/516/6.png|/d/manhua/naruto/516/7.png|/d/manhua/naruto/516/8.png|/d/manhua/naruto/516/9.png|/d/manhua/naruto/516/10.png|/d/manhua/naruto/516/11.png|/d/manhua/naruto/516/12.png|/d/manhua/naruto/516/13.png|/d/manhua/naruto/516/14.png|/d/manhua/naruto/516/15.png|/d/manhua/naruto/516/16.png"
now I want use regular expression to get this string and get the value of picArr.
my reg exp is
var picArr ="([.]*)"
I think the dot means any characters. But it doesn't work. :(
Any idea?
THanks a lot
/picArr =\"([^\"]+)\"/
If I got this right that's what you need.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ExtractFileNames
{
class Program
{
static void Main(string[] args)
{
string pageData = #"blah blah
var picArr =""/d/manhua/naruto/516/1.png|/d/manhua/naruto/516/2.png|/d/manhua/naruto/516/3.png|/d/manhua/naruto/516/4.png|/d/manhua/naruto/516/5.png|/d/manhua/naruto/516/6.png|/d/manhua/naruto/516/7.png|/d/manhua/naruto/516/8.png|/d/manhua/naruto/516/9.png|/d/manhua/naruto/516/10.png|/d/manhua/naruto/516/11.png|/d/manhua/naruto/516/12.png|/d/manhua/naruto/516/13.png|/d/manhua/naruto/516/14.png|/d/manhua/naruto/516/15.png|/d/manhua/naruto/516/16.png""
more blah decimal blah";
var match = Regex.Match(pageData, #"var\s+picArr\s*=\s*""(.*?)""");
var str = match.Groups[1].Value;
var files = str.Split('|');
foreach(var f in files)
{
Console.WriteLine(f);
}
Console.ReadLine();
}
}
}
Output:
/d/manhua/naruto/516/1.png
/d/manhua/naruto/516/2.png
/d/manhua/naruto/516/3.png
/d/manhua/naruto/516/4.png
/d/manhua/naruto/516/5.png
/d/manhua/naruto/516/6.png
/d/manhua/naruto/516/7.png
/d/manhua/naruto/516/8.png
/d/manhua/naruto/516/9.png
/d/manhua/naruto/516/10.png
/d/manhua/naruto/516/11.png
/d/manhua/naruto/516/12.png
/d/manhua/naruto/516/13.png
/d/manhua/naruto/516/14.png
/d/manhua/naruto/516/15.png
/d/manhua/naruto/516/16.png
If you just want to get the filenames, you could just do a split on the pipe:
var picArr = "/d/manhua/naruto/516/1.png|/d/manhua/naruto/516/2.png|/d/manhua/naruto/516/3.png|/d/manhua/naruto/516/4.png|/d/manhua/naruto/516/5.png|/d/manhua/naruto/516/6.png|/d/manhua/naruto/516/7.png|/d/manhua/naruto/516/8.png|/d/manhua/naruto/516/9.png|/d/manhua/naruto/516/10.png|/d/manhua/naruto/516/11.png|/d/manhua/naruto/516/12.png|/d/manhua/naruto/516/13.png|/d/manhua/naruto/516/14.png|/d/manhua/naruto/516/15.png|/d/manhua/naruto/516/16.png";
var splitPics = picArr.Split('|');
foreach (var pic in splitPics)
{
Console.WriteLine(pic);
}
It looks like you want the value of the string literal in your snippet, "/d/manhua/naruto/516/1.png|..."
Get rid of the square brackets. "." matches any character just as it is, without brackets. Square brackets are for matching a limited set of characters: For example, you'd use "[abc]" to match any "a", "b", or "c".
It looks like the brackets have the effect of escaping the ".", a feature I hadn't known about (or forgot, sometime in the Ordovician). But I tested the regex as you have it with the string value replaced with a series of dots, and the regex matched. It's being treated as a literal "." character, which you would more likely try to match with a backslash escape: "\."
So just get rid of the brackets and it should work. It works in VS2008 for me.

C# string splitting

If I have a string: str1|str2|str3|srt4 and parse it with | as a delimiter. My output would be str1 str2 str3 str4.
But if I have a string: str1||str3|str4 output would be str1 str3 str4. What I'm looking for my output to be like is str1 null/blank str3 str4.
I hope this makes sense.
string createText = "srt1||str3|str4";
string[] txt = createText.Split(new[] { '|', ',' },
StringSplitOptions.RemoveEmptyEntries);
if (File.Exists(path))
{
//Console.WriteLine("{0} already exists.", path);
File.Delete(path);
// write to file.
using (StreamWriter sw = new StreamWriter(path, true, Encoding.Unicode))
{
sw.WriteLine("str1:{0}",txt[0]);
sw.WriteLine("str2:{0}",txt[1]);
sw.WriteLine("str3:{0}",txt[2]);
sw.WriteLine("str4:{0}",txt[3]);
}
}
Output
str1:str1
str2:str3
str3:str4
str4:"blank"
Thats not what i'm looking for. This is what I would like to code:
str1:str1
str2:"blank"
str3:str3
str4:str4
Try this one:
str.Split('|')
Without StringSplitOptions.RemoveEmptyEntries passed, it'll work as you want.
this should do the trick...
string s = "str1||str3|str4";
string[] parts = s.Split('|');
The simplest way is to use Quantification:
using System.Text.RegularExpressions;
...
String [] parts = new Regex("[|]+").split("str1|str2|str3|srt4");
The "+" gets rid of it.
From Wikipedia :
"+" The plus sign indicates that there is one or more of the preceding element. For example, ab+c matches "abc", "abbc", "abbbc", and so on, but not "ac".
Form msdn: The Regex.Split methods are similar to the String.Split method, except Split splits the string at a delimiter determined by a regular expression instead of a set of characters. The input string is split as many times as possible. If pattern is not found in the input string, the return value contains one element whose value is the original input string.
Additional wish can be done with:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication1 {
class Program{
static void Main(string[] args){
String[] parts = "str1||str2|str3".Replace(#"||", "|\"blank\"|").Split(#"|");
foreach (string s in parts)
Console.WriteLine(s);
}
}
}
Try something like this:
string result = "str1||str3|srt4";
List<string> parsedResult = result.Split('|').Select(x => string.IsNullOrEmpty(x) ? "null" : x).ToList();
when using the Split() the resulting string in the array will be empty (not null). In this example i have tested for it and replaced it with the actual word null so you can see how to substitute in another value.

Categories

Resources