C# string splitting - c#

If I have a string: str1|str2|str3|srt4 and parse it with | as a delimiter. My output would be str1 str2 str3 str4.
But if I have a string: str1||str3|str4 output would be str1 str3 str4. What I'm looking for my output to be like is str1 null/blank str3 str4.
I hope this makes sense.
string createText = "srt1||str3|str4";
string[] txt = createText.Split(new[] { '|', ',' },
StringSplitOptions.RemoveEmptyEntries);
if (File.Exists(path))
{
//Console.WriteLine("{0} already exists.", path);
File.Delete(path);
// write to file.
using (StreamWriter sw = new StreamWriter(path, true, Encoding.Unicode))
{
sw.WriteLine("str1:{0}",txt[0]);
sw.WriteLine("str2:{0}",txt[1]);
sw.WriteLine("str3:{0}",txt[2]);
sw.WriteLine("str4:{0}",txt[3]);
}
}
Output
str1:str1
str2:str3
str3:str4
str4:"blank"
Thats not what i'm looking for. This is what I would like to code:
str1:str1
str2:"blank"
str3:str3
str4:str4

Try this one:
str.Split('|')
Without StringSplitOptions.RemoveEmptyEntries passed, it'll work as you want.

this should do the trick...
string s = "str1||str3|str4";
string[] parts = s.Split('|');

The simplest way is to use Quantification:
using System.Text.RegularExpressions;
...
String [] parts = new Regex("[|]+").split("str1|str2|str3|srt4");
The "+" gets rid of it.
From Wikipedia :
"+" The plus sign indicates that there is one or more of the preceding element. For example, ab+c matches "abc", "abbc", "abbbc", and so on, but not "ac".
Form msdn: The Regex.Split methods are similar to the String.Split method, except Split splits the string at a delimiter determined by a regular expression instead of a set of characters. The input string is split as many times as possible. If pattern is not found in the input string, the return value contains one element whose value is the original input string.
Additional wish can be done with:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication1 {
class Program{
static void Main(string[] args){
String[] parts = "str1||str2|str3".Replace(#"||", "|\"blank\"|").Split(#"|");
foreach (string s in parts)
Console.WriteLine(s);
}
}
}

Try something like this:
string result = "str1||str3|srt4";
List<string> parsedResult = result.Split('|').Select(x => string.IsNullOrEmpty(x) ? "null" : x).ToList();
when using the Split() the resulting string in the array will be empty (not null). In this example i have tested for it and replaced it with the actual word null so you can see how to substitute in another value.

Related

Find exact substring in string array using LINQ in C#

I'm trying to see if an exact substring exists in a string array. It is returning true if the substring exists in the string but it will contains spelling errors.
EDIT:
For example if I am checking if 'Connecticut' exists in the string array but it is spelled 'Connecticute' it will still return true but I do not want it to. I want it to return false for 'Connecticute' and return true for
'Connecticut' only
Is there a way to do this using LINQ?
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
string[] sample = File.ReadAllLines(#"C:\samplefile.txt");
/* Sample file containing data organised like
Niall Gleeson 123 Fake Street UNIT 63 Connecticute 00703 USA
*/
string[] states = File.ReadAllLines(#"C:\states.txt"); //Text file containing list of all US states
foreach (string s in sample)
{
if (states.Any(s.Contains))
{
Console.WriteLine("Found State");
Console.WriteLine(s);
Console.ReadLine();
}
else
{
Console.WriteLine("Could not find State");
Console.WriteLine(s);
Console.ReadLine();
}
}
}
}
}
String.Contains returns true if one part of the string is anywhere within the string being matched.
Hence "Conneticute".Contains("Conneticut") will be true.
If you want exact matches, what you're looking for is String.Equals
...
if (states.Any(s.Equals))
...
You could use \b to match word breaking characters (ie. white spaces, periods, start or end of string etc):
var r = new Regex("\bConneticut\b", RegexOptions.IgnoreCase);
var m = r.Match("Conneticute");
Console.WriteLine(m.Success); // false
Rather than using string.Contains, which matches whether the string contains the sequence of letters, use a regular expression match, with whatever you consider to be appropriate. For example, this will match on word boundaries,
var x = new [] { "Connect", "Connecticute is a cute place", "Connecticut", "Connecticut is a nice place" };
x.Dump();
var p = new Regex(#"\bConnecticut\b", RegexOptions.Compiled);
x.Where(s=>p.IsMatch(s)).Dump();
This will match "Connecticut" and "CConnecticut is a nice place" but not the other strings. Change the regex to suit your exact requirements.
(.Dump() is used in linqpad, which can be used to experiment with this sort of thing )

Get a part of a string using regex

I need to get //table[#data-account='test'] from the string //table[#data-account='test']//span[contains(.,'FB')] using regex.
I am new to regex and not able to use the existing samples for my purpose.
Thanks
You don't need regex for that. You can use String.Split method like;
Returns a string array that contains the substrings in this string
that are delimited by elements of a specified string array.
string s = #"//table[#data-account='test']//span[contains(.,'FB')]";
string[] stringarray = s.Split(new string[1] {#"//"}, StringSplitOptions.RemoveEmptyEntries);
Console.WriteLine("//" + stringarray[0]);
Output will be;
//table[#data-account='test']
Here is a DEMO.
using System;
using System.Text.RegularExpressions;
class P
{
static void Main()
{
Console.WriteLine(
Regex.Match("//table[#data-account='test']//span[contains(.,'FB')]", "^([^]]+])").Groups[1].Value);
}
}

matching and replacing text in a string while keeping non replaced text

I know how to use Regex.Split() and Regex.Replace(); but not how to keep certain data when replacing.
if I had the following lines of text in a String[] (Split after every ;)
"
using system;
using system.blab;
using system.blab.blabity;
"
how would I loop trough and replace all 'using' to '' but match the whole line 'using (.+;)' for example.
and end up with the following (but not just Regex.replace("using", "");)
"
<using> system;
<using> system.blab;
<using> system.blab.blabity;
"
if str is your current string then
string str = #"using system;
using system.blab;
using system.blab.blabity;";
str = str.Replace("using ", "<using> ");
using parens in a Regex instructs the engine to store that value as a group. then when you call Replace, you can reference groups with $n, where n is the number of the group. I haven't tested this, but something like this:
Regex.Replace(input, #"^using( .+;)$", "$1");
Read here for more info
This should get you pretty close. You should use a named group for every logical item you're trying to match. In this instance, you're trying to match everything that is not the string "using". You can then use the notation ${yourGroupName} to reference the match in the replacement string. I wrote a tool called RegexPixie that will show you live matching of your content as you type so you can see what works and what doesn't work.
//the named group has the name "everythingElse"
var regex = new Regex(#"using(?<everythingElse>[^\r\n]+)");
var content = new string [] { /* ... */ };
for(int i = 0; i < content[i]; i++)
{
content[i] = regex.Replace(content[i], "${everythingElse}");
}
This combines 2 of the answers. It wraps word boundaries \b around using to perform a whole words only search and then captures the regex in a back-reference $1
string str = #"using system;
using system.blab;
using system.blab.blabity;";
str = Regex.Replace(str, #"\b(using)\b", "<$1>");

Regex.Split adding empty strings to result array

I have a Regex to split out words operators and brackets in simple logic statements (e.g. "WORD1 & WORD2 | (WORd_3 & !word_4 )". the Regex I've come up with is "(?[A-Za-z0-9_]+)|(?[&!\|()]{1})". Here is a quick test program.
using System;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("* Test Project *");
string testExpression = "!(LIONV6 | NOT_superCHARGED) &RHD";
string removedSpaces = testExpression.Replace(" ", "");
string[] expectedResults = new string[] { "!", "(", "LIONV6", "|", "NOT_superCHARGED", ")", "&", "RHD" };
string[] splits = Regex.Split(removedSpaces, #"(?[A-Za-z0-9_]+)|(?[&!\|()]{1})");
Console.WriteLine("Expected\n{0}\nActual\n{1}", expectedResults.AllElements(), splits.AllElements());
Console.WriteLine("*** Any Key to finish ***");
Console.ReadKey();
}
}
public static class Extensions
{
public static string AllElements(this string[] str)
{
string output = "";
if (str != null)
{
foreach (string item in str)
{
output += "'" + item + "',";
}
}
return output;
}
}
The Regex does the required job of splitting out words and operators into an array in the right sequence, but the result array contains many empty elements, and I can't work out why. Its not a serious problem as I just ignore empty elements when consuming the array but I'd like Regex to do all the work if possible, including ignoring spaces.
Try this:
string[] splits = Regex.Split(removedSpaces, #"(?[A-Za-z0-9_]+)|(?[&!\|()]{1})").Where(x => x != String.Empty);
The spaces are jsut becasue of the way the split works. From the help page:
If multiple matches are adjacent to one another, an empty string is inserted into the array.
What split is doing as standard is taking your matches as delimiters. So in effect the standard that would be returned is a lot of empty strings between the adjacent matches (imagine as a comparison what you might expect if you split ",,,," on ",", you'd probably expect all the gaps.
Also from that help page though is:
If capturing parentheses are used in a Regex.Split expression, any
captured text is included in the resulting string array.
This is the reason you are getting what you actually want in there at all. So effectively it is now showing you the text that has been split (all the empty strings) with the delimiters too.
What you are doing may well be better off done with just matching the regular expression (with Regex.Match) since what is in your regular expression is actually what you want to match.
Something like this (using some linq to convert to a string array):
Regex.Matches(testExpression, #"([A-Za-z0-9_]+)|([&!\|()]{1})")
.Cast<Match>()
.Select(x=>x.Value)
.ToArray();
Note that because this is taking positive matches it doesn't need the spaces to be removed first.
var matches = Regex.Matches(removedSpaces, #"(\w+|[&!|()])");
foreach (var match in matches)
Console.Write("'{0}', ", match); // '!', '(', 'LIONV6', '|', 'NOT_superCHARGED', ')', '&', 'RHD',
Actually, you don't need to delete spaces before extracting your identifiers and operators, the regex I proposed will ignore them anyway.

How to extract the useful data with regular expression in C#?

Sorry guys, it seems like I didn't explain my question clearly. Please allow me to rephrase my question again.
I use WebClient to download the whole webpage and I got the content as a string
"
.......
.....
var picArr ="/d/manhua/naruto/516/1.png|/d/manhua/naruto/516/2.png|/d/manhua/naruto/516/3.png|/d/manhua/naruto/516/4.png|/d/manhua/naruto/516/5.png|/d/manhua/naruto/516/6.png|/d/manhua/naruto/516/7.png|/d/manhua/naruto/516/8.png|/d/manhua/naruto/516/9.png|/d/manhua/naruto/516/10.png|/d/manhua/naruto/516/11.png|/d/manhua/naruto/516/12.png|/d/manhua/naruto/516/13.png|/d/manhua/naruto/516/14.png|/d/manhua/naruto/516/15.png|/d/manhua/naruto/516/16.png"
......
";
in this content, I want to get only one line which is
var picArr ="/d/manhua/naruto/516/1.png|/d/manhua/naruto/516/2.png|/d/manhua/naruto/516/3.png|/d/manhua/naruto/516/4.png|/d/manhua/naruto/516/5.png|/d/manhua/naruto/516/6.png|/d/manhua/naruto/516/7.png|/d/manhua/naruto/516/8.png|/d/manhua/naruto/516/9.png|/d/manhua/naruto/516/10.png|/d/manhua/naruto/516/11.png|/d/manhua/naruto/516/12.png|/d/manhua/naruto/516/13.png|/d/manhua/naruto/516/14.png|/d/manhua/naruto/516/15.png|/d/manhua/naruto/516/16.png"
now I want use regular expression to get this string and get the value of picArr.
my reg exp is
var picArr ="([.]*)"
I think the dot means any characters. But it doesn't work. :(
Any idea?
THanks a lot
/picArr =\"([^\"]+)\"/
If I got this right that's what you need.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ExtractFileNames
{
class Program
{
static void Main(string[] args)
{
string pageData = #"blah blah
var picArr =""/d/manhua/naruto/516/1.png|/d/manhua/naruto/516/2.png|/d/manhua/naruto/516/3.png|/d/manhua/naruto/516/4.png|/d/manhua/naruto/516/5.png|/d/manhua/naruto/516/6.png|/d/manhua/naruto/516/7.png|/d/manhua/naruto/516/8.png|/d/manhua/naruto/516/9.png|/d/manhua/naruto/516/10.png|/d/manhua/naruto/516/11.png|/d/manhua/naruto/516/12.png|/d/manhua/naruto/516/13.png|/d/manhua/naruto/516/14.png|/d/manhua/naruto/516/15.png|/d/manhua/naruto/516/16.png""
more blah decimal blah";
var match = Regex.Match(pageData, #"var\s+picArr\s*=\s*""(.*?)""");
var str = match.Groups[1].Value;
var files = str.Split('|');
foreach(var f in files)
{
Console.WriteLine(f);
}
Console.ReadLine();
}
}
}
Output:
/d/manhua/naruto/516/1.png
/d/manhua/naruto/516/2.png
/d/manhua/naruto/516/3.png
/d/manhua/naruto/516/4.png
/d/manhua/naruto/516/5.png
/d/manhua/naruto/516/6.png
/d/manhua/naruto/516/7.png
/d/manhua/naruto/516/8.png
/d/manhua/naruto/516/9.png
/d/manhua/naruto/516/10.png
/d/manhua/naruto/516/11.png
/d/manhua/naruto/516/12.png
/d/manhua/naruto/516/13.png
/d/manhua/naruto/516/14.png
/d/manhua/naruto/516/15.png
/d/manhua/naruto/516/16.png
If you just want to get the filenames, you could just do a split on the pipe:
var picArr = "/d/manhua/naruto/516/1.png|/d/manhua/naruto/516/2.png|/d/manhua/naruto/516/3.png|/d/manhua/naruto/516/4.png|/d/manhua/naruto/516/5.png|/d/manhua/naruto/516/6.png|/d/manhua/naruto/516/7.png|/d/manhua/naruto/516/8.png|/d/manhua/naruto/516/9.png|/d/manhua/naruto/516/10.png|/d/manhua/naruto/516/11.png|/d/manhua/naruto/516/12.png|/d/manhua/naruto/516/13.png|/d/manhua/naruto/516/14.png|/d/manhua/naruto/516/15.png|/d/manhua/naruto/516/16.png";
var splitPics = picArr.Split('|');
foreach (var pic in splitPics)
{
Console.WriteLine(pic);
}
It looks like you want the value of the string literal in your snippet, "/d/manhua/naruto/516/1.png|..."
Get rid of the square brackets. "." matches any character just as it is, without brackets. Square brackets are for matching a limited set of characters: For example, you'd use "[abc]" to match any "a", "b", or "c".
It looks like the brackets have the effect of escaping the ".", a feature I hadn't known about (or forgot, sometime in the Ordovician). But I tested the regex as you have it with the string value replaced with a series of dots, and the regex matched. It's being treated as a literal "." character, which you would more likely try to match with a backslash escape: "\."
So just get rid of the brackets and it should work. It works in VS2008 for me.

Categories

Resources