Need to split string with regex - c#

i need to split a string in C# .net.
output i am getting : i:0#.f|membership|sdp950452#abctechnologies.com or i:0#.f|membership|tss954652#abctechnologies.com
I need to remove i:0#.f|membership| and #abctechnologies.com from the string. out put i need is sdp950452 or tss954652
also one more string I am getting is Pawar, Jaywardhan and i need it to be jaywardhan pawar
thanks,
Jay

Here is code example how you can do first part with Regex and the second with Splits and Replaces:
using System;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
public class Program
{
public static void Main()
{
//First part
string first = "i:0#.f|membership|sdp950452#abctechnologies.com";
string second = "i:0#.f|membership|tss954652#abctechnologies.com";
string pattern = #"\|[A-Za-z0-9]+\#";
Regex reg = new Regex(pattern);
Match m1 = reg.Match(first);
Match m2 = reg.Match(second);
string result1 = m1.Value.Replace("|",string.Empty).Replace("#",string.Empty);
string result2 = m2.Value.Replace("|", string.Empty).Replace("#", string.Empty);
Console.WriteLine(result1);
Console.WriteLine(result2);
//Second part
string inputString = "Pawar, Jaywardhan";
string a = inputString.ToLower();
var b = a.Split(' ');
var result3 = b[1] + " " + b[0].Replace(",",string.Empty);
}
}
}

Using Linq to reduce the code lines
Link to dotnetfiddle code sample
using System.Linq;
using System;
public class Program
{
public static void Main()
{
//Extract email
string a = "i:0#.f|membership|sdp950452#abctechnologies.com";
string s = a.Split('|').Where(splitted => splitted.Contains("#")).FirstOrDefault().Split('#').First();
Console.WriteLine(s);
//Format Name
string name = "Pawar, Jaywardhan";
string formatted = String.Join(" ",name.Split(',').Reverse()).ToLower().TrimStart().TrimEnd();
Console.WriteLine(formatted);
}
}

Related

String between two strings that exist more than once C#

I have an XML file as string.
I want to filter the value from this string that is located between two given strings (between two tags).
These two strings (tags) can occur more than once.
My string is:
public string text = "<?xml version="1.0" encoding="utf-8"?> <Userlist> <User1 userid="123" agreement="true"> <firstname>Daniel</firstname> <lastname>Brown</lastname> </User1> <User2 userid="124" agreement="false"> <firstname>Charlie</firstname> <lastname>Walsh</lastname> </User2> </Userlist>"
e.g. I would like to have all strings from the following string that are between <firstname> and </firstname>.
Thank`s a lot.
You should use XML library to parse xml. Not string methods. To get unique items you should use GroupBy and then take first item. A Group by produces a two dimensional array List>. GroupBy creates unique keys and then First get one item from each key. See code below which uses xml linq
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication1
{
class Program
{
const string FILENAME = #"c:\temp\test.xml";
static void Main(string[] args)
{
string text = File.ReadAllText(FILENAME);
XDocument doc = XDocument.Parse(text);
List<User> users = doc.Root.Elements().Select(x => new User
{
id = (string)x.Attribute("userid"),
agreement = (Boolean)x.Attribute("agreement"),
firstname = (string)x.Element("firstname"),
lastname = (string)x.Element("lastname")
}).ToList();
List<User> distinct = users.GroupBy(x => x.id)
.Select(x => x.First())
.ToList();
}
}
public class User
{
public string id { get; set; }
public Boolean agreement { get; set; }
public string firstname { get; set; }
public string lastname { get; set; }
}
}
If you are looking for repetition of words, replace characters like < / "" > by space character. So that way you end up with words separated by space. Now split the text in array and use Dictionary to count each occurrence by something like code below
text = text.Replace('<', ' ');
text = text.Replace('>', ' ');
text = text.Replace('\"', ' ');
text = text.Replace('?', ' ');
text = text.Replace('=', ' ');
text = text.Replace('/', ' ');
var textAr = text.Split(' ');
var textDict = new Dictionary<string, int>();
foreach(var word in textAr)
{
if(textDict.ContainsKey(word))
{
textDict[word]++;
}
else
{
textDict.Add(word, 1);
}
}
Console.WriteLine("string: Repetition");
foreach (var key in textDict.Keys)
{
if (!String.IsNullOrWhiteSpace(key) && textDict[key] > 1)
{
Console.WriteLine(key + ": " + textDict[key]);
}
}
Output I get is
string: Repetition
Userlist: 2
User1: 2
userid: 2
agreement: 2
firstname: 4
lastname: 4
User2: 2

Comparing Multiple Strings using .Contains

I am trying to compare a string to see if it contains a curse word. I assumed that I could do this using str.Contains("" || "") although I quickly realized I cannot use || with two strings. What would I use in place of this?
str.Contains("123" || "abc");
I expected it to see if it contains 123 or abc but the code segment does not work as it cannot compare two strings.
var str = "testabc123";
var str2 = "helloworld";
var bannedWords = new List<string>
{
"test",
"ok",
"123"
};
var res = bannedWords.Any(x => str.Contains(x)); //true
var res2 = bannedWords.Any(x => str2.Contains(x)); //false
You can do something like this. Create a list with the swear words, then you can check if the string contains any word in the list.
Try the following approach
using System;
using System.Collections.Generic;
public class Program
{
private static final List<String> curseWords = new List<String>() {"123", "abc"};
public static void Main()
{
String input = "text to be checked with word abc";
if(isContainCurseWord(input)){
Console.WriteLine("Input Contains atlease one curse word");
}else{
Console.WriteLine("input does not contain any curse words")
}
}
public static bool isContainCurseWord(String text){
for(String curse in curseWords){
if(text.Contains(curse)){
return true;
}
}
return false;
}
}
Try -
using System;
using System.Linq;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
var input = "some random string with abc and 123";
var words = new List<String>() {"123", "abc"};
var foundAll = words.Any(word => input.Contains(word));
Console.WriteLine("Contains: {0}", foundAll);
}
}
Try -
var array = new List<String>() {"123", "abc"};
var found = array.Contains("abc");
Console.WriteLine("Contains: {0}", found);

How to extract a substring from one delimiter to another in C#?

My input is going to be as follows:
abc#gmail.com,def#yahoo.com;xyz#gmail.com;ghi#hotmail.com and so on
Now I want my output to be:
abc
def
xyz
ghi
The following is my code:
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main(string[] args)
{
string str;
string[] newstr,newstr2;
Console.WriteLine("Enter the email addresses: ");
str=Console.ReadLine();
newstr=Regex.Split(str,",|;|#");
foreach (string s in newstr)
{
Console.WriteLine(s);
}
}
}
My output right now is:
abc
gmail.com
def
yahoo.com
xyz
gmail.com
ghi
hotmail.com
Any kind of help would be greatly appreciated. Thanks.
You shouldn't use regex for split, and should no split by #. Instead, use the follopwing code:
using System;
public class Program
{
public static void Main(string[] args)
{
string str;
string[] newstr;
Console.WriteLine("Enter the email addresses: ");
str = Console.ReadLine();
newstr = str.Split(new char[] { ',', ';' }); // Split to get a temporal array of addresses
foreach (string s in newstr)
{
Console.WriteLine(s.Substring(0, s.IndexOf('#'))); // Extract the sender from the email addresses
}
}
}
Edit:
Or, with LINQ:
using System;
using System.Linq;
public class Program
{
public static void Main(string[] args)
{
string str;
string[] newstr;
Console.WriteLine("Enter the email addresses: ");
str = Console.ReadLine();
newstr = str.Split(new char[] { ',', ';' }) // Split to get a array of addresses to work with
.Select(s => s.Substring(0, s.IndexOf('#'))).ToArray(); // Extract the sender from the email addresses
foreach (string s in newstr)
{
Console.WriteLine(s);
}
}
}
another approach without RegEx
string input = "abc#gmail.com,def#yahoo.com;xy#gmail.com; ghi#hotmail.com";
var result = input.Split(',', ';').Select(x => x.Split('#').First());
first Split the adresses by , and ;, then select the part before the # by splitting again.
You can use this email regex:
var regex = new Regex(#"(?<name>\w+([-+.']\w+)*)#\w+([-.]\w+)*\.\w+([-.]\w+)*");
var results =
regex.Matches("abc#gmail.com,def#yahoo.com;xyz#gmail.com;ghi#hotmail.com")
.Cast<Match>()
.Select(m => m.Groups["name"].Value)
.ToList();
Perhaps using this might help
str.Substring(0, str.LastIndexOf(" ")<0?0:str.LastIndexOf(" "));
As Mail is a weird thing with a complexe definition, I will never assume that something with an # is a mail.
My best try would be to convert the string to a MailAddress, just in case it look like a mail but it's not one because of some invalid char etc.
string input = "abc#gmail.com,ghi#hotmail.com;notme; #op this is not a mail!";
var result = input
.Split(',', ';') // Split
.Select(x =>
{
string adr = "";
try
{ // Create an MailAddress, MailAddress has no TryParse.
adr = new MailAddress(x).User;
}
catch
{
return new { isValid = false, mail = adr };
}
return new { isValid = true, mail = adr };
})
.Where(x => x.isValid)
.Select(x => x.mail);
Actually, in the regular expression, to capture some substring, you need to wrap the expected content by ( and )
Below code should work
string str22 = "abc#gmail.com;def#yahoo.com,xyz#gmail.com;fah#yao.com,h347.2162#yahoo.com.hk";// ghi#hotmail.com";
List<string> ret = new List<string>();
string regExp = #"(.*?)#.*?[,;]{1}|(.*)#";
MatchCollection matches = Regex.Matches(str22, regExp, RegexOptions.IgnoreCase);
foreach (Match match in matches)
{
if (match.Success)
{
int pvt = 1;
while (string.IsNullOrEmpty(match.Groups[pvt].Value))
{
pvt++;
}
MessageBox.Show(match.Groups[pvt].Value);
}
}
return;
The regular expression is as below
(.*?)#.*?[,;]{1}|(.*)#
(.*?)#.*?[,;]{1} is fetching the substring before # and ? restrict it fetches the first match.
The last email do not contain , or ;, thus add a OR condition and fetch the last email name by the substring before #

Regex extract encoding type from given strings

following a recent thread in stackoverflow, I'm posting a new question:
I have several strings from which I want to extract the encoding type.
I'm willing to do it using regex:
Examples:
utf-8 quoted printable
string str = "=?utf-8?Q?=48=69=67=68=2d=45=6e=64=2d=44=65=73=69=67=6e=65=72=2d=57=61=74=63=68=2d=52=65=70=6c=69=63=61=73=2d=53=61=76=65=2d=54=48=4f=55=53=41=4e=44=53=2d=32=30=31=32=2d=4d=6f=64=65=6c=73?=";
utf-8 Base 64
string fld4 = "=?utf-8?B?VmFsw6lyaWUgTWVqc25lcm93c2tp?= <Valerie.renamed#company.com>";
Windows 1258 Base 64
string msg2= "=?windows-1258?B?UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?= =?windows-1258?B?IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=?=";
iso-8859-1 Quoted printable
string fld2 = "=?iso-8859-1?Q?Fr=E9d=E9ric_Germain?= <Frederic.Germain#company.com>";
etc...
In order to write a generic decoding function, we need to extract:
the charset (utf-8, Windows1258, etc...)
the transfert encoding type (quoted printable or base 64)
the encoded string
Any idea how to extract the pattern between ?xxx?Q? or ?xxx?B?
Note: this can be uppercase or lowercase
Thanks.
Here is a Rubular that will do it for you. In short, this Regex =\?(.*?)\?[QBqb] will grab that encoding. But one thing to note is this, when grabbing the results, the third example you gave has two matches in it so just make sure you decide what you want to do with the second match.
Here is a full working solution
public class Encoded
{
public string Charset;
public string ContentTransfertEncoding;
public string Data;
}
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication2
{
public class Decoding
{
public Decoding()
{
}
public List<Encoded> Process(string data)
{
List<Encoded> list = new List<Encoded>();
var occurences = new Regex(#"=\?[a-zA-Z0-9?=-]*\?[BbQq]\?[a-zA-Z0-9?=-]*\?=", RegexOptions.IgnoreCase);
var matches = occurences.Matches(data);
foreach (Match match in matches)
{
Encoded cls = new Encoded();
cls.Data = match.Groups[0].Value;
cls.Charset = GetCharset(cls.Data);
cls.ContentTransfertEncoding = GetContentTransfertEncoding(cls.Data);
// cleanup data
int pos = cls.Data.IndexOf("=?");
pos = cls.Data.IndexOf("?",pos+ 2);
cls.Data = cls.Data.Substring(pos + 3);
cls.Data = cls.Data.Replace("?=", "");
list.Add(cls);
}
return list;
}
private string GetContentTransfertEncoding(string data)
{
var occurences = new Regex(#"=\?(.*?)\?[QBqb]", RegexOptions.IgnoreCase);
var matches = occurences.Matches(data);
foreach (Match match in matches)
{
int pos = match.Groups[0].Value.LastIndexOf('?');
return match.Groups[0].Value.Substring(pos+1);
}
return data;
}
public string GetCharset(string data)
{
var occurences = new Regex(#"=\?(.*?)\?[QBqb]", RegexOptions.IgnoreCase);
var matches = occurences.Matches(data);
foreach (Match match in matches)
{
string str1 = match.Groups[0].Value.Replace("=?", "");
int pos = str1.IndexOf('?');
str1 = str1.Substring(0, pos);
return str1; // there should be only 1 match
}
return data;
}
public string Decodeetc...()
}

C#: How can I cut a String based on a Value?

I have this string:
value1*value2*value3*value4
How would cut the String in multiple Strings?
string1 = value1;
string2 = value2;
etc...
My way (and probably not a very good way):
I take an array with all the indexes of the "*" character and after that, I call the subString method to get what I need.
string valueString = "value1*value2*value3*value4";
var strings = valueString.Split('*');
string string1 = strings[0];
string string2 = strings[1];
...
More info here.
Try this
string string1 = "value1*value2*value3*value4";
var myStrings = string1.Split('*');
string s = "value1*value2*value3*value4";
string[] array = s.Split('*');
simply :
string[] parts = myString.Split("*");
parts will be an array of string (string[])
you can just use the Split() method of the String-Object like so:
String temp = "value1*value2*value3*value4";
var result = temp.Split(new char[] {'*'});
The result variable is a string[] with the four values.
If you want to shine in society, you can also use dynamic code :
using System;
using System.Dynamic;
namespace ConsoleApplication1
{
class DynamicParts : System.Dynamic.DynamicObject
{
private string[] m_Values;
public DynamicParts(string values)
{
this.m_Values = values.Split('*');
}
public override bool TryGetMember(GetMemberBinder binder, out object result)
{
var index = Convert.ToInt32(binder.Name.Replace("Value", ""));
result = m_Values[index - 1];
return true;
}
public static void Main()
{
dynamic d = new DynamicParts("value1*value2*value3*value4");
Console.WriteLine(d.Value1);
Console.WriteLine(d.Value2);
Console.WriteLine(d.Value3);
Console.WriteLine(d.Value4);
Console.ReadLine();
}
}
}

Categories

Resources