C# How to split text, but without removing delimiter?

C# How to split text, but without removing delimiter? - c#

I wanna split text by mathematical symbols [(),-,+,/,*,^].
For eg. "(3*21)+4/2" should make array {"(","3","*","21",")","+","4","/","2"}
I was trying do that by regex.split but brackets are problematic.

You can run through source string, adding to array cell if current value is a number, or moving to next array cell if not ([,*,-, etc...).

Not sure what problem you encountered with Regex.Split, but it seems quite simple. All you have to do is escape the character that have special meaning in regex. Like so:
string input = "(3*21+[3-5])+4/2";
string pattern = #"(\()|(\))|(\d+)|(\*)|(\+)|(-)|(/)|(\[)|(\])";
var result = Regex.Matches(input, pattern);
var result2 = Regex.Split(input, pattern);
Edit: updated pattern, '-' and '/' don't have to be escaped.
Afterwards you got 2 options: first one is using Split, it will make string array, but in between every match will be empty string. That's why I think you should go for Matches and transforming it to array of string is simple afterwards.
string[] stringResult = (from Match match in result select match.Value).ToArray();
stringResult
{string[15]}
[0]: "("
[1]: "3"
[2]: "*"
[3]: "21"
[4]: "+"
[5]: "["
[6]: "3"
[7]: "-"
[8]: "5"
[9]: "]"
[10]: ")"
[11]: "+"
[12]: "4"
[13]: "/"
[14]: "2"

I really think something like this will come in handy..
First, use getline and take all the input or if u already have a string, store it.
string input = Console.ReadLine();
Then create an array of length string.length...
string[] arr = new string[input.Length];
//Make sure ur input doesnt have spaces
Then store each value of the array to the value of string!! Like this
str[0]=arr[0];
This should work properly do this for all the characters or could use a for loop..
for(int i=0;i<input.Length;i++){
str[i]=arr[i];
}
That's it ...

Related

sorting on List<string> with middle 2 character

I like to sort a list with middle 2 character. for example: The list contains following:
body1text
body2text
body11text
body3text
body12text
body13text
if I apply list.OrderBy(r => r.body), it will sort as follows:
body1text
body11text
body12text
body13text
body2text
body3text
But I need the following result:
body1text
body2text
body3text
body11text
body12text
body13text
is there any easy way to sort with middle 2 digit character?
Regards
Shuvra

The issue here is that your numbers are compared as strings, so string.Compare("11", "2") will return -1 meaning that "11" is less than "2". Assuming that your string is always in format "body" + n numbers + "text" you can match numbers with regex and parse an integer from result:
new[]
{
"body1text"
,"body2text"
,"body3text"
,"body11text"
,"body12text"
,"body13text"
}
.OrderBy(s => int.Parse(Regex.Match(s,#"\d+").Value))

The specificity of sorting

Code of the character '-' is 45, code of the character 'a' is 97. It's clear that '-' < 'a' is true.
Console.WriteLine((int)'-' + " " + (int)'a');
Console.WriteLine('-' < 'a');
45 97
True
Hence the result of the following sort is correct
var a1 = new string[] { "a", "-" };
Console.WriteLine(string.Join(" ", a1));
Array.Sort(a1);
Console.WriteLine(string.Join(" ", a1));
a -
- a
But why the result of the following sort is wrong?
var a2 = new string[] { "ab", "-b" };
Console.WriteLine(string.Join(" ", a2));
Array.Sort(a2);
Console.WriteLine(string.Join(" ", a2));
ab -b
ab -b

The - is ignored,
so - = "" < a
and -b = "b" > "ab"
this is because of Culture sort ( which is default )
https://msdn.microsoft.com/en-us/library/system.globalization.compareoptions(v=vs.110).aspx
The .NET Framework uses three distinct ways of sorting: word sort, string
sort, and ordinal sort. Word sort performs a culture-sensitive
comparison of strings. Certain nonalphanumeric characters might have
special weights assigned to them. For example, the hyphen ("-") might
have a very small weight assigned to it so that "coop" and "co-op"
appear next to each other in a sorted list. String sort is similar to
word sort, except that there are no special cases. Therefore, all
nonalphanumeric symbols come before all alphanumeric characters.
Ordinal sort compares strings based on the Unicode values of each
element of the string.

split field/value from xml STRING not formatted

I have one string in format of XML, (this is not well-formatted XML!) and I would like to get the field and value
<MYXML
address="rua sao carlos, 128" telefone= "1000-222" service="xxxxxx" source="xxxxxxx" username="aaaaaaa" password="122222" nome="asasas" sobrenome="aass" email="sao.aaaaa#aaaaa.com.br" pais="SS" telefone="4002-" />
I would like to get the parameter and value separeted in split.
I try this:
xml.ToString().Replace(" =" , "=").Replace("= " , "=").Replace(" = " , "=").Split(new char[]{' '});
But not work perfect becase for example the attribute 'address' was split in two items
{string[29]}
[0]: "<signature"
[1]: "aaa=\"xxxx\""
[2]: "sss=\"xxxx\""
[3]: "ssss=\"xxx\""
[4]: "username=\"xxx\""
[5]: "password=\"xxxx\""
[6]: "nome=\"xxxx\""
[7]: "sobrenome=\"xxx\""
[8]: "email=\"xxx.xxx#xxx.com.br\""
[9]: "pais=\"BR\""
[10]: "endereco=\"Rua"
[11]: "Sao"
[12]: "Carlos,"
[13]: "128\""
[14]: "cidade=\"Sao"
[15]: "Paulo\""
The error is
[10]: "endereco=\"Rua"
[11]: "Sao"
[12]: "Carlos,"
When the correct I would like is
[10]: "endereco=\"Rua Sao Carlos , 128"

A regular expression will work for this as you are working with badly formed xml.
Regex regex = new Regex("\\s\\w+=\"(\\w|\\s|,|=|#|-|\\.)+\"");
MatchCollection matches = regex.Matches(searchText);
foreach (var match in matches)
{
//your code here
}
Tested with your example string and matches were as expected.
Hope this Helps!

I would suggest you to use xPath or Linq to parse this xml. The way you are using is not a good way and that is why you end up in error."Rua Sao Carlo" contains three words separated by single space ; as a result when you try to split it with single space, it also splits the address

Try this overload of Split. It will allow you to use a string as the splitter token, namely '" ' (that is quote and space). This will split to the name and attribute pairs. Then take the resulting array, and split it again on = (equals) to get the pairs you need, then do as you will with them. Hope this gets you headed in the right direction

As already noted, you have badly formed XML. If you were to fix it, by either renaming or removing on of the telephone attributes, you could break down your XML like this:
This is the correct way to handle the XML, if however you do not have control over getting proper xml and must work w/ junk, i'd suggest the regex answer by #AFrieze.
var xmlString = #"<MYXML address=""rua sao carlos, 128"" service=""xxxxxx"" source=""xxxxxxx"" username=""aaaaaaa"" password=""122222"" nome=""asasas"" sobrenome=""aass"" email=""sao.aaaaa#aaaaa.com.br"" pais=""SS"" telefone=""4002-"" />";
var xml = XDocument.Parse(xmlString);
var values = xml.Descendants("MYXML").SelectMany(x => x.Attributes()).ToArray();
foreach (var value in values)
{
Console.WriteLine(value);
}
Console.Read();
This returns:
address="rua sao carlos, 128"
service="xxxxxx"
source="xxxxxxx"
username="aaaaaaa"
password="122222"
nome="asasas"
sobrenome="aass"
email="sao.aaaaa#aaaaa.com.br"
pais="SS"
telefone="4002-"

Regex masking of words that contain a digit

Trying to come up with a 'simple' regex to mask bits of text that look like they might contain account numbers.
In plain English:
any word containing a digit (or a train of such words) should be matched
leave the last 4 digits intact
replace all previous part of the matched string with four X's (xxxx)
So far
I'm using the following:
[\-0-9 ]+(?<m1>[\-0-9]{4})
replacing with
xxxx${m1}
But this misses on the last few samples below
sample data:
123456789
a123b456
a1234b5678
a1234 b5678
111 22 3333
this is a a1234 b5678 test string
Actual results
xxxx6789
a123b456
a1234b5678
a1234 b5678
xxxx3333
this is a a1234 b5678 test string
Expected results
xxxx6789
xxxxb456
xxxx5678
xxxx5678
xxxx3333
this is a xxxx5678 test string
Is such an arrangement possible with a regex replace?
I think I"m going to need some greediness and lookahead functionality, but I have zero experience in those areas.

This works for your example:
var result = Regex.Replace(
input,
#"(?<!\b\w*\d\w*)(?<m1>\s?\b\w*\d\w*)+",
m => "xxxx" + m.Value.Substring(Math.Max(0, m.Value.Length - 4)));
If you have a value like 111 2233 33, it will print xxxx3 33. If you want this to be free from spaces, you could turn the lambda into a multi-line statement that removes whitespace from the value.
To explain the regex pattern a bit, it's got a negative lookbehind, so it makes sure that the word behind it does not have a digit in it (with optional word characters around the digit). Then it's got the m1 portion, which looks for words with digits in them. The last four characters of this are grabbed via some C# code after the regex pattern resolves the rest.

I don't think that regex is the best way to solve this problem and that's why I am posting this answer. For so complex situations, building the corresponding regex is too difficult and, what is worse, its clarity and adaptability is much lower than a longer-code approach.
The code below these lines delivers the exact functionality you are after, it is clear enough and can be easily extended.
string input = "this is a a1234 b5678 test string";
string output = "";
string[] temp = input.Trim().Split(' ');
bool previousNum = false;
string tempOutput = "";
foreach (string word in temp)
{
if (word.ToCharArray().Where(x => char.IsDigit(x)).Count() > 0)
{
previousNum = true;
tempOutput = tempOutput + word;
}
else
{
if (previousNum)
{
if (tempOutput.Length >= 4) tempOutput = "xxxx" + tempOutput.Substring(tempOutput.Length - 4, 4);
output = output + " " + tempOutput;
previousNum = false;
}
output = output + " " + word;
}
}
if (previousNum)
{
if (tempOutput.Length >= 4) tempOutput = "xxxx" + tempOutput.Substring(tempOutput.Length - 4, 4);
output = output + " " + tempOutput;
previousNum = false;
}

Have you tried this:
.*(?<m1>[\d]{4})(?<m2>.*)
with replacement
xxxx${m1}${m2}
This produces
xxxx6789
xxxx5678
xxxx5678
xxxx3333
xxxx5678 test string
You are not going to get 'a123b456' to match ... until 'b' becomes a number. ;-)

Here is my really quick attempt:
(\s|^)([a-z]*\d+[a-z,0-9]+\s)+
This will select all of those test cases. Now as for C# code, you'll need to check each match to see if there is a space at the beginning or end of the match sequence (e.g., the last example will have the space before and after selected)
here is the C# code to do the replace:
var redacted = Regex.Replace(record, #"(\s|^)([a-z]*\d+[a-z,0-9]+\s)+",
match => "xxxx" /*new String("x",match.Value.Length - 4)*/ +
match.Value.Substring(Math.Max(0, match.Value.Length - 4)));

checking input for morse code converter

I want to check the input from the user to make sure that they only enter dots and dashes and any other letters or numbers will give back and error message. Also i wanted to allow the user to enter a space yet when i am converting how can i remove or ignore the white space?
string permutations;
string entered = "";
do
{
Console.WriteLine("Enter Morse Code: \n");
permutations = Console.ReadLine();
.
.
} while(entered.Length != 0);
Thanks!

string permutations = string.Empty;
Console.WriteLine("Enter Morse Code: \n");
permutations = Console.ReadLine(); // read the console
bool isValid = Regex.IsMatch(permutations, #"^[-. ]+$"); // true if it only contains whitespaces, dots or dashes
if (isValid) //if input is proper
{
permutations = permutations.Replace(" ",""); //remove whitespace from string
}
else //input is not proper
{
Console.WriteLine("Error: Only dot, dashes and spaces are allowed. \n"); //display error
}

Let's assume that you separate letters by a single space and words by two spaces. Then you can test if your string is well formatted by using a regular expression like this
bool ok = Regex.IsMatch(entered, #"^(\.|-)+(\ {1,2}(\.|-)+)*$");
Regular expression explained:
^ is the beginning of the string.
\.|- is a dot (escaped with \ as the dot has a special meaning within Regex) or (|) a minus sign.
+ means one or more repetitions of what's left to it (dot or minus).
\ {1,2} one or two spaces (they are followed by dots or minuses again (\.|-)+).
* repeats the space(s) followed by dots or minuses zero or more times.
$ is the end of the line.
You can split the string at the spaces with
string[] parts = input.Split();
Two spaces will create an empty entry. This allows you to detect word boundaries. E.g.
"–– ––– .–. ... . –.–. ––– –.. .".Split();
produces the following string array
{string[10]}
[0]: "––"
[1]: "–––"
[2]: ".–."
[3]: "..."
[4]: "."
[5]: ""
[6]: "–.–."
[7]: "–––"
[8]: "–.."
[9]: "."

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# How to split text, but without removing delimiter? - c#

I wanna split text by mathematical symbols [(),-,+,/,,^]. For eg. "(321)+4/2" should make array {"(","3","*","21",")","+","4","/","2"} I was trying do that by regex.split but brackets are problematic.

You can run through source string, adding to array cell if current value is a number, or moving to next array cell if not ([,*,-, etc...).

Related

sorting on List<string> with middle 2 character

The specificity of sorting

split field/value from xml STRING not formatted

Regex masking of words that contain a digit

checking input for morse code converter

Categories

Resources

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# How to split text, but without removing delimiter? - c#

I wanna split text by mathematical symbols [(),-,+,/,*,^]. For eg. "(3*21)+4/2" should make array {"(","3","*","21",")","+","4","/","2"} I was trying do that by regex.split but brackets are problematic.

You can run through source string, adding to array cell if current value is a number, or moving to next array cell if not ([,*,-, etc...).

Related

sorting on List<string> with middle 2 character

The specificity of sorting

split field/value from xml STRING not formatted

Regex masking of words that contain a digit

checking input for morse code converter

Categories

Resources

I wanna split text by mathematical symbols [(),-,+,/,,^]. For eg. "(321)+4/2" should make array {"(","3","*","21",")","+","4","/","2"} I was trying do that by regex.split but brackets are problematic.