Detect Junk Characters in string

Detect Junk Characters in string - c#

I want to allow user to enter characters,numbers and special characters but no the JUNK Characters (ex. ♠ ♣ etc) whose ascii value is greater thane 127.
I have function like this
for (int i = 0; i < value.Length; i++) // value is input string
{
if ((int)value[i] < 32 || (int)value[i] > 126)
{
// show error
}
}
This makes code bit slower as i have to compare each and every string and its character.
Can anyone suggest better approach ?

Well, for one thing you can make the code simpler:
foreach (char c in value)
{
if (c < 32 || c > 126)
{
...
}
}
Or using LINQ, if you just need to know if any characters are non-ASCII:
bool bad = value.Any(c => c < 32 || c > 126);
... but fundamentally you're not going to be able to detect non-ASCII characters without iterating over every character in the string...

You can make regular expression which allowed all the desired characters and use it for each strings. I think this will improve the performance. All you have to do is to create a proper regular expression.
Update: However, Using RegEx will not improve the speed it will just minimize the code lines.

Related

C# locating where the * is in a string separated by pipes

I have to find where a * is at when it could be none at all , 1st position | 2nd position | 3rd position.
The positions are separated by pipes |
Thus
No * wildcard would be
`ABC|DEF|GHI`
However, while that could be 1 scenario, the other 3 are
string testPosition1 = "*|DEF|GHI";
string testPosition2 = "ABC|*|GHI";
string testPosition3 = "ABC|DEF|*";
I gather than I should use IndexOf , but it seems like I should incorporate | (pipe) to know the position ( not just the length as the values could be long or short in each of the 3 places. So I just want to end up knowing if * is in first, second or third position ( or not at all )
Thus I was doing this but i'm not going to know about if it is before 1st or 2nd pipe
if(testPosition1.IndexOf("*") > 0)
{
// Look for pipes?
}

There are lots of ways you could approach this. The most readable might actually just be to do it the hard way (i.e. scan the string to find the first '*' character, keeping track of how many '|' characters you see along the way).
That said, this could be a similarly readable and more concise:
int wildcardPosition = Array.IndexOf(testPosition1.Split('|'), "*");
Returns -1 if not found, otherwise 0-based index for which segment of the '|' delimited string contains the wildcard string.
This only works if the wildcard is exactly the one-character string "*". If you need to support other variations on that, you will still want to split the string, but then you can loop over the array looking for whatever criteria you need.

You can try with linq splitting the string at the pipe character and then getting the index of the element that contains just a *
var x = testPosition2.Split('|').Select((k, i) => new { text = k, index = i}).FirstOrDefault(p => p.text == "*" );
if(x != null) Console.WriteLine(x.index);
So the first line starts splitting the string at the pipe creating an array of strings. This sequence is passed to the Select extension that enumerates the sequence passing the string text (k) and the index (i). With these two parameters we build a sequences of anonymous objects with two properties (text and index). FirstOrDefault extract from this sequence the object with text equals to * and we can print the property index of that object.

The other answers are fine (and likely better), however here is another approach, the good old fashioned for loop and the try-get pattern
public bool TryGetStar(string input, out int index)
{
var split = input.Split('|');
for (index = 0; index < split.Length; index++)
if (split[index] == "*")
return true;
return false;
}
Or if you were dealing with large strings and trying to save allocations. You could remove the Split entirely and use a single parse O(n)
public bool TryGetStar(string input, out int index)
{
index = 0;
for (var i = 0; i < input.Length; i++)
if (input[i] == '|') index++;
else if (input[i] == '*') return true;
return false;
}
Note : if performance was a consideration, you could also use unsafe and pointers, or Span<Char> which would afford a small amount of efficiency.

Try DotNETFiddle:
testPosition.IndexOf("*") - testPosition.Replace("|","").IndexOf("*")
Find the index of the wildcard ("*") and see how far it moves if you remove the pipe ("|") characters. The result is a zero-based index.

From the question you have the following code segment:
if(testPosition1.IndexOf("*") > 0)
{
}
If you're now inside the if statement, you're sure the asterisk exists.
From that point, an efficient solution could be to check the first two chars, and the last two chars.
if (testPosition1.IndexOf("*") > 0)
{
if (testPosition1[0] == '*' && testPosition[1] == '|')
{
// First position.
}
else if (testPosition1[testPosition.Length - 1] == '*' && testPosition1[testPosition.Length - 2] == '|')
{
// Third (last) position.
}
else
{
// Second position.
}
}
This assumes that no more than one * can exist, and also assumes that if an * exist, it can only be surrounded by pipes. For example, I assume an input like ABC|DEF|G*H is invalid.
If you want to remove this assumptions, you could do a one-pass loop over the string and keeping track with the necessary information.

how to add a sign between each letter in a string in C#?

I have a task, in which i have to write a function called accum, which transforms given string into something like this:
Accumul.Accum("abcd"); // "A-Bb-Ccc-Dddd"
Accumul.Accum("RqaEzty"); // "R-Qq-Aaa-Eeee-Zzzzz-Tttttt-Yyyyyyy"
Accumul.Accum("cwAt"); // "C-Ww-Aaa-Tttt"
So far I only converted each letter to uppercase and... Now that I am writing about it, I think it could be easier for me to - firstly multiply the number of each letter and then add a dash there... Okay, well let's say I already multiplied the number of them(I will deal with it later) and now I need to add the dash. I tried several manners to solve this, including: for and foreach(and now that I think of it, I can't use foreach if I want to add a dash after multiplying the letters) with String.Join, String.Insert or something called StringBuilder with Append(which I don't exactly understand) and it does nothing to the string.
One of those loops that I tried was:
for (int letter = 0; letter < s.Length-1; letter += 2) {
if (letter % 2 == 0) s.Replace("", "-");
}
and
for (int letter = 0; letter < s.Length; letter++) {
return String.Join(s, "-");
}
The second one returns "unreachable code" error. What am I doing wrong here, that it does nothing to the string(after uppercase convertion)? Also, is there any method to copy each letter, in order to increase the number of them?

As you say string.join can be used as long as an enumerable is created instead of a foreach. Since the string itself is enumerable, you can use the Linq select overload which includes an index:
var input = "abcd";
var res = string.Join("-", input.Select((c,i) => Char.ToUpper(c) + new string(Char.ToLower(c),i)));
(Assuming each char is unique or can be used. e.g. "aab" would become "A-Aa-Bbb")
Explanation:
The Select extension method takes a lambda function as parameter with c being a char and i the index. The lambda returns an uppercase version of the char (c) folowed by a string of the lowercase char of the index length (new string(char,length)), (which is an empty string for the first index). Finally the string.join concatenates the resulting enumeration with a - between each element.

Use this code.
string result = String.Empty;
for (int i = 0; i < s.Length; i++)
{
char c = s[i];
result += char.ToUpper(c);
result += new String(char.ToLower(c), i);
if (i < s.Length - 1)
{
result += "-";
}
}
It will be better to use StringBuilder instead of strings concatenation, but this code can be a bit more clear.

Strings are immutable, which means that you cannot modify them once you created them. It means that Replace function return a new string that you need to capture somehow:
s = s.Replace("x", "-");
you currently are not assigning the result of the Replace method anywhere, that's why you don't see any results

For the future, the best way to approach problems like this one is not to search for the code snippet, but write down step by step algorithm of how you can achieve the expected result in plain English or some other pseudo code, e.g.
Given I have input string 'abcd' which should turn into output string 'A-Bb-Ccc-Dddd'.
Copy first character 'a' from the input to Buffer.
Store the index of the character to Index.
If Buffer has only one character make it Upper Case.
If Index is greater then 1 trail Buffer with Index-1 lower case characters.
Append dash '-' to the Buffer.
Copy Buffer content to Output and clear Buffer.
Copy second character 'b' from the input to Buffer.
...
etc.
Aha moment often happens on the third iteration. Hope it helps! :)

Bitwise OR on strings for large strings in c#

I have two strings(with 1's and 0's) of equal lengths(<=500) and would like to apply Logical OR on these strings.
How should i approach on this. I'm working with c#.
When i consider the obvious solution, reading each char and applying OR | on them, I have to deal with apx, 250000 strings each with 500 length. this would kill my performance.
Performance is my main concern.
Thanks in advance!

This is fastest way:
string x="";
string y="";
StringBuilder sb = new StringBuilder(x.Length);
for (int i = 0; i < x.Length;i++ )
{
sb.Append(x[i] == '1' || y[i] == '1' ? '1' : '0');
}
string result = sb.ToString();

Since it was mentioned that speed is a big factor, it would be best to use bit-wise operations.
Take a look at an ASCII table:
The character '0' is 0x30, or 00110000 in binary.
The character '1' is 0x31, or 00110001 in binary.
Only the last bit of the character is different. As such - we can safely say that performing a bitwise OR on the characters themselves will produce the correct character.
Another important thing we can do is do to optimize speed is to use a StringBuilder, initialized to the initial capacity of our string. Or even better: we can reuse our StringBuilder for multiple operations, although we have to ensure the StringBuilder has enough capacity.
With those optimizations considered, we can make this method:
string BinaryStringBitwiseOR(string a, string b, StringBuilder stringBuilder = null)
{
if (a.Length != b.Length)
{
throw new ArgumentException("The length of given string parameters didn't match");
}
if (stringBuilder == null)
{
stringBuilder = new StringBuilder(a.Length);
}
else
{
stringBuilder.Clear().EnsureCapacity(a.Length);
}
for (int i = 0; i < a.Length; i++)
{
stringBuilder.Append((char)(a[i] | b[i]));
}
return stringBuilder.ToString();
}
Note that this will work for all bit-wise operations you would like to perform on your strings, you only have to modify the | operator.

I've found this to be faster than all proposed solutions. It combines elements from #Gediminas and #Sakura's answers, but uses a pre-initialized char[] rather than a StringBuilder.
While StringBuilder is efficient at memory management, each Append operation requires some bookkeeping of the marker, and performs more actions than only an index into an array.
string x = ...
string y = ...
char[] c = new char[x.Length];
for (int i = 0; i < x.Length; i++)
{
c[i] = (char)(x[i] | y[i]);
}
string result = new string(c);

I have two strings(with 1's and 0's) of equal lengths(<=500) and would
like to apply Logical OR on these strings.
You can write a custom logical OR operator or function which takes two characters as input and produces result (e.g. if at least one of input character is '1' return '1' - otherwise return '0'). Apply this function to each character in your strings.
You can also look at this approach. You'd first need to convert each character to boolean (e.g. '1' corresponds to true), perform OR operation between two boolean values, convert back result to character '0' or '1' - depending if result of logical OR was false or true respectively. Then just append each result of this operation to each other.

You can use a Linq query to zip and then aggregate the results:
var a = "110010";
var b = "001110";
var result = a.Zip(b, (i, j) => i == '1' || j == '1' ? '1' : '0')
.Select(i => i + "").Aggregate((i, j) => i + j);
Basically, the Zip extension method, takes two sequences and apply an action on each corresponding elements of the two sequences. Then I use Select to cast from char to String and finally I aggregate the results from a sequence of strings (of "0" and "1") to a String.

Substrings and Char.Is/Number Confusion. [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I'm a beginner at c# and how could I write a code that will check if:
the first 3 characters are letters
the next 3 are numbers,
next two letters
And the last character a number.
And if it is isn't write an error message.
I've tried using Substring(0,3) and putting it against Char.IsLetter just to attempt but failed.

Here's a correct way to do it using char.IsLetter and char.IsNumber.
if(myString.Length == 9
&& char.IsLetter(myString[0])
&& char.IsLetter(myString[1])
&& char.IsLetter(myString[2])
&& char.IsNumber(myString[3])
&& char.IsNumber(myString[4])
&& char.IsNumber(myString[5])
&& char.IsLetter(myString[6])
&& char.IsLetter(myString[7])
&& char.IsNumber(myString[8]))
{
// match.
}
Basically you have validate the length of the string, and then validate each character.
You could also use char.IsDigit to limit the match to radix-10 digit versus char.IsNumber that will match any Unicode character that is deemed a number (fractions, subscripts, superscripts, Roman numerals, currency numerators, encircled numbers, and script-specific digits). Also char.IsLetter will also match any Unicode character that is deemed a letter which will stray outside of the basic A-Z. To restrict numbers to 0-9 and letters to A-Z you could do this instead.
public static IsAtoZ(char c)
{
return ('a' <= c && c <= 'z') || ('A' <= c && c <= 'Z');
}
if(myString.Length == 9
&& IsAtoZ(myString[0])
&& IsAtoZ(myString[1])
&& IsAtoZ(myString[2])
&& char.IsDigit(myString[3])
&& char.IsDigit(myString[4])
&& char.IsDigit(myString[5])
&& IsAtoZ(myString[6])
&& IsAtoZ(myString[7])
&& char.IsDigit(myString[8]))
{
// match.
}
But honestly at this point a regular expression will be more terse. But note that you'll still have to consider if you want to match Unicode characters and use the correct regular expression based on that.

you can use a regex
bool isMatch= Regex.IsMatch("abc123de4", #"\p{L}{3}\d{3}\p{L}{2}\d");

You could just use a regular expression:
var regex = new Regex("^[a-z]{3}[0-9]{2}[a-z]{2}[0-9]{1}$", RegexOptions.IgnoreCase);
var matces = regex.Matches(input);
where input is the string you want to check.
When we pass the option RegexOptions.IgnoreCase to the constructor of the Regex, we say that it doesn't matter if the letters are capital or not.
You could avoid to specify this parameter and state explicitly that you want both capital and small letters, as Rahul has correctly pointed out in his comment. This is done like below:
var regex = new Regex("^[a-z][A-Z]{3}[0-9]{2}[a-z][A-Z]{2}[0-9]{1}$");
var matces = regex.Matches(input);

You can access the individual characters of a string in C# like this:
string s = "1test";
char c = s[0];
c will be '1' one then.
In the next step you can use the Char.IsNumber Method which returns a bool. Just like this:
if(c.IsNumber()){}
Then you do the same thing for the next two chars except that you use the Char.IsLetter method.

I think there are several elegant ways to do this. Since you said that you're a beginner to C#, I would suggest just finding the easiest (most pseudo-code-like, IMHO) way to just express the problem/solution:
private bool MatchesPattern(string test)
{
// can't possibly match the pattern with less than 9 chars
if (test.Length < 9) return false;
int idx = 0;
// test are letters
for (int steps = 1; steps <= 3; steps++)
{
if (!char.IsLetter(test[idx++])) return false;
}
// test are numbers
for (int steps = 1; steps <= 3; steps++)
{
if (!char.IsNumber(test[idx++])) return false;
}
// test are letters
for (int steps = 1; steps <= 2; steps++)
{
if (!char.IsLetter(test[idx++])) return false;
}
// test last char is number
if (!char.IsNumber(test.Last())) return false;
return true;
}
You can test the results:
private void Test(string testValue)
{
if (!MatchesPattern(testValue))
{
Console.WriteLine("Error!");
}
}

How to escape variables in C#?

I want to know if what wrong is with
this peace of code? I mean how to escape
to variables from strings (in C#)? (if I had correctly understood what escaping is.)
int i;
string s;
for(i=0;i<10;i++)
{
s="\u00A"+i;
Console.WriteLine(s);
}
(Actually I want the program to write
a sereies of unicode chatacters to make a uncode table for example 00A9=® and 00A5=¥.
Or in general I want to use a variable after backslash)
To make it simple:
string a,b;
a="t";
b="n";
Console.WriteLine("Hi\{0}All\{1}",a,b);
Which I want to type a tab and insert a new line (I know it's possible to write \n and \t in WriteLine directly but I assume we want to get the special chracter from user)
Thanks in advance. ☺

int i;
string x;
for(i=0;i<10;i++)
{
x= #"\u00A" + i; //if you want the backslash
// without # it's x= "\\u00A"+i; //so the backslash is escaped
Console.WriteLine(x);
}
Edit
for (int i = 0; i < 10; i++)
{
Console.WriteLine(
char
.ConvertFromUtf32(
int.Parse("00A"+ i, System.Globalization.NumberStyles.HexNumber)));
}
The only thing you can do about the other problem:
string a,b;
a="\t";
b="\n";
Console.WriteLine("Hi{0}All{1}",a,b);

Unicode escape sequence requires 4 digits:
for(var i = 0; i < 10; i++)
{
var x="\u000A"+i; // notice 3 zeros
Console.WriteLine(x);
}
Notes
usually you'd use "\n" for new line (or even Environemnt.NewLine) like Console.Write("\n" + i);
adding integer (or any other value) to string causes automatic call to .ToString so you can add any object to string. Often it is better to use String.Format("\n {0}", i); instead of + to provide more flexible/readable formatting (adding many strings - have better methods - search).
if you are looking for writing out characters with code to particular integer value (like 32 is space) you should cast it to char: x="\u000A"+(char)i - also you should pick printable range (like 32-42).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Detect Junk Characters in string - c#

You can make regular expression which allowed all the desired characters and use it for each strings. I think this will improve the performance. All you have to do is to create a proper regular expression. Update: However, Using RegEx will not improve the speed it will just minimize the code lines.

Related

C# locating where the * is in a string separated by pipes

how to add a sign between each letter in a string in C#?

Bitwise OR on strings for large strings in c#

Substrings and Char.Is/Number Confusion. [closed]

How to escape variables in C#?

Categories

Resources