Finding symbol in text c#

Finding symbol in text c# - c#

I am trying to retrieve from text one of the symbols like this: "<", "=", "<=", ">", ">=", "<>", "¬=".
Example text might look like "> 10 dalk tasd ... " or
" >= 10 asdasdasd ..". There could be a lot of whitespace characters.
I am trying do something like the below, but it doesn't work:
string sign = new string(textCh.SkipWhile(c => !Char.IsSymbol('>') || !Char.IsSymbol('=') || !Char.IsSymbol('<') || !Char.IsSymbol('¬'))
.TakeWhile(c => Char.IsSymbol('=') || Char.IsSymbol('>')).ToArray());
How can i retrieve it ?

You don't want to SkipWhile(criterion OR criterion OR criterion) because a character you want to take can only be one of <, =, or >, and the criteria corresponding to the characters it is not will be true and the character will be skipped.
You could change the SkipWhile criteria to be &&, or you could use a Regex.
var sign = System.Text.RegularExpressions.Regex.Match("<[>=]?|=|>=?|¬=").Value;

To extract the first such symbol I would use
Regex.Match(myString, "<[>=]?|=|>=?|!=").Value

string example = "avasvasv>asfascvd<hrthtjh";
int firstIndex = example.IndexOfAny(new char[] { '>', '<', '-', '=' });
int lastIndex = example.Substring(firstIndex + 1).IndexOfAny(new char[] { '=', '>'});
string outPutExample = example.Substring(firstIndex + 1).Substring(0, lastIndex); // OutPut -> asfascvd

Related

Is there a way to get a symbol of a non-printable character?

I want to find a way to get a symbol of a non-printable character in c# (e.g. "SOH" for start of heading and "BS" for backspace). Any ideas?
Edit: I don't need to visualize a byte value of a non-printable character but it's code as shown here https://web.itu.edu.tr/sgunduz/courses/mikroisl/ascii.html
Example would be "NUL" for 0x00, "SOH" for 0x01 etc.

You, probably, are looking for a kind of string dump in order to visualize control characters. You can do it with a help of Regular Expressions where \p{Cc} matches control symbol:
using Systen.Text.RegularExpressions;
...
string source = "BEL \u0007 then CR + LF \r\n SOH \u0001 \0\0";
// To get control characters visible, we match them and
// replace with their codes
string result = Regex.Replace(
source, #"\p{Cc}",
m => $"[Control: 0x{(int)m.Value[0]:x4}]");
// Let's have a look:
// Initial string
Console.WriteLine(source);
Console.WriteLine();
// Control symbols visualized
Console.WriteLine(result);
Outcome:
BEL then CR + LF
SOH
BEL [Control: 0x0007] then CR + LF [Control: 0x000d][Control: 0x000a] SOH [Control: 0x0001] [Control: 0x0000][Control: 0x0000]
Edit: If you want to visualize in a different way, you shoud edit lambda
m => $"[Control: 0x{(int)m.Value[0]:x4}]"
For instance:
static string[] knownCodes = new string[] {
"NULL", "SOH", "STX", "ETX", "EOT", "ENQ",
"ACK", "BEL", "BS", "HT", "LF", "VT",
"FF", "CR", "SO", "SI", "DLE", "DC1", "DC2",
"DC3", "DC4", "NAK", "SYN", "ETB", "CAN",
"EM", "SUB", "ESC", "FS", "GS", "RS", "US",
};
private static string StringDump(string source) {
if (null == source)
return source;
return Regex.Replace(
source,
#"\p{Cc}",
m => {
int code = (int)(m.Value[0]);
return code < knownCodes.Length
? $"[{knownCodes[code]}]"
: $"[Control 0x{code:x4}]";
});
}
Demo:
Console.WriteLine(StringDump(source));
Outcome:
BEL [BEL] then CR + LF [CR][LF] SOH [SOH] [NULL][NULL]

In Visual Studio just displaying the SOH character (U+0001) for example and than encode it like this:
var bytes = Encoding.UTF8.GetBytes("☺");
And now you can do whatever you like with it. For Backspace use U+232B

Find in the List of words with letters in certain positions

I'm doing a crossword puzzle maker. The user selects cells for words, and the program compiles a crossword puzzle from the dictionary (all words which can be used in the crossword) - List<string>.
I need to find a word (words) in a dictionary which matches given mask (pattern).
For example, I need to find all words which match
#a###g
pattern, i.e. all words of length 6 in the dictionary with "a" at index 1 and "g" at index 5
The number of letters and their position are unknown in advance
How do I realize this?

You can convert word description (mask)
#a###g
into corresponding regular expression pattern:
^\p{L}a\p{L}{3}g$
Pattern explained:
^ - anchor, word beginning
\p{L} - arbitrary letter
a - letter 'a'
\p{L}{3} - exactly 3 arbitrary letters
g - letter 'g'
$ - anchor, word ending
and then get all words from dictionary which match this pattern:
Code:
using System.Linq;
using System.Text.RegularExpressions;
...
private static string[] Variants(string mask, IEnumerable<string> availableWords) {
Regex regex = new Regex("^" + Regex.Replace(mask, "#*", m => #$"\p{{L}}{{{m.Length}}}") + "$");
return availableWords
.Where(word => regex.IsMatch(availableWords))
.OrderBy(word => word)
.ToArray();
}
Demo:
string[] allWords = new [] {
"quick",
"brown",
"fox",
"jump",
"rating",
"coding"
"lazy",
"paring",
"fang",
"dog",
};
string[] variants = Variants("#a###g", allWords);
Console.Write(string.Join(Environment.NewLine, variants));
Outcome:
paring
rating

I need to find a word in a list with "a" at index 1 and "g" at index 5, like the following
wordList.Where(word => word.Length == 6 && word[1] == 'a' && word[5] == 'g')
The length check first will be critical to preventing a crash, unless your words are arranged into different lists by length..
If you mean that you literally will pass "#a###g" as the parameter that conveys the search term:
var term = "#a###g";
var search = term.Select((c,i) => (Chr:c,Idx:i)).Where(t => t.Chr != '#').ToArray();
var words = wordList.Where(word => word.Length == term.Length && search.All(t => word[t.Idx] == t.Chr));
How it works:
Take "#a###g" and project it to a sequence of the index of the char and the char itself, so ('#', 0),('a', 1),('#', 2),('#', 3),('#', 4),('g', 5)
Discard the '#', leaving only ('a', 1),('g', 5)
This means "'a' at position 1 and 'g' at 5"
Search the wordlist demanding that the word length is same as "#a###g", and also that All the search terms match when we "get the char out of the word at Idx and check it matches the Chr in the search term

Regular expression replace (C#)

How to make Regex.Replace for the following texts:
1) "Name's", "Sex", "Age", "Height_(in)", "Weight (lbs)"
2) " LatD", "LatM ", 'LatS', "NS", "LonD", "LonM", "LonS", "EW", "City", "State"
Result:
1) Name's, Sex, Age, Height (in), Weight (lbs)
2) LatD, LatM, LatS, NS, LonD, LonM, LonS, EW, City, State
Spaces between brackets can be any size (Example 1). There may also be incorrect spaces in brackets (Example 2). Also, instead of spaces, the "_" sign can be used (Example 1). And instead of double quotes, single quotes can be used (Example 2).
As a result, words must be separated with a comma and a space.
Snippet of my code
StreamReader fileReader = new StreamReader(...);
var fileRow = fileReader.ReadLine();
fileRow = Regex.Replace(fileRow, "_", " ");
fileRow = Regex.Replace(fileRow, "\"", "");
var fileDataField = fileRow.Split(',');

I don't well know C# syntax, but this regex does the job:
Find: (?:_|^["']\h*|\h*["']$|\h*["']\h*,\h*["']\h*)
Replace: A space
Explanation:
(?: # non capture group
_ # undersscore
| # OR
^["']\h* # beginning of line, quote or apostrophe, 0 or more horizontal spaces
| # OR
\h*["']$ # 0 or more horizontal spaces, quote or apostrophe, end of line
| # OR
\h*["']\h* # 0 or more horizontal spaces, quote or apostrophe, 0 or more horizontal spaces
, #
\h*["']\h* # 0 or more horizontal spaces, quote or apostrophe, 0 or more horizontal spaces
) # end group
Demo

How about a simple straight string manipulation way?
using System;
using System.Linq;
static void Main(string[] args)
{
string dirty1 = "\"Name's\", \"Sex\", \"Age\", \"Height_(in)\", \"Weight (lbs)\"";
string dirty2 = "\" LatD\", \"LatM \", 'LatS', \"NS\", \"LonD\", \"LonM\", \"LonS\", \"EW\", \"City\", \"State\"";
Console.WriteLine(Clean(dirty1));
Console.WriteLine(Clean(dirty2));
Console.ReadKey();
}
private static string Clean(string dirty)
{
return dirty.Split(',').Select(item => item.Trim(' ', '"', '\'')).Aggregate((a, b) => string.Join(", ", a, b));
}
private static string CleanNoLinQ(string dirty)
{
string[] items = dirty.Split(',');
for(int i = 0; i < items.Length; i++)
{
items[i] = items[i].Trim(' ', '"', '\'');
}
return String.Join(", ", items);
}
You can even replace the LinQ with a foreach and then string.Join().
Easier to understand - easier to maintain.

C# How to split text, but without removing delimiter?

I wanna split text by mathematical symbols [(),-,+,/,*,^].
For eg. "(3*21)+4/2" should make array {"(","3","*","21",")","+","4","/","2"}
I was trying do that by regex.split but brackets are problematic.

You can run through source string, adding to array cell if current value is a number, or moving to next array cell if not ([,*,-, etc...).

Not sure what problem you encountered with Regex.Split, but it seems quite simple. All you have to do is escape the character that have special meaning in regex. Like so:
string input = "(3*21+[3-5])+4/2";
string pattern = #"(\()|(\))|(\d+)|(\*)|(\+)|(-)|(/)|(\[)|(\])";
var result = Regex.Matches(input, pattern);
var result2 = Regex.Split(input, pattern);
Edit: updated pattern, '-' and '/' don't have to be escaped.
Afterwards you got 2 options: first one is using Split, it will make string array, but in between every match will be empty string. That's why I think you should go for Matches and transforming it to array of string is simple afterwards.
string[] stringResult = (from Match match in result select match.Value).ToArray();
stringResult
{string[15]}
[0]: "("
[1]: "3"
[2]: "*"
[3]: "21"
[4]: "+"
[5]: "["
[6]: "3"
[7]: "-"
[8]: "5"
[9]: "]"
[10]: ")"
[11]: "+"
[12]: "4"
[13]: "/"
[14]: "2"

I really think something like this will come in handy..
First, use getline and take all the input or if u already have a string, store it.
string input = Console.ReadLine();
Then create an array of length string.length...
string[] arr = new string[input.Length];
//Make sure ur input doesnt have spaces
Then store each value of the array to the value of string!! Like this
str[0]=arr[0];
This should work properly do this for all the characters or could use a for loop..
for(int i=0;i<input.Length;i++){
str[i]=arr[i];
}
That's it ...

C# string.IsNullOrWhiteSpace("\t") == true

I have a line of code
var delimiter = string.IsNullOrWhiteSpace(foundDelimiter) ? "," : foundDelimiter;
when foundDelimiter is "\t", string.IsNullOrWhiteSpace returns true.
Why? And what is the approriate way to work around this?

\t is the tab character, which is whitespace. In C# can do either of these to get a tab:
var tab1 = "\t";
var tab2 = " ";
var areEqual = tab1 == tab2; //returns true
Edit: As noted by Magus, SO is converting my tab character into spaces when the answer gets rendered. If you're in your IDE you'd just hit quote, tab, quote.
As far as a workaround goes, I'd suggest you just add a check for tabs in your conditional.
var delimiter = string.IsNullOrWhiteSpace(foundDelimiter) && foundDelimiter != "\t" ? "," : foundDelimiter;

Welcome to Unicode.
What did you expect would happen? HT (horizontal tab) has been a whitespace character for decades. The "classic" C-language definition of white-space characters consists of the US-ASCII characters:
SP: space (0x20,' ')
HT: horizontal tab (0x09,'\t')
LF: line feed (0x0A, '\n')
VT: vertical tab (0x0B, '\v')
FF: vertical tab (0x0C, '\f')
CR: carriage return (0x0C, '\r')
Unicode is a little more...ecumenical in its approach: its definition of white-space characters is this set:
Members of the Unicode category SpaceSeparator:
SPACE (U+0020)
OGHAM SPACE MARK (U+1680)
MONGOLIAN VOWEL SEPARATOR (U+180E)
EN QUAD (U+2000)
EM QUAD (U+2001)
EN SPACE (U+2002)
EM SPACE (U+2003)
THREE-PER-EM SPACE (U+2004)
FOUR-PER-EM SPACE (U+2005)
SIX-PER-EM SPACE (U+2006)
FIGURE SPACE (U+2007)
PUNCTUATION SPACE (U+2008)
THIN SPACE (U+2009)
HAIR SPACE (U+200A)
NARROW NO-BREAK SPACE (U+202F)
MEDIUM MATHEMATICAL SPACE (U+205F)
IDEOGRAPHIC SPACE (U+3000)
Members of the Unicode category LineSeparator, which consists solely of
LINE SEPARATOR (U+2028)
Member of the Unicode category ParagraphSeparator, which consists solely of
PARAGRAPH SEPARATOR (U+2029)
These Basic Latin/C0 Controls/US-ASCII characters:
CHARACTER TABULATION (U+0009)
LINE FEED (U+000A)
LINE TABULATION (U+000B)
FORM FEED (U+000C)
CARRIAGE RETURN (U+000D)
These C1 Controls and Latin-1 Supplement characters
NEXT LINE (U+0085)
NO-BREAK SPACE (U+00A0)
If you don't like the definition, roll your own along these lines (plug in your own character set):
public static bool IsNullOrCLanguageWhitespace( this string s )
{
bool value = ( s == null || rxWS.IsMatch(s) ) ;
return value ;
}
private static Regex rxWS = new Regex( #"^[ \t\n\v\f\r]*$") ;
You might want to add a char analog as well:
public static bool IsCLanguageWhitespace( this char c )
{
bool value ;
switch ( c )
{
case ' ' : value = true ; break ;
case '\t' : value = true ; break ;
case '\n' : value = true ; break ;
case '\v' : value = true ; break ;
case '\f' : value = true ; break ;
case '\r' : value = true ; break ;
default : value = false ; break ;
}
return value ;
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Finding symbol in text c# - c#

To extract the first such symbol I would use Regex.Match(myString, "<[>=]?|=|>=?|!=").Value

Related

Is there a way to get a symbol of a non-printable character?

Find in the List of words with letters in certain positions

Regular expression replace (C#)

C# How to split text, but without removing delimiter?

C# string.IsNullOrWhiteSpace("\t") == true

Categories

Resources