Well, I know this can be very basic functionality of C#.
But I didn't used since years so Asking this....
I have a string like MyName-1_1#1233
I want to pick only the numbers/characters from between of - , _ and # ...
I can use split function, but it take quite big code...is there anything else ?
for picking the numbers from the string, I supposed to write like below code
string[] words = s.Split('-');
foreach (string word in words)
{
//getting two separate string and have to pick the number using index...
}
string[] words = s.Split('_');
foreach (string word in words)
{
//getting two separate string and have to pick the number using index...
}
string[] words = s.Split('#');
foreach (string word in words)
{
//getting two separate string and have to pick the number using index...
}
You can use regular expressions for this:
string S = "-1-2#123#3";
foreach (Match m in Regex.Matches(S, "(?<=[_#-])(\\d+)(?=[_#-])?"))
{
Console.WriteLine(m.Groups[1]);
}
A bit shorter:
List<char> badChars = new List<char>{'-','_','#'};
string str = "MyName-1_1#1233";
string output = new string(str.Where(ch => !badChars.Contains(ch)).ToArray());
output will be MyName111233
If you want only the numbers then:
string str = "MyName-1_1#1233";
string output = new string(str.Where(ch => char.IsDigit(ch)).ToArray());
output will be 111233
Related
I would like to split a string into a list or array by a specific tag.
<START><A>message<B>UnknownLengthOfText<BEOF><AEOF><A>message<B>UnknownLengthOfText<BEOF><AEOF><END>
I want to split the above example into two items, the items being the strings between the <A> and <AEOF> tags
Any help is appreciated.
I would suggest simple regex for this.
Take a look at this example:
using System.Diagnostics;
using System.Text.RegularExpressions;
...
Regex regex = new Regex("<A>(.*?)<B><BEOF>(.*?)<AEOF>");
string myString = #"<START><A>message<B><BEOF>UnknownLengthOfText<AEOF><A>message<B><BEOF>some other line of text<AEOF><END>";
MatchCollection matches = regex.Matches(myString);
foreach (Match m in matches)
{
Debug.WriteLine(m.Groups[1].ToString(), m.Groups[2].ToString());
}
EDIT:
Since string is in one line, regex should be "lazy", marked with lazy quantifier ?. Also, I changed regex so that it uses sTrenat's suggestion to automatically parse message and title also.
So, instead of
Regex regex = new Regex("<A>(.*)<AEOF>");
I used
Regex regex = new Regex("<A>(.*?)<B><BEOF>(.*?)<AEOF>");
Notice additional ? which marks lazy quantifier, to stop when it finds first match between tags (without ? whole strign will be captured and not n messages between tags)
Try it with something like this:
string test = #"<START>
<A>message<B><BEOF>UnknownLengthOfText<AEOF>
<A>message<B><BEOF>UnknownLengthOfText<AEOF>
<END>";
//for this test this will give u an array containing 3 items...
string[] tmp1 = test.Split("<AEOF>");
//here u will store your results in
List<string> results = new List<string>();
//for every single one of those 3 items:
foreach(string item in tmp1)
{
//this will only be true for the first and second item
if(item.Contains("<A>"))
{
string[] tmp2 = item.Split("<A>");
//As the string you are looking for is always BEHIND the <A> you
//store the item[1], (the item[0] would be in front)
results.Add(tmp2[1]);
}
}
Rather than using the String.Split you can use the Regex.Split as below
var stringToSplit = #"<START>
<A>message<B>UnknownLengthOfText<BEOF><AEOF>
<A>message<B>UnknownLengthOfText<BEOF><AEOF>
<END>";
var regex = "<A>(.*)<AEOF>";
var splitStrings = Regex.Split(stringToSplit, regex);
splitStrings will contain 4 elements
splitString[0] = "<START>"
splitString[1] = "message<B>UnknownLengthOfText<BEOF>"
splitString[2] = "message<B>UnknownLengthOfText<BEOF>"
splitString[3] = "<END>"
Playing with the regex could give you only the strings between and
All answer so far are regex based. Here is an alternative without:
Try it Online!
var input = #"
<START>
<A>message<B>UnknownLengthOfText<BEOF><AEOF>
<A>message<B>UnknownLengthOfText<BEOF><AEOF>
<END>";
var start = "<A>";
var end = "<AEOF>";
foreach (var item in ExtractEach(input, start, end))
{
Console.WriteLine(item);
}
}
public static IEnumerable<string> ExtractEach(string input, string start, string end)
{
foreach (var line in input
.Split(Environment.NewLine.ToCharArray())
.Where(x=> x.IndexOf(start) > 0 && x.IndexOf(start) < x.IndexOf(end)))
{
yield return Extract(line, start, end);
}
}
public static string Extract(string input, string start, string end)
{
int startPosition = input.LastIndexOf(start) + start.Length;
int length = input.IndexOf(end) - startPosition;
var substring = input.Substring(startPosition, length);
return substring;
}
I'm trying to remove all conjunctions and pronouns from any array of strings(let call that array A), The words to be removed are read from a text file and converted into an array of strings(lets call that array B).
What I need is to Get the first element of array A and compare it to every word in array B, if the word matches I want to delete the word of array A.
For example:
array A = [0]I [1]want [2]to [3]go [4]home [5]and [6]sleep
array B = [0]I [1]and [2]go [3]to
Result= array A = [0]want [1]home [2]sleep
//remove any duplicates,conjunctions and Pronouns
public IQueryable<All_Articles> removeConjunctionsProNouns(IQueryable<All_Articles> myArticles)
{
//get words to be removed
string text = System.IO.File.ReadAllText("A:\\EnterpriceAssigment\\EnterpriceAssigment\\TextFiles\\conjunctions&ProNouns.txt").ToLower();
//split word into array of strings
string[] wordsToBeRemoved = text.Split(',');
//all articles
foreach (var article in myArticles)
{
//split articles into words
string[] articleSplit = article.ArticleContent.ToLower().Split(' ');
//loop through array of articles words
foreach (var y in articleSplit)
{
//loop through words to be removed from articleSplit
foreach (var x in wordsToBeRemoved)
{
//if word of articles matches word to be removed, remove word from article
if (y == x)
{
//get index of element in array to be removed
int g = Array.IndexOf(articleSplit,y);
//assign elemnt to ""
articleSplit[g] = "";
}
}
}
//re-assign splitted article to string
article.ArticleContent = articleSplit.ToString();
}
return myArticles;
}
If it is possible as well, I need array A to have no duplicates/distinct values.
You are looking for IEnumerable.Except, where the passed parameter is applied to the input sequence and every element of the input sequence not present in the parameter list is returned only once
For example
string inputText = "I want this string to be returned without some words , but words should have only one occurence";
string[] excludedWords = new string[] {"I","to","be", "some", "but", "should", "have", "one", ","};
var splitted = inputText.Split(' ');
var result = splitted.Except(excludedWords);
foreach(string s in result)
Console.WriteLine(s);
// ---- Output ----
want
this
string
returned
without
words <<-- This appears only once
only
occurence
And applied to your code is:
string text = File.ReadAllText(......).ToLower();
string[] wordsToBeRemoved = text.Split(',');
foreach (var article in myArticles)
{
string[] articleSplit = article.ArticleContent.ToLower().Split(' ');
var result = articleSplit.Except(wordsToBeRemoved);
article.ArticleContent = string.Join(" ", result);
}
You may have your answer already in your code. I am sure your code could be cleaned up a bit, as all our code could be. You loop through articleSplit and pull out each word. Then compare that word to the words in the wordsToBeRemoved array in a loop one by one. You use your conditional to compare and when true you remove the items from your original array, or at least try.
I would create another array of the results and then display, use or what ever you'd like with the array minus the words to exclude.
Loop through articleSplit
foreach x in arcticle split
foreach y in wordsToBeRemoved
if x != y newArray.Add(x)
However this is quite a bit of work. You may want to use array.filter and then add that way. There is a hundred ways to achieve this.
Here are some helpful articles:
filter an array in C#
https://msdn.microsoft.com/en-us/library/d9hy2xwa(v=vs.110).aspx
These will save you from all that looping.
You want to remove stop words. You can do it with a help of Linq:
...
string filePath = #"A:\EnterpriceAssigment\EnterpriceAssigment\TextFiles\conjunctions & ProNouns.txt";
// Hashset is much more efficient than array in the context
HashSet<string> stopWords = new HashSet<string>(File
.ReadLines(filePath), StringComparer.OrdinalIgnoreCase);
foreach (var article in myArticles) {
// read article, split into words, filter out stop words...
var cleared = article
.ArticleContent
.Split(' ')
.Where(word => !stopWords.Contains(word));
// ...and join words back into article
article.ArticleContent = string.Join(" ", cleared);
}
...
Please, notice, that I've preserved Split() which you've used in your code and so you have a toy implementation. In real life you have at least to take punctuation into consideration, and that's why a better code uses regular expressions:
foreach (var article in myArticles) {
// read article, extract words, filter out stop words...
var cleared = Regex
.Matches(article.ArticleContent, #"\w+") // <- extract words
.OfType<Match>()
.Select(match => match.Value)
.Where(word => !stopWords.Contains(word));
// ...and join words back into article
article.ArticleContent = string.Join(" ", cleared);
}
I have the following text in an Excel spreadsheet cell:
"Calories (kcal) "
(minus quotes).
I can get the value of the cell into my code:
string nutrientLabel = dataRow[0].ToString().Trim();
I'm new to C# and need help in separating the "Calories" and "(kcal)" to to different variables that I can upload into my online system. I need the result to be two strings:
nutrientLabel = Calories
nutrientUOM = kcal
I've googled the hell out of this and found out how to make it work to separate them and display into Console.WriteLine but I need the values actually out to 2 variables.
foreach (DataRow dataRow in nutrientsdataTable.Rows)
{
string nutrientLabel = dataRow[0].ToString().Trim();
}
char[] paraSeparator = new char[] { '(', ')' };
string[] result;
Console.WriteLine("=======================================");
Console.WriteLine("Para separated strings :\n");
result = nutrientLabel.Split(paraSeparator,
StringSplitOptions.RemoveEmptyEntries);
foreach (string str in result)
{
Console.WriteLine(str);
}
You can use a simple regex for this:
var reg = new Regex(#"(?<calories>\d+)\s\((?<kcal>\d+)\)");
Which essentially says:
Match at least one number and store it in the group 'calories'
Match a space and an opening parenthesis
Match at least one number and store it in the group 'kcal'
Match a closing parenthesis
Then we can extract the results using the named groups:
var sampleInput = "15 (35)";
var match = reg.Match(sampleInput);
var calories = match.Groups["calories"];
var kcal = match.Groups["kcal"];
Note that calories and kcal are still strings here, you'll need to parse them into an integer (or decimal)
string [] s = dataRow[0].ToString().Split(' ');
nutrientLabel = s[0];
nutrientUOM = s[1].Replace(")","").Replace("(","");
I'm trying to work on this string
abc
def
--------------
efg
hij
------
xyz
pqr
--------------
Now I have to split the string with the - character.
So far I'm first spliting the string in lines and then finding the occurrence of - and the replacing the line with a single *, then combining the whole string and splitting them again.
I'm trying to get the data as
string[] set =
{
"abc
def",
"efg
hij",
"xyz
pqr"
}
Is there a better way to do this?
var spitStrings = yourString.Split(new char[] { '-' }, StringSplitOptions.RemoveEmptyEntries);
If i understand your question, this above code solves it.
Use of string split function using the specific char or string of chars (here -) can be used.
The output will be array of strings. Then choose whichever strings you want.
Example:
http://www.dotnetperls.com/split
I'm confused with exactly what you're asking, but won't this work?
string[] seta =
{
"abc\ndef",
"efg\nhij",
"xyz\npqr"
}
\n = CR (Carriage Return) // Used as a new line character in Unix
\r = LF (Line Feed) // Used as a new line character in Mac OS
\n\r = CR + LF // Used as a new line character in Windows
(char)13 = \n = CR // Same as \n
If I'm understanding your question about splitting -'s then the following should work.
string s = "abc-def-efg-hij-xyz-pqr"; // example?
string[] letters = s.Split(new char[] { '-' }, StringSplitOptions.RemoveEmptyEntries);
If this is what your array looks like at the moment, then you can loop through it as follows:
string[] seta = {
"abc-def",
"efg-hij",
"xyz-pqr"
};
foreach (var letter in seta)
{
string[] letters = letter.Split(new char[] { '-' }, StringSplitOptions.RemoveEmptyEntries);
// do something with letters?
}
I'm sure this below code will help you...
string m = "adasd------asdasd---asdasdsad-------asdasd------adsadasd---asdasd---asdadadad-asdadsa-asdada-s---adadasd-adsd";
var array = m.Split('-');
List<string> myCollection = new List<string>();
if (array.Length > 0)
{
foreach (string item in array)
{
if (item != "")
{
myCollection.Add(item);
}
}
}
string[] str = myCollection.ToArray();
if it does then don't forget to mark my answer thanks....;)
string set = "abc----def----------------efg----hij--xyz-------pqr" ;
var spitStrings = set.Split(new char[]{'-'},StringSplitOptions.RemoveEmptyEntries);
EDIT -
He wants to split the strings no matter how many '-' are there.
var spitStrings = set.Split(new char[]{'-'},StringSplitOptions.RemoveEmptyEntries);
This will do the work.
I'm wondering how I can replace (remove) multiple words (like 500+) from a string. I know I can use the replace function to do this for a single word, but what if I want to replace 500+ words? I'm interested in removing all generic keywords from an article (such as "and", "I", "you" etc).
Here is the code for 1 replacement.. I'm looking to do 500+..
string a = "why and you it";
string b = a.Replace("why", "");
MessageBox.Show(b);
Thanks
# Sergey Kucher Text size will vary between a few hundred words to a few thousand. I am replacing these words from random articles.
I would normally do something like:
// If you want the search/replace to be case sensitive, remove the
// StringComparer.OrdinalIgnoreCase
Dictionary<string, string> replaces = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase) {
// The format is word to be searched, word that should replace it
// or String.Empty to simply remove the offending word
{ "why", "xxx" },
{ "you", "yyy" },
};
void Main()
{
string a = "why and you it and You it";
// This will search for blocks of letters and numbers (abc/abcd/ab1234)
// and pass it to the replacer
string b = Regex.Replace(a, #"\w+", Replacer);
}
string Replacer(Match m)
{
string found = m.ToString();
string replace;
// If the word found is in the dictionary then it's placed in the
// replace variable by the TryGetValue
if (!replaces.TryGetValue(found, out replace))
{
// otherwise replace the word with the same word (so do nothing)
replace = found;
}
else
{
// The word is in the dictionary. replace now contains the
// word that will substitute it.
// At this point you could add some code to maintain upper/lower
// case between the words (so that if you -> xxx then You becomes Xxx
// and YOU becomes XXX)
}
return replace;
}
As someone else wrote, but without problems with substrings (the ass principle... You don't want to remove asses from classes :-) ), and working only if you only need to remove words:
var escapedStrings = yourReplaces.Select(Regex.Escape);
string result = Regex.Replace(yourInput, #"\b(" + string.Join("|", escapedStrings) + #")\b", string.Empty);
I use the \b word boundary... It's a little complex to explain what it's, but it's useful to find word boundaries :-)
Create a list of all text you want and load it into a list, you do this fairly simple or get very complex. A trivial example would be:
var sentence = "mysentence hi";
var words = File.ReadAllText("pathtowordlist.txt").Split(Enviornment.NewLine);
foreach(word in words)
sentence.replace("word", "x");
You could create two lists if you wanted a dual mapping scheme.
Try this:
string text = "word1 word2 you it";
List<string> words = new System.Collections.Generic.List<string>();
words.Add("word1");
words.Add("word2");
words.ForEach(w => text = text.Replace(w, ""));
Edit
If you want to replace text with another text, you can create class Word:
public class Word
{
public string SearchWord { get; set; }
public string ReplaceWord { get; set; }
}
And change above code to this:
string text = "word1 word2 you it";
List<Word> words = new System.Collections.Generic.List<Word>();
words.Add(new Word() { SearchWord = "word1", ReplaceWord = "replaced" });
words.Add(new Word() { SearchWord = "word2", ReplaceWord = "replaced" });
words.ForEach(w => text = text.Replace(w.SearchWord, w.ReplaceWord));
if you are talking about a single string the solution is to remove them all by a simple replace method. as you can read there:
"Returns a new string in which all occurrences of a specified string in the current instance are replaced with another specified string".
you may be needing to replace several words, and you can make a list of these words:
List<string> wordsToRemove = new List<string>();
wordsToRemove.Add("why");
wordsToRemove.Add("how);
and so on
and then remove them from the string
foreach(string curr in wordsToRemove)
a = a.ToLower().Replace(curr, "");
Importent
if you want to keep your string as it was, without lowering words and without struggling with lower and upper case use
foreach(string curr in wordsToRemove)
// You can reuse this object
Regex regex = new Regex(curr, RegexOptions.IgnoreCase);
myString = regex.Replace(myString, "");
depends on the situation ofcourse,
but if your text is long and you have many words,
and you want optimize performance.
you should build a trie from the words, and search the Trie for a match.
it won't lower the Order of complexity, still O(nm), but for large groups of words, it will be able to check multiple words against each char instead of one by one.
i can assume couple of houndred words should be enough to get this faster.
This is the fastest method in my opinion and
i written a function for you to start with:
public struct FindRecord
{
public int WordIndex;
public int PositionInString;
}
public static FindRecord[] FindAll(string input, string[] words)
{
LinkedList<FindRecord> result = new LinkedList<FindRecord>();
int[] matchs = new int[words.Length];
for (int i = 0; i < input.Length; i++)
{
for (int j = 0; j < words.Length; j++)
{
if (input[i] == words[j][matchs[j]])
{
matchs[j]++;
if(matchs[j] == words[j].Length)
{
FindRecord findRecord = new FindRecord {WordIndex = j, PositionInString = i - matchs[j] + 1};
result.AddLast(findRecord);
matchs[j] = 0;
}
}
else
matchs[j] = 0;
}
}
return result.ToArray();
}
Another option:
it might be the rare case where regex will be faster then building the code.
Try using
public static string ReplaceAll(string input, string[] words)
{
string wordlist = string.Join("|", words);
Regex rx = new Regex(wordlist, RegexOptions.Compiled);
return rx.Replace(input, m => "");
}
Regex can do this better, you just need all the replace words in a list, and then:
var escapedStrings = yourReplaces.Select(PadAndEscape);
string result = Regex.Replace(yourInput, string.Join("|", escapedStrings);
This requires a function that space-pads the strings before escaping them:
public string PadAndEscape(string s)
{
return Regex.Escape(" " + s + " ");
}