String split by every 3 words - c#

I've got a problem. I need to split my every string like this:
For example:
"Economic drive without restrictions"
I need array with sub string like that:
"Economic drive without"
"drive without restrictions"
For now i have this:
List<string> myStrings = new List<string>();
foreach(var text in INPUT_TEXT) //here is Economic drive without restrictions
{
myStrings.DefaultIfEmpty();
var textSplitted = text.Split(new char[] { ' ' });
int j = 0;
foreach(var textSplit in textSplitted)
{
int i = 0 + j;
string threeWords = "";
while(i != 3 + j)
{
if (i >= textSplitted.Count()) break;
threeWords = threeWords + " " + textSplitted[i];
i++;
}
myStrings.Add(threeWords);
j++;
}
}

You could use this LINQ query:
string text = "Economic drive without restrictions";
string[] words = text.Split();
List<string> myStrings = words
.Where((word, index) => index + 3 <= words.Length)
.Select((word, index) => String.Join(" ", words.Skip(index).Take(3)))
.ToList();
Because others commented that it would be better to show a loop version since OP is learning this language, here is a version that uses no LINQ at all:
List<string> myStrings = new List<string>();
for (int index = 0; index + 3 <= words.Length; index++)
{
string[] slice = new string[3];
Array.Copy(words, index, slice, 0, 3);
myStrings.Add(String.Join(" ", slice));
}

I try to give a simple solution. So i hope you can better understand it.
List<string> myStrings = new List<string>();
string input = "Economic drive without restrictions";
var allWords = input.Split(new char[] {' '});
for (int i = 0; i < allWords.Length - 2; i++)
{
var textSplitted = allWords.Skip(i).Take(3);
string threeString = string.Join(" ", textSplitted);
myStrings.Add(threeString);
}
foreach (var myString in myStrings)
{
Console.WriteLine(myString);
}
The method Take(n) is from Linq. It takes the first n elements of the given array. for example if you have an array a,b,c,d,e then Take(3) will give you a new array a,b,c.
The method Skip(n) is from Linq. It gives you the new array by skipping first n elements. given array a,b,c,d,e then Skip(1) will return b,c,d,e. as you can see it skipped the first elements.
Now with this two methods you can move on array 3 by 3 and get the words you want.

Just for comparative purposes, here's another solution that doesn't use Linq:
string[] words = INPUT_TEXT.Split();
List<string> myStrings = new List<string>();
for (int i = 0; i < words.Length - 2; ++i)
myStrings.Add(string.Join(" ", words[i], words[i+1], words[i+2]));
Or using ArraySegment<string>:
string[] words = INPUT_TEXT.Split();
List<string> myStrings = new List<string>();
for (int i = 0; i < words.Length - 2; ++i)
myStrings.Add(string.Join(" ", new ArraySegment<string>(words, i, 3)));

I would use one of the methods described here ; for instance the following that takes the elements 3 by 3.
var groups = myStrings.Select((p, index) => new {p,index})
.GroupBy(a =>a.index/3);
Warning, it is not the most memory efficient, if you start parsing big strings, it might blow up on you. Try and observe.
Then you only need to handle the last element. If it has less than 3 strings, fill it up from the left.

Related

Remove Identical Words from a string array

The goal is to remove a certain prefix word from a string in string array example: ["Market1", "Market2", "Market3"]. The prefix word Market is dominant in string array, so we have to remove Market from string array so the result should be ["1", "2", "3"]. Please take note that the Market prefix word in string could be anything.
Look for the first character that is not identical among all strings and select a substring starting at that position to remove the prefix.
string[] words = new string[] { "Market1", "Market2", "Market3" };
int i = 0;
while (words.All(word => word.Length > i && word[i] == words[0][i])) ++i;
var wordsWithoutPrefixes = words.Select(word => word.Substring(i)).ToArray();
Make a delimited string and then replace all the Market with an empty string and then split the string to an array.
string[] arr = new string[] { "Market1", "Market2", "Market3" };
string[] result = string.Join(".", arr).Replace("Market", "").Split('.');
Loop through each item in the array and for each item chop off the beginning the start matches.
var commonPrefix = "Market";
for (int i = 0; i < arr.length, i++) {
if(arr[i].IndexOf(commonPrefix) == 0) {
arr[i] = arr[i].Substring(commonPrefix.Length);
}
}
You can use LINQ:
string[] myArray = ["Market1", "Market2", "Market3"];
string prefix = myArray[0];
foreach (var s in myArray)
{
while (!s.StartsWith(prefix))
prefix = prefix.Substring(0, prefix.Length - 1);
}
string[] result = myArray
.Select(s => s.Substring(prefix.Length))
.ToArray();
Loop through the array of string and replace the substring containing prefix with an empty substring.
string[] s=new string[]{"Market1","Market2","Market3"};
string prefix="Market";
foreach(var x in s)
{
if(x.Contains(prefix))
{
x=x.Replace(prefix,"");
}
}

Detecting and modifying ListBox entries that contain digits

My program has about 25 entries, most of them string only. However, some of them are supposed to have digits in them, and I don't need those digits in the output (output should be string only). So, how can I "filter out" integers from strings?
Also, if I have integers, strings AND chars, how could I do it (for example, one ListBox entry is E#2, and should be renamed to E# and then printed as output)?
Assuming that your entries are in a List<string>, you can loop through the list and then through each character of each entry, then check if it is a number and remove it. Something like this:
List<string> list = new List<string>{ "abc123", "xxx111", "yyy222" };
for (int i = 0; i < list.Count; i++) {
var no_numbers = "";
foreach (char c in list[i]) {
if (!Char.IsDigit(c))
no_numbers += c;
}
list[i] = no_numbers;
}
This only removes digits as it seems you wanted from your question. If you want to remove all other characters except letters, you can change the logic a bit and use Char.IsLetter() instead of Char.IsDigit().
You can remove all numbers from a strings with this LINQ solution:
string numbers = "Ho5w ar7e y9ou3?";
string noNumbers = new string(numbers.Where(c => !char.IsDigit(c)).ToArray());
noNumbers = "How are you?"
But you can also remove all numbers from a string by using a foreach loop :
string numbers = "Ho5w ar7e y9ou3?";
List<char> noNumList = new List<char>();
foreach (var c in numbers)
{
if (!char.IsDigit(c))
noNumList.Add(c);
}
string noNumbers = string.Join("", noNumList);
If you want to remove all numbers from strings inside a collection :
List<string> myList = new List<string>() {
"Ho5w ar7e y9ou3?",
"W9he7re a3re y4ou go6ing?",
"He2ll4o!"
};
List<char> noNumList = new List<char>();
for (int i = 0; i < myList.Count; i++)
{
foreach (var c in myList[i])
{
if(!char.IsDigit(c))
noNumList.Add(c);
}
myList[i] = string.Join("", noNumList);
noNumList.Clear();
}
myList Output :
"How are you?"
"Where are you going?"
"Hello!"
I don't know exactly what is your scenario, but given a string, you can loop through its characters, and if it's a number, discard it from output.
Maybe this is what you're looking for:
string entry = "E#2";
char[] output = new char[entry.Length];
for(int i = 0, j =0; i < entry.Length ; i++)
{
if(!Char.IsDigit(entry[i]))
{
output[j] = entry[i];
j++;
}
}
Console.WriteLine(output);
I've tried to give you a simple solution with one loop and two index variables, avoiding string concatenations that can make performance lacks.
See this example working at C# Online Compiler
If i am not wrong,maybe this is how your list looks ?
ABCD123
EFGH456
And your expected output is :
ABCD
EFGH
Is that correct?If so,assuming that it's a List<string>,then you can use the below code :
list<string> mylist = new list<string>;
foreach(string item in mylist)
{
///To get letters/alphabets
var letters = new String(item.Where(Char.IsLetter).ToArray());
///to get special characters
var letters = new String(item.Where(Char.IsSymbol).ToArray())
}
Now you can easily combine the codes :)

Replace string with multiple different options

Hi there wonderful people of stackOverFlow.
I am currently in a position where im totaly stuck. What i want to be able to do is take out a word from a text and replace it with a synonym. I thought about it for a while and figured out how to do it if i ONLY have one possible synonym with this code.
string pathToDesk = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
string text = System.IO.File.ReadAllText(pathToDesk + "/Text.txt");
string replacementsText = System.IO.File.ReadAllText(pathToDesk + "/Replacements.txt");
string wordsToReplace = System.IO.File.ReadAllText(pathToDesk + "/WordsToReplace.txt");
string[] words = text.Split(' ');
string[] reWords = wordsToReplace.Split(' ');
string[] replacements = replacementsText.Split(' ');
for(int i = 0; i < words.Length; i++) {//for each word
for(int j = 0; j < replacements.Length; j++) {//compare with the possible synonyms
if (words[i].Equals(reWords[j], StringComparison.InvariantCultureIgnoreCase)) {
words[i] = replacements[j];
}
}
}
string newText = "";
for(int i = 0; i < words.Length; i++) {
newText += words[i] + " ";
}
txfInput.Text = newText;
But lets say that we were to get the word hi. Then i want to be able to replace that with {"Hello","Yo","Hola"}; (For example)
Then my code will not be good for anything since they will not have the same position in the arrays.
Is there any smart solution to this I would really like to know.
you need to store your synonyms differently
in your file you need something like
hello yo hola hi
awesome fantastic great
then for each line, split the words, put them in an array array of arrays
Now use that to find replacement words
This won't be super optimized, but you can easily index each word to a group of synonyms as well.
something like
public class SynonymReplacer
{
private Dictionary<string, List<string>> _synonyms;
public void Load(string s)
{
_synonyms = new Dictionary<string, List<string>>();
var lines = s.Split(new[] {'\r', '\n'}, StringSplitOptions.RemoveEmptyEntries);
foreach (var line in lines)
{
var words = line.Split(new[] {' '}, StringSplitOptions.RemoveEmptyEntries).ToList();
words.ForEach(word => _synonyms.Add(word, words));
}
}
public string Replace(string word)
{
if (_synonyms.ContainsKey(word))
{
return _synonyms[word].OrderBy(a => Guid.NewGuid())
.FirstOrDefault(w => w != word) ?? word;
}
return word;
}
}
The OrderBy gets you a random synonym...
then
var s = new SynonymReplacer();
s.Load("hi hello yo hola\r\nawesome fantastic great\r\n");
Console.WriteLine(s.Replace("hi"));
Console.WriteLine(s.Replace("ok"));
Console.WriteLine(s.Replace("awesome"));
var words = new string[] {"hi", "you", "look", "awesome"};
Console.WriteLine(string.Join(" ", words.Select(s.Replace)));
and you get :-
hello
ok
fantastic
hello you look fantastic
Your first task will be to build a list of words and synonyms. A Dictionary will be perfect for this. The text file containing this list might look like this:
word1|synonym11,synonym12,synonym13
word2|synonym21,synonym22,synonym23
word3|synonym31,synonym32,synonym33
Then you can construct the dictionary like this:
public Dictionary<string, string[]> GetSynonymSet(string synonymSetTextFileFullPath)
{
var dict = new Dictionary<string, string[]>();
string line;
// Read the file and display it line by line.
using (var file = new StreamReader(synonymSetTextFileFullPath))
{
while((line = file.ReadLine()) != null)
{
var split = line.Split('|');
if (!dict.ContainsKey(split[0]))
{
dict.Add(split[0], split[1].Split(','));
}
}
}
return dict;
}
The eventual code will look like this
public string ReplaceWordsInText(Dictionary<string, string[]> synonymSet, string text)
{
var newText = new StringBuilder();
string[] words = text.Split(' ');
for (int i = 0; i < words.Length; i++) //for each word
{
string[] synonyms;
if (synonymSet.TryGetValue(words[i], out synonyms)
{
// The exact synonym you wish to use is up to you.
// I will just use the first one
words[i] = synonyms[0];
}
newText.AppendFormat("{0} ", words[i]);
}
return newText.ToString();
}

Insert blank strings into array based on index

Let's say I have an array for example
string[] A = {"1","2","3","4","5"}
I want the array to be of size 10, and want to insert blank strings after a certain index.
For example, I could make it size 10 and insert strings after index 3 which would result in
A = {"1","2","3","4","","","","","","5"}
Basically the elements after the given index will be pushed to the end and blank strings will take the empty space in between.
This is what I tried but it only adds one string and doesnt exactly set a size for the array
var foos = new List<string>(A);
foos.Insert(33, "");
foos[32] = "";
A = foos.ToArray();
You can use InsertRange
var l = new List<string>{"1","2","3","4","5"};
l.InsertRange(3, new string[10 - l.Count]);
foreach(var i in l)
Console.WriteLine(i);
Note: The above doesn't populate with empty strings but null values, but you can easily modify the new string[] being used to be populated with your desired default.
For example; see How to populate/instantiate a C# array with a single value?
Here is LINQ based approach:
public string[] InsertToArray(string[] array, int index, int newArrayCapacity)
{
var firstPart = array.Take(index + 1);
var secondPart = array.Skip(index + 1);
var newPart = Enumerable.Repeat(String.Empty, newArrayCapacity - array.Length);
return firstPart.Concat(newPart).Concat(secondPart).ToArray();
}
Here is the usage of the method:
string[] A = {"1","2","3","4","5"};
// Insert 5 elements (so that A.Length will be 10) in A after 3rd element
var result = InsertToArray(A, 3, 10);
Added: see Sayse's answer, really the way to go
Since arrays are fixed sized collections, you can't resize an array.What you need to do is to split your array elements, you need to get the elements before and after the specified index,you can do that by using Skip and Take methods, then you need to generate a sequence of empty strings and put them together:
string[] A = {"1","2","3","4","5"};
int index = 3;
var result = A.Take(index + 1)
.Concat(Enumerable.Repeat("", 10 - A.Length))
.Concat(A.Skip(index+1))
.ToArray();
If you don't mind using a list instead of an array, you can do it this way:
int index = 3;
int numberOfBlanksToInsert = 5;
List<string> strings = new List<string>();
for (int i = 0; i < numberOfBlanksToInsert; i++)
{
strings.Insert(index, "");
}
You can also output this to an array when you're done:
string[] A = strings.ToArray();
static string[] InsertRange(string[] initialValues, int startIndex, int count, string toInsert)
{
string[] result = new string[initialValues.Length + count];
for (int i = 0; i < initialValues.Length + count; i++)
result[i] = i < startIndex ? initialValues[i] : i >= startIndex + count ? initialValues[i - count] : toInsert;
return result;
}
Usage : InsertRange(A, 4, 5, "hello");
Output : "1, 2, 3, 4, hello, hello, hello, hello, hello, 5"

Replacing words in a body of text from array

So I am trying to get my code to read all the text from a text box called txtBody for now, and to check them against listA, if any of the words from listA appear I would like to replace those words with the appropriate one from listB. How can I do this?
For reference list A and B are from a CSV, ListA being column one and list B column 2, therefore listA[1] is the counter part of listB[1].
This is the code I have for the lists
string body = txtBody.Text;
var reader = new StreamReader(File.OpenRead("textwords.csv"));
List<string> listA = new List<string>();
List<string> listB = new List<string>();
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
var values = line.Split(',');
listA.Add(values[0]);
listB.Add(values[1]);
}
thanks for any help
In the most simplest form, you can do:
for(int i = 0; i < listA.Count; i++)
body = body.Replace(listA[i], listB[i]);
However, if you have a word like is in listA, then something like this would be partially replaced.
UPDATE
If you want each word to be surrounded by spaces, you could add that:
for (int i = 0; i < listA.Count; i++)
{
var word1 = string.Format(#"(\b){0}(\b)", listA[i]);
var word2 = string.Format(#"$1{0}$2", listB[i]);
body = Regex.Replace(body, word1, word2, RegexOptions.IgnoreCase);
}
the regex will match the word, with spaces on either side, and replace it with the new word, keeping the spacing the same.
I would suggest using a regex.
for(int i = 0; i < listA.Count; i++)
{
Regex myRege = new Regex(listA[i]);
body = Regex.Replace(body,listB[i];
}
If an entire word match is desired:
for(int i = 0; i < listA.Count; i++)
{
Regex myRege = new Regex(" " + listA[i] + " ");
body = Regex.Replace(body,listB[i];
}
Using Linq you can have sth like this:
string[] bodyWords = body.Split(' ');
// For each word contained inside body (s in bodyWords), if there is a match
// in listA (i != 0), then get counterpart from listB (listB[i]), otherwise
// leave it as it is. Use Join to reconstruct result string.
string result = string.Join(" ", (from s in bodyWords
let i = listA.FindIndex(x => x == s)
select i == -1 ? s : listB[i]).ToArray());
With this a input:
string body = "John Smith";
List<string> listA = new List<string>() { "John", "Two", "Smith", "Four", "Five" };
List<string> listB = new List<string>() { "Martin", "TwoB", "Jones", "FourB", "FiveB"};;
the above code yields as result: "Martin Jones"
Performance-wise this might not be the best possible solution, but it is always fun doing things in linq! :=)

Categories

Resources