Replacing words in a body of text from array

Replacing words in a body of text from array - c#

So I am trying to get my code to read all the text from a text box called txtBody for now, and to check them against listA, if any of the words from listA appear I would like to replace those words with the appropriate one from listB. How can I do this?
For reference list A and B are from a CSV, ListA being column one and list B column 2, therefore listA[1] is the counter part of listB[1].
This is the code I have for the lists
string body = txtBody.Text;
var reader = new StreamReader(File.OpenRead("textwords.csv"));
List<string> listA = new List<string>();
List<string> listB = new List<string>();
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
var values = line.Split(',');
listA.Add(values[0]);
listB.Add(values[1]);
}
thanks for any help

In the most simplest form, you can do:
for(int i = 0; i < listA.Count; i++)
body = body.Replace(listA[i], listB[i]);
However, if you have a word like is in listA, then something like this would be partially replaced.
UPDATE
If you want each word to be surrounded by spaces, you could add that:
for (int i = 0; i < listA.Count; i++)
{
var word1 = string.Format(#"(\b){0}(\b)", listA[i]);
var word2 = string.Format(#"$1{0}$2", listB[i]);
body = Regex.Replace(body, word1, word2, RegexOptions.IgnoreCase);
}
the regex will match the word, with spaces on either side, and replace it with the new word, keeping the spacing the same.

I would suggest using a regex.
for(int i = 0; i < listA.Count; i++)
{
Regex myRege = new Regex(listA[i]);
body = Regex.Replace(body,listB[i];
}
If an entire word match is desired:
for(int i = 0; i < listA.Count; i++)
{
Regex myRege = new Regex(" " + listA[i] + " ");
body = Regex.Replace(body,listB[i];
}

Using Linq you can have sth like this:
string[] bodyWords = body.Split(' ');
// For each word contained inside body (s in bodyWords), if there is a match
// in listA (i != 0), then get counterpart from listB (listB[i]), otherwise
// leave it as it is. Use Join to reconstruct result string.
string result = string.Join(" ", (from s in bodyWords
let i = listA.FindIndex(x => x == s)
select i == -1 ? s : listB[i]).ToArray());
With this a input:
string body = "John Smith";
List<string> listA = new List<string>() { "John", "Two", "Smith", "Four", "Five" };
List<string> listB = new List<string>() { "Martin", "TwoB", "Jones", "FourB", "FiveB"};;
the above code yields as result: "Martin Jones"
Performance-wise this might not be the best possible solution, but it is always fun doing things in linq! :=)

Related

Detecting and modifying ListBox entries that contain digits

My program has about 25 entries, most of them string only. However, some of them are supposed to have digits in them, and I don't need those digits in the output (output should be string only). So, how can I "filter out" integers from strings?
Also, if I have integers, strings AND chars, how could I do it (for example, one ListBox entry is E#2, and should be renamed to E# and then printed as output)?

Assuming that your entries are in a List<string>, you can loop through the list and then through each character of each entry, then check if it is a number and remove it. Something like this:
List<string> list = new List<string>{ "abc123", "xxx111", "yyy222" };
for (int i = 0; i < list.Count; i++) {
var no_numbers = "";
foreach (char c in list[i]) {
if (!Char.IsDigit(c))
no_numbers += c;
}
list[i] = no_numbers;
}
This only removes digits as it seems you wanted from your question. If you want to remove all other characters except letters, you can change the logic a bit and use Char.IsLetter() instead of Char.IsDigit().

You can remove all numbers from a strings with this LINQ solution:
string numbers = "Ho5w ar7e y9ou3?";
string noNumbers = new string(numbers.Where(c => !char.IsDigit(c)).ToArray());
noNumbers = "How are you?"
But you can also remove all numbers from a string by using a foreach loop :
string numbers = "Ho5w ar7e y9ou3?";
List<char> noNumList = new List<char>();
foreach (var c in numbers)
{
if (!char.IsDigit(c))
noNumList.Add(c);
}
string noNumbers = string.Join("", noNumList);
If you want to remove all numbers from strings inside a collection :
List<string> myList = new List<string>() {
"Ho5w ar7e y9ou3?",
"W9he7re a3re y4ou go6ing?",
"He2ll4o!"
};
List<char> noNumList = new List<char>();
for (int i = 0; i < myList.Count; i++)
{
foreach (var c in myList[i])
{
if(!char.IsDigit(c))
noNumList.Add(c);
}
myList[i] = string.Join("", noNumList);
noNumList.Clear();
}
myList Output :
"How are you?"
"Where are you going?"
"Hello!"

I don't know exactly what is your scenario, but given a string, you can loop through its characters, and if it's a number, discard it from output.
Maybe this is what you're looking for:
string entry = "E#2";
char[] output = new char[entry.Length];
for(int i = 0, j =0; i < entry.Length ; i++)
{
if(!Char.IsDigit(entry[i]))
{
output[j] = entry[i];
j++;
}
}
Console.WriteLine(output);
I've tried to give you a simple solution with one loop and two index variables, avoiding string concatenations that can make performance lacks.
See this example working at C# Online Compiler

If i am not wrong,maybe this is how your list looks ?
ABCD123
EFGH456
And your expected output is :
ABCD
EFGH
Is that correct?If so,assuming that it's a List<string>,then you can use the below code :
list<string> mylist = new list<string>;
foreach(string item in mylist)
{
///To get letters/alphabets
var letters = new String(item.Where(Char.IsLetter).ToArray());
///to get special characters
var letters = new String(item.Where(Char.IsSymbol).ToArray())
}
Now you can easily combine the codes :)

Replace string with multiple different options

Hi there wonderful people of stackOverFlow.
I am currently in a position where im totaly stuck. What i want to be able to do is take out a word from a text and replace it with a synonym. I thought about it for a while and figured out how to do it if i ONLY have one possible synonym with this code.
string pathToDesk = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
string text = System.IO.File.ReadAllText(pathToDesk + "/Text.txt");
string replacementsText = System.IO.File.ReadAllText(pathToDesk + "/Replacements.txt");
string wordsToReplace = System.IO.File.ReadAllText(pathToDesk + "/WordsToReplace.txt");
string[] words = text.Split(' ');
string[] reWords = wordsToReplace.Split(' ');
string[] replacements = replacementsText.Split(' ');
for(int i = 0; i < words.Length; i++) {//for each word
for(int j = 0; j < replacements.Length; j++) {//compare with the possible synonyms
if (words[i].Equals(reWords[j], StringComparison.InvariantCultureIgnoreCase)) {
words[i] = replacements[j];
}
}
}
string newText = "";
for(int i = 0; i < words.Length; i++) {
newText += words[i] + " ";
}
txfInput.Text = newText;
But lets say that we were to get the word hi. Then i want to be able to replace that with {"Hello","Yo","Hola"}; (For example)
Then my code will not be good for anything since they will not have the same position in the arrays.
Is there any smart solution to this I would really like to know.

you need to store your synonyms differently
in your file you need something like
hello yo hola hi
awesome fantastic great
then for each line, split the words, put them in an array array of arrays
Now use that to find replacement words
This won't be super optimized, but you can easily index each word to a group of synonyms as well.
something like
public class SynonymReplacer
{
private Dictionary<string, List<string>> _synonyms;
public void Load(string s)
{
_synonyms = new Dictionary<string, List<string>>();
var lines = s.Split(new[] {'\r', '\n'}, StringSplitOptions.RemoveEmptyEntries);
foreach (var line in lines)
{
var words = line.Split(new[] {' '}, StringSplitOptions.RemoveEmptyEntries).ToList();
words.ForEach(word => _synonyms.Add(word, words));
}
}
public string Replace(string word)
{
if (_synonyms.ContainsKey(word))
{
return _synonyms[word].OrderBy(a => Guid.NewGuid())
.FirstOrDefault(w => w != word) ?? word;
}
return word;
}
}
The OrderBy gets you a random synonym...
then
var s = new SynonymReplacer();
s.Load("hi hello yo hola\r\nawesome fantastic great\r\n");
Console.WriteLine(s.Replace("hi"));
Console.WriteLine(s.Replace("ok"));
Console.WriteLine(s.Replace("awesome"));
var words = new string[] {"hi", "you", "look", "awesome"};
Console.WriteLine(string.Join(" ", words.Select(s.Replace)));
and you get :-
hello
ok
fantastic
hello you look fantastic

Your first task will be to build a list of words and synonyms. A Dictionary will be perfect for this. The text file containing this list might look like this:
word1|synonym11,synonym12,synonym13
word2|synonym21,synonym22,synonym23
word3|synonym31,synonym32,synonym33
Then you can construct the dictionary like this:
public Dictionary<string, string[]> GetSynonymSet(string synonymSetTextFileFullPath)
{
var dict = new Dictionary<string, string[]>();
string line;
// Read the file and display it line by line.
using (var file = new StreamReader(synonymSetTextFileFullPath))
{
while((line = file.ReadLine()) != null)
{
var split = line.Split('|');
if (!dict.ContainsKey(split[0]))
{
dict.Add(split[0], split[1].Split(','));
}
}
}
return dict;
}
The eventual code will look like this
public string ReplaceWordsInText(Dictionary<string, string[]> synonymSet, string text)
{
var newText = new StringBuilder();
string[] words = text.Split(' ');
for (int i = 0; i < words.Length; i++) //for each word
{
string[] synonyms;
if (synonymSet.TryGetValue(words[i], out synonyms)
{
// The exact synonym you wish to use is up to you.
// I will just use the first one
words[i] = synonyms[0];
}
newText.AppendFormat("{0} ", words[i]);
}
return newText.ToString();
}

String split by every 3 words

I've got a problem. I need to split my every string like this:
For example:
"Economic drive without restrictions"
I need array with sub string like that:
"Economic drive without"
"drive without restrictions"
For now i have this:
List<string> myStrings = new List<string>();
foreach(var text in INPUT_TEXT) //here is Economic drive without restrictions
{
myStrings.DefaultIfEmpty();
var textSplitted = text.Split(new char[] { ' ' });
int j = 0;
foreach(var textSplit in textSplitted)
{
int i = 0 + j;
string threeWords = "";
while(i != 3 + j)
{
if (i >= textSplitted.Count()) break;
threeWords = threeWords + " " + textSplitted[i];
i++;
}
myStrings.Add(threeWords);
j++;
}
}

You could use this LINQ query:
string text = "Economic drive without restrictions";
string[] words = text.Split();
List<string> myStrings = words
.Where((word, index) => index + 3 <= words.Length)
.Select((word, index) => String.Join(" ", words.Skip(index).Take(3)))
.ToList();
Because others commented that it would be better to show a loop version since OP is learning this language, here is a version that uses no LINQ at all:
List<string> myStrings = new List<string>();
for (int index = 0; index + 3 <= words.Length; index++)
{
string[] slice = new string[3];
Array.Copy(words, index, slice, 0, 3);
myStrings.Add(String.Join(" ", slice));
}

I try to give a simple solution. So i hope you can better understand it.
List<string> myStrings = new List<string>();
string input = "Economic drive without restrictions";
var allWords = input.Split(new char[] {' '});
for (int i = 0; i < allWords.Length - 2; i++)
{
var textSplitted = allWords.Skip(i).Take(3);
string threeString = string.Join(" ", textSplitted);
myStrings.Add(threeString);
}
foreach (var myString in myStrings)
{
Console.WriteLine(myString);
}
The method Take(n) is from Linq. It takes the first n elements of the given array. for example if you have an array a,b,c,d,e then Take(3) will give you a new array a,b,c.
The method Skip(n) is from Linq. It gives you the new array by skipping first n elements. given array a,b,c,d,e then Skip(1) will return b,c,d,e. as you can see it skipped the first elements.
Now with this two methods you can move on array 3 by 3 and get the words you want.

Just for comparative purposes, here's another solution that doesn't use Linq:
string[] words = INPUT_TEXT.Split();
List<string> myStrings = new List<string>();
for (int i = 0; i < words.Length - 2; ++i)
myStrings.Add(string.Join(" ", words[i], words[i+1], words[i+2]));
Or using ArraySegment<string>:
string[] words = INPUT_TEXT.Split();
List<string> myStrings = new List<string>();
for (int i = 0; i < words.Length - 2; ++i)
myStrings.Add(string.Join(" ", new ArraySegment<string>(words, i, 3)));

I would use one of the methods described here ; for instance the following that takes the elements 3 by 3.
var groups = myStrings.Select((p, index) => new {p,index})
.GroupBy(a =>a.index/3);
Warning, it is not the most memory efficient, if you start parsing big strings, it might blow up on you. Try and observe.
Then you only need to handle the last element. If it has less than 3 strings, fill it up from the left.

How to replace exclusively?

Okay, I have a pretty obvious but apparently nontrivial problem to solve.
Suppose I have a simple string ab.
Now I want to replace a with b and b with a, so I end up with ba.
The hand on solution would be to do the two replaces consecutively. But the result from that is either aa or bb depending on the order.
Obviously, the production situation will have to deal with much more complex strings and more replacements than two, but the problem still applies.
One idea I had was to save positions where I replaced something. But that threw me off as soon as the replacement had a different length than the original needle.
This is general problem, but I'm working with C#. Here's some code I came up with:
string original = "abc";
Regex[] expressions = new Regex[]
{
new Regex("a"), //replaced by ab
new Regex("b") //replaced by c
};
string[] replacements = new string[]
{
"ab",
"c"
};
for (int i = 0; i < expressions.Length; i++)
original = expressions[i].Replace(original, replacements[i]);
//Expected result: abcc
//Actual result: accc <- the b is replaced by c in the second pass.
So is there a simple way to solve this?

If you are talking about simple one-to-one conversions, converting to a char array and doing a switch is probably ideal, however you seem to be looking for more complex replacements.
Basically the trick is to create an intermediate character to mark your temporaries. Rather than showing the actual code, here is what the string would look like as it was transformed:
ab
%1b
%1%2
b%2
ba
So basically, replace % with %%, then the first match with %1 and so on. Once they are all done replace %1 with its output and so on, finally replacing %% with %.
Be careful though, if you can guarantee that your intermediate syntax doesn't taint your input you are okay, if you cannot, you will need to use a tricks to make sure you aren't prefaced by an odd number of %. (So %%a would match, but %%%a would not, since that would mean the special value %a)

Here’s one solution. Try all the regular expressions against the string, do the replacement on the earliest match, then recurse on the remaining part of the string. If you need this to be faster but more complicated, you could ask for all Matches() right at the start and process them from left to right, adjusting their Indexes as you replace expressions with longer and shorter strings, and throwing away any overlaps.
using System;
using System.IO;
using System.Text.RegularExpressions;
class MultiRegex {
static String Replace(String text, Regex[] expressions,
String[] replacements, int start=0)
{
// Try matching each regex; save the first match
Match firstMatch = null;
int firstMatchingExpressionIndex = -1;
for (int i = 0; i < expressions.Length; i++) {
Regex r = expressions[i];
Match m = r.Match(text, start);
if (m.Success
&& (firstMatch == null || m.Index < firstMatch.Index))
{
firstMatch = m;
firstMatchingExpressionIndex = i;
}
}
if (firstMatch == null) {
/* No matches anywhere */
return text;
}
// Replace text, then recurse
String newText = text.Substring(0, firstMatch.Index)
+ replacements[firstMatchingExpressionIndex]
+ text.Substring(firstMatch.Index + firstMatch.Length);
return Replace(newText, expressions, replacements,
start + replacements[firstMatchingExpressionIndex].Length);
}
public static void Main() {
Regex[] expressions = new Regex[]
{
new Regex("a"), //replaced by ab
new Regex("b") //replaced by c
};
string[] replacements = new string[]
{
"ab",
"c"
};
string original = "a b c";
Console.WriteLine(
Replace(original, expressions, replacements));
// Should be "baz foo bar"
Console.WriteLine(Replace("foo bar baz",
new Regex[] { new Regex("bar"), new Regex("baz"),
new Regex("foo") },
new String[] { "foo", "bar", "baz" }));
}
}
This prints:
ab c c
baz foo bar

If you use (\ba\b) to represent matching the letter a and only the letter a, whereas ab won't be matched. Similar for the b, it would be (\bb\b).
string original = "a b c";
Regex[] expressions = new Regex[] {
// # sign used to signify a literal string
new Regex(#"(\ba\b)"), // \b represents a word boundary, between a word and a space
new Regex(#"(\bb\b)"),
};
string[] replacements = new string[] {
"ab",
"c"
};
for(int i = 0; i < expressions.Length; i++)
original = expressions[i].Replace(original, replacements[i]);
Edit 1:
Question changed to without spaces between the letters to match, wanted the same abcc from abc, I just reversed the order in which the regular expression was checked.
Regex[] expressions = new Regex[] {
new Regex(#"b"), //replaced by c
new Regex(#"a"), //replaced by ab
};
string[] replacements = new string[] {
"c",
"ab",
};
Edit 2:
Answer changed to reflect variable length to match, this matches based on order of patterns to check, checks for the pattern, then moves to new string
string original = "a bc";
Regex[] expressions = new Regex[] {
new Regex(#"a"), //replaced by ab
new Regex(#"b"), //replaced by c
};
string[] replacements = new string[] {
"ab",
"c",
};
string newString = string.Empty;
string workingString = string.Empty;
// Position of start point in string
int index = 0;
// Length to retrieve
int length = 1;
while(index < original.Length) {
// Retrieve a piece of the string
workingString = original.Substring(index, length);
// Whether the expression has been matched
bool found = false;
for(int i = 0; i < expressions.Length && !found; i++) {
if(expressions[i].Match(workingString).Success) {
// If expression matched, add the replacement value to the new string
newString += expressions[i].Replace(workingString, replacements[i]);
// Mark expression as found
found = true;
}
}
if(!found) {
// If not found, increase length (check for more than one character patterns)
length++;
// If the rest of the entire string doesn't match anything, move the character at **index** into the new string
if(length >= (original.Length - index)) {
newString += original.Substring(index, 1);
index++;
length = 1;
}
}
// If a match was found, start over at next position in string
else {
index += length;
length = 1;
}
}

String Formatting: remove the last underscore and the following characters

I have a list of strings like
A_1
A_2
A_B_1
X_a_Z_14
i need to remove the last underscore and the following characters.
so the resulting list will be like
A
A
A_B
X_a_Z

var data = new List<string> {"A_1", "A_2", "A_B_1", "X_a_Z_14"};
int trimPosition;
for (var i = 0; i < data.Count; i++)
if ((trimPosition = data[i].LastIndexOf('_')) > -1)
data[i] = data[i].Substring(0, trimPosition);

string[] names = {"A_1","A_2","A_B_1","X_a_Z_14" };
for (int i = 0; i < names.Length;i++ )
names[i]= names[i].Substring(0, names[i].LastIndexOf('_'));

var s = "X_a_Z_14";
var result = s.Substring(0, s.LastIndexOf('_') ); // X_a_Z

string s = "X_a_Z_14";
s = s.Substring(0, s.LastIndexOf("_"));

input.Substring(0,input.LastIndexOf("_"));

There is also the possibility to use regular expressions if you are so-inclined.
Regex regex = new Regex("_[^_]*$");
string[] strings = new string[] {"A_1", "A_2", "A_B_1", "X_a_Z_14"};
foreach (string s in strings)
{
Console.WriteLine(regex.Replace(s, ""));
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Replacing words in a body of text from array - c#

Related

Detecting and modifying ListBox entries that contain digits

Replace string with multiple different options

String split by every 3 words

How to replace exclusively?

String Formatting: remove the last underscore and the following characters

Categories

Resources