I'm having some issues with replacing words in a string with values from a dictionary. Here's a small sample of my current code:
Dictionary<string, string> replacements = new Dictionary<string, string>()
{
{"ACFT", "AIRCRAFT"},
{"FT", "FEET"},
};
foreach(string s in replacements.Keys)
{
inputBox.Text = inputBox.Text.Replace(s, replacements[s]);
}
When I execute the code, if I have ACFT in the textbox, it is replaced with AIRCRAFEET because it sees the FT part in the string. I need to somehow differentiate this and only replace the whole word.
So for example, if I have ACFT in the box, it should replace it with AIRCRAFT. And, if I have FT in the box, replace it with FEET.
So my question is, how can I match whole words only when replacing words?
EDIT: I want to be able to use and replace multiple words.
use the if condition..
foreach(string s in replacements.Keys) {
if(inputBox.Text==s){
inputBox.Text = inputBox.Text.Replace(s, replacements[s]);
}
}
UPDATE after you modified your question..
string str = "ACFT FTT";
Dictionary<string, string> replacements = new Dictionary<string, string>()
{
{"ACFT", "AIRCRAFT"},
{"FT", "FEET"},
};
string[] temp = str.Split(' ');
string newStr = "";
for (int i = 0; i < temp.Length; i++)
{
try
{
temp[i] = temp[i].Replace(temp[i], replacements[temp[i]]);
}
catch (KeyNotFoundException e)
{
// not found..
}
newStr+=temp[i]+" ";
}
Console.WriteLine( newStr);
how can I match whole words only when replacing words?
Use regular expressions (as was suggested by David Pilkington)
Dictionary<string, string> replacements = new Dictionary<string, string>()
{
{"ACFT", "AIRCRAFT"},
{"FT", "FEET"},
};
foreach(string s in replacements.Keys)
{
var pattern = "\b" + s + "\b"; // match on word boundaries
inputBox.Text = Regex.Replace(inputBox.Text, pattern, replacements[s]);
}
However, if you have control over the design, I would much rather use keys like "{ACFT}","{FT}" (which have explicit boundaries), so you could just use them with String.Replace.
I think you may want to replace the max length subStr in inputText.
int maxLength = 0;
string reStr = "";
foreach (string s in replacements.Keys)
{
if (textBox2.Text.Contains(s))
{
if (maxLength < s.Length)
{
maxLength = s.Length;
reStr = s;
}
}
}
if (reStr != "")
textBox2.Text = textBox2.Text.Replace(reStr, replacements[reStr]);
The problem with this is that you are replacing every instance of the substring in the entire string. If what you want is to replace only whole, space-delimited instances of "ACFT" or "FT", you would want to use String.Split() to create a set of tokens.
For example:
string tempString = textBox1.Text;
StringBuilder finalString = new StringBuilder();
foreach (string word in tempString.Split(new char[] { ' ' })
{
foreach(string s in replacements.Keys)
{
finalString.Append(word.Replace(s, replacements[s]));
}
}
textBox1.Text = finalString.ToString();
I've used a StringBuilder here because concatenation requires the creation of a new string every single time, and this gets extremely inefficient over long periods. If you expect to have a small number of concatenations to make, you can probably get away with using string.
Note that there's a slight wrinkle in your design - if you have a KeyValuePair with a value that's identical to a key that occurs later in the dictionary iteration, the replacement will be overwritten.
Here's very funky way of doing this.
First up, you need to use regular expressions (Regex) as this has good built-in features for matching word boundaries.
So the key line of code would be to define a Regex instance:
var regex = new Regex(String.Format(#"\b{0}\b", Regex.Escape("ACFT"));
The \b marker looks for word boundaries. The Regex.Escape ensures that if any other your keys have special Regex characters that they are escaped out.
You could then replace the text like this:
var replacedtext = regex.Replace("A FT AFT", "FEET");
You would get replacedtext == "A FEET AFT".
Now, here's the funky part. If you start with your current dictionary then you can define a single function that will do all of the replacements in one go.
Do it this way:
Func<string, string> funcreplaceall =
replacements
.ToDictionary(
kvp => new Regex(String.Format(#"\b{0}\b", Regex.Escape(kvp.Key))),
kvp => kvp.Value)
.Select(kvp =>
(Func<string, string>)(x => kvp.Key.Replace(x, kvp.Value)))
.Aggregate((f0, f1) => x => f1(f0(x)));
Now you can just call it like so:
inputBox.Text = funcreplaceall(inputBox.Text);
No looping required!
Just as a sanity check I got this:
funcreplaceall("A ACFT FT RACFT B") == "A AIRCRAFT FEET RACFT B"
Related
I have a text file from which I want to store Keys and Values in a String array.
In this case, Key is something like "Input File" and the Value is "'D:\myfile.wav'". I'm splitting the text file lines by **:** character. However, I just want to restrict the split to only the first occurrence of **:**.
This is my code:
Input File : 'D:\myfile.wav'
Duration : 00:00:18.57
if (Regex.IsMatch(line, #"[^0-9\p{L}:_ ]+", RegexOptions.IgnoreCase))
{
string[] dataArray = line.Split(':');
}
Using regular expression captures
private static Regex _regex = new Regex(#"^([\p{L}_ ]+):?(.+)$");
....
Match match = _regex.Match(line);
if (match.Success)
{
string key = match.Groups[1].Captures[0].Value;
string value = match.Groups[2].Captures[0].Value;
}
The regexp is a static member to avoid compiling it for every usage. The ? in the expression is to force lazy behavior (greedy is the default) and match the first :.
Link to Fiddle.
Edit
I've updated the code and fiddle after your comment. I think this is what you mean:
Key: Any letter, underscore and whitespace combination (no digits)
Value: anything
Separator between key and value: :
Basically, you do not want to split your entire string, but to skip all the content before encountering first ':' char plus one symbol (':' itself).
var data = line.Substring(line.IndexOf(':') + 1);
Or if you really want solution with Split:
var data = string.Join(":", line.Split(':').Skip(1));
Here, we first split the string into array, then skip one element (the one we are trying to get rid of), and finally construct a new string with ':' between elements in the array.
Here's one way to do it with regex (comments in code):
string[] lines = {#"Input File : 'D:\myfile.wav'", #"Duration: 00:00:18.57"};
Regex regex = new Regex("^[^:]+");
Dictionary<string, string> dict = new Dictionary<string, string>();
for (int i = 0; i < lines.Length; i++)
{
// match in the string will be everything before first :,
// then we replace match with empty string and remove first
// character which will be :, and that will be the value
string key = regex.Match(lines[i]).Value.Trim();
string value = regex.Replace(lines[i], "").Remove(0, 1).Trim();
dict.Add(key, value);
}
It uses pattern ^[^:]+, which is negated class technique to match everything unless specified character.
you need to read put information to String Line
after that, do this.
String Key = Line.Split( ':' )[0];
String Value = Text.Substring( Key.Length + 1, Text.Length - Property.Length - 1 );
On this way you can read each line of the text file. You fill the json with Key = until the ":" Value= From the ":"
Dictionary<string, string> yourDictionary = new Dictionary<string, string>();
string pathF = "C:\\fich.txt";
StreamReader file = new StreamReader(pathF, Encoding.Default);
string step = "";
List<string> stream = new List<string>();
while ((step = file.ReadLine()) != null)
{
if (!string.IsNullOrEmpty(step))
{
yourDictionary.Add(step.Substring(0, step.IndexOf(':')), step.Substring(step.IndexOf(':') + 1));
}
}
Well, I know this can be very basic functionality of C#.
But I didn't used since years so Asking this....
I have a string like MyName-1_1#1233
I want to pick only the numbers/characters from between of - , _ and # ...
I can use split function, but it take quite big code...is there anything else ?
for picking the numbers from the string, I supposed to write like below code
string[] words = s.Split('-');
foreach (string word in words)
{
//getting two separate string and have to pick the number using index...
}
string[] words = s.Split('_');
foreach (string word in words)
{
//getting two separate string and have to pick the number using index...
}
string[] words = s.Split('#');
foreach (string word in words)
{
//getting two separate string and have to pick the number using index...
}
You can use regular expressions for this:
string S = "-1-2#123#3";
foreach (Match m in Regex.Matches(S, "(?<=[_#-])(\\d+)(?=[_#-])?"))
{
Console.WriteLine(m.Groups[1]);
}
A bit shorter:
List<char> badChars = new List<char>{'-','_','#'};
string str = "MyName-1_1#1233";
string output = new string(str.Where(ch => !badChars.Contains(ch)).ToArray());
output will be MyName111233
If you want only the numbers then:
string str = "MyName-1_1#1233";
string output = new string(str.Where(ch => char.IsDigit(ch)).ToArray());
output will be 111233
What is the best way to enumerate a regex-replacement in C#.
For example if I wanted every "<intent-filter" match to be replaced by "<intent-filter android:label=label#". The # sign is a incremental digit. What would be the best way to code it?
You can use an incremented counter in the anonymous method specified as the MatchEvaluator callback. The (?<=…) is positive lookbehind; it is matched by the regex evaluator, but not removed.
string input = "a <intent-filter data=a /> <intent-filter data=b />";
int count = 0;
string result = Regex.Replace(input, #"(?<=\<intent-filter)",
_ => " android:label=label" + count++);
Don't bother with Regexes for this one. Do something along the lines of:
var pieces = text.Split(new string[] { "xx" });
var sb = new StringBuilder();
var idx = 0;
foreach (var piece in pieces)
{
sb.Append(piece);
sb.Append(" android:label=label");
sb.Append(idx);
}
// oops, homework assignment: remove the last "<intent-filter android:label=label#"
I'm wondering how I can replace (remove) multiple words (like 500+) from a string. I know I can use the replace function to do this for a single word, but what if I want to replace 500+ words? I'm interested in removing all generic keywords from an article (such as "and", "I", "you" etc).
Here is the code for 1 replacement.. I'm looking to do 500+..
string a = "why and you it";
string b = a.Replace("why", "");
MessageBox.Show(b);
Thanks
# Sergey Kucher Text size will vary between a few hundred words to a few thousand. I am replacing these words from random articles.
I would normally do something like:
// If you want the search/replace to be case sensitive, remove the
// StringComparer.OrdinalIgnoreCase
Dictionary<string, string> replaces = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase) {
// The format is word to be searched, word that should replace it
// or String.Empty to simply remove the offending word
{ "why", "xxx" },
{ "you", "yyy" },
};
void Main()
{
string a = "why and you it and You it";
// This will search for blocks of letters and numbers (abc/abcd/ab1234)
// and pass it to the replacer
string b = Regex.Replace(a, #"\w+", Replacer);
}
string Replacer(Match m)
{
string found = m.ToString();
string replace;
// If the word found is in the dictionary then it's placed in the
// replace variable by the TryGetValue
if (!replaces.TryGetValue(found, out replace))
{
// otherwise replace the word with the same word (so do nothing)
replace = found;
}
else
{
// The word is in the dictionary. replace now contains the
// word that will substitute it.
// At this point you could add some code to maintain upper/lower
// case between the words (so that if you -> xxx then You becomes Xxx
// and YOU becomes XXX)
}
return replace;
}
As someone else wrote, but without problems with substrings (the ass principle... You don't want to remove asses from classes :-) ), and working only if you only need to remove words:
var escapedStrings = yourReplaces.Select(Regex.Escape);
string result = Regex.Replace(yourInput, #"\b(" + string.Join("|", escapedStrings) + #")\b", string.Empty);
I use the \b word boundary... It's a little complex to explain what it's, but it's useful to find word boundaries :-)
Create a list of all text you want and load it into a list, you do this fairly simple or get very complex. A trivial example would be:
var sentence = "mysentence hi";
var words = File.ReadAllText("pathtowordlist.txt").Split(Enviornment.NewLine);
foreach(word in words)
sentence.replace("word", "x");
You could create two lists if you wanted a dual mapping scheme.
Try this:
string text = "word1 word2 you it";
List<string> words = new System.Collections.Generic.List<string>();
words.Add("word1");
words.Add("word2");
words.ForEach(w => text = text.Replace(w, ""));
Edit
If you want to replace text with another text, you can create class Word:
public class Word
{
public string SearchWord { get; set; }
public string ReplaceWord { get; set; }
}
And change above code to this:
string text = "word1 word2 you it";
List<Word> words = new System.Collections.Generic.List<Word>();
words.Add(new Word() { SearchWord = "word1", ReplaceWord = "replaced" });
words.Add(new Word() { SearchWord = "word2", ReplaceWord = "replaced" });
words.ForEach(w => text = text.Replace(w.SearchWord, w.ReplaceWord));
if you are talking about a single string the solution is to remove them all by a simple replace method. as you can read there:
"Returns a new string in which all occurrences of a specified string in the current instance are replaced with another specified string".
you may be needing to replace several words, and you can make a list of these words:
List<string> wordsToRemove = new List<string>();
wordsToRemove.Add("why");
wordsToRemove.Add("how);
and so on
and then remove them from the string
foreach(string curr in wordsToRemove)
a = a.ToLower().Replace(curr, "");
Importent
if you want to keep your string as it was, without lowering words and without struggling with lower and upper case use
foreach(string curr in wordsToRemove)
// You can reuse this object
Regex regex = new Regex(curr, RegexOptions.IgnoreCase);
myString = regex.Replace(myString, "");
depends on the situation ofcourse,
but if your text is long and you have many words,
and you want optimize performance.
you should build a trie from the words, and search the Trie for a match.
it won't lower the Order of complexity, still O(nm), but for large groups of words, it will be able to check multiple words against each char instead of one by one.
i can assume couple of houndred words should be enough to get this faster.
This is the fastest method in my opinion and
i written a function for you to start with:
public struct FindRecord
{
public int WordIndex;
public int PositionInString;
}
public static FindRecord[] FindAll(string input, string[] words)
{
LinkedList<FindRecord> result = new LinkedList<FindRecord>();
int[] matchs = new int[words.Length];
for (int i = 0; i < input.Length; i++)
{
for (int j = 0; j < words.Length; j++)
{
if (input[i] == words[j][matchs[j]])
{
matchs[j]++;
if(matchs[j] == words[j].Length)
{
FindRecord findRecord = new FindRecord {WordIndex = j, PositionInString = i - matchs[j] + 1};
result.AddLast(findRecord);
matchs[j] = 0;
}
}
else
matchs[j] = 0;
}
}
return result.ToArray();
}
Another option:
it might be the rare case where regex will be faster then building the code.
Try using
public static string ReplaceAll(string input, string[] words)
{
string wordlist = string.Join("|", words);
Regex rx = new Regex(wordlist, RegexOptions.Compiled);
return rx.Replace(input, m => "");
}
Regex can do this better, you just need all the replace words in a list, and then:
var escapedStrings = yourReplaces.Select(PadAndEscape);
string result = Regex.Replace(yourInput, string.Join("|", escapedStrings);
This requires a function that space-pads the strings before escaping them:
public string PadAndEscape(string s)
{
return Regex.Escape(" " + s + " ");
}
I have a List of words I want to ignore like this one :
public List<String> ignoreList = new List<String>()
{
"North",
"South",
"East",
"West"
};
For a given string, say "14th Avenue North" I want to be able to remove the "North" part, so basically a function that would return "14th Avenue " when called.
I feel like there is something I should be able to do with a mix of LINQ, regex and replace, but I just can't figure it out.
The bigger picture is, I'm trying to write an address matching algorithm. I want to filter out words like "Street", "North", "Boulevard", etc. before I use the Levenshtein algorithm to evaluate the similarity.
How about this:
string.Join(" ", text.Split().Where(w => !ignoreList.Contains(w)));
or for .Net 3:
string.Join(" ", text.Split().Where(w => !ignoreList.Contains(w)).ToArray());
Note that this method splits the string up into individual words so it only removes whole words. That way it will work properly with addresses like Northampton Way #123 that string.Replace can't handle.
Regex r = new Regex(string.Join("|", ignoreList.Select(s => Regex.Escape(s)).ToArray()));
string s = "14th Avenue North";
s = r.Replace(s, string.Empty);
Something like this should work:
string FilterAllValuesFromIgnoreList(string someStringToFilter)
{
return ignoreList.Aggregate(someStringToFilter, (str, filter)=>str.Replace(filter, ""));
}
What's wrong with a simple for loop?
string street = "14th Avenue North";
foreach (string word in ignoreList)
{
street = street.Replace(word, string.Empty);
}
If you know that the list of word contains only characters that do not need escaping inside a regular expression then you can do this:
string s = "14th Avenue North";
Regex regex = new Regex(string.Format(#"\b({0})\b",
string.Join("|", ignoreList.ToArray())));
s = regex.Replace(s, "");
Result:
14th Avenue
If there are special characters you will need to fix two things:
Use Regex.Escape on each element of ignore list.
The word-boundary \b will not match a whitespace followed by a symbol or vice versa. You may need to check for whitespace (or other separating characters such as punctuation) using lookaround assertions instead.
Here's how to fix these two problems:
Regex regex = new Regex(string.Format(#"(?<= |^)({0})(?= |$)",
string.Join("|", ignoreList.Select(x => Regex.Escape(x)).ToArray())));
If it's a short string as in your example, you can just loop though the strings and replace one at a time. If you want to get fancy you can use the LINQ Aggregate method to do it:
address = ignoreList.Aggregate(address, (a, s) => a.Replace(s, String.Empty));
If it's a large string, that would be slow. Instead you can replace all strings in a single run through the string, which is much faster. I made a method for that in this answer.
LINQ makes this easy and readable. This requires normalized data though, particularly in that it is case-sensitive.
List<string> ignoreList = new List<string>()
{
"North",
"South",
"East",
"West"
};
string s = "123 West 5th St"
.Split(' ') // Separate the words to an array
.ToList() // Convert array to TList<>
.Except(ignoreList) // Remove ignored keywords
.Aggregate((s1, s2) => s1 + " " + s2); // Reconstruct the string
Why not juts Keep It Simple ?
public static string Trim(string text)
{
var rv = text.trim();
foreach (var ignore in ignoreList) {
if(tv.EndsWith(ignore) {
rv = rv.Replace(ignore, string.Empty);
}
}
return rv;
}
You can do this using and expression if you like, but it's easier to turn it around than using a Aggregate. I would do something like this:
string s = "14th Avenue North"
ignoreList.ForEach(i => s = s.Replace(i, ""));
//result is "14th Avenue "
public static string Trim(string text)
{
var rv = text;
foreach (var ignore in ignoreList)
rv = rv.Replace(ignore, "");
return rv;
}
Updated For Gabe
public static string Trim(string text)
{
var rv = "";
var words = text.Split(" ");
foreach (var word in words)
{
var present = false;
foreach (var ignore in ignoreList)
if (word == ignore)
present = true;
if (!present)
rv += word;
}
return rv;
}
If you have a list, I think you're going to have to touch all the items. You could create a massive RegEx with all your ignore keywords and replace to String.Empty.
Here's a start:
(^|\s+)(North|South|East|West){1,2}(ern)?(\s+|$)
If you have a single RegEx for ignore words, you can do a single replace for each phrase you want to pass to the algorithm.