I have a text file from which I want to store Keys and Values in a String array.
In this case, Key is something like "Input File" and the Value is "'D:\myfile.wav'". I'm splitting the text file lines by **:** character. However, I just want to restrict the split to only the first occurrence of **:**.
This is my code:
Input File : 'D:\myfile.wav'
Duration : 00:00:18.57
if (Regex.IsMatch(line, #"[^0-9\p{L}:_ ]+", RegexOptions.IgnoreCase))
{
string[] dataArray = line.Split(':');
}
Using regular expression captures
private static Regex _regex = new Regex(#"^([\p{L}_ ]+):?(.+)$");
....
Match match = _regex.Match(line);
if (match.Success)
{
string key = match.Groups[1].Captures[0].Value;
string value = match.Groups[2].Captures[0].Value;
}
The regexp is a static member to avoid compiling it for every usage. The ? in the expression is to force lazy behavior (greedy is the default) and match the first :.
Link to Fiddle.
Edit
I've updated the code and fiddle after your comment. I think this is what you mean:
Key: Any letter, underscore and whitespace combination (no digits)
Value: anything
Separator between key and value: :
Basically, you do not want to split your entire string, but to skip all the content before encountering first ':' char plus one symbol (':' itself).
var data = line.Substring(line.IndexOf(':') + 1);
Or if you really want solution with Split:
var data = string.Join(":", line.Split(':').Skip(1));
Here, we first split the string into array, then skip one element (the one we are trying to get rid of), and finally construct a new string with ':' between elements in the array.
Here's one way to do it with regex (comments in code):
string[] lines = {#"Input File : 'D:\myfile.wav'", #"Duration: 00:00:18.57"};
Regex regex = new Regex("^[^:]+");
Dictionary<string, string> dict = new Dictionary<string, string>();
for (int i = 0; i < lines.Length; i++)
{
// match in the string will be everything before first :,
// then we replace match with empty string and remove first
// character which will be :, and that will be the value
string key = regex.Match(lines[i]).Value.Trim();
string value = regex.Replace(lines[i], "").Remove(0, 1).Trim();
dict.Add(key, value);
}
It uses pattern ^[^:]+, which is negated class technique to match everything unless specified character.
you need to read put information to String Line
after that, do this.
String Key = Line.Split( ':' )[0];
String Value = Text.Substring( Key.Length + 1, Text.Length - Property.Length - 1 );
On this way you can read each line of the text file. You fill the json with Key = until the ":" Value= From the ":"
Dictionary<string, string> yourDictionary = new Dictionary<string, string>();
string pathF = "C:\\fich.txt";
StreamReader file = new StreamReader(pathF, Encoding.Default);
string step = "";
List<string> stream = new List<string>();
while ((step = file.ReadLine()) != null)
{
if (!string.IsNullOrEmpty(step))
{
yourDictionary.Add(step.Substring(0, step.IndexOf(':')), step.Substring(step.IndexOf(':') + 1));
}
}
Related
How would capture both the filenames inside the quotes, and the numbers following as named captures (Regex / C#)?
Files("fileone.txt", 5969784, "file2.txt", 45345333)
Out of every occurrence in the string, the ability to capture "fileone.txt" and the integer following (a loop cycles each pair)
I am trying to use this https://regex101.com/r/MwMzBo/1 but having issues matching without the '[' and ']'.
Required to be able to loop each filename+size as a pair and moving next.
Any help is appreciated!
UPDATE
string file = "Files(\"fileone.txt\", 5969784, \"file2.txt\", 45345333, \"file2.txt\", 45345333)";
var regex = new Regex(#"(?:\G(?!\A)\s*,\s*|\w+\()(?:""(?<file>.*?)""|'(?<file>.*?)')\s*,\s*(?<number>\d+)");
var match = regex.Match(file);
var names = match.Groups["file"].Captures.Cast<Capture>();
var lengths = match.Groups["number"].Captures.Cast<Capture>();
var filelist = names.Zip(lengths, (f, n) => new { file = f.Value, length = long.Parse(n.Value) }).ToArray();
foreach (var item in filelist)
{
// Only returning 1 pair result, ignoring the rest
}
Reading match.Value to confirm what is being read. Only first pair is being picked up.
while (match.Success)
{
MessageBox.Show(match.Value);
match = match.NextMatch();
}
Now we are getting all results properly. I read, that Regex.Match only returns the first matched result. This explains a lot.
You can use
(?:\G(?!\A)\s*,\s*|\w+\()(?:""(?<file>.*?)""|'(?<file>.*?)')\s*,\s*(?<number>\d+)
See the regex demo
Details:
(?:\G(?!\A)\s*,\s*|\w+\() - end of the previous successful match and a comma enclosed with zero or more whitespaces, or a word and an opening ( char
(?:""(?<file>.*?)""|'(?<file>.*?)') - ", Group "file" capturing any zero or more chars other than a newline char as few as possible and then a ", or a ', Group "file" capturing any zero or more chars other than a newline char as few as possible and then a '
\s*,\s* - a comma enclosed with zero or more whitespaces
(?<number>\d+) - Group "number": one or more digits.
I like doing it in smaller pieces :
string input = "cov('Age', ['5','7','9'])";
string pattern1 = #"\((?'key'[^,]+),\s+\[(?'values'[^\]]+)";
Match match = Regex.Match(input, pattern1);
string key = match.Groups["key"].Value.Trim(new char[] {'\''});
string pattern2 = #"'(?'value'[^']+)'";
string values = match.Groups["values"].Value;
MatchCollection matches = Regex.Matches(values, pattern2);
int[] number = matches.Cast<Match>().Select(x => int.Parse(x.Value.Replace("'",string.Empty))).ToArray();
Duration : 00:05:48.73
File Size 61.5M
As you can see the two lines. One of them has a : separating the word and the number, the other one has a blankspace separating the word and number.
I need to separate the word from the number for both the cases (for : as well as for blankspace).
I used String.Split(':') and String.Split(null). While the String.Split(':') worked, and there were only two items in the array, String.Split(null) resulted in the following items in the array: File, Size, 61.5M. So three items. I want to make that into two.
this is the code I'm using:
private static Regex _regex = new Regex(#"^([\p{L}_ ]+):?(.+)$");
Match match = _regex.Match(line);
if (match.Success)
{
string key = match.Groups[1].Captures[0].Value;
string value = match.Groups[2].Captures[0].Value;
}
your split may not work, but if you use the regex you mentioned that should work fine.
Please find the attached fiddle. https://dotnetfiddle.net/g0apnE
var line1 = "Duration : 00:05:48.73";
var line2 = "File Size 61.5M";
Regex _regex = new Regex(#"^([\p{L}_ ]+):?(.+)$");
Match match = _regex.Match(line1);
if (match.Success)
{
string key = match.Groups[1].Captures[0].Value;
string value = match.Groups[2].Captures[0].Value;
//call trim to remove extra space around.
Console.WriteLine(key.Trim()); //Duration
Console.WriteLine(value.Trim()); //00:05:48.73
}
match = _regex.Match(line2);
if (match.Success)
{
string key = match.Groups[1].Captures[0].Value;
string value = match.Groups[2].Captures[0].Value;
//call trim to remove extra space around.
Console.WriteLine(key.Trim()); //File Size
Console.WriteLine(value.Trim()); //61.5M
}
I'm trying to work on this string
abc
def
--------------
efg
hij
------
xyz
pqr
--------------
Now I have to split the string with the - character.
So far I'm first spliting the string in lines and then finding the occurrence of - and the replacing the line with a single *, then combining the whole string and splitting them again.
I'm trying to get the data as
string[] set =
{
"abc
def",
"efg
hij",
"xyz
pqr"
}
Is there a better way to do this?
var spitStrings = yourString.Split(new char[] { '-' }, StringSplitOptions.RemoveEmptyEntries);
If i understand your question, this above code solves it.
Use of string split function using the specific char or string of chars (here -) can be used.
The output will be array of strings. Then choose whichever strings you want.
Example:
http://www.dotnetperls.com/split
I'm confused with exactly what you're asking, but won't this work?
string[] seta =
{
"abc\ndef",
"efg\nhij",
"xyz\npqr"
}
\n = CR (Carriage Return) // Used as a new line character in Unix
\r = LF (Line Feed) // Used as a new line character in Mac OS
\n\r = CR + LF // Used as a new line character in Windows
(char)13 = \n = CR // Same as \n
If I'm understanding your question about splitting -'s then the following should work.
string s = "abc-def-efg-hij-xyz-pqr"; // example?
string[] letters = s.Split(new char[] { '-' }, StringSplitOptions.RemoveEmptyEntries);
If this is what your array looks like at the moment, then you can loop through it as follows:
string[] seta = {
"abc-def",
"efg-hij",
"xyz-pqr"
};
foreach (var letter in seta)
{
string[] letters = letter.Split(new char[] { '-' }, StringSplitOptions.RemoveEmptyEntries);
// do something with letters?
}
I'm sure this below code will help you...
string m = "adasd------asdasd---asdasdsad-------asdasd------adsadasd---asdasd---asdadadad-asdadsa-asdada-s---adadasd-adsd";
var array = m.Split('-');
List<string> myCollection = new List<string>();
if (array.Length > 0)
{
foreach (string item in array)
{
if (item != "")
{
myCollection.Add(item);
}
}
}
string[] str = myCollection.ToArray();
if it does then don't forget to mark my answer thanks....;)
string set = "abc----def----------------efg----hij--xyz-------pqr" ;
var spitStrings = set.Split(new char[]{'-'},StringSplitOptions.RemoveEmptyEntries);
EDIT -
He wants to split the strings no matter how many '-' are there.
var spitStrings = set.Split(new char[]{'-'},StringSplitOptions.RemoveEmptyEntries);
This will do the work.
I'm having some issues with replacing words in a string with values from a dictionary. Here's a small sample of my current code:
Dictionary<string, string> replacements = new Dictionary<string, string>()
{
{"ACFT", "AIRCRAFT"},
{"FT", "FEET"},
};
foreach(string s in replacements.Keys)
{
inputBox.Text = inputBox.Text.Replace(s, replacements[s]);
}
When I execute the code, if I have ACFT in the textbox, it is replaced with AIRCRAFEET because it sees the FT part in the string. I need to somehow differentiate this and only replace the whole word.
So for example, if I have ACFT in the box, it should replace it with AIRCRAFT. And, if I have FT in the box, replace it with FEET.
So my question is, how can I match whole words only when replacing words?
EDIT: I want to be able to use and replace multiple words.
use the if condition..
foreach(string s in replacements.Keys) {
if(inputBox.Text==s){
inputBox.Text = inputBox.Text.Replace(s, replacements[s]);
}
}
UPDATE after you modified your question..
string str = "ACFT FTT";
Dictionary<string, string> replacements = new Dictionary<string, string>()
{
{"ACFT", "AIRCRAFT"},
{"FT", "FEET"},
};
string[] temp = str.Split(' ');
string newStr = "";
for (int i = 0; i < temp.Length; i++)
{
try
{
temp[i] = temp[i].Replace(temp[i], replacements[temp[i]]);
}
catch (KeyNotFoundException e)
{
// not found..
}
newStr+=temp[i]+" ";
}
Console.WriteLine( newStr);
how can I match whole words only when replacing words?
Use regular expressions (as was suggested by David Pilkington)
Dictionary<string, string> replacements = new Dictionary<string, string>()
{
{"ACFT", "AIRCRAFT"},
{"FT", "FEET"},
};
foreach(string s in replacements.Keys)
{
var pattern = "\b" + s + "\b"; // match on word boundaries
inputBox.Text = Regex.Replace(inputBox.Text, pattern, replacements[s]);
}
However, if you have control over the design, I would much rather use keys like "{ACFT}","{FT}" (which have explicit boundaries), so you could just use them with String.Replace.
I think you may want to replace the max length subStr in inputText.
int maxLength = 0;
string reStr = "";
foreach (string s in replacements.Keys)
{
if (textBox2.Text.Contains(s))
{
if (maxLength < s.Length)
{
maxLength = s.Length;
reStr = s;
}
}
}
if (reStr != "")
textBox2.Text = textBox2.Text.Replace(reStr, replacements[reStr]);
The problem with this is that you are replacing every instance of the substring in the entire string. If what you want is to replace only whole, space-delimited instances of "ACFT" or "FT", you would want to use String.Split() to create a set of tokens.
For example:
string tempString = textBox1.Text;
StringBuilder finalString = new StringBuilder();
foreach (string word in tempString.Split(new char[] { ' ' })
{
foreach(string s in replacements.Keys)
{
finalString.Append(word.Replace(s, replacements[s]));
}
}
textBox1.Text = finalString.ToString();
I've used a StringBuilder here because concatenation requires the creation of a new string every single time, and this gets extremely inefficient over long periods. If you expect to have a small number of concatenations to make, you can probably get away with using string.
Note that there's a slight wrinkle in your design - if you have a KeyValuePair with a value that's identical to a key that occurs later in the dictionary iteration, the replacement will be overwritten.
Here's very funky way of doing this.
First up, you need to use regular expressions (Regex) as this has good built-in features for matching word boundaries.
So the key line of code would be to define a Regex instance:
var regex = new Regex(String.Format(#"\b{0}\b", Regex.Escape("ACFT"));
The \b marker looks for word boundaries. The Regex.Escape ensures that if any other your keys have special Regex characters that they are escaped out.
You could then replace the text like this:
var replacedtext = regex.Replace("A FT AFT", "FEET");
You would get replacedtext == "A FEET AFT".
Now, here's the funky part. If you start with your current dictionary then you can define a single function that will do all of the replacements in one go.
Do it this way:
Func<string, string> funcreplaceall =
replacements
.ToDictionary(
kvp => new Regex(String.Format(#"\b{0}\b", Regex.Escape(kvp.Key))),
kvp => kvp.Value)
.Select(kvp =>
(Func<string, string>)(x => kvp.Key.Replace(x, kvp.Value)))
.Aggregate((f0, f1) => x => f1(f0(x)));
Now you can just call it like so:
inputBox.Text = funcreplaceall(inputBox.Text);
No looping required!
Just as a sanity check I got this:
funcreplaceall("A ACFT FT RACFT B") == "A AIRCRAFT FEET RACFT B"
I'm wondering how I can replace (remove) multiple words (like 500+) from a string. I know I can use the replace function to do this for a single word, but what if I want to replace 500+ words? I'm interested in removing all generic keywords from an article (such as "and", "I", "you" etc).
Here is the code for 1 replacement.. I'm looking to do 500+..
string a = "why and you it";
string b = a.Replace("why", "");
MessageBox.Show(b);
Thanks
# Sergey Kucher Text size will vary between a few hundred words to a few thousand. I am replacing these words from random articles.
I would normally do something like:
// If you want the search/replace to be case sensitive, remove the
// StringComparer.OrdinalIgnoreCase
Dictionary<string, string> replaces = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase) {
// The format is word to be searched, word that should replace it
// or String.Empty to simply remove the offending word
{ "why", "xxx" },
{ "you", "yyy" },
};
void Main()
{
string a = "why and you it and You it";
// This will search for blocks of letters and numbers (abc/abcd/ab1234)
// and pass it to the replacer
string b = Regex.Replace(a, #"\w+", Replacer);
}
string Replacer(Match m)
{
string found = m.ToString();
string replace;
// If the word found is in the dictionary then it's placed in the
// replace variable by the TryGetValue
if (!replaces.TryGetValue(found, out replace))
{
// otherwise replace the word with the same word (so do nothing)
replace = found;
}
else
{
// The word is in the dictionary. replace now contains the
// word that will substitute it.
// At this point you could add some code to maintain upper/lower
// case between the words (so that if you -> xxx then You becomes Xxx
// and YOU becomes XXX)
}
return replace;
}
As someone else wrote, but without problems with substrings (the ass principle... You don't want to remove asses from classes :-) ), and working only if you only need to remove words:
var escapedStrings = yourReplaces.Select(Regex.Escape);
string result = Regex.Replace(yourInput, #"\b(" + string.Join("|", escapedStrings) + #")\b", string.Empty);
I use the \b word boundary... It's a little complex to explain what it's, but it's useful to find word boundaries :-)
Create a list of all text you want and load it into a list, you do this fairly simple or get very complex. A trivial example would be:
var sentence = "mysentence hi";
var words = File.ReadAllText("pathtowordlist.txt").Split(Enviornment.NewLine);
foreach(word in words)
sentence.replace("word", "x");
You could create two lists if you wanted a dual mapping scheme.
Try this:
string text = "word1 word2 you it";
List<string> words = new System.Collections.Generic.List<string>();
words.Add("word1");
words.Add("word2");
words.ForEach(w => text = text.Replace(w, ""));
Edit
If you want to replace text with another text, you can create class Word:
public class Word
{
public string SearchWord { get; set; }
public string ReplaceWord { get; set; }
}
And change above code to this:
string text = "word1 word2 you it";
List<Word> words = new System.Collections.Generic.List<Word>();
words.Add(new Word() { SearchWord = "word1", ReplaceWord = "replaced" });
words.Add(new Word() { SearchWord = "word2", ReplaceWord = "replaced" });
words.ForEach(w => text = text.Replace(w.SearchWord, w.ReplaceWord));
if you are talking about a single string the solution is to remove them all by a simple replace method. as you can read there:
"Returns a new string in which all occurrences of a specified string in the current instance are replaced with another specified string".
you may be needing to replace several words, and you can make a list of these words:
List<string> wordsToRemove = new List<string>();
wordsToRemove.Add("why");
wordsToRemove.Add("how);
and so on
and then remove them from the string
foreach(string curr in wordsToRemove)
a = a.ToLower().Replace(curr, "");
Importent
if you want to keep your string as it was, without lowering words and without struggling with lower and upper case use
foreach(string curr in wordsToRemove)
// You can reuse this object
Regex regex = new Regex(curr, RegexOptions.IgnoreCase);
myString = regex.Replace(myString, "");
depends on the situation ofcourse,
but if your text is long and you have many words,
and you want optimize performance.
you should build a trie from the words, and search the Trie for a match.
it won't lower the Order of complexity, still O(nm), but for large groups of words, it will be able to check multiple words against each char instead of one by one.
i can assume couple of houndred words should be enough to get this faster.
This is the fastest method in my opinion and
i written a function for you to start with:
public struct FindRecord
{
public int WordIndex;
public int PositionInString;
}
public static FindRecord[] FindAll(string input, string[] words)
{
LinkedList<FindRecord> result = new LinkedList<FindRecord>();
int[] matchs = new int[words.Length];
for (int i = 0; i < input.Length; i++)
{
for (int j = 0; j < words.Length; j++)
{
if (input[i] == words[j][matchs[j]])
{
matchs[j]++;
if(matchs[j] == words[j].Length)
{
FindRecord findRecord = new FindRecord {WordIndex = j, PositionInString = i - matchs[j] + 1};
result.AddLast(findRecord);
matchs[j] = 0;
}
}
else
matchs[j] = 0;
}
}
return result.ToArray();
}
Another option:
it might be the rare case where regex will be faster then building the code.
Try using
public static string ReplaceAll(string input, string[] words)
{
string wordlist = string.Join("|", words);
Regex rx = new Regex(wordlist, RegexOptions.Compiled);
return rx.Replace(input, m => "");
}
Regex can do this better, you just need all the replace words in a list, and then:
var escapedStrings = yourReplaces.Select(PadAndEscape);
string result = Regex.Replace(yourInput, string.Join("|", escapedStrings);
This requires a function that space-pads the strings before escaping them:
public string PadAndEscape(string s)
{
return Regex.Escape(" " + s + " ");
}