I have a text file containing many lines which look like this:
Flowers{Tulip|Sun Flower|Rose}
Gender{Female|Male}
Pets{Cat|Dog|Rabbit}
I know how to read lines from a file, but what's the best way to split and store the categories and their subitems in a dictionary afterwards? Let's say from a string array which contains all the above lines?
The idea to use a regexp is right, but I prefer using named captures for readability
var regexp = new Regex(#"(?<category>\w+?)\{(?<entities>.*?)\}");
var d = new Dictionary<string, List<string>>();
// you would replace this list with the lines read from the file
var list = new string[] {"Flowers{Tulip|Sun Flower|Rose}"
, " Gender{Female|Male}"
, "Pets{Cat|Dog|Rabbit}"};
foreach (var entry in list)
{
var mc = regexp.Matches(entry);
foreach (Match m in mc)
{
d.Add(m.Groups["category"].Value
, m.Groups["entities"].Value.Split('|').ToList());
}
}
You get a dictionary with the category as a key, and the values in a list of strings
you can use the Key and value on this code
string T = #"Flowers{Tulip|Sun Flower|Rose}
Gender{Female|Male}
Pets{Cat|Dog|Rabbit}";
foreach (var line in T.Split('\n'))//or while(!file.EndOfFile)
{
var S = line.Split(new char[] { '{', '|','}' }, StringSplitOptions.RemoveEmptyEntries);
string Key = S[0];
MessageBox.Show(Key);//sth like this
for (int i = 1 ; i < S.Length; i++)
{
string value = S[i];
MessageBox.Show(value);//sth like this
}
}
you can use this:
string line = reader.ReadLine();
Regex r = new Regex(#"(\w+){(\w+)}");
now loop the results of this regex:
foreach(Match m in r.Matches(line)) {
yourDict.Add(m.Groups[1], m.Groups[2].Split(' '));
}
Related
I have a list of strings (word--number) ex (burger 5$). I need to extract only numbers from every string in list and make new int list.
There are several ways to do that, Regex and Linq for example.
For short string you can use Linq, for example:
public static void Main()
{
var myStringValue = "burger 5$";
var numbersArray = myStringValue.ToArray().Where(x => char.IsDigit(x));
foreach (var number in numbersArray)
{
Console.WriteLine(numbersArray);
}
}
If you take a look at the Regex.Split, numbers article.
You'll find the answer in there. Modified code might look like
var source = new List<string> {
"burger 5$",
"pizza 6$",
"roll 1$ and salami 2$"
};
var result = new List<int>();
foreach (var input in source)
{
var numbers = Regex.Split(input, #"\D+");
foreach (string number in numbers)
{
if (Int32.TryParse(number, out int value))
{
result.Add(value);
}
}
}
Hope it helps.
Petr
Using linq and Regex:
List<string> list = new List<string>(){"burger 5$","ab12c","12sc34","sd3d5"};
Regex nonDigits = new Regex(#"[^\d]");
List<string> list2 = list.Select(l => nonDigits.Replace(l, "")).ToList();
You can take a look and also solve the problem with this code:
List<string> word_number = new List<string>();
List<int> number = new List<int>();
word_number.Add("burger 5$");
word_number.Add("hamburger 6$");
word_number.Add("burger 12$");
foreach (var item in word_number)
{
string[] parts = item.Split(' ');
string[] string_number = parts[1].Split('$');
number.Add(Convert.ToInt16(string_number[0]));
Console.WriteLine(string_number[0]);
}
I want to ignore the punctuation.So, I'm trying to make a program that counts all the appearences of every word in my text but without taking in consideration the punctuation marks.
So my program is:
static void Main(string[] args)
{
string text = "This my world. World, world,THIS WORLD ! Is this - the world .";
IDictionary<string, int> wordsCount =
new SortedDictionary<string, int>();
text=text.ToLower();
text = text.replaceAll("[^0-9a-zA-Z\text]", "X");
string[] words = text.Split(' ',',','-','!','.');
foreach (string word in words)
{
int count = 1;
if (wordsCount.ContainsKey(word))
count = wordsCount[word] + 1;
wordsCount[word] = count;
}
var items = from pair in wordsCount
orderby pair.Value ascending
select pair;
foreach (var p in items)
{
Console.WriteLine("{0} -> {1}", p.Key, p.Value);
}
}
The output is:
is->1
my->1
the->1
this->3
world->5
(here is nothing) -> 8
How can I remove the punctuation here?
You should try specifying StringSplitOptions.RemoveEmptyEntries:
string[] words = text.Split(" ,-!.".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
Note that instead of manually creating a char[] with all the punctuation characters, you may create a string and call ToCharArray() to get the array of characters.
I find it easier to read and to modify later on.
string[] words = text.Split(new char[]{' ',',','-','!','.'}, StringSplitOPtions.RemoveEmptyItems);
It is simple - first step is to remove undesired punctuation with function Replace and then continue with splitting as you have it.
... you can go with the making people cry version ...
"This my world. World, world,THIS WORLD ! Is this - the world ."
.ToLower()
.Split(" ,-!.".ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
.GroupBy(i => i)
.Select(i=>new{Word=i.Key, Count = i.Count()})
.OrderBy(k => k.Count)
.ToList()
.ForEach(Console.WriteLine);
.. output
{ Word = my, Count = 1 }
{ Word = is, Count = 1 }
{ Word = the, Count = 1 }
{ Word = this, Count = 3 }
{ Word = world, Count = 5 }
Here is the code and it is working fine for a single input string
string[] stop_word = new string[]
{
"please",
"try",
"something",
"asking",
"-",
"(", ")",
"/",
".",
"was",
"the"
};
string str = "Please try something (by) yourself. -befor/e asking";
foreach (string word in stop_word)
{
str = str.ToLower().Replace(word, "").Trim();
}
and the output is by yourself before
and now I want to have
string str[] = new string[]
{
"Please try something-by yourself. before (CAD) asking/",
"cover, was adopted. The accuracy (of) the- change map was"
};
and also may be the number of strings is greater than 2 then how to alter this above code to display the str array or store in a text file or database.
Please help with acknowledgements. Thanks
The code for single string need to be put inside a loop for string array
List<string> result = new List<string>();
for(int i =0; i<str.Length; i++)
{
foreach (string word in stop_word)
{
str[i] = str[i].ToLower().Replace(word, "").Trim();
str[i] = Regex.Replace(str[i], #"\s+", " ");
}
result.Add(str[i]);
}
foreach(string r in result)
{
//this is to printout the result
Console.WriteLine(r);
}
You can try it here: https://dotnetfiddle.net/wg83gM
EDIT:
Use regex to replace multiple spaces with one single space
Here is an easy to understand way to do it:
List<string> list = new List<string>();
foreach (string text in str)//loops through your str array
{
string newText =text;
foreach (string word in stop_word) //loops through your word array
{
newText = newText.ToLower().Replace(word, "").Trim();
}
list.Add(newText); //store the results in a list
}
Here is a working Demo
Does this work as you expect?
var results =
str
.Select(x => stop_word.Aggregate(x, (a, y) => a.ToLower().Replace(y, "").Trim()))
.ToArray();
I used this input:
string[] str = new string[]
{
"Please try something-by yourself. before (CAD) asking/",
"cover, was adopted. The accuracy (of) the- change map was"
};
string[] stop_word = new string[]
{
"please", "try", "something", "asking", "-", "(", ")", "/", ".", "was", "the"
};
I got this output:
by yourself before cad
cover, adopted accuracy of change map
You can use Select() for this.
var results = str.Select(x => {
foreach (string word in stop_word)
{
x = x.ToLower().Replace(word, "").Trim();
}
return x;
}).ToList(); // You can use ToArray() if you wish too.
...
foreach(string result in results)
{
Console.WriteLine(result);
}
Result:
by yourself before cad
cover, adopted accuracy of change map
Related to this question: Using Linq to filter out certain Keys from a Dictionary and return a new dictionary
I got a control for a auto-complete that uses dictionary. Scenario was every word in my RichTextBox (to serve as code-editor) will automatically add in my list of autocomplete. Like if I type the word "asdasdasd" in RichTextBox , the word "asdasdasd" will automatically be added in my auto-complete .
using this code:
private IEnumerable<AutocompleteItem> BuildList()
{
//find all words of the text
var words = new Dictionary<string, string>();
var keysToBeFiltered = new HashSet<string> { "Do", "Not" };
var filter = words.Where(p => !keysToBeFiltered.Contains(p.Key))
.ToDictionary(p => p.Key, p => p.Value);
foreach (Match m in Regex.Matches(rtb_JS.Text, #"\b\w+\b"))
filter[m.Value] = m.Value;
//foreach (Match m in Regex.Matches(rtb_JS.Text, #"^(\w+)([=<>!:]+)(\w+)$"))
//filter[m.Value] = m.Value;
foreach (var word in filter.Keys)
{
yield return new AutocompleteItem(word);
}
}
Now the word "Do" and "Not" are still included to auto-complete using the code above. Also when my form loads, a specific default script appears that must be there all the time. So i can't change it.
Two possible solutions I have do to fix this:
1. don't allow those default words used in default script add in my autocomplete when form loads.(make list of words that prevent from adding into my list)
2. detect the line that has commented "//" or "/*" and prevent words from it to add in my dictionary.
Hope you can help me. Please tell me if I need to revise my question and I'll revise/update it ASAP.
main_Q:
how to prevent adding commented words from richtextbox (line that starts with // or /*) into autocomplete
I found your problem in the following line:
foreach (Match m in Regex.Matches(rtb_JS.Text, #"\b\w+\b"))
filter[m.Value] = m.Value;
With "\b\w+\b" regex, you add all words in your RichTextBox control to your filter variable.
So, you must change your code in that line for prevent from adding your unwanted keywords. Please check the following:
private IEnumerable<AutocompleteItem> BuildList()
{
//find all words of the text
bool bolFindMatch = false;
var words = new Dictionary<string, string>();
var keysToBeFiltered = new HashSet<string> { "Do", "Not" };
var filter = words.Where(p => !keysToBeFiltered.Contains(p.Key))
.ToDictionary(p => p.Key, p => p.Value);
foreach (Match m in Regex.Matches(rtb1.Text, #"\b\w+\b"))
{
foreach (string hs in keysToBeFiltered)
{
if (Regex.Matches(m.Value, #"\b" + hs + #"\b").Count > 0)
{
bolFindMatch = true;
break;
}
}
if (!bolFindMatch)
{
filter[m.Value] = m.Value;
}
else
{
bolFindMatch = false;
}
}
//foreach (Match m in Regex.Matches(rtb_JS.Text, #"^(\w+)([=<>!:]+)(\w+)$"))
//filter[m.Value] = m.Value;
foreach (var word in filter.Keys)
{
yield return new AutocompleteItem(word);
}
}
why do not you just check weather the string starts with "//" or "/*" before you do your processing
string notAllowed1 = #"//";
string notAllowed2 = #"/*";
var contains = false;
foreach(string line in rtb_JS.Lines)
{
if (line.StartsWith(notAllowed2) || !line.StartsWith(notAllowed1))
{
contains = true;
break
}
}
//else do nothing
update 2 with Linq
var contains = rtb_JS.Lines.ToList()
.Count( line => line.TrimStart().StartsWith(notAllowed2) ||
line.TrimStart().StartsWith(notAllowed1)) > 0 ;
if(!contains)
{
//do your logic
}
I think this is what you want:
var filter = Regex.Matches(rtb_JS.Text, #"\b\w+\b")
.OfType<Match>()
.Where(m=>!keysToBeFiltered.Any(x=>x == m.Value))
.ToDictionary(m=>m.Value,m=>m.Value);
It's strange that your Dictionary has Keyand Value as the same values in an entry?
i am listbox to store different strings which user gives as input.
but i want to split those listbox items where i want to have the first word of every item as seperate string and rest as other string.
i am iterating the listbox item as
foreach (ListItem item in lstboxColumnList.Items)
{
column_name = temp + "\" "+item+"\"";
temp = column_name + "," + Environment.NewLine;
}
how could i get the splitted string
Assuming firs word ends with a space, you can use something like below:
string firsWord = sentence.SubString(0, sentence.IndexOf(' '));
string remainingSentence = sentence.SubString(sentence.IndexOf(' '), sentence.Length);
I dont know your listbox item's format..
but I assumed that your listbox item have at least 2 word and separate by a space..
so, you can do the splitting using substring and index of..
string first = sentence.SubString(0, sentence.IndexOf(" "));
string second = sentence.SubString(sentence.IndexOf(" ") + 1);
public void Test()
{
List<string> source = new List<string> {
"key1 some data",
"key2 some more data",
"key3 yada..."};
Dictionary<string, string> resultDictionary = source.ToDictionary(n => n.Split(' ').First, n => n.Substring(n.IndexOf(' ')));
List<string> resultStrings = source.Select(n => string.Format("\"{0}\",{1}", n.Split(' ').First, n.Substring(n.IndexOf(' ')))).ToList;
}
resultDictionary is a dictionary with the key set to the first word of each string in the source list.
The second closer matches the requirements in your question that it outputs a list of strings in the format you specified.
EDIT: Apologies, posted in VB first time round.
checkout:
var parts = lstboxColumnList.Items.OfType<ListItem>().Select(i => new {
Part1 = i.Text.Split(' ').FirstOrDefault(),
Part2 = i.Text.Substring(i.Text.IndexOf(' '))
});
foreach (var part in parts)
{
var p1 = part.Part1;
var p2 = part.Part2;
// TODO: use p1, p2 in magic code!!
}