Quickest algorithm for identifying pairs in collection of string - c#

I am looking for the quickest algorithm:
GOAL: output the total number of pair occurrences found on a line. The individual elements may be in any order on any given line.
INPUT:
a;b;c;d
a;e;f;g
a;b;f;h
OUTPUT
a;b = 2
a;c = 1
a;d = 1
a;e = 1
a;f = 2
a;g = 1
b;c = 1
b;d = 1
I am programming in C#, I've got a nested for loop adding do a common dictionary of type where string is like a;b and when an occurrence is found it adds to the existing int tally or adds a new one at tally = 0.
Note this:
a;b = 1
b;a = 1
Should be reduced to this:
a;b = 1
I am open to using other languages, the output is in a plain text file which I feed into Gephi visualization tool.
Bonus: Very interested to know the name of this particular algorithm if it's out there. Pretty sure it is.
String[] data = File.ReadAllLines(#"C:\input.txt");
Dictionary<string, int> ress = new Dictionary<string, int>();
foreach (var line in data)
{
string[] outStrings = line.Split(';');
for (int i = 0; i < outStrings.Count(); i++)
{
for (int y = 0; y < outStrings.Count(); y++)
{
if (outStrings[i] != outStrings[y])
{
try
{
if (ress.Any(x => x.Key == outStrings[i] + ";" + outStrings[y]))
{
ress[outStrings[i] + ";" + outStrings[y]] += 1;
}
else
{
ress.Add(outStrings[i] + ";" + outStrings[y], 0);
}
}
catch (Exception)
{
}
}
}
}
}
foreach (var val in ress)
{
Console.WriteLine(val.Key + "----" + val.Value);
}

I think your inner loop should start with i + 1 instead of starting back at 0 again, and the outer loop should only run until Length - 1, since the last item will be compared on the inner loop. Also, when you add a new item, you should add the value 1, not 0 (since the whole reason we're adding it is because we found one).
You can also just store the key into a string once instead of doing multiple concatenations during your comparison and assignment, and you can use the ContainsKey method to determine if a key exists already.
Also, you might want to consider avoiding empty catch blocks unless you're really certain that you don't care if or what went wrong. If I'm expecting an exception and know how to handle it, then I catch that exception, otherwise I'll just let it bubble up the stack.
Here's one way you could modify your code to find all pairs and their counts:
Update
I added a check to ensure that the "pair" key is always sorted, so that "b;a" becomes "a;b". This wasn't an issue in your sample data, but I extended the data to include lines like b;a;a;b;a;b;a;. Also I added StringSplitOptions.RemoveEmptyEntries to the Split method to handle cases where a line begins or ends with a ; (otherwise the null value resulted in a pair like ";a").
private static void Main()
{
var data = File.ReadAllLines(#"f:\public\temp\temp.txt");
var pairCount = new Dictionary<string, int>();
foreach (var line in data)
{
var lineItems = line.Split(new[] {';'}, StringSplitOptions.RemoveEmptyEntries);
for (var outer = 0; outer < lineItems.Length - 1; outer++)
{
for (var inner = outer + 1; inner < lineItems.Length; inner++)
{
var outerComparedToInner = string.Compare(lineItems[outer],
lineItems[inner], StringComparison.Ordinal);
// If both items are the same character, ignore them and keep looping
if (outerComparedToInner == 0) continue;
// Create the pair such that the lower of the two
// values is first, so that "b;a" becomes "a;b"
var thisPair = outerComparedToInner < 0
? $"{lineItems[outer]};{lineItems[inner]}"
: $"{lineItems[inner]};{lineItems[outer]}";
if (pairCount.ContainsKey(thisPair))
{
pairCount[thisPair]++;
}
else
{
pairCount.Add(thisPair, 1);
}
}
}
}
Console.WriteLine("Pair\tCount\n----\t-----");
foreach (var val in pairCount.OrderBy(i => i.Key))
{
Console.WriteLine($"{val.Key}\t{val.Value}");
}
Console.Write("\nDone!\nPress any key to exit...");
Console.ReadKey();
}
Output
Given a file containing your sample data, the output is:

#mrmcgreg, finally after changing the implementation to the ECLAT algorythm everything runs in seconds instead of hours.
Basically for each unique tag, keep track of the LINE NUMBERS where those tags are found, and simply intersect the pair of list of numbers by combination pairs to get the count.
Dictionary<string, List<int>> uniqueTagList = new Dictionary<string, List<int>>();
foreach (var uniqueTag in uniquetags)
{
List<int> lineNumbers = new List<int>();
foreach (var item in data.Select((value, i) => new { i, value }))
{
var value = item.value;
var index = item.i;
//split data into tags
var tags = item.ToString().Split(new[] { ';' }, StringSplitOptions.RemoveEmptyEntries);
foreach (var tag in tags)
{
if (uniqueTag == tag)
{
lineNumbers.Add(index);
}
}
}
//remove all having support threshold.
if (lineNumbers.Count > 5)
{
uniqueTagList.Add(uniqueTag, lineNumbers);
}
}

Related

Teaching myself C#. Don't know where/what I need to fix (Object reference not set to instance of an object. line 25) [duplicate]

This question already has answers here:
What is a NullReferenceException, and how do I fix it?
(27 answers)
What does "Object reference not set to an instance of an object" mean? [duplicate]
(8 answers)
Closed 2 years ago.
I keep getting the error in title. I am not very experienced in coding and am not great at reading code and understanding it yet.
I also know this will be a simple fix but still I don't know what or where I need to fix.
using System;
using System.Linq;
namespace Day_6
{
class Program
{
static void Main(string[] args)
{
int numStrings = Convert.ToInt32(Console.ReadLine());
var str = "";
string[] words = new string[1000];
var even = new string[500];
var odd = new string[500];
for (int i = 0; i < numStrings; i++)
{
str = Console.ReadLine();
words.Append(str);
}
foreach (var word in words)
{
foreach (var letter in word)
{
if (word.IndexOf(letter)%2 != 0)
{
odd.Append(letter.ToString());
}
else
{
even.Append(letter.ToString());
}
}
Console.WriteLine(odd + " " + even);
}
}
}
}
Any help, even if it is just material to read so I can understand why/what/where I am getting this error would be great.
After all I am trying to learn!
Many thanks
If you have to use Arrays like you are doing, change your assignment for loop to the following,
for (int i = 0; i < numStrings; i++)
{
str = Console.ReadLine();
words[i] = str; // Assign to elements of Array (not Append).
}
and then you want to iterate over words that are not null (initial array is all nulls). You will need to use index for each of the even and odd arrays as well. You cannot use indexOf either to check if your character is even or odd... if you do that, then any duplicate letter will not work. Use an iteration index for that as well.
int evenIndex = 0;
int oddIndex = 0;
int iterationIndex = 0;
foreach (var word in words.Where(x => x != null))
{
foreach (var letter in word)
{
if (iterationIndex++ % 2 != 0)
{
odd[oddIndex++] = (letter.ToString());
}
else
{
even[evenIndex++] = (letter.ToString());
}
}
}
// Print this "outside" of your for loops
Console.WriteLine(string.Concat(odd.Where(x => x != null)) + " " + string.Concat(even.Where(x => x != null)));
You will also need to change your Console.WriteLine statement at the end to print the elements of the arrays instead of printing the type of the string array.
Console.WriteLine(string.Concat(odd.Where(x => x != null)) + " --- " + string.Concat(even.Where(x => x != null)));
Test Inputs and output
2
abababababababababababababab
abababababababababababababab
bbbbbbbbbbbbbbbbbbbbbbbbbbbb aaaaaaaaaaaaaaaaaaaaaaaaaaaa
Alternate Suggestion that doesnt use indexers
foreach (string word in words.Where(x => x != null))
{
for (int i = 0; i < word.Count(); i++)
{
if (i % 2 == 0)
odd[i] = word[i].ToString();
else
even[i] = word[i].ToString();
}
}
The first instance of null-reference error is caused by line 17 where you have words.Append(str);, replace it with words[i] = str;.
Read about the behaviour of IEnumerable.Append() method here, because it does not modify the array words in place; it returns a new IEnumerable, which must then be enumerated to an array.
So the statement words.Append(str); actually does nothing; if you wish to use the Append() method, you must save it's return value and enumerate the result to an array like so:
words = words.Append(str).ToArray()
Also, there's a second null-reference error in the 3rd statement in your Main() method (line 9), where you have
string[] words = new string[1000];
If you replace it with:
string[] words = new string[numStrings];
You will remove the possibility of your foreach loop looping over null elements.

C# Variable not getting all values outside for loop

I have two values in the dictionary but when I try to get the two values outside the loop I am only getting one value. The locationdesc variable value are being overwritten. Is there a better way to create unique variables to handle this issues
There are two keys location-1 and location-2. I am trying to figure out how to get both the values outside the loop. Am I doing it wrong?
string locationDesc = "";
string locationAddress = "";
int count = dictionary.Count(D => D.Key.StartsWith("location-"));
for (int i = 1; i <= count; i++)
{
if (dictionary.ContainsKey("location-"+i))
{
string locationData = dictionary["location-"+i];
string[] locationDataRow = locationData.Split(':');
locationDesc = locationDataRow[0];
locationAddress = locationDataRow[1];
}
}
// Only getting location-2 value outside this loop since locationDesc is not unique.
Debug.WriteLine("Location Desc from dictionary is : " + locationDesc);
Debug.WriteLine("Location Add from dictionary is : " + locationAddress);
What I would like to get here is get both the values like locationDesc1 and locationDesc2 instead of locationDesc
What I am looking for is to create locationDesc and locationAddress unique so I can access both the values outside the for loop.
More Explanation as I was not very clear:
I have a dynamic table that will be created in the front end. Every time a location is created I create a cookie. For e.g. location-1, location-2 ...location-n with the location description and location values as values in the cookie. I am trying to access these values in the backend by creating a dictionary so I can assign all the values to unique variable which will make it easier for me to pass these values to a api call. I think I am over complicating a simple issue and might be doing it wrong.
My api call will be something like this:
<field="" path="" value=locationDesc1>
<field="" path="" value=locationDesc2>
The problem with your loop is that you are relying on the position of the entry in the dictionary matching the index within your loop. Your first line of code pretty much has it though:
int count = dictionary.Count(D => D.Key.StartsWith("location-"));
What this tells me is that you are looking for all entries in your dictionary where the key starts with "location-". So why not do that directly:
var values = dictionary.Where(d => d.Key.StartsWith("location-"));
And to do the extraction/string splitting at the same time:
var values = dictionary
.Where(d => d.Key.StartsWith("location-"))
.Select(d => d.Item.Split(':')
.Select(s => new
{
LocationDesc = s[0],
LocationAddress = s[1]
});
This will give you an IEnumerable of LocationDesc/LocationAddress pairs which you can loop over:
foreach(var pair in values)
{
Debug.WriteLine(pair.LocationDesc);
Debug.WriteLine(pair.LocationAddress);
}
Try this:
int count = dictionary.Count(D => D.Key.StartsWith("location-"));
Dictionary<string, string> values = new Dictionary<string, string>();
for (int i = 1; i <= count; i++)
{
if (dictionary.ContainsKey("location-"+i))
{
string locationData = dictionary["location-"+i];
string[] locationDataRow = locationData.Split(':');
values.Add(locationDataRow[0],locationDataRow[1]);
}
}
foreach (var item in values)
{
Debug.WriteLine(item.Key + " : " + item.Value);
}
As you are dealing with multiple values, you should go with a container where you can store all the values.
if you are dealing with only two unique values then use below code.
int count = dictionary.Count(D => D.Key.StartsWith("location-"));
string[] locationDesc = new string[2];
string[] locationAddress = new string[2];
for (int i = 1; i <= count; i++)
{
if (dictionary.ContainsKey("location-"+i))
{
string locationData = dictionary["location-"+i];
string[] locationDataRow = locationData.Split(':');
locationDesc[i-1] = locationDataRow[0];
locationAddress[i-1] = locationDataRow[1];
}
}
for (int i = 0; i <= locationDesc.Length-1; i++)
{
Debug.WriteLine("Location Desc from dictionary is : " + locationDesc[i]);
Debug.WriteLine("Location Add from dictionary is : " + locationAddress[i]);
}
if number of unique values is not fixed then go with ArrayList
int count = dictionary.Count(D => D.Key.StartsWith("location-"));
ArrayList locationDesc = new ArrayList();
ArrayList locationAddress = new ArrayList();
for (int i = 1; i <= count; i++)
{
if (dictionary.ContainsKey("location-"+i))
{
string locationData = dictionary["location-"+i];
string[] locationDataRow = locationData.Split(':');
locationDesc.Add(locationDataRow[0]);
locationAddress.Add(locationDataRow[1]);
}
}
for (int i = 0; i < locationDesc.Count; i++)
{
Debug.WriteLine("Location Desc from dictionary is : " + locationDesc[i]);
Debug.WriteLine("Location Add from dictionary is : " + locationAddress[i]);
}
Simple One. If you only want to show result using Debug.WriteLine, then go with below code
int count = dictionary.Count(D => D.Key.StartsWith("location-"));
for (int i = 1; i <= count; i++)
{
if (dictionary.ContainsKey("location-"+i))
{
string locationData = dictionary["location-"+i];
string[] locationDataRow = locationData.Split(':');
Debug.WriteLine("Location Desc from dictionary is : " + locationDataRow[0]);
Debug.WriteLine("Location Add from dictionary is : " + locationDataRow[1]);
}
}
Not able to prepare Code in Visual Studio at the moment therefore there may be some syntax errors.
It is hard to judge what you are event trying to do. I would not just be dumping objects you already have into other objects for fun. If you are just trying to expose values in a loop for use with another function, you can just use LINQ to iterate over the dictionary. If you want a specific value just add a where LINQ expression. LINQ should be in any .NET framework after 3.5 I believe.
public static void ApiMock(string s)
{
Console.WriteLine($"I worked on {s}!");
}
static void Main(string[] args)
{
var d = new Dictionary<int, string> {
{ 1, "location-1" },
{ 2, "location-2" },
{ 3, "location-3" }
};
d.ToList().ForEach(x => ApiMock(x.Value));
//I just want the second one
d.Where(x => x.Value.Contains("-2")).ToList().ForEach(x => ApiMock(x.Value));
//Do you want a concatenated string
var holder = string.Empty;
d.ToList().ForEach(x => holder += x.Value + ", ");
holder = holder.Substring(0, holder.Length - 2);
Console.WriteLine(holder);
}

How to use nested for loops and recursion to create a list of combinations of strings

I have a SortedList in Key/Value pair that so far stores 3 entries like this:
Key: "Shapes" and Value: ["Cube", "Sphere"]
Key: "Colors" and Value: ["Red", "Green"]
Key: "Sizes" and Value: ["Big", "Small"]
My goal is generate all the combination of strings and store them into another list like this:
"Shape:Cube/Colors:Red/Size:Big"
"Shape:Cube/Colors:Red/Size:Small"
"Shape:Cube/Colors:Green/Size:Big"
"Shape:Cube/Colors:Green/Size:Small"
"Shape:Sphere/Colors:Red/Size:Big"
"Shape:Sphere/Colors:Red/Size:Small"
"Shape:Sphere/Colors:Green/Size:Big"
"Shape:Sphere/Colors:Green/Size:Small"
The caveat here is that there can be N number of entries in the first SortedList so I can't really create the for-loops in my source code before hand. I know I should use recursion to tackle the trickiness of the dynamic N value.
So far I've only come up with a hard-coded solution for N=2 entries and I'm having trouble translating into a recursion that can handle any value of N entries:
for (int ns=0; ns < listFeaturesSuperblock.Values[0].Count; ns++) {
for (int nc=0; nc < listFeaturesSuperblock.Values[1].Count; nc++) {
//prefab to load
string str = "PreFabs/Objects/" + listFeaturesSuperblock.Keys[0][ns] + ":" + listFeaturesSuperblock.Values[0][ns] + "/" + listFeaturesSuperblock.Values[1][nc] + ":" + listFeaturesSuperblock.Values[1][nc];
}
}
Can somebody kindly point me towards the right direction? How should I approach this and what do I need to study to get better at coding recursion?
Thank you.
In your current method:
List<string> result = new List<string>;
ProcessItems(listFeaturesSuperblock, result);
And this is the recursive method:
void ProcessItems(SortedList<string, List<string>> data, List<string> result, int level = 0, string prefix = "PreFabs/Objects/")
{
for (int i = 0; i < data.Values[level].Count; i++)
{
string item = prefix + data.Keys[level] + ":" + data.Values[level][i] + "/";
if (level == data.Values.Count - 1)
result.Add(item);
else
ProcessItems(data, result, level + 1, item);
}
}
The 'result' variable will then contain all permutations.
To use recursion is quiet a simple way and here's how.
Let's say we have Dictionary just like in your example
public static Dictionary<string, List<string>> props = new Dictionary<string, List<string>>(){
{ "Shapes", new List<string>{"Cube", "Sphere"} },
{ "Colors", new List<string>{"Red", "Green"} },
{ "Sizes", new List<string>{"Big", "Small"} }
};
Now we take all values of first key and go through them appending their values to the source string. So for the first value we will get
/Shapes:Cube
And now we do the same for the next key Colors, resulting
/Shapes:Cube/Colors:Red
We continue it while there are more unprocessed keys. When there are no more keys we got the first result string
/Shapes:Cube/Colors:Red/Sizes:Big
now we need to go back and add another value which result
/Shapes:Cube/Colors:Red/Sizes:Small
And the code for this will be like following
public static List<string> GetObjectPropertiesPermutations(string src, string[] keys, int index) {
if(index >= keys.Length) {
return new List<string>() { src };
}
var list = new List<string>();
var key = keys[index];
foreach(var val in props[key]) {
var other = GetObjectPropertiesPermutations(src + "/" + key + ":" + val, keys, index + 1);
list.AddRange(other);
}
return list;
}
public static void Main(string[] args)
{
var perms = GetObjectPropertiesPermutations("", props.Keys.ToArray(), 0);
foreach(var s in perms) {
Console.WriteLine(s);
}
}

Remove All Indexes in String

I have dictionary of int (Dictionary<int, int>) which has index of all parenthesis in a string (key was openStartParenthesisIndex and value was closeEndParenthesisIndex)
e.g in text
stringX.stringY(())() -> stringX.stringY$($()^)^$()^
$ = openParenthesisStartIndex
^ = closeParenthesisEndIndex
Dictionary items:
key value
(openParenthesisStartIndex) --- (closeParenthesisEndIndex)
item1 15 19
item2 16 18
item3 19 21
My problem was when I loop my dictionary and try to remove it on string, next loop the index was not valid since its already change because I remove it .
string myText = "stringX.stringY(())()";
Dictionary<int, int> myIndexs = new Dictionary<int, int>();
foreach (var x in myIndexs)
{
myText = myText.Remove(item.Key, 1).Remove(item.Value-1);
}
Question: how can i remove all index in a string (from startIndex[key] to endIndex[value])?
To prevent the index from changing, one trick is to remove the occurences starting from the end:
string myText = stringX.stringY(())();
Dictionary<int, int> myIndexs = new Dictionary<int, int>();
var allIndexes = myIndexs.Keys.Concat(myIndexs.Values);
foreach (var index in allIndexes.OrderByDescending(i => i))
{
myText = myText.Remove(index, 1);
}
Note that you probably don't need a dictionary at all. Consider replacing it by a list.
StringBuilder will be more suited to your case as you are continuously changing data. StringBuilder MSDN
Ordering the keys by descending order will work as well for removing all indexes.
Another workaround could be to place an intermediary character at required index and replace all instances of that character in the end.
StringBuilder ab = new StringBuilder("ab(cd)");
ab.Remove(2, 1);
ab.Insert(2, "`");
ab.Remove(5, 1);
ab.Insert(5, "`");
ab.Replace("`", "");
System.Console.Write(ab);
Strings when you make a change to a string a new string is always created, so what you want is to create a new string without the removed parts. This code is a little bit complicated because of how it deals with the potential overlap. Maybe the better way would be to cleanup the indexes, making a list of indexes that represent the same removals in the right order without overlap.
public static string removeAtIndexes(string source)
{
var indexes = new Tuple<int, int>[]
{
new Tuple<int, int>(15, 19),
new Tuple<int, int>(16, 18),
new Tuple<int, int>(19, 21)
};
var sb = new StringBuilder();
var last = 0;
bool copying = true;
for (var i = 0; i < source.Length; i++)
{
var end = false;
foreach (var index in indexes)
{
if (copying)
{
if (index.Item1 <= i)
{
copying = false;
break;
}
}
else
{
if (index.Item2 < i)
{
end = true;
}
}
}
if (false == copying && end)
{
copying = true;
}
if(copying)
{
sb.Append(source[i]);
}
}
return sb.ToString();
}

How do I make the foreach instruction iterate in 2 places?

how do I make the foreach instruction iterate both in the "files" variable and in the "names" array?
var files = Directory.GetFiles(#".\GalleryImages");
string[] names = new string[8] { "Matt", "Joanne", "Robert","Andrei","Mihai","Radu","Ionica","Vasile"};
I've tried 2 options.. the first one gives me lots of errors and the second one displays 8 images of each kind
foreach(var file in files,var i in names)
{
//Do stuff
}
and
foreach(var file in files)
{
foreach (var i in names)
{
//Do stuff
}
}
You can try using the Zip Extension method of LINQ:
int[] numbers = { 1, 2, 3, 4 };
string[] words = { "one", "two", "three" };
var numbersAndWords = numbers.Zip(words, (first, second) => first + " " + second);
foreach (var item in numbersAndWords)
Console.WriteLine(item);
Would look something like this:
var files = Directory.GetFiles(#".\GalleryImages");
string[] names = new string[] { "Matt", "Joanne", "Robert", "Andrei", "Mihai","Radu","Ionica","Vasile"};
var zipped = files.Zip(names, (f, n) => new { File = f, Name = n });
foreach(var fn in zipped)
Console.WriteLine(fn.File + " " + fn.Name);
But I haven't tested this one.
It's not clear what you're asking. But, you can't iterate two iterators with foreach; but you can increment another variable in the foreach body:
int i = 0;
foreach(var file in files)
{
var name = names[i++];
// TODO: do something with name and file
}
This, of course, assumes that files and names are of the same length.
You can't. Use a for loop instead.
for(int i = 0; i < files.Length; i++)
{
var file = files[i];
var name = names[i];
}
If the both array have the same length this should work.
You have two options here; the first works if you are iterating over something that has an indexer, like an array or List, in which case use a simple for loop and access things by index:
for (int i = 0; i < files.Length && i < names.Length; i++)
{
var file = files[i];
var name = names[i];
// Do stuff with names.
}
If you have a collection that doesn't have an indexer, e.g. you just have an IEnumerable and you don't know what it is, you can use the IEnumerable interface directly. Behind the scenes, that's all foreach is doing, it just hides the slightly messier syntax. That would look like:
var filesEnum = files.GetEnumerator();
var namesEnum = names.GetEnumerator();
while (filesEnum.MoveNext() && namesEnum.MoveNext())
{
var file = filesEnum.Current;
var name = namesEnum.Current;
// Do stuff with files and names.
}
Both of these assume that both collections have the same number of items. The for loop will only iterate as many times as the smaller one, and the smaller enumerator will return false from MoveNext when it runs out of items. If one collection is bigger than the other, the 'extra' items won't get processed, and you'll need to figure out what to do with them.
I guess the files array and the names array have the same indices.
When this is the case AND you always want the same index at one time you do this:
for (int key = 0; key < files.Length; ++key)
{
// access names[key] and files[key] here
}
You can try something like this:
var pairs = files.Zip(names, (f,n) => new {File=f, Name=n});
foreach (var item in pairs)
{
Console.Write(item.File);
Console.Write(item.Name);
}

Categories

Resources