I have a list<> of an "region" class with two variables, "startLocation" and "endLocation".
I'd like to combine those two into a new sorted 2 dimensional array where its just Location and an integer representing whether its start or an end.
For example, if the list has three region objects with
[Region 1] : startLocation = 5,
endLocation = 7
[Region 2] : startLocation = 3,
endLocation = 5
[Region 3] : startLocation = 8,
endLocation = 9
I'd like to get a sorted two dimensional array (or list or similar) looking like:
[3] [1]
[5] [1]
[5] [-1]
[7] [-1]
[8] [1]
[9] [-1]
(preferably i'd like the overlaps to add their second values together, so the two separate 5's in the array would be combined into [5 0]...but that's not too important)
I'm currently using a regular forloop going through each one by one and adding them to a list one at a time. This implementation is quite slow because I'm working with large datasets, and I'm guessing there's a more elegant / faster way to accomplish this through LINQ.
Any suggestions would be much appreciated.
You'll need to define a helper method which splits a region into 2 parts and it's much easier to represent this using a new struct vs. a 2D array
struct Data {
public int Value;
public bool IsStart;
}
public static IEnumerable<Data> Split(this Region region) {
yield return new Data() { Value = region.StartLocation, IsStart=true};
yield return new Data() { Value = region.EndLocation, IsStart=false};
}
Then you can use the following LINQ query to break them up and sort them.
List<Region> list = GetTheList();
var query = list
.SelectMany(x => x.Split())
.OrderBy(x => x.Data);
This isn't a solution that's suitable for LINQ in anything other than an intellectual exercise. A foreach loop will be just as fast (actually likely faster) than any cobbled-together LINQ implementation.
As a side note, I'm assuming that you're using foreach rather than for. If not, then you could significantly speed up your process by switching to the foreach loop.
foreach(Region r in regionList)
{
// add your entries using r
}
will be much faster than..
for(int i = 0; i < regionList.Count; i++)
{
// add your entires using the indexer
}
Related
I have a sequence of listnode objects
list -> [1] -> [2] -> [3] -> [4] /
and I need to convert it into 2 separate lists.
list -> [4] -> [2] /
list2 -> [3} -> [1] /
I'm not even sure where I'd begin with this one. I've been playing around with my_list.AddLast() and my_list.Remove() but am not sure what I'd do in order to create that one list into 2, and then move the numbers around as indicated.
Here's an answer based on my comment above. I take your input, convert it to an array, and then iterate over the array backwards (from the end to the beginning). I put the odd numbered items in the array into one list and the even-numbered ones in another. Then I return the two lists as a tuple:
private (List<T>, List<T>) SplitList<T>(IEnumerable<T> input)
{
var asArray = input.ToArray();
var evens = new List<T>();
var odds = new List<T>();
for (var i = asArray.Length - 1; i >= 0; --i)
{
if (i % 2 == 0) //if even
{
evens.Add(asArray[i]);
}
else
{
odds.Add(asArray[i]);
}
}
return (evens, odds);
}
Once that's done, it's easily callable. This will work with your 4-valued example, but it will also work with any number of things of any type. I'm using a range of integers to make my point clear, but it should work with just about anything.
var oneToFour = Enumerable.Range(1, 4);
var result = SplitList(oneToFour);
var oneToFifteen = Enumerable.Range(1, 15);
var other = SplitList(oneToFifteen);
Some people may say "Why not just index over the list backwards". Yes, List<T> is implemented internally with an array and is indexable. But, in my mind, a List is O(N) for indexing (rather than O(1) like an array); that it's indexable is simply an implementation detail.
I am doing an image comparer for learning purposes.
I have done almost everything already and I am now improving it. To check for similarity, I have 2 jagged-multidimensional arrays (byte[][,]) where I access each element of each array using a triple for loop and store their remainder, like this:
for (int dimension = 0; dimension < 8; dimension++)
{
Parallel.For(0, 16, mycolumn =>
{
Parallel.For(0, 16, myrow =>
{
Diffs[dimension][mycolumn, myrow] =
(byte)Math.Abs(Image1Bytes[dimension][mycolumn, myrow]
- Image2Bytes[dimension][mycolumn, myrow]);
});
});
}
Now, I would like to check how much each dimension is equal to another in the other collection.
How could I compare the entire arrays in each array (like array1[i][,] == array2[j][,])?
I think there are better ways to do these operations, but I have managed to do them pretty quickly.
Here is an older thread on comparing two images that would be simple for you to adapt to your needs.
Compare Bitmaps
Since Array supports the IStructuralEquatable interface, you can use structural comparison:
using System.Collections;
. . .
var areEqual = StructuralComparisons.StructuralEqualityComparer.Equals(array1[i], array2[j]);
IMPORTANT NOTE
To the people who flagged this as a duplicate, please understand we do NOT want a LINQ-based solution. Our real-world example has several original lists in the tens-of-thousands range and LINQ-based solutions are not performant enough for our needs since they have to walk the lists several times to perform their function, expanding with each new source list.
That is why we are specifically looking for a non-LINQ algorithm, such as the one suggested in this answer below where they walk all lists simultaneously, and only once, via enumerators. That seems to be the best so far, but I am wondering if there are others.
Now back to the question...
For the sake of explaining our issue, consider this hypothetical problem:
I have multiple lists, but to keep this example simple, let's limit it to two, ListA and ListB, both of which are of type List<int>. Their data is as follows:
List A List B
1 2
2 3
4 4
5 6
6 8
8 9
9 10
...however the real lists can have tens of thousands of rows.
We next have a class called ListPairing that's simply defined as follows:
public class ListPairing
{
public int? ASide{ get; set; }
public int? BSide{ get; set; }
}
where each 'side' parameter really represents one of the lists. (i.e. if there were four lists, it would also have a CSide and a DSide.)
We are trying to do is construct a List<ListPairing> with the data initialized as follows:
A Side B Side
1 -
2 2
- 3
4 4
5 -
6 6
8 8
9 9
- 10
Again, note there is no row with '7'
As you can see, the results look like a full outer join. However, please see the update below.
Now to get things started, we can simply do this...
var finalList = ListA.Select(valA => new ListPairing(){ ASide = valA} );
Which yields...
A Side B Side
1 -
2 -
4 -
5 -
6 -
8 -
9 -
and now we want to go back-fill the values from List B. This requires checking first if there is an already existing ListPairing with ASide that matches BSide and if so, setting the BSide.
If there is no existing ListPairing with a matching ASide, a new ListPairing is instantiated with only the BSide set (ASide is blank.)
However, I get the feeling that's not the most efficient way to do this considering all of the required 'FindFirst' calls it would take. (These lists can be tens of thousands of items long.)
However, taking a union of those lists once up front yields the following values...
1, 2, 3, 4, 5, 6, 8, 9, 10 (Note there is no #7)
My thinking was to somehow use that ordered union of the values, then 'walking' both lists simultaneously, building up ListPairings as needed. That eliminates repeated calls to FindFirst, but I'm wondering if that's the most efficient way to do this.
Thoughts?
Update
People have suggested this is a duplicate of getting a full outer join using LINQ because the results are the same...
I am not after a LINQ full outer join. I'm after a performant algorithm.
As such, I have updated the question.
The reason I bring this up is the LINQ needed to perform that functionality is much too slow for our needs. In our model, there are actually four lists, and each can be in the tens of thousands of rows. That's why I suggested the 'Union' approach of the IDs at the very end to get the list of unique 'keys' to walk through, but I think the posted answer on doing the same but with the enumerators is an even better approach as you don't need the list of IDs up front. This would yield a single pass through all items in the lists simultaneously which would easily out-perform the LINQ-based approach.
This didn't turn out as neat as I'd hoped, but if both input lists are sorted then you can just walk through them together comparing the head elements of each one: if they're equal then you have a pair, else emit the smallest one on its own and advance that list.
public static IEnumerable<ListPairing> PairUpLists(IEnumerable<int> sortedAList,
IEnumerable<int> sortedBList)
{
// Should wrap these two in using() per Servy's comment with braces around
// the rest of the method.
var aEnum = sortedAList.GetEnumerator();
var bEnum = sortedBList.GetEnumerator();
bool haveA = aEnum.MoveNext();
bool haveB = bEnum.MoveNext();
while (haveA && haveB)
{
// We still have values left on both lists.
int comparison = aEnum.Current.CompareTo(bEnum.Current);
if (comparison < 0)
{
// The heads of the two remaining sequences do not match and A's is
// lower. Generate a partial pair with the head of A and advance the
// enumerator.
yield return new ListPairing() {ASide = aEnum.Current};
haveA = aEnum.MoveNext();
}
else if (comparison == 0)
{
// The heads of the two sequences match. Generate a pair.
yield return new ListPairing() {
ASide = aEnum.Current,
BSide = bEnum.Current
};
// Advance both enumerators
haveA = aEnum.MoveNext();
haveB = bEnum.MoveNext();
}
else
{
// No match and B is the lowest. Generate a partial pair with B.
yield return new ListPairing() {BSide = bEnum.Current};
// and advance the enumerator
haveB = bEnum.MoveNext();
}
}
if (haveA)
{
// We still have elements on list A but list B is exhausted.
do
{
// Generate a partial pair for all remaining A elements.
yield return new ListPairing() { ASide = aEnum.Current };
} while (aEnum.MoveNext());
}
else if (haveB)
{
// List A is exhausted but we still have elements on list B.
do
{
// Generate a partial pair for all remaining B elements.
yield return new ListPairing() { BSide = bEnum.Current };
} while (bEnum.MoveNext());
}
}
var list1 = new List<int?>(){1,2,4,5,6,8,9};
var list2 = new List<int?>(){2,3,4,6,8,9,10};
var left = from i in list1
join k in list2 on i equals k
into temp
from k in temp.DefaultIfEmpty()
select new {a = i, b = (i == k) ? k : (int?)null};
var right = from k in list2
join i in list1 on k equals i
into temp
from i in temp.DefaultIfEmpty()
select new {a = (i == k) ? i : (int?)i , b = k};
var result = left.Union(right);
If you need the ordering to be same as your example, then you will need to provide an index and order by that (then remove duplicates)
var result = left.Select((o,i) => new {o.a, o.b, i}).Union(right.Select((o, i) => new {o.a, o.b, i})).OrderBy( o => o.i);
result.Select( o => new {o.a, o.b}).Distinct();
Trying to find a solution to my ranking problem.
Basically I have two multi-dimensional double[,] arrays. Both containing rankings for certain scenarios, so [rank number, scenario number]. More than one scenario can have the same rank.
I want to generate a third multi-dimensional array, taking the intersections of the previous two multi-dimensional arrays to provide a joint ranking.
Does anyone have an idea how I can do this in C#?
Many thanks for any advice or help you can provide!
Edit:
Thank you for all the responses, sorry I should have included an example.
Here it is:
Array One:
[{0,4},{1,0},{1,2},{2,1},{3,5},{4,3}]
Array Two:
[{0,1},{0,4},{1,0},{1,2},{3,5},{4,3}]
Required Result:
[{0,4},{1,0},{1,2},{1,1},{2,5},{3,3}]
Here's some sample code that makes a bunch of assumptions but might be something like what you are looking for. I've added a few comments as well:
static double[,] Intersect(double[,] a1, double[,] a2)
{
// Assumptions:
// a1 and a2 are two-dimensional arrays of the same size
// An element in the array matches if and only if its value is found in the same location in both arrays
// result will contain not-a-number (NaN) for non-matches
double[,] result = new double[a1.GetLength(0), a1.GetLength(1)];
for (int i = 0; i < a1.GetLength(0); i++)
{
for (int j = 0; j < a1.GetLength(1); j++)
{
if (a1[i, j] == a2[i, j])
{
result[i, j] = a1[i, j];
}
else
{
result[i, j] = double.NaN;
}
}
}
return result;
}
For the most part, finding the intersection of multiple dimensional arrays will involve iterating over the elements in each of the dimensions in the arrays. If the indices of the array are not part of the match criteria (my second assumption in my code is removed), you would have to walk each dimension in each array - which increases the run-time of the algorithm (in this case, from O(n^2) to O(n^4).
If you care enough about run-time, I believe array matching is one of the typical examples of dynamic programming (DP) optimization; which you can read up on at your leisure.
I'm not sure how you wanted your results...you could probably return a flat collection of results that can be indexed by a pair, which would potentially save a lot of space if the expected result set is typically small. I went with a third fixed-sized array because it was the easiest thing to do.
Lastly, I'll mention that I don't see a keen C# way of doing this using IEnumerable, LINQ, or something like that. Someone more C# knowledgeable than I can chime in anytime now....
Given the additional information, I'd argue that you aren't actually working with multidimensional arrays, but instead are working with a collection of pairs. The pair is a pair of doubles. I think the following should work nicely:
public class Pair : IEquatable<Pair>
{
public double Rank;
public double Scenario;
public bool Equals(Pair p)
{
return Rank == p.Rank && Scenario == p.Scenario;
}
public override int GetHashCode()
{
int hashRank= Rank.GetHashCode();
int hashScenario = Scenario.GetHashCode();
return hashRank ^ hashScenario;
}
}
You can then use the Intersect operator on IEnumerable:
List<Pair> one = new List<Pair>();
List<Pair> two = new List<Pair>();
// ... populate the lists
List<Pair> result = one.Intersect(two).ToList();
Check out the following msdn article on Enumerable.Intersect() for more information:
http://msdn.microsoft.com/en-us/library/bb910215%28v=vs.90%29.aspx
Please, now that I've re-written the question, and before it suffers from further fast-gun answers or premature closure by eager editors let me point out that this is not a duplicate of this question. I know how to remove duplicates from an array.
This question is about removing sequences from an array, not duplicates in the strict sense.
Consider this sequence of elements in an array;
[0] a
[1] a
[2] b
[3] c
[4] c
[5] a
[6] c
[7] d
[8] c
[9] d
In this example I want to obtain the following...
[0] a
[1] b
[2] c
[3] a
[4] c
[5] d
Notice that duplicate elements are retained but that sequences of the same element have been reduced to a single instance of that element.
Further, notice that when two lines repeat they should be reduced to one set (of two lines).
[0] c
[1] d
[2] c
[3] d
...reduces to...
[0] c
[1] d
I'm coding in C# but algorithms in any language appreciated.
EDIT: made some changes and new suggestions
What about a sliding window...
REMOVE LENGTH 2: (no other length has other matches)
//the lower case letters are the matches
ABCBAbabaBBCbcbcbVbvBCbcbcAB
__ABCBABABABBCBCBCBVBVBCBCBCAB
REMOVE LENGTH 1 (duplicate characters):
//* denote that a string was removed to prevent continual contraction
//of the string, unless this is what you want.
ABCBA*BbC*V*BC*AB
_ABCBA*BBC*V*BC*AB
RESULT:
ABCBA*B*C*V*BC*AB == ABCBABCVBCAB
This is of course starting with length=2, increase it to L/2 and iterate down.
I'm also thinking of two other approaches:
digraph - Set a stateful digraph with the data and iterate over it with the string, if a cycle is found you'll have a duplication. I'm not sure how easy it is check check for these cycles... possibly some dynamic programming, so it could be equivlent to method 2 below. I'm going to have to think about this one as well longer.
distance matrix - using a levenstein distance matrix you might be able to detect duplication from diagonal movement (off the diagonal) with cost 0. This could indicate duplication of data. I will have to think about this more.
Here's C# app i wrote that solves this problem.
takes
aabccacdcd
outputs
abcacd
Probably looks pretty messy, took me a bit to get my head around the dynamic pattern length bit.
class Program
{
private static List<string> values;
private const int MAX_PATTERN_LENGTH = 4;
static void Main(string[] args)
{
values = new List<string>();
values.AddRange(new string[] { "a", "b", "c", "c", "a", "c", "d", "c", "d" });
for (int i = MAX_PATTERN_LENGTH; i > 0; i--)
{
RemoveDuplicatesOfLength(i);
}
foreach (string s in values)
{
Console.WriteLine(s);
}
}
private static void RemoveDuplicatesOfLength(int dupeLength)
{
for (int i = 0; i < values.Count; i++)
{
if (i + dupeLength > values.Count)
break;
if (i + dupeLength + dupeLength > values.Count)
break;
var patternA = values.GetRange(i, dupeLength);
var patternB = values.GetRange(i + dupeLength, dupeLength);
bool isPattern = ComparePatterns(patternA, patternB);
if (isPattern)
{
values.RemoveRange(i, dupeLength);
}
}
}
private static bool ComparePatterns(List<string> pattern, List<string> candidate)
{
for (int i = 0; i < pattern.Count; i++)
{
if (pattern[i] != candidate[i])
return false;
}
return true;
}
}
fixed the initial values to match the questions values
I would dump them all into your favorite Set implementation.
EDIT: Now that I understand the question, your original solution looks like the best way to do this. Just loop through the array once, keeping an array of flags to mark which elements to keep, plus a counter to keep track to the size of the new array. Then loop through again to copy all the keepers to a new array.
I agree that if you can just dump the strings into a Set, then that might be the easiest solution.
If you don't have access to a Set implementation for some reason, I would just sort the strings alphabetically and then go through once and remove the duplicates. How to sort them and remove duplicates from the list will depend on what language and environment you are running your code.
EDIT: Oh, ick.... I see based on your clarification that you expect that patterns might occur even over separate lines. My approach won't solve your problem. Sorry. Here is a question for you. If I had the following file.
a
a
b
c
c
a
a
b
c
c
Would you expect it to simplify to
a
b
c