I am trying to 'zip' an arbitrary number of streams in Rx, where elements correspond but may be processed out of order. Each stream's elements have an identifier that can be used to match them together. E.g. elements look like:
public class Element
{
public string Key {get; set;}
}
Normally, zip will just combine elements by their index of occurrence:
|-A-----------A
|--B---------B-
|-----C------C-
|-----ABC-----ABC <- zip
But what if we want to only match elements that share the same Key? I'm looking for a sequence that works more like this:
(In this example, the key is 1 or 2)
|--2A-------1A----------
|----1B----------2B-----
|------1C-----------2C--
|-----------1ABC----2ABC <- zipped by key 1 & 2 respectively
I feel that GroupJoin suits this scenario, but it only serves two Observables and chaining them got out of hand pretty quickly.
I also looked at And/Then/When but didn't really understand how to structure it for this scenario.
Ideally, I'd want an extension method I can call and provide a result selector for, where the inputs of the result selector are guaranteed to have the same Key.
How would you approach this problem?
Here is something I bumped together in LinqPad. It meets your requirements of your marble diagram. It is however messier than I would like.
Nuget dependencies on Rx-Testing
void Main()
{
TestScheduler scheduler = new TestScheduler();
/*
|--2A-------1A----------
|----1B----------2B-----
|------1C-----------2C--
|-----------1ABC----2ABC <- zipped by key 1 & 2 respectively
*/
var sourceA = scheduler.CreateColdObservable(
ReactiveTest.OnNext(3, "2A"),
ReactiveTest.OnNext(12, "1A"));
var sourceB = scheduler.CreateColdObservable(
ReactiveTest.OnNext(5, "1B"),
ReactiveTest.OnNext(17, "2B"));
var sourceC= scheduler.CreateColdObservable(
ReactiveTest.OnNext(7, "1C"),
ReactiveTest.OnNext(20, "2C"));
var observer = scheduler.CreateObserver<string>();
var query = Observable.Merge(sourceA, sourceB, sourceC)
.GroupBy(x => GetKey(x))
.SelectMany(grp => grp.Select(x=>GetValue(x))
.Take(3)
.Aggregate(new List<string>(),
(accumulator, current) => {
accumulator.Add(current);
return accumulator;
})
.Select(acc=>CreateGroupResult(grp.Key, acc)));
query.Subscribe(observer);
scheduler.Start();
ReactiveAssert.AreElementsEqual(
new[]{
ReactiveTest.OnNext(12, "1ABC"),
ReactiveTest.OnNext(20, "2ABC")
},
observer.Messages
);
}
// Define other methods and classes here
private static string CreateGroupResult(string key, IEnumerable<string> values)
{
var combinedOrderedValues = string.Join(string.Empty, values.OrderBy(v => v));
return string.Format("{0}{1}", key, combinedOrderedValues);
}
private static string GetKey(string message)
{
return message.Substring(0, 1);
}
private static string GetValue(string message)
{
return message.Substring(1);
}
Related
I have a collection with around 80 million documents.
Via an API the user can validate if a set of ids (input-set) are still present in the database.
The input set can be rather large, but I would split the validation into chunks of approx. 10000 ids.
Basically I would like to get the intersection between the database ids and the input-set.
I'd like to do this with Linq but other suggestions are welcome.
Below is some sample code that shows my scenario and what I have tried so far.
The first approach is what I would do but it throws an NotSupportedException : The method Intersect is not supported in the expression tree:
The second approach works but is really slow on large sets.
The third approach is faster than the second but then I have to load 80 million ids in memory.
We have tried to stick with the linq interface provided by the C# wrapper but it is a struggle sometimes. Any pointers are appreciated. I guess there's a way forward using different builders and pipeline defintion with $setIntersection but I can not get my head around the c# documentation on that.
private string[] FilterOnExistInDatabase1(string[] candidates)
{
// Query<T> is just a wrapper to the collection and returns a IQueryable<T>
return mongoRepository.Query<TestModel>().Select(x => x.Id).Intersect(candidates).ToArray();
}
private string[] FilterOnExistInDatabase2(string[] candidates)
{
// Query<T> is just a wrapper to the collection and returns a IQueryable<T>
return mongoRepository.Query<TestModel>().Select(x => x.Id).Where(x => candidates.Contains(x)).ToArray();
}
private string[] FilterOnExistInDatabase3(string[] candidates)
{
// Query<T> is just a wrapper to the collection and returns a IQueryable<T>
var allExistingIds = mongoRepository.Query<TestModel>().Select(x => x.Id).ToArray();
var existingCandidates = allExistingIds.Intersect(candidates).ToArray();
return existingCandidates;
}
[Test]
public void SampleQuery()
{
var models = Enumerable.Range(0, 10).Select(x => new TestModel()).ToArray();
mongoRepository.InsertMany(models, CancellationToken.None);
var deletedId = "I no longer exist";
var candidates = models.Select(x => x.Id).Concat(new []{deletedId}).ToArray();
var existingCandidates = FilterOnExistInDatabase3(candidates);
Assert.That(existingCandidates.Length, Is.EqualTo(models.Length));
Assert.False(existingCandidates.Contains(deletedId));
Assert.That(existingCandidates.Length, Is.EqualTo(candidates.Length - 1));
}
how about the following approach? basically you get back an array of IDs that exist in the db and produce the intersection on the client-side using linq to get the invalid/deleted IDs.
using MongoDB.Bson;
using MongoDB.Entities;
using MongoDB.Entities.Core;
using System;
using System.Linq;
namespace StackOverflow
{
public class Item : Entity
{
public string Name { get; set; }
}
public class Program
{
private static void Main(string[] args)
{
new DB("test", "localhost");
var one = new Item { Name = "one" }; one.Save();
var two = new Item { Name = "two" }; two.Save();
var thr = new Item { Name = "three" }; thr.Save();
var inputIDs = new[] { one.ID, two.ID, ObjectId.GenerateNewId().ToString() };
var validIDs = DB.Queryable<Item>() // for official driver use: collection.AsQueryable()
.Where(i => inputIDs.Contains(i.ID))
.Select(i => i.ID)
.ToArray();
var deletedIDs = inputIDs.Except(validIDs).ToArray();
}
}
}
this should theoritically be faster than your second method above because it doesn't cause a projection of every ID in the collection. if this approach is acceptable for you, i'd be interested in knowing how many milliseconds it takes mongodb to finish this task for 80 million docs.
I'm interested in the following overloads:
public static IObservable<IList<TSource>> CombineLatest<TSource>(this params IObservable<TSource>[] sources);
public static IObservable<IList<TSource>> CombineLatest<TSource>(this IEnumerable<IObservable<TSource>> sources);
Is the order of the elements in the resulting list guaranteed to be the same as the order in the input?
For example, in the following code will list[0] always contain an element from a and list[1] an element from b?
IObservable<int> a = ...;
IObservable<int> b = ...;
var list = await Observable.CombineLatest(a, b).FirstAsync();
The documentations states:
a list with the latest source elements
and:
observable sequence containing lists of the latest elements of the sources
but does not really mention anything about order.
The order is conserved.
When you look in the source code of RX, it all boils down to the System.Reactive.Linq.CombineLatest<TSource, TResult> class.
You can find there that an indexed observer is created for each input observable (where the index is the order in the input):
for (int i = 0; i < N; i++)
{
var j = i;
var d = new SingleAssignmentDisposable();
_subscriptions[j] = d;
var o = new O(this, j);
d.Disposable = srcs[j].SubscribeSafe(o);
}
And the resulting element is produced as follows:
private void OnNext(int index, TSource value)
{
lock (_gate)
{
_values[index] = value;
_hasValue[index] = true;
if (_hasValueAll || (_hasValueAll = _hasValue.All(Stubs<bool>.I)))
{
/* snip */
res = _parent._resultSelector(new ReadOnlyCollection<TSource>(_values));
/* snip */
_observer.OnNext(res);
}
/* snip */
}
}
The _resultSelector for the overloads I'm interested in is just a Enumerable.ToList(). So the order in the output list will be the same as the order in the input.
CombineLatest first fires when there was an element pushed to all the streams.
After that it fires whenever a new element is pushed to any of the streams.
So if you "combine" two streams, the resulting list always has two elements in it, and as far as I know it's guaranteed that the order is the same as the order you gave the streams in the CombineLatest parameters.
You can visualize Rx operators using marble diagrams. HERE is the one for CombineLatest.
I have a list of part numbers:
var parts = new List<string> {"part1", "part2", "part3"};
I also have a dictionary of quantities for these part numbers:
var quantities = new Dictionary<string, int> {{"part1", 45}, {"part3", 25}};
Given a delimiter of |, I need to arrange these values in a flat file like so:
SalesRep|part1|part2|part3
Mr. Foo|45||25
What I'd like to do is define a string that no matter what values are in parts and quantities, I can tack this on to the sales rep name to resemble the example above.
It seems like I should be able to do this with a string.Join() on an enumerable LINQ operation, but I can't figure out what statement will get me the IEnumerable<string> result from joining parts and quantities. It thought that would be a .Join(), but the signature doesn't seem right. Can someone enlighten me?
Something like this perhaps?
var partValues = parts.Select(x => quantities.ContainsKey(x) ? quantities[x] : 0);
Basically for each item in the parts list you either pick the value from your dictionary, or if it doesn't exist 0.
To make this a little more interesting you could define a generic extension method on IDictionary<T,U> that makes this a little more readable:
public static class DictionaryExtensions
{
public static U GetValueOrDefault<T,U>(this IDictionary<T, U> dict, T key)
{
if(dict.ContainsKey(key))
{
return dict[key];
}
return default(U);
}
}
Then you can simply write:
var partValues = parts.Select(quantities.GetValueOrDefault);
var parts = new List<string> { "part1", "part2", "part3" };
var quantities = new Dictionary<string, int> { { "part1", 45 }, { "part3", 25 } };
var result = string.Join("|",
from p in parts select quantities.ContainsKey(p)
? quantities[p].ToString() : "");
I'm working with the Reactive framework for Silverlight and would like to achieve the following.
I am try to create a typical data provider for a Silverlight client that also takes advantage of the caching framework available in MS Ent Lib. The scenarios requires that I must check in the cache for the key-value pair before hitting the WCF data client.
By using the Rx extension Amb, I am able to pull the data from the cache or WCF data client, whichever returns first, but how can I stop the WCF client from executing the call if the values is in the cache?
I would also like to consider racing conditions, e.g. if the first subscriber requests some data and the provider is fetching data from the WCF data client (async), how do I prevent subsequent async requests from doing the same thing (at this stage, the cache has yet to be populated).
I had exactly the same problem. I solved it with an extension method with the following signature:
IObservable<R> FromCacheOrFetch<T, R>(
this IObservable<T> source,
Func<T, R> cache,
Func<IObservable<T>, IObservable<R>> fetch,
IScheduler scheduler) where R : class
Effectively what this did was take in the source observable and return an observable that would match each input value with its output value.
To get each output value it would check the cache first. If the value exists in the cache it used that. If not it would spin up the fetch function only on values that weren't in the cache. If all of the values were in the cache then the fetch function would never be spun up - so no service connection set up penalty, etc.
I'll give you the code, but it's based on a slightly different version of the extension method that uses a Maybe<T> monad - so you might find you need to fiddle with the implementation.
Here it is:
public static IObservable<R> FromCacheOrFetch<T, R>(this IObservable<T> source, Func<T, R> cache, Func<IObservable<T>, IObservable<R>> fetch, IScheduler scheduler)
where R : class
{
return source.FromCacheOrFetch<T, R>(t => cache(t).ToMaybe(null), fetch, scheduler);
}
public static IObservable<R> FromCacheOrFetch<T, R>(this IObservable<T> source, Func<T, Maybe<R>> cache, Func<IObservable<T>, IObservable<R>> fetch, IScheduler scheduler)
{
var results = new Subject<R>();
var disposables = new CompositeDisposable();
var loop = new EventLoopScheduler();
disposables.Add(loop);
var sourceDone = false;
var pairsDone = true;
var exception = (Exception)null;
var fetchIn = new Subject<T>();
var fetchOut = (IObservable<R>)null;
var pairs = (IObservable<KeyValuePair<int, R>>)null;
var lookup = new Dictionary<T, int>();
var list = new List<Maybe<R>>();
var cursor = 0;
Action checkCleanup = () =>
{
if (sourceDone && pairsDone)
{
if (exception == null)
{
results.OnCompleted();
}
else
{
results.OnError(exception);
}
loop.Schedule(() => disposables.Dispose());
}
};
Action dequeue = () =>
{
while (cursor != list.Count)
{
var mr = list[cursor];
if (mr.HasValue)
{
results.OnNext(mr.Value);
cursor++;
}
else
{
break;
}
}
};
Action<KeyValuePair<int, R>> nextPairs = kvp =>
{
list[kvp.Key] = Maybe<R>.Something(kvp.Value);
dequeue();
};
Action<Exception> errorPairs = ex =>
{
fetchIn.OnCompleted();
pairsDone = true;
exception = ex;
checkCleanup();
};
Action completedPairs = () =>
{
pairsDone = true;
checkCleanup();
};
Action<T> sourceNext = t =>
{
var mr = cache(t);
list.Add(mr);
if (mr.IsNothing)
{
lookup[t] = list.Count - 1;
if (fetchOut == null)
{
pairsDone = false;
fetchOut = fetch(fetchIn.ObserveOn(Scheduler.ThreadPool));
pairs = fetchIn.Select(x => lookup[x]).Zip(fetchOut, (i, r2) => new KeyValuePair<int, R>(i, r2));
disposables.Add(pairs.ObserveOn(loop).Subscribe(nextPairs, errorPairs, completedPairs));
}
fetchIn.OnNext(t);
}
else
{
dequeue();
}
};
Action<Exception> errorSource = ex =>
{
sourceDone = true;
exception = ex;
fetchIn.OnCompleted();
checkCleanup();
};
Action completedSource = () =>
{
sourceDone = true;
fetchIn.OnCompleted();
checkCleanup();
};
disposables.Add(source.ObserveOn(loop).Subscribe(sourceNext, errorSource, completedSource));
return results.ObserveOn(scheduler);
}
Example usage would look like this:
You would have a source of the indices that you want to fetch:
IObservable<X> source = ...
You would have a function that can get values from the cache and an action that can put them in (and both should be thread-safe):
Func<X, Y> getFromCache = x => ...;
Action<X, Y> addToCache = (x, y) => ...;
Then you would have the actual call to go get the data from your database or service:
Func<X, Y> getFromService = x => ...;
Then you could define fetch like so:
Func<IObservable<X>, IObservable<Y>> fetch =
xs => xs.Select(x =>
{
var y = getFromService(x);
addToCache(x, y);
return y;
});
And finally you can make your query by calling the following:
IObservable<Y> results =
source.FromCacheOrFetch(
getFromCache,
fetch,
Scheduler.ThreadPool);
Of course you would need to subscribe to the result to make the computation take place.
Clearly Amb is not the right way to go, since that will hit both the cache and the service every time. What does EntLib return you if the cache is a miss?
Note that Observable.Timeout is a reasonable alternative:
cache(<paramters>).Timeout(TimeSpan.FromSeconds(1), service<paramters>);
But clearly it's not a great idea to timeout if you want instead process the return from EntLib and act appropriately instead.
I'm not seeing why this is necessarily a Reactive Extensions problem.
A simple approach, which is probably less fully featured than #Enigmativity's solution could be something along the lines of:
public IObservable<T> GetCachedValue<TKey, TResult>(TKey key, Func<TKey, TResult> getFromCache, Func<TKey, TResult> getFromSource)
{
return getFromCache(<key>).Concat(getFromSource(<key>).Take(1);
}
This is just a loosely formed idea, you'd need to add:
A mechanism to add the item to the cache, or assume getFromSource caches the result
Some kind of thread safety to prevent multiple hits on the source for the same uncached key (if required)
getFromCache would need to return Observable.Empty() if the item wasn't in the cache.
But if you want something simple, it's not a bad place to start.
Is there any easy LINQ expression to concatenate my entire List<string> collection items to a single string with a delimiter character?
What if the collection is of custom objects instead of string? Imagine I need to concatenate on object.Name.
string result = String.Join(delimiter, list);
is sufficient.
Warning - Serious Performance Issues
Though this answer does produce the desired result, it suffers from poor performance compared to other answers here. Be very careful about deciding to use it
By using LINQ, this should work;
string delimiter = ",";
List<string> items = new List<string>() { "foo", "boo", "john", "doe" };
Console.WriteLine(items.Aggregate((i, j) => i + delimiter + j));
class description:
public class Foo
{
public string Boo { get; set; }
}
Usage:
class Program
{
static void Main(string[] args)
{
string delimiter = ",";
List<Foo> items = new List<Foo>() { new Foo { Boo = "ABC" }, new Foo { Boo = "DEF" },
new Foo { Boo = "GHI" }, new Foo { Boo = "JKL" } };
Console.WriteLine(items.Aggregate((i, j) => new Foo{Boo = (i.Boo + delimiter + j.Boo)}).Boo);
Console.ReadKey();
}
}
And here is my best :)
items.Select(i => i.Boo).Aggregate((i, j) => i + delimiter + j)
Note: This answer does not use LINQ to generate the concatenated string. Using LINQ to turn enumerables into delimited strings can cause serious performance problems
Modern .NET (since .NET 4)
This is for an array, list or any type that implements IEnumerable:
string.Join(delimiter, enumerable);
And this is for an enumerable of custom objects:
string.Join(delimiter, enumerable.Select(i => i.Boo));
Old .NET (before .NET 4)
This is for a string array:
string.Join(delimiter, array);
This is for a List<string>:
string.Join(delimiter, list.ToArray());
And this is for a list of custom objects:
string.Join(delimiter, list.Select(i => i.Boo).ToArray());
using System.Linq;
public class Person
{
string FirstName { get; set; }
string LastName { get; set; }
}
List<Person> persons = new List<Person>();
string listOfPersons = string.Join(",", persons.Select(p => p.FirstName));
Good question. I've been using
List<string> myStrings = new List<string>{ "ours", "mine", "yours"};
string joinedString = string.Join(", ", myStrings.ToArray());
It's not LINQ, but it works.
You can simply use:
List<string> items = new List<string>() { "foo", "boo", "john", "doe" };
Console.WriteLine(string.Join(",", items));
Happy coding!
I think that if you define the logic in an extension method the code will be much more readable:
public static class EnumerableExtensions {
public static string Join<T>(this IEnumerable<T> self, string separator) {
return String.Join(separator, self.Select(e => e.ToString()).ToArray());
}
}
public class Person {
public string FirstName { get; set; }
public string LastName { get; set; }
public override string ToString() {
return string.Format("{0} {1}", FirstName, LastName);
}
}
// ...
List<Person> people = new List<Person>();
// ...
string fullNames = people.Join(", ");
string lastNames = people.Select(p => p.LastName).Join(", ");
List<string> strings = new List<string>() { "ABC", "DEF", "GHI" };
string s = strings.Aggregate((a, b) => a + ',' + b);
I have done this using LINQ:
var oCSP = (from P in db.Products select new { P.ProductName });
string joinedString = string.Join(",", oCSP.Select(p => p.ProductName));
Put String.Join into an extension method. Here is the version I use, which is less verbose than Jordaos version.
returns empty string "" when list is empty. Aggregate would throw exception instead.
probably better performance than Aggregate
is easier to read when combined with other LINQ methods than a pure String.Join()
Usage
var myStrings = new List<string>() { "a", "b", "c" };
var joinedStrings = myStrings.Join(","); // "a,b,c"
Extensionmethods class
public static class ExtensionMethods
{
public static string Join(this IEnumerable<string> texts, string separator)
{
return String.Join(separator, texts);
}
}
This answer aims to extend and improve some mentions of LINQ-based solutions. It is not an example of a "good" way to solve this per se. Just use string.Join as suggested when it fits your needs.
Context
This answer is prompted by the second part of the question (a generic approach) and some comments expressing a deep affinity for LINQ.
The currently accepted answer does not seem to work with empty or singleton sequences. It also suffers from a performance issue.
The currently most upvoted answer does not explicitly address the generic string conversion requirement, when ToString does not yield the desired result. (This can be remedied by adding a call to Select.)
Another answer includes a note that may lead some to believe that the performance issue is inherent to LINQ. ("Using LINQ to turn enumerables into delimited strings can cause serious performance problems.")
I noticed this comment about sending the query to the database.
Given that there is no answer matching all these requirements, I propose an implementation that is based on LINQ, running in linear time, works with enumerations of arbitrary length, and supports generic conversions to string for the elements.
So, LINQ or bust? Okay.
static string Serialize<T>(IEnumerable<T> enumerable, char delim, Func<T, string> toString)
{
return enumerable.Aggregate(
new StringBuilder(),
(sb, t) => sb.Append(toString(t)).Append(delim),
sb =>
{
if (sb.Length > 0)
{
sb.Length--;
}
return sb.ToString();
});
}
This implementation is more involved than many alternatives, predominantly because we need to manage the boundary conditions for the delimiter (separator) in our own code.
It should run in linear time, traversing the elements at most twice.
Once for generating all the strings to be appended in the first place, and zero to one time while generating the final result during the final ToString call. This is because the latter may be able to just return the buffer that happened to be large enough to contain all the appended strings from the get go, or it has to regenerate the full thing (unlikely), or something in between. See e.g. What is the Complexity of the StringBuilder.ToString() on SO for more information.
Final Words
Just use string.Join as suggested if it fits your needs, adding a Select when you need to massage the sequence first.
This answer's main intent is to illustrate that it is possible to keep the performance in check using LINQ. The result is (probably) too verbose to recommend, but it exists.
You can use Aggregate, to concatenate the strings into a single, character separated string but will throw an Invalid Operation Exception if the collection is empty.
You can use Aggregate function with a seed string.
var seed = string.Empty;
var seperator = ",";
var cars = new List<string>() { "Ford", "McLaren Senna", "Aston Martin Vanquish"};
var carAggregate = cars.Aggregate(seed,
(partialPhrase, word) => $"{partialPhrase}{seperator}{word}").TrimStart(',');
you can use string.Join doesn’t care if you pass it an empty collection.
var seperator = ",";
var cars = new List<string>() { "Ford", "McLaren Senna", "Aston Martin Vanquish"};
var carJoin = string.Join(seperator, cars);