select distinct mongodb C# - c#

I have to select distinct records from my simple mongo db database.
I have many simple records these records looks like this :
{"word":"some text"}
My code is very simple.
const string connectionString = "mongodb://localhost";
var client = new MongoClient(connectionString);
MongoServer server = client.GetServer();
MongoDatabase database = server.GetDatabase("text8");
MongoCollection<Element> collection = database.GetCollection<Element>("text8");
MongoCursor<Element> words = (MongoCursor<Element>)collection.FindAll();
But I have't idea how to select distinct word's from database.
Could someone can give me some advice ?

MongoDB API has a distinct aggregation command, which returns distinct values found for a specified key in a collection. You can also use it from C# Driver:
var distinctWords = collection.Distinct("word");
where collection - is an instance from your example. This query will return all distinct values of word field in the collection.
Also, as #JohnnyHK mentioned in comment, you can use linq approach, since it is supported by C# driver:
var distinctWords = collection.AsQueryable<Element>().Select(e => e.Word).Distinct();

this work´s for me
Collection.Distinct<string>("ColumnNameForDistinct", FilterDefinition<T>.Empty).ToListAsync()

My guess would be to make "word" an index on this db.
Then using some linq to query it in a simple expression:
var res = col.Query().Select(e => e.word).Distinct();
This would result in reading all words from the index.

The MongoCollection.Distinct Method (String) V2.0 is Legacy
for new version API like 2.4 use:
FieldDefinition<yueyun.land,string> field = "FirstName";
var bx = _yueyunlands.Distinct<string>(field, Builders<yueyun.land>.Filter.Empty).ToList();

If you want to filter first and get distinct afterwards and also do all of these at MongoDB side, you can use the following example.
In this example I applied a filter, got distinct values and finally got count:
var filter = Builders<Logs>.Filter.Ne(x => x.Id, null);
var count = collection.Distinct(x => x.Id, filter).ToList().Count();

MongoDB doesn't have a built in operator to split a string of words from a query as there's not a way to split a string, then run a "distinct" operation on it.
One option would be to create a MapReduce and do the split in the MapReduce code and count each word. You can't do this with just C# code.
A second, and possibly simpler option would be to pre-split the field into words so that you could use one of the distinct operators:
{ "word": [ "some", "text"] }
Then:
dbCollection.Distinct("word");
This would of course work if you just want to treat the entire string as a "word" rather than each word separately
.
MapReduce's aren't real-time ... the pseudo-code would be:
map = function() {
var splits = this.word.split(' ');
for(var i = 0, l = splits.length; i < l; i++) {
emit(splits[i], 1);
}
}
reduce = function(word, vals) {
var count = 0;
for(var i=0, l=vals.length; i < l; i++) {
count += vals[i];
}
return count;
}
When you run the MapReduce, it would be a collection of the number of occurrences of each word.

Related

Linq Select Where IN

Cannot find the lambda linq equivalent to SELECT * FROM [Source] WHERE [Field] IN [String Array]. I need to select all data from a data table that contains zip codes from a string array. I would like a faster way than just iterating through every row comparing them as I believe this would be fairly inefficient (or I believe it would anyway). I can't seem to find an adequate answer on Google of how to perform a lambda LINQ IN query on a data table. Any assistance would be great! Here is what I have currently:
List<string> lst = dtEtechZipCodeEmailRecipients.AsEnumerable()
.Select(o => o.Field<string>("Email")).Distinct().ToList();
for (int i = 0; i < lst.Count - 1; ++i)
{
string email = lst[i].ToString().ToUpper();
string[] zipCodes = dtEtechZipCodeEmailRecipients.AsEnumerable()
.Where(zip => (zip.Field<string>("Email") ?? (object)String.Empty).ToString().ToUpper() == email)
.Select(zip => zip.Field<string>("ZipCode")).ToArray();
Console.WriteLine(" - " + email);
dtEtechModelRequests.AsEnumerable().Where(mod => mod.Field<string>("ZipCode").Contains(zipCodes)).Select(mod => mod);
}
That does not work, everything but the .Contains does do exactly what I need though. I left the .Contains to try and demonstrate my point.
You should do opposite thing - check whether array of zip codes contains current zip code:
Where(mod => zipCodes.Contains(mod.Field<string>("ZipCode"))
That is same as verifying if current zip code IN array.
Simple answer to your question is,
[Source].Where(el => [String Array].Contains(el.Field<string>("Field"))
And if you need NOT IN then follow the following pattern
Adding ! infront of your [String Array]
[Source].Where(el => ![String Array].Contains(el.Field<string>("Field"))
alternative representation
[Source].Where(el => [String Array].Contains(el.Field<string>("Field")==false)

Multiple property Full-Text Search with MongoDB

I've this C# code to query my MongoDB collection:
var query = myCollection.FindAll().AsQueryable();
if (!string.IsNullOrWhiteSpace(username))
query = query.Where(
x => x.User.FullName.IndexOf(username, StringComparison.OrdinalIgnoreCase) >= 0);
if (!string.IsNullOrWhiteSpace(productName))
query = query.Where(
x => x.Product.ProductName.IndexOf(productName, StringComparison.OrdinalIgnoreCase) >= 0);
query = query.Take(pageSize).Skip(pageSize*(pageNumber-1));
var itemCount=query.Count();
var result = query.ToList();
Due to low performance now I want to use a full-text search. I created text index for User.FullName and Product.ProductName and I started to write code like this:
var textSearchCommand = new CommandDocument
{
{ "text", myCollection.Name },
{ "search", username }
};
var commandResult = _database.RunCommand(textSearchCommand);
var result = commandResult.Response;
Now I'm stuck; How to specify the property name in the above syntax example? Is this the right way to do that?
A text index points to the document as a whole, not to the individual field where the match occurs. That means a text-search is always performed on all fields which are part of the text-index. You can not selectively only search for matches in one field.
But what you can do is further filter the result-set of the $text-operator with additional operators. You could, for example, use an additional $regex-operator to check if the string you searched for occurs in the field where you want it to be.

Join large list of Integers into LINQ Query

I have LINQ query that returns me the following error:
"The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Too many parameters were provided in this RPC request. The maximum is 2100".
All I need is to count all clients that have BirthDate that I have their ID's in list.
My list of client ID's could be huge (millions of records).
Here is the query:
List<int> allClients = GetClientIDs();
int total = context.Clients.Where(x => allClients.Contains(x.ClientID) && x.BirthDate != null).Count();
When the query is rewritten this way,
int total = context
.Clients
.Count(x => allClients.Contains(x.ClientID) && x.BirthDate != null);
it causes the same error.
Also tried to make it in different way and it eats all memory:
List<int> allClients = GetClientIDs();
total = (from x in allClients.AsQueryable()
join y in context.Clients
on x equals y.ClientID
where y.BirthDate != null
select x).Count();
We ran into this same issue at work. The problem is that list.Contains() creates a WHERE column IN (val1, val2, ... valN) statement, so you're limited to how many values you can put in there. What we ended up doing was in fact do it in batches much like you did.
However, I think I can offer you a cleaner and more elegant piece of code to do this with. Here is an extension method that will be added to the other Linq methods you normally use:
public static IEnumerable<IEnumerable<T>> BulkForEach<T>(this IEnumerable<T> list, int size = 1000)
{
for (int index = 0; index < list.Count() / size + 1; index++)
{
IEnumerable<T> returnVal = list.Skip(index * size).Take(size).ToList();
yield return returnVal;
}
}
Then you use it like this:
foreach (var item in list.BulkForEach())
{
// Do logic here. item is an IEnumerable<T> (in your case, int)
}
EDIT
Or, if you prefer, you can make it act like the normal List.ForEach() like this:
public static void BulkForEach<T>(this IEnumerable<T> list, Action<IEnumerable<T>> action, int size = 1000)
{
for (int index = 0; index < list.Count() / size + 1; index++)
{
IEnumerable<T> returnVal = list.Skip(index * size).Take(size).ToList();
action.Invoke(returnVal);
}
}
Used like this:
list.BulkForEach(p => { /* Do logic */ });
Well as Gert Arnold mentioned before, making query in chunks solves the problem, but it looks nasty:
List<int> allClients = GetClientIDs();
int total = 0;
const int sqlLimit = 2000;
int iterations = allClients.Count() / sqlLimit;
for (int i = 0; i <= iterations; i++)
{
List<int> tempList = allClients.Skip(i * sqlLimit).Take(sqlLimit).ToList();
int thisTotal = context.Clients.Count(x => tempList.Contains(x.ClientID) && x.BirthDate != null);
total = total + thisTotal;
}
As has been said above, your query is probably being translated to:
select count(1)
from Clients
where ClientID = #id1 or ClientID = #id2 -- and so on up to the number of ids returned by GetClientIDs.
You will need to change your query such that you aren't passing so many parameters to it.
To see the generated SQL you can set the Clients.Log = Console.Out which will cause it to be written to the debug window when it is executed.
EDIT:
A possible alternative to chunking would be to send the IDs to the server as a delimited string, and create a UDF in your database which can covert that string back to a list.
var clientIds = string.Jon(",", allClients);
var total = (from client in context.Clients
join clientIds in context.udf_SplitString(clientIds)
on client.ClientId equals clientIds.Id
select client).Count();
There are lots of examples on Google for UDFs that split strings.
Another alternative and probably the fastest at query time is to add your numbers from the CSV file into a temporary table in your database and then do a join query.
Doing a query in chunks means a lot of round-trips between your client and database. If the list of IDs you are interested in is static or changes rarely, I recommend the approach of a temporary table.
If you don't mind moving the work from the database to the application server and have the memory, try this.
int total = context.Clients.AsEnumerable().Where(x => allClients.Contains(x.ClientID) && x.BirthDate != null).Count();

Where clause of LINQ statement to find instances of a string within a List<string> collection?

I'm trying to construct a Where clause for a Linq statement which needs to determine whether the AccountNumber values retrieved as below exist in a List<string> collection.
I've thus far tried this:
private void FindAccountNumbers(List<string> AccountNumbers)
{
var query = from abc
select new
{
AccountNumber = abc.AccountNumber
};
query = query.Where(AccountNumbers.Contains(x => x.AccountNumber));
}
However I get the following build error:
The type arguments for method
'System.Linq.Queryable.Where(System.Linq.IQueryable,
System.Linq.Expressions.Expression>)' cannot
be inferred from the usage. Try specifying the type arguments
explicitly.
At runtime, query contains AccountNumber values, and I'm trying to pare this down based on matches found in the AccountNumbers collection (similar to an IN statement in TSQL). Should I be using Intersect instead of Contains? What am I doing wrong??
I think you want to have this:
query = query.Where(x => AccountNumbers.Contains(x.AccountNumber));
This doesn't work?
var query = from x in abc
where AccountNumbers.Contains(x.AccountNumber)
select new { x.AccountNumber };
That would give you back any AccountNumber in that list, unless AccountNumber isn't actually a string. That could be your problem.
Its because your syntax for from is wrong, I'm guessing that your collection is abc of items to match against is abc
The correct syntax would be (Version 1)
var query = from x in abc
select new { AccountNumber = x.AccountNumber };
query = query.Where(x=>AccountNumbers.Contains(x.AccountNumber));
you don't need to do an anonymous type either as you are just wanting the same field you could just do (Version 2)
var query = from x in abc select x.AccountNumber;
query = query.Where(x=>AccountNumbers.Contains(x));
However you could just slap the Where straight onto your original collection. (Version 3)
var query = abc.Where(x=>AccountNumbers.Contains(x.AccountNumber);
Or if you are just trying to find whether any exist in the collection (Version 4)
var query = abc.Any(x=>AccountNumbers.Countains(x.AccountNumber);
Version 1 will return IEnumerable<string>
Version 2 will return IEnumerable<string>
Version 3 will return IEnumerable<type of the items in abc>
Version 4 will return bool
Let me verify what you're trying to do.
You have a collection of objects abc. You want to pull out the AccountNumber from each member of that collection, compare it to the list of account numbers passed in, and determine... what? If there IS any overlap, or WHAT the overlap is?
If the AccountNumber field is a string, you could do this:
private IEnumerable<string> OverlappingAccountNumbers(IEnumerable<string> accountNumbers)
{
return abc.Select(x => x.AccountNumber)
.Intersect(accountNumbers);
}
Or for the boolean case:
private bool AnyOverlappingAccountNumbers(IEnumerable<string> accountNumbers)
{
return abc.Select(x => x.AccountNumber)
.Intersect(accountNumbers)
.Count() > 0;
}
I'd go with this:
private void FindAccountNumbers(List<string> AccountNumbers)
{
// Get a strongly-typed list, instead of an anonymous typed one...
var query = (from a in abc select a.AccountNumber).AsEnumerable();
// Grab a quick intersect
var matched = query.Intersect(AccountNumbers)
}
One liner?
var query = (from a in abc select a.AccountNumber).AsEnumerable().Intersect(AccountNumbers);
The last answer are wrong because did mention one important point and obviously didn't be tested, the first issue is, you can't mix between an sql query than not been execute and a string's list, YOU CAN'T MIX!!! the solution for this problem and tested is:
var AccountNumbers = select accountNumber from table where.... // is a entitie
private void FindAccountNumbers(IQueryable<acounts> AccountNumbers) //entitie object not string
{
var query = from abc
select new
{
AccountNumber = abc.AccountNumber
};
query = query.Join(AccountNumbers, abc => abc.AccountNumber, aco=> aco, (ac, coNum) => cpac);
}
It really works! is necessary to mention this solution is when you are working with linq and entities framework!

Select entities where ID in int array - WCF Data Services, LINQ

I would like to return a set of entities who has and ID that is contained in a list or array of IDs using LINQ and Data Services. I know how to this using LinqToEF but I am at a loss how to this with Data Services or using OData query conventions for that matter.
My thought is that I would do something like:
int[] intArray = {321456, 321355, 218994, 189232};
var query = (from data in context.Entity
where intArray.contains(data.ID)
select data);
Is there any way to accomplish using Data Services / OData? I know I could probably hack it with a Service Operation but I would prefer not to do that.
Cheers.
Currently OData (the underlying protocol) doesn't support the Contains operation. So that's why the client library does not translate the above query.
People are basically using two ways to overcome this limitation:
1) Use service operations as you noted.
2) Construct a where clause dynamically which uses simple comparisons to compare the value to each item from the array. So if the array contains 1, 2, 3, the where would be data.ID == 1 || data.ID == 2 || data.ID == 3
The #2 solution is nice because it's a client side only change. The downside is, that it only works for small arrays. If the array contains too many items the expression gets too long and that leads to all kinds of troubles.
The #1 solution doesn't have the size problem, but you need to provide the operation on the server.
Here is my realization of WhereIn() Method, to filter IQueryable collection by a set of selected entities:
public static IQueryable<T> WhereIn<T,TProp>(this IQueryable<T> source, Expression<Func<T,TProp>> memberExpr, IEnumerable<TProp> values) where T : class
{
Expression predicate = null;
ParameterExpression param = Expression.Parameter(typeof(T), "t");
bool IsFirst = true;
// Create a comparison for each value eg:
// IN: t => t.Id == 1 | t.Id == 2
MemberExpression me = (MemberExpression) memberExpr.Body;
foreach (TProp val in values)
{
ConstantExpression ce = Expression.Constant(val);
Expression comparison = Expression.Equal(me, ce);
if (IsFirst)
{
predicate = comparison;
IsFirst = false;
}
else
{
predicate = Expression.Or(predicate, comparison);
}
}
return predicate != null
? source.Where(Expression.Lambda<Func<T, bool>>(predicate, param)).AsQueryable<T>()
: source;
}
And calling of this method looks like:
IQueryable<Product> q = context.Products.ToList();
var SelectedProducts = new List<Product>
{
new Product{Id=23},
new Product{Id=56}
};
...
// Collecting set of product id's
var selectedProductsIds = SelectedProducts.Select(p => p.Id).ToList();
// Filtering products
q = q.WhereIn(c => c.Product.Id, selectedProductsIds);
Thank you men you really helped me :) :)
I did it like Vitek Karas said.
1) Download the Dynamic query library
Check this link
No need to read it just download the Dynamic query library
2)Check the project named DynamicQuery. In it you will find a class named Dynamic.cs . Copy It to your project
3)Generate your project( If you are using silverlight an error that say ReaderWriterLock is not found will appear. Don't be affraid. Just comment or delete the lines that make errors( there is just 6 or 7 lines that make errors) )
4) All done you just need now to write your query
Example: ordersContext.CLIENTS.Where(" NUMCLI > 200 || NUMCLI < 20");
All done. If you have to use the 'Contains' method you just to write a method that iterate over your array and return the string that your request will use.
private string MyFilter()
{ string st = "";
foreach(var element in myTab)
{
st = st + "ThePropertyInTheTable =" + element + "||";
}
return st;
}
I hope you understand me and that i helped someone :)

Categories

Resources