Pros & cons between LINQ and traditional collection based approaches

Pros & cons between LINQ and traditional collection based approaches - c#

Being relatively new to the .net game, I was wondering, has anyone had any experience of the pros / cons between the use of LINQ and what could be considered more traditional methods working with lists / collections?
For a specific example of a project I'm working on : a list of unique id / name pairs are being retrieved from a remote web-service.
this list will change infrequently (once per day),
will be read-only from the point of view of the application where it is being used
will be stored at the application level for all requests to access
Given those points, I plan to store the returned values at the application level in a singleton class.
My initial approach was to iterate through the list returned from the remote service and store it in a NameValueCollection in a singleton class, with methods to retrieve from the collection based on an id:
sugarsoap soapService = new sugarsoap();
branch_summary[] branchList = soapService.getBranches();
foreach (branch_summary aBranch in branchList)
{
branchNameList.Add(aBranch.id, aBranch.name);
}
The alternative using LINQ is to simply add a method that works on the list directly once it has been retrieved:
public string branchName (string branchId)
{
//branchList populated in the constructor
branch_summary bs = from b in branchList where b.id == branchId select b;
return branch_summary.name;
}
Is either better than the other - is there a third way? I'm open to all answers, for both approaches and both in terms of solutions that offer elegance, and those which benefit performance.

i dont think the linq you wrote would compile, it'd have to be
public string branchName (string branchId)
{
//branchList populated in the constructor
branch_summary bs = (from b in branchList where b.id == branchId select b).FirstOrDefault();
return branch_summary == null ? null : branch_summary.name;
}
note the .FirstsOrDefault()
I'd rather use LINQ for the reason that it can be used in other places, for writing more complex filters on your data. I also think it's easier to read than NameValueCollection alternative.
that's my $0.02

In general, your simple one-line for/foreach loop will be faster than using Linq. Also, Linq doesn't [always] offer significant readability improvements in this case. Here is the general rule I code by:
If the algorithm is simple enough to write and maintain without Linq, and you don't need delayed evaluation, and Linq doesn't offer sufficient maintainability improvements, then don't use it. However, there are times where Linq immensely improves the readability and correctness of your code, as shown in two examples I posted here and here.

I'm not sure a singleton class is absolutely necessary, do you absolutely need global access at all times? Is the list large?
I assume you will have a refresh method on the singleton class for when the properties need to change and also that you have some way of notifying the singleton to update when the list changes.
Both solutions are viable. I think LINQ will populate the collection faster in the constructor (but not noticeably faster). Traditional collection based approaches are fine. Personally, I would choose the LINQ version if only because it is new tech and I like to use it. Assuming your deployment environment has .NET 3.5...
Do you have a method on your webservice for getting branches by Id? That would be the third option if the branch info is needed infrequently.

Shortened and workified:
public string BranchName(string branchId)
{
var bs = branchList.FirstOrDefault(b => b.Id == branchId);
return bs == null ? null : bs.Name;
}

Related

Whats the Good Practice of Possible Linq Result Data Storage to be used Globally within Solution

How do i do this the proper way legit code. like if it do a linq query within using and the result is i want it to use on another different bracket of using another call of linq query
if i store it i could declare a pre define var neither can i use a class constructor method.
what is the proper way to do this that i may able to use it on maybe different modules or different function / method without again doing the same call over in every method.
Hope i made it clear what i want.
var globalstorage;
using (Si360DbContext _si360DbContext = new Si360DbContext())
{
var selectedItems = _si360DbContext.SaleDumpDetails
.Where(sd => sd.SaleID == saleid)
.Select(i => i.SaleItemID).ToList();
//Store data..
globalstorage = selectedItems;
}
// able to reuse it else where having the same data stored on it.
globalstorage

It depends... as always...
If you're working on a web project, some mechanism that allows you to cache would be approprate. A MemoryCache for example could be a good and fast solution. However... If the amount of data grows, or you run multiple instances of the same website for example, you may need a distributed cache system like Redis.
For different types of apps, console or maybe Windows desktop you could just create a public static variable and you're good to go. So again, it totally depends on the needs of your solution.

Creating a join Linq query to optimize looping together two lists

I'm not good at databases and T-sql queries so I got a little stumped over how to do something similar in C# using Linq.
Thing is I have this structure that is pretty much the same as a relational database table, with which I have to do some kind of join selection.
In effect I get a list of composite key addresses. These are actually classes that hold a few int values (byte or short perhaps but not relevant). Now I have to search through my structure for matches of these lists and call a method there.
This is probably a simple join (can't remember what join does what) but I need some help because I wan't this to be as cheap as I can easily get away with so I don't need to search through every line for every address.
public class TheLocationThing
{
int ColumnID;
int ColumnGroupId;
int RowID;
}
public class TheCellThing
{
TheLocationThing thing;
public void MethodINeedToCallIfInList()
{
//here something happens
}
}
public class TheRowThing
{
int RowId;
List<TheCellThing> CellsInThisRow;
}
public class TableThing
{
List<TheRowThing> RowsInThisTable;
}
So I have this tablething type class that has rows and they have cells. Notice that ColumnGroup thing, it's a composite key with ColumnId so the same columnid can come again but only once for each ColumnGroup.
The thing to keep in mind though is that inn TheTable there will only ever be one GroupColumnId, but the list given could have multiple, so we can filter them away.
public void DoThisThing()
{
List<TheLocationThing> TheAddressesINeedToFind = GetTheseAddresses(); //actualy is a TheLocationThing[] if that matters
var filterList = TheAddressesINeedToFind.Where(a => a.ColumnGroupId == this.CurrentActiveGroup);
//Here I have to do the join with this.TableInstance
}
Now, I should of course only loop through the addresses with the same row id in that row and all that.
Also is managing thing as IQueryable something that would help me out here, especially in the initial filter out, should i get it as Queryable?

I'm going to give different example, because I'm not quite following yours, and use it to explain the basics of joining, hopefully hitting what you need to learn.
Let's imagine two slightly more meaninfully-named classes than LocationThing etc. (which has me lost).
public class Language
{
string Code{get;set;}
string EnglishName{get;set;}
string NativeName{get;set;}
}
public class Document
{
public int ID{get; private set;}//no public set as it corresponds to an automatically-set column
public string LanguageCode{get;set;}
public string Title{get;set;}
public string Text{get;set;}
}
Now, let's also imagine we have methods GetLanguages() and GetDocuments() that return all languages and documents respectively. There's a few different ways that could work, and I'll get to that later.
An example of a join being useful, is if we e.g. wanted all the titles and all the English names of the languages they were in. For that in SQL we would use:
SELECT documents.title, languages.englishName
FROM languages JOIN documents
ON languages.code = documents.languageCode
Or leaving out table names where doing so doesn't make column-names ambiguous:
SELECT title, englishName
FROM languages JOIN documents
ON code = languageCode
Each of these will, for each row in documents, match them up with the corresponding row in languages, and return the title and English name of the combined row (if there's a document with no matching language, it doesn't get returned, if there's two languages with the same code - should be prevented by the db in this case though - corresponding documents get mentioned once for each).
The LINQ equivalent is:
from l in GetLanguages()
join d in GetDocuments()
on l.Code equals d.LanguageCode //note l must come before d
select new{d.Title, l.EnglishName}
This will similarly match each document with its corresponding language and return an IQueryable<T> or IEnumerable<T> (depending on the source enumerations/queryables) where T is an anonymous object with Title and EnglishName properties.
Now, as to the expense of this. This depends primarily on the nature of GetLanguages() and GetDocuments().
No matter what the source, this is inherently a matter of searching through every one of the results of those two methods - that's just the nature of the operation. However, the most efficient way of doing this, is still something that varies according to what we know about the source data. Let's consider a Linq2Objects form first. There's lots of ways that this could be done, but lets imagine they're returning Lists that were pre-computed:
public List<Document> GetDocuments()
{
return _precomputedDocs;
}
public List<Language> GetLanguages()
{
return _precomputedLangs;
}
Let's pretend Linq's join doesn't exist for a moment, and imagine how we'd write something functionally equivalent to the code above. We might arrive at something like:
var langLookup = GetLanguages().ToLookup(l => l.Code);
foreach(var doc in GetDocuments())
foreach(var lang in langLookup[doc.LanguageCode])
yield return new{doc.Title, lang.EnglishName};
This is a reasonable general case. We can go one step further, and reduce storage, since we know that all we finally care about with each language is the English name:
var langLookup = GetLanguages().ToLookup(l => l.Code, l => l.EnglishName);
foreach(var doc in GetDocuments())
foreach(var englishName in langLookup[doc.LanguageCode])
yield return new{doc.Title, EnglishName = englishName};
That's about as much as we can do without special knowledge of the set of data.
If we did have special knowledge, we could go further. For example, if we knew there was only one language per code, then the following would be faster:
var langLookup = GetLanguages().ToDictionary(l => l.Code, l => l.EnglishName);
string englishName;
foreach(var doc in GetDocuments())
if(langLookup.TryGetValue(doc.LanguageCode, out englishName))
yield return new{doc.Title, EnglishName = englishName};
If we knew the two sources were both sorted by language code, we could go further still and spin through them both at the same time, yielding matches, and throwing away languages once we've dealt with them, as we're never going to need it again for the rest of the enumeration.
But, Linq does not have that special knowledge when just looking at two lists. For all it knows every single language and every single document all have the same codes. It really has to examine the lot to find out. For that, it's pretty efficient in how it does it (a bit better than my example above suggests, due to some optimisation).
Let's consider a Linq2SQL case, and note that Entity Framework and other ways of using Linq directly on databases would be comparable. Let's say all of this is happening in the context of a class that has a _ctx member that's a DataContext. Then our source methods could be:
public Table<Document> GetDocuments()
{
return _ctx.GetTable<Document>();
}
public Table<Language> GetLanguages()
{
return _ctx.GetTable<Languages>();
}
Table<T> implements IQueryable<T> along with some other methods. Here, instead of joining things in memory, it'll execute the following (bar some aliases) SQL:
SELECT documents.title, languages.englishName
FROM languages JOIN documents
ON languages.code = documents.languageCode
Look familiar? It's the same SQL we mentioned at the beginning.
First great thing about this, is that it's not bringing back anything from the database that we won't use.
Second great thing, is that the database's query engine (what turns this into executable code that it then runs) does have knowledge of the nature of the data. If for example we've set up the Languages table to have a unique key or constraint on the code column, the engine knows there can't be two languages with the same code, so it can perform the equivalent of the optimisation we mentioned above where we used a Dictionary instead of a ILookup.
Third great thing, is that if we have indices on languages.code and documents.languageCode then the query engine will use these for even faster retrieval and matching, perhaps getting all it needs from the index without hitting the table, making a call as to which table to hit first to avoid testing irrelevant rows in the second, and so on.
Fourth great thing, is that RDBMSs have benefited from several decades of research into how to make this sort of retrieval as fast as possible, so we've stuff going on that I don't know about and don't need to know about to benefit from.
In all then, we want to run our queries against the datasource directly, not against sources in memory. There are exceptions, particularly some forms of grouping (hitting the DB directly with some group-by operations can mean hitting it repeatedly) and if we reuse the same results over and over in quick succession (in which case we're better off hitting it once for those results, and then storing them).

How to optimize this code

it has a property:
string Code
and 10 other.
common codes is list of strings(string[] )
cars a list of cars(Car[])
filteredListOfCars is List.
for (int index = 0; index < cars.Length; index++)
{
Car car = cars[index];
if (commonCodes.Contains(car.Code))
{
filteredListOfCars.Add(car);
}
}
Unfortunately this piece of methodexecutes too long.
I have about 50k records
How can I lower execution time??

The easiest optimization isto convert commonCodes from a string[] to a faster lookup structure such as a Dictionary<string,object> or a HashSet<string> if you are using .Net 3.5 or above. This will reduce the big O complexity of this loop and depending on the size of commonCodes should make this loop execute faster.

Jared has correctly pointed out that you can optimize this with a HashSet, but I would also like to point out that the entire method is unnecessary, wasting memory for the output list and making the code less clear.
You could write the entire method as:
var commonCodesLookup = new HashSet<int>(commonCodes);
var filteredCars = cars.Where(c => commonCodesLookup.Contains(c.Code));
Execution of the filteredCars filtering operation will be deferred, so that if the consumer of it only wants the first 10 elements, i.e. by using filteredCars.Take(10), then this doesn't need to build the entire list (or any list at all).

To do what you want, I would use the Linq ToLookup method to create an ILookup instead of using a dictionary. ToLookup was made especially for this type of scenario. It is basically an indexed look up on groups. You want to group your cars by Code.
var carCodeLookup = cars.ToLookup(car => car.Code);
The creation of the carCodeLookup would be slow but then you can use it for fast lookup of cars based on Code. To get your list of cars that are in your list of common codes you can do a fast lookup.
var filteredCarsQuery = commonCodes.SelectMany(code => carCodeLookup[code]);
This assumes that your list of cars does not change very often and it is your commonCodes that are dynamic between queries.

you could use the linq join command, like
var filteredListOfCars = cars.Join(commonCodes, c => c.Code, cC => cC, (car, code) => car).ToArray();

Here's an alternative to the linq options (which are also good ideas): If you're trying to do filtering quickly, I would suggest taking advantage of built in types. You could create a DataTable that has two fields, the id of the car in your array, and the code (you can add the other 10 things if they matter as well). Then you can create a DataView around it and use the filter property of that. It uses some really fast indexing internally (B-trees I believe) so you probably won't be able to beat its performance manually unless you're an algorithms whiz, which if you were, you wouldn't be asking here. It depends what you're doing and how much performance matters.

It looks like what you're really checking is whether the "code" is common, not the car. You could consider a fly weight pattern, where cars share common instances of Code objects. The code object can then have an IsCommon property and a Value property.
You can then do something to the effect of updating the used Code objects whenever the commoncodes list changes.
Now when you do your filtering you only need to check each car code's IsCommon property

Common problem for me in C#, is my solution good, stupid, reasonable? (Advanced Beginner)

Ok, understand that I come from Cold Fusion so I tend to think of things in a CF sort of way, and C# and CF are as different as can be in general approach.
So the problem is: I want to pull a "table" (thats how I think of it) of data from a SQL database via LINQ and then I want to do some computations on it in memory. This "table" contains 6 or 7 values of a couple different types.
Right now, my solution is that I do the LINQ query using a Generic List of a custom Type. So my example is the RelevanceTable. I pull some data out that I want to do some evaluation of the data, which first start with .Contains. It appears that .Contains wants to act on the whole list or nothing. So I can use it if I have List<string>, but if I have List<ReferenceTableEntry> where ReferenceTableEntry is my custom type, I would need to override the IEquatable and tell the compiler what exactly "Equals" means.
While this doesn't seem unreasonable, it does seem like a long way to go for a simple problem so I have this sneaking suspicion that my approach is flawed from the get go.
If I want to use LINQ and .Contains, is overriding the Interface the only way? It seems like if there way just a way to say which field to operate on. Is there another collection type besides LIST that maybe has this ability. I have started using List a lot for this and while I have looked and looked, a see some other but not necessarily superior approaches.
I'm not looking for some fine point of performance or compactness or readability, just wondering if I am using a Phillips head screwdriver in a Hex screw. If my approach is a "decent" one, but not the best of course I'd like to know a better, but just knowing that its in the ballpark would give me little "Yeah! I'm not stupid!" and I would finish at least what I am doing completely before switch to another method.
Hope I explained that well enough. Thanks for you help.

What exactly is it you want to do with the table? It isn't clear. However, the standard LINQ (-to-Objects) methods will be available on any typed collection (including List<T>), allowing any range of Where, First, Any, All, etc.
So: what is you are trying to do? If you had the table, what value(s) do you want?
As a guess (based on the Contains stuff) - do you just want:
bool x= table.Any(x=>x.Foo == foo); // or someObj.Foo
?

There are overloads for some of the methods in the List class that takes a delegate (optionally in the form of a lambda expression), that you can use to specify what field to look for.
For example, to look for the item where the Id property is 42:
ReferenceTableEntry found = theList.Find(r => r.Id == 42);
The found variable will have a reference to the first item that matches, or null if no item matched.
There are also some LINQ extensions that takes a delegate or an expression. This will do the same as the Find method:
ReferenceTableEntry found = theList.FirstOrDefault(r => r.Id == 42);

Ok, so if I'm reading this correctly you want to use the contains method. When using this with collections of objects (such as ReferenceTableEntry) you need to be careful because what you're saying is you're checking to see if the collection contains an object that IS the same as the object you're comparing against.
If you use the .Find() or .FindAll() method you can specify the criteria that you want to match on using an anonymous method.
So for example if you want to find all ReferenceTableEntry records in your list that have an Id greater than 1 you could do something like this
List<ReferenceTableEntry> listToSearch = //populate list here
var matches = listToSearch.FindAll(x => x.Id > 1);
matches will be a list of ReferenceTableEntry records that have an ID greater than 1.
having said all that, it's not completely clear that this is what you're trying to do.

Here is the LINQ query involved that creates the object I am talking about, and the problem line is:
.Where (searchWord => queryTerms.Contains(searchWord.Word))
List<queryTerm> queryTerms = MakeQueryTermList();
public static List<RelevanceTableEntry> CreateRelevanceTable(List<queryTerm> queryTerms)
{
SearchDataContext myContext = new SearchDataContext();
var productRelevance = (from pwords in myContext.SearchWordOccuranceProducts
where (myContext.SearchUniqueWords
.Where (searchWord => queryTerms.Contains(searchWord.Word))
.Select (searchWord => searchWord.Id)).Contains(pwords.WordId)
orderby pwords.WordId
select new {pwords.WordId, pwords.Weight, pwords.Position, pwords.ProductId});
}
This query returns a list of WordId's that match the submitted search string (when it was List and it was just the word, that works fine, because as an answerer mentioned before, they were the same type of objects). My custom type here is queryTerms, a List that contains WordId, ProductId, Position, and Weight. From there I go about calculating the relevance by doing various operations on the created object. Sum "Weight" by product, use position matches to bump up Weights, etc. My point for keeping this separate was that the rules for doing those operations will change, but the basic factors involved will not. I would have even rather it be MORE separate (I'm still learning, I don't want to get fancy) but the rules for local and interpreted LINQ queries seems to trip me up when I do.
Since CF has supported queries of queries forever, that's how I tend to lean. Pull the data you need from the db, then do your operations (which includes queries with Aggregate functions) on the in-memory table.
I hope that makes it more clear.

Dynamic "WHERE" like queries on memory objects

What would be the best approach to allow users to define a WHERE-like constraints on objects which are defined like this:
Collection<object[]> data
Collection<string> columnNames
where object[] is a single row.
I was thinking about dynamically creating a strong-typed wrapper and just using Dynamic LINQ but maybe there is a simpler solution?
DataSet's are not really an option since the collections are rather huge (40,000+ records) and I don't want to create DataTable and populate it every time I run a query.

What kind of queries do you need to run? If it's just equality, that's relatively easy:
public static IEnumerable<object[]> WhereEqual(
this IEnumerable<object[]> source,
Collection<string> columnNames,
string column,
object value)
{
int columnIndex = columnNames.IndexOf(column);
if (columnIndex == -1)
{
throw new ArgumentException();
}
return source.Where(row => Object.Equals(row[columnIndex], value);
}
If you need something more complicated, please give us an example of what you'd like to be able to write.

If I get your point : you'd like to support users writting the where clause externally - I mean users are real users and not developers so you seek solution for the uicontrol, code where condition bridge. I just though this because you mentioned dlinq.
So if I'm correct what you want to do is really :
give the user the ability to use column names
give the ability to describe a bool function (which will serve as where criteria)
compose the query dynamically and run
For this task let me propose : Rules from the System.Workflow.Activities.Rules namespace. For rules there're several designers available not to mention the ones shipped with Visual Studio (for the web that's another question, but there're several ones for that too).I'd start with Rules without workflow then examine examples from msdn. It's a very flexible and customizable engine.
One other thing: LINQ has connection to this problem as a function returning IQueryable can defer query execution, you can previously define a query and in another part of the code one can extend the returned queryable based on the user's condition (which then can be sticked with extension methods).

When just using object, LINQ isn't really going to help you very much... is it worth the pain? And Dynamic LINQ is certainly overkill. What is the expected way of using this? I can think of a few ways of adding basic Where operations.... but I'm not sure how helpful it would be.

How about embedding something like IronPython in your project? We use that to allow users to define their own expressions (filters and otherwise) inside a sandbox.

I'm thinking about something like this:
((col1 = "abc") or (col2 = "xyz")) and (col3 = "123")
Ultimately it would be nice to have support for LIKE operator with % wildcard.

Thank you all guys - I've finally found it. It's called NQuery and it's available from CodePlex. In its documentation there is even an example which contains a binding to my very structure - list of column names + list of object[]. Plus fully functional SQL query engine.
Just perfect.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.