Where does the LINQ query syntax come from?

Where does the LINQ query syntax come from? - c#

I'm new to C# and have just started to delve into using classes that essentially mirror a database. What I'm confused about is how I'm able to use lines like
var queryLondonCustomers = from cust in customers
where cust.City == "London"
select cust;
in my program. From what I understand, that syntax isn't "normal" C#, so the above line wouldn't have any meaning if I hadn't included System.Linq. What is happening is we've added to the C# sharp language in the context of this file.
Maybe I'm completely wrong. Could someone clear this up for me? I come from a C++ background, so maybe I'd understand if someone could show me the C++ equivalent of this concept.
And if I'm right, why is this way of doing things preferable to having a C# class that talks to the database by using strings that are database queries, like with PHP and MySQL? I thought this MVC way of talking to the database was supposed to provide an abstraction for me to use a C# class for database operations, but really this is just taking database language and adding it to the C# language in the context of a particular C# file. I can't see any point of that. I'm not trolling, just trying to understand the spirit of this whole ASP.NET MVC thing that is the most confusing thing I've learned so far.

From what I understand, that syntax isn't "normal" C#
Yes it is, as of C# 3.
so the above line wouldn't have any meaning if I hadn't included System.Linq
Yes it would. It would still have effectively been transformed by the compiler into:
var queryLondonCustomers = customers.Where(cust => cust.City == "London");
(The lack of a Select call is because you're selecting the range variable directly, rather than some projection of it.)
If that code would have compiled (e.g. because of a Where member in customers, or due to another extension method on its type) then so would the query expression.
Query expressions are specified in section 7.16 of the C# language specification.
As for the question of why you'd want to do this, well:
Using an ORM instead of just manual SQL is hardly new - but LINQ integrates it into the language, with a somewhat leaky abstraction
LINQ doesn't just work for databases; I primarily use it in "regular" collections such as lists etc.

in my program. From what I understand, that syntax isn't "normal" C#, so the above line wouldn't have any meaning if I hadn't included System.Linq
Yes and no at the same time :-)
The LINQ syntax is standard C# syntax (from C# 3), but it is resolved at compile time as a semi-textual substitution...
Your code is changed to:
var queryLondonCustomers = customers.Where(cust => cust.City == "London");
and then the various .Where and .Select methods are resolved (it is called Duck Typing... see for example Does LINQ "Query Syntax" Support Duck Typing?)
So at this point you need the using System.Linq that gives you access to the System.Linq.Enumerable and System.Linq.Queryable that are the two classes that implement all the various .Where and .Select methods as extension methods.
Note that you could create and implement a static class of yours, public static class MyLinqMethods, and by creating methods with the "right" signature in that class, you could use the LINQ syntax against your MyLinqMethods class.
And if I'm right, why is this way of doing things preferable to having a C# class that talks to the database by using strings that are database queries
There is some safety in using LINQ... If you created somewhere some classes that mapped the database tables, then the C# compiler could check against these classes that you are using the right names for the fields. If you wrote
var queryLondonCustomers = from cust in customers
where cust.CityERROR == "London"
select cust;
the compiler would give you an error, because CityERROR isn't a field/property of Customer. Clearly you could have an error in the "mapping" files, but at least you have a single place that can have these errors.

From what I understand, that syntax isn't "normal" C#,
Yes it is.
so the above line wouldn't have any meaning if I hadn't included System.Linq
It would. It would always mean the same thing as:
var queryLondonCustomers = customers.Where(cust.City == "London");
C# doesn't care how customers.Where(Func<Customer, bool>) is defined, just as long as it is. System.Linq has extension methods that define Where for IEnumerable and IQueryable which covers 99.9% of the time that you want this, but it doesn't have to come from there.
In particular if customers was an instance of a class that had its own Where(Func<Customer, bool>) method then it would be the overload used (instance methods always beat extension methods in overload resolution). Likewise if another static class defined an extension method for ... Where(this CustomerCollection, Func<Customer, bool>) or similar it would be called.
And if I'm right, why is this way of doing things preferable to having a C# class that talks to the database by using strings that are database queries
Querying collection-like objects is a very common use case, of which database access is only one. Providing a common interface to a common use case is a classic reason for any interface-based programming.

ASP.NET MVC is just a way of creating web applications, like you would create windows forms or WPF projects to create desktop applications. They don't have any special capabilities regarding database interaction .
LINQ on the other hand is something that is quite unique. It provides a convenient way of working with collections. The data in these collections can come from databases but it doesn't have to. How you write your queries depend on your preference. I like the lambda syntax, it is short and easy to read.
The advantage of LINQ is that you CAN use its syntax to interact with the database, but therefore you'll need to use APIs that are designed to do this, such as entity framework. This way, you can tell entity framework to do certain stuff with your LINQ commands, such as retrieving records with a certain where clause.

Related

C++ CLI equivalent of Intersect and Except [duplicate]

Just wondering if there is a way to use LINQ in C++/CLI. I found one post that was focused on VS 2008 and required a bunch of workarounds for the System::String class. I have seen some framework replacements on CodeProject, but I was wondering if there is a way to use it directly in C++/CLI. If you can, anyone have a good example?

You can use the Linq methods that are defined in the System::Linq namespace, but you'll have to jump through a couple extra hoops.
First, C++/CLI doesn't support extension methods. However, the extension methods are regular methods defined on various classes in System::Linq, so you can call them directly.
List<int>^ list = gcnew List<int>();
int i = Enumerable::FirstOrDefault(list);
Second, C++/CLI doesn't support lambda expressions. The only workaround is to declare an actual method, and pass that as a delegate.
ref class Foo
{
public:
static bool GreaterThanZero(int i) { return i > 0; }
void Bar()
{
List<int>^ list = gcnew List<int>();
int i = Enumerable::FirstOrDefault(list, gcnew Func<int, bool>(&Foo::GreaterThanZero));
}
}

Are you talking about "Language Integrated Query" or the System::Linq namespace? Every programmer I know prefers the function call syntax instead of LINQ syntax.
C++/CLI does not support LINQ syntax. Databases have supported a form of language integrated query in the past, called Embedded SQL, which is pretty much dead these days. Embedded SQL (and later LINQ-to-SQL) was a dumb idea to begin with, people have since figured out that database query logic should be in the database and not mixed into the business logic.
LINQ-to-objects is a more useful idea, but SQL syntax just feels out of place. So C# programmers tend to call the LINQ library functions directly.
C++ doesn't really need LINQ, because we have templates. The standard library algorithms made possible by templates are a superset of the advantages of LINQ: They can be specialized for particular containers, but you get a good default implementation without any help from the container class. And they compile to much more efficient code, because overload resolution happens after specialization (unlike generics). Ok, templates aren't as good for runtime reflection as generics, but C# extension methods don't play well with runtime reflection either. The biggest drawback of the C++ standard algorithms has been the verbosity of writing predicate functors, but C++0x introduces lambdas which take care of that.
Really what C++/CLI needs is a version of the standard algorithms that works on .NET containers. And here it is. For example, LINQ's Where method corresponds pretty closely to find_if. Now we just need Microsoft to hurry up and implement the final C++0x spec.

Porting a very Pythonesque library over to .NET

I'm investigating the possibility of porting the Python library Beautiful Soup over to .NET. Mainly, because I really love the parser and there's simply no good HTML parsers on the .NET framework (Html Agility Pack is outdated, buggy, undocumented and doesn't work well unless the exact schema is known.)
One of my primary goals is to get the basic DOM selection functionality to really parallel the beauty and simplicity of BeautifulSoup, allowing developers to easily craft expressions to find elements they're looking for.
BeautifulSoup takes advantage of loose-binding and named parameters to make this happen. For example, to find all a tags with an id of test and a title that contains the word foo, I could do:
soup.find_all('a', id='test', title=re.compile('foo'))
However, C# doesn't have a concept of an arbitrary number of named elements. The .NET4 Runtime has named parameters, however they have to match an existing method prototype.
My Question: What is the C# design pattern that most parallels this Pythonic construct?
Some Ideas:
I'd like to go after this based on how I, as a developer, would like to code. Implementing this is out of the scope of this post. One idea I has would be to use anonymous types. Something like:
soup.FindAll("a", new { Id = "Test", Title = new Regex("foo") });
Though this syntax loosely matches the Python implementation, it still has some disadvantages.
The FindAll implementation would have to use reflection to parse the anonymous type, and handle any arbitrary metadata in a reasonable manner.
The FindAll prototype would need to take an Object, which makes it fairly unclear how to use the method unless you're well familiar with the documented behavior. I don't believe there's a way to declare a method that must take an anonymous type.
Another idea I had is perhaps a more .NET way of handling this but strays further away from the library's Python roots. That would be to use a fluent pattern. Something like:
soup.FindAll("a")
.Attr("id", "Test")
.Attr("title", new Regex("foo"));
This would require building an expression tree and locating the appropriate nodes in the DOM.
The third and last idea I have would be to use LINQ. Something like:
var nodes = (from n in soup
where n.Tag == "a" &&
n["id"] == "Test" &&
Regex.Match(n["title"], "foo").Success
select n);
I'd appreciate any insight from anyone with experience porting Python code to C#, or just overall recommendations on the best way to handle this situation.

Have you try to run your code inside the IronPython engine. As far as I know performs really well and you don't have to touch your python code.

DSL: from DSL rules into C# expressions

the question is maybe composite, let me expand it:
does it exists a designer (stub/framework/meta-designer) to create AND/OR based rules based on .NET object public bool properties? Saved as any DSL/Boo/... output.
is it possible to compile the DSL output into C# expressions?
Our main problem is the gap between the documentation and the code. Our product based on hundreds of user definied rules and we want to speed up the change requests.
If we are able to give a simple designer to the users and grab the output, then after translating/compiling it into C#/IL code we have a fast change request cycle.
I know that our problem it's to specific but any "bricks in the wall" are welcome!
Example:
A C# class, subject of :
public class TestA
{
public bool B {...}
public bool C {...}
}
In the designer, we should able to create
any type of graphics designers (ie. dropdown to select public properties)
Output in DSL:
If TestA.B AND TestA.C Then Return True;
Output in C#:
if (testA.B && testA.C) { return true; }
Update #1
I would be glad with a DSL language that support using of static-typed .NET classes. I mean if the user can check the code ("Output in DSL" in the example), we don't need the designer.
Update #2
Based on the tipp, I stared with expression trees. After few days I ran into DLinq - I never was a big fan of DLinq but in this case fits the problem domain very well.
Easy to parse (A > 2 AND B < 4) OR C = 5 into expression trees
Easy to create expressions like that
Very easy to serialize/deserialize
GUI based on FlowLayoutPanel works fine as "expression builder"

You could build something like this your self.
You can get a list of all the public properties for a class using Type.GetMembers()
However, instead of generating C# code, I would use expression trees.
That way you don't need to involve the C# compiler when the users change rules. Instead, you can store the rules in a database, load them at runtime, and then use the Expression.Compile() method to create a delegate you can invoke to run the code.
Update:
In the comments someone asked "What is the difference between Expression Tress and domain specific languages?"
Here's the answer:
Expression trees and domain specific languages are orthogonal things.
Expression tress are just an API for representing C# expressions, that conveniently can be converted into a delegate dynamically at runtime.
A DSL, or domain specific language, is a programing language designed to solve a narrow class of problems.
They are, essentially, completely different things.
You can use expression trees as part of a DSL implementation if you like. Linq uses them for for that purpose.
In your case, however, you don't need a DSL. What you need is a user interface that generates rules (similar to the way outlook works), and then a way of executing those rules.
Creating the UI is just normal UI development.
Expression trees are what you can use to implement the rules.

It's a little-known fact that the designer for Windows Workflow Foundation and its Rules Engine in particular can be hosted in a Windows Forms application separate from Visual Studio. The rules authored in this way can similarly be evaluated independent of an actual workflow.
See WF Scenarios Guidance: Workflow Designer Re-Hosting and Tutorial: Hosting the WF Designer.

How do you write a search function that is easy to comprehend (/maintainable)? Maybe in a modular manner?

Every time I see a search function, the code behind it is a mess. Several hundreds of lines, spaghetti code, and almost ALWAYS as one huge method. A programming language (Java/C#/PHP/etc) is used to construct one big fat SQL query. Many, many if else's.
There must be more elegant ways to do this than this? Or is this what you get when you use RMDBS instead of a flat data structure?
I'd be willing to learn more about this topic, perhaps even buy a book. /Adam

Use the query object pattern. If you can, also use an ORM, it will make things easier.
The implementation details depend on your platform and architecture, but here are some samples:
http://www.theserverside.com/patterns/thread.tss?thread_id=29319
http://www.lostechies.com/blogs/chad_myers/archive/2008/08/01/query-objects-with-the-repository-pattern.aspx

In my current project we use a simplified version of the query object pattern that mausch mentions. In our case we have a search criteria object that consists of a field and a value, and several such objects can be added to a list. We did have an operator property from the start, but it was never used so we removed it. Whether the criteria are treated as AND or OR depends on the search method that is used (I would say that it is AND in 95% of the cases in that project).
The find methods themselves do not do a lot with this information; they will invoke stored procs in the DB, passing the criteria as parameters. Most of those procs are fairly straight forward, even though we have a couple that does involve some string handling to unpack lists of critera for certain fields.
The code from a caller's perspective might look something like this (the Controller classes wraps repetetive stuff as instantiating a search object with a configurable implementation*, populating it with search criteria and such):
CustomerCollection customers = CustomerController.Find(new SearchCriterion("Name", "<the customer name>"));
If more than one search criterion is needed a collection can be passed instead. Inside the finder function the code will loop over the collection, map the present values to appropriate parameters in an SqlCommand object.
This approach has worked out rather well for us.
*) The "configurable implementation" means that we have created an architecture wher the search objects are defined as abstract classes that merely will define the interface and contain some generic pre- and post validation. The actual search code is implemented in separate decendent classes; which amongst other things allowed us to quickly create a "fake data layer" that could be used for mocking away the database for some of the unit tests.

Have you looked at the Lucene project (http://lucene.apache.org)? Its designed exactly for this purpose. The idea is that you build and then maintain a set of indexes that are then easily searchable. The lifecycle works like this:
Write a bunch of sql statements that index all of the searchable areas of your database
Run them against the full database to create an initial index of your data
Every time the data changes, update these indexes.
The query language is much simpler then, you're queries become much more targeted.
There is a great project in the hibernate tool suite called hibernate search (http://search.hibernate.org) that does the maintenance of your indexes for you if you are using hibernate as your ORM.

I've been tinkering with this thought a bit (since I actually had to implement something like this some time ago) and I've come to the conclusion that there's two ways I'd do it to make it both work and especially maintainable. But before going into those, here's some history first.
1. Why does the problem even exist
Most search functions are based on algorithms and technologies derivated from the ones in databases. SQL was originally developed in the early 1970's (Wikipedia says 1974) and back then programming was a whole another kind of beast than it is today because every byte counted, every extra function call could make the difference between excellent performance and bankruption, code was made by people who thought in Assembly...well you get the point.
The problem is that those technologies originally have mostly been carried over to modern world without changing them (and why should they be changed, don't fix something which isn't broken) which means the old paradigms creep around too. And then there's cases when the original algorithm is misinterpreted for some reason and you end up with what you now have, like slow regular expressions. A bit of underlining here is required though, the technologies themselves aren't bad, it's usually just the legacy paradigms which are!
2. Solutions to the problem
The solution I ended up using was a system which was a mix of builder pattern and query object pattern (linked by mausch already). As an example if I were to make a pragmatic system to build SQL queries, it would look something like this:
SQL.select("column1", "column2")
.from("relation")
.where().valueEquals("column1", "hello")
.and().valueIsLargerThan("column2", 3)
.toSQL();
The obvious downside of this is that the builder pattern has the tendency to be a bit too verbose. Upsides are that the each of the build steps (=methods) are quite small by nature, for example .valueIsLargerThan("a", x) merely may just be return columnName + ">=" + x;. This means they're easily unit-testable and one of the biggest upsides is that they can be generated easily from external sources like XML/whatnot and most notably it's rather easy to create a converter from, say, SQL query to Lucene query (Lucene has automation for this already afaik, this is just an example).
The second one I'd rather use but really avoid is because it's not order-safe (unless you spend a good amount of time creating metadata helper classes) while builders are. It's easier to write an example than to go into more detail what I mean, so:
import static com.org.whatever.SQL.*;
query(select("column1", "column2"),
from("relation"),
where(valueEquals("column1", "hello"),
valueIsLargerThan("column2", 3)));
I do count static imports as a downside but other than that, that looks like something I'd really want to use.

What is the design motive behind extension methods in C#

I was wondering what is the design motive behind extension methods in C#

It allows you to create new functionality to an existing code base without editing the original code.
http://weblogs.asp.net/scottgu/archive/2007/03/13/new-orcas-language-feature-extension-methods.aspx
"Extension methods allow developers to add new methods to the public contract of an existing CLR type, without having to sub-class it or recompile the original type. Extension Methods help blend the flexibility of "duck typing" support popular within dynamic languages today with the performance and compile-time validation of strongly-typed languages.
Extension Methods enable a variety of useful scenarios, and help make possible the really powerful LINQ query framework that is being introduced with .NET as part of the "Orcas" release."

The primary reason for their existence is being able to somehow add features to a type without inheriting from it.
This was required to provide Where, Select, ... methods for use in LINQ for collections that didn't have one.

It provides multiple inheritance via the back door.
It languages such as C++, which support inheriting from many classes, extension methods aren't required. However multiple inheritance has lots of problems with it, and so modern languages have dropped it. However extension methods are a use-case where multiple inheritance is useful. Rather than re-introduce multiple inheritance though, the C# designers created extension methods (which is a very neat solution to the problem).

Quite often we end up writing for ourselves usefull little utility static classes that perform a common function on a type. I know there have been a number of times I wished I could simply inherit a class to add a feature but that class is sealed. I'm glad though they were sealed, unwarranted inheriting is a bad thing.
Extension methods make code look more intuitive by allowing those static methods to appear to be new instance methods of the type.
They also have the advantage of not polluting the actual member namespace of the type and allowing you to opt into them by means of the using keyword.

The existence of Extension Methods is very likely due to the need for Microsoft to add functionality to IEnumerable without changing the interface. If they added methods to the interface, then every existing implementation of IEnumerable (including Microsoft's) would no longer compile.
The alternative to changing IEnumerable is to create a utility class (called Enumerable) which has methods that perform transformations on instances of IEnumerable. This works, except the user experience is not the same as calling a method from an existing IEnumerable instance. For instance, compare the following equivalent statements.
IEnumerable<string> strings = myIntList.Select(num => num.ToString())
.Where(num => num.StartsWith('T'));
IEnumerable<string> strings =
Enumerable.Where(
Enumerable.Select(myIntList, num => num.ToString()),
num => num.StartsWith('T'));
To achieve the best of both worlds, the C# compiler team added support for extension methods, allowing the creation of the Enumerable class with special syntax and maintaining the same user experience of having added the methods to IEnumerable to begin with.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.