Linq to Entity comparing strings ignores white spaces - c#

When using LINQ to entity doing string comparisons will ignore white spaces.
In my table, I have an nchar(10) column so any data saved if it is not 10 characters will fill the rest with empty spaces. Below i am comparing the "ncharTextColumn" with the "Four" string. And even though the ncharText will equal "Four " It results in a match and the "result" variable will contain 1 record
TestEntities1 entity = new TestEntities1();
var result = entity.Table_1.Where(e => e.ncharText == "Four");
Is there an explanation for this and a way to work around it or am I going to have to call ToList on my query before any comparisons like so.
var newList = result.ToList().Where(e => e.ncharText == "Four");
This code now correctly returns 0 records as it takes into account white spaces. However, calling to list before a comparison can result in loading a large collection into memory which won't end up being used.

This answer explains why.
SQL Server follows the ANSI/ISO SQL-92 specification (Section 8.2, ,
General rules #3) on how to compare strings with spaces. The ANSI
standard requires padding for the character strings used in
comparisons so that their lengths match before comparing them. The
padding directly affects the semantics of WHERE and HAVING clause
predicates and other Transact-SQL string comparisons. For example,
Transact-SQL considers the strings 'abc' and 'abc ' to be equivalent
for most comparison operations.
The only exception to this rule is the LIKE predicate. When the right
side of a LIKE predicate expression features a value with a trailing
space, SQL Server does not pad the two values to the same length
before the comparison occurs. Because the purpose of the LIKE
predicate, by definition, is to facilitate pattern searches rather
than simple string equality tests, this does not violate the section
of the ANSI SQL-92 specification mentioned earlier.
Internally LINQ is just making SQL queries against your database.

Related

Is there a syntactically legal expression that has 2 consecutive identifiers separated only by white space in C#?

That might not be the best way to phrase it, but I'm considering writing a tool that converts identifiers separated by spaces in my code to camel case. A quick example:
var zoo animals = GetZooAnimals(); // i can't help but type this
var zooAnimals = GetZooAnimals(); // i want it to rewrite it like this
I was wondering if writing a tool like this would run into any ambiguities assuming it ignores all keywords. The only reason I can think of is if there is a syntactically valid expression with 2 identifiers only separated by white space.
Looking through the grammar I could not immediately find a place that allows it, but perhaps someone else would know better.
On a side note, I realize this is not a practical solution to a real problem a lot of people have, but just something I do all the time and wanted to take a stab at fixing with tools instead of forcing myself to always write camel case.
It is hard to tell whether a space-separated sequence of identifiers represents a single variable or not without doing full semantic analysis. For example
Myclass myVariable;
is a pair of space-separated identifiers which are perfectly valid. This would cause an ambiguity if you want to camel-case both type names and variable names.
If one enters:
csharp> var i j = 3;
(1,7): error CS1525: Unexpected symbol `j', expecting `,', `;', or `='
in the csharp interactive shell, one gets an error generated by the parser (a (LA)LR parser does bookkeeping what to expect next). Such parser works left-to-right so it doesn't know which characters to come next. It simply knows that the next characters are one of the list shown above.
So that means that there is probably no way to - at least declare a variable - with spaces.
Furthermore based on this context-free grammar for C# there doesn't seem to be a case where one can place two identifiers next to each other. It is for instance possible that a primary expressions is an identifier, but there is no situation where a primary expression is placed next to an identifier (or with an optional part in between).
As #dasblinkenlight says, you can indeed see the rule "local-variable-declaration":
type variable-declarator
with type that can be evaluated to an identifier and variable-declarator starting with an identifier. You can however know that the type is the first identifier (or the var keyword). Some kind of rewrite rule is thus:
(\w+)(\s+\w+)+ -> \1 concat(\2)
where you need to combine (concat) the identifiers of the second group. In case of an assignment.

Linq with dynamics "where parameter"

I have this case:
I create an array from a list like this:
String[] parameters = stringParametersToSearch.Split(' ');
The number of parameters can vary from 1 to n and I have to search for objects that in the description field containing all the occurrences of parameters
List<LookUpObject> result =
components.Where(o => o.LongDescription.Contains(parameters[0])).ToList<LookUpObject>();
if the parameter is 1 do so, but if they had two or more?
Currently to resolve this situation, I use an IF in which I build the LINQ expression for cases up to five parameters (maximum of real cases).
I can resolve this situation dynamically using LINQ ?
You either want to use Any or All, depending on whether you want to find objects where all of the parameters match or any of them. So something like:
var result = components
.Where(o => parameters.Any(p => o.LongDescription.Contains(p)))
.ToList();
... but change Any to All if you need to.
It's always worth trying to describe a query in words, and then look at the words you've used. If you use the word "any" or "all" that's a good hint that you might want to use it in the query.
Having said that, given the example you posted (in a now-deleted comment), it's not clear that you really want to use string operations for this. If the long description is:
KW=50 CO2=69 KG=100
... then you'd end up matching on "G=100" or "KG=1" neither of which is what you really want, I suspect. You should probably parse the long description and parameters into name/value pairs, and look for those in the query.

Is sorting in LINQ by Ascii code?

In my LINQ to Entities query I have a .orderby f.Description.Trim() command
The reason for .Trim() is that some of the data coming from DB have a bunch of white spaces at the beginning of them so I wanted to trim those so they won't affect sorting.
Now it sorts correctly but I see something like this in the result:
[Queries - Blah]
Action
Adhere
Azalia
Then I looked up ASCII code of "[" and it is 91 and "A" is 65 so how come that one showed up first? Maybe there are some other things in the code causing this and sort is fine?
OrderBy is using the default comparator for strings, which doesn't use ASCII (actually, Unicode) ordinal comparison. It actually depends on the current culture you are using.
And, if you think about it... if you were sorting entries for an appendix or index, symbols come before letters (at least in English).
If you want to sort by "raw ascii value", use
...OrderBy(s => s, StringComparer.Ordinal)
If the actual expression can be compiled to a store expression, then the ordering will be done as implemented by your store.
So: the result will depend on the collation of the database, table and column.

Using ASCII equivalent in LINQ

I have a list which want to order by the ASCII code of one field. What is the equivalent of an ASCII method in LINQ.
revisions = revisions.OrderBy(x => Ascii(x.SubRevision) % 90).ToList();
Method Ascii doesn't exist in LINQ. How I can use it?
It has nothing to do with LINQ really. If x.SubRevision is a char property you can simply cast it into an integer to get the ASCII value:
revisions = revisions.OrderBy(x => ((int)x.SubRevision) % 90).ToList();
There is, though. The function you need to use is System.Data.Objects.SqlClient.SqlFunctions.Ascii(x.Subrevision). This takes the leftmost character in x.Subrevision and gives the ASCII code for it.
However, this method can only be used in a LINQ to Entities query, not LINQ to SQL.
Like the others have answered, characters in C# can be casted to int for the same effect. So if x.Subrevision is a character you can simply cast it to int, if it's a string, you can cast the leftmost character to string, like this:
(int)x.Subrevision[0]

Identify problematic characters in a string

I want to be able to identify problematic characters in a string saved in my sql server using LINQ to Entities.
Problematic characters are characters which had problem in the encoding process.
This is an example of a problematic string : "testing�stringáאç".
In the above example only the � character is considered as problematic.
So for example the following string isn't considered problematic:"testingstringáאç".
How can I check this Varchar and identify that there are problematic chars in it?
Notice that my preferred solution is to identify it via a LINQ to entities query , but other solutions are also welcome - for example: some store procedure maybe?
I tried to play with Regex and with "LIKE" statement but with no success...
Check out the Encoding class.
It has a DecoderFallback Property and a EncoderFallback Property that lets you detect and substitute bad characters found during decoding.
.Net and NVARCHAR both use Unicode, so there is nothing inherently "problematic" (at least not for BMP characters).
So you first have to define what "problematic" in meant to mean:
characters are not mapped in target codepages
Simply convert between encodings and check whether data is lost:
CONVERT(NVARCHAR, CONVERT(VARCHAR, #originalNVarchar)) = #originalNVarchar
Note that you can use SQL Server collations using the COLLATE clause rather than using the default database collation.
characters cannot be displayed due to the fonts used
This cannot be easily done in .Net
You can do something like this:
DECLARE #StringWithProblem NVARCHAR(20) = N'This is '+NCHAR(8)+N'roblematic';
DECLARE #ProblemChars NVARCHAR(4000) = N'%['+NCHAR(0)+NCHAR(1)+NCHAR(8)+']%'; --list all problematic characters here, wrapped in %[]%
SELECT PATINDEX(#ProblemChars, #StringWithProblem), #StringWithProblem;
That gives you the index of the first problematic character or 0 if none is found.

Categories

Resources