DB2 ZOS String Comparison Problem

DB2 ZOS String Comparison Problem - c#

I am comparing some CHAR data in a where clause in my sql like this,
where PRI_CODE < PriCode
The problem I am having is when the CHAR values are of different lengths.
So if PRI_CODE = '0800' and PriCode = '20' it is returning true instead of false.
It looks like it is comparing it like this
'08' < '20'
instead of like
'0800' < '20'
Does a CHAR comparison start from the Left until one or the other values end?
If so how do I fix this?
My values can have letters in it so convering to numeric is not an option.

It's not comparing '08' with '20', it is, as you expect, comparing '0800' with '20'.
What you don't seem to expect, however, is that '0800' (the string) is indeed less than '20' (the string).
If converting it to numerics for a numeric comparison is out of the question, you could use the following DB2 function:
right ('0000000000'||val,10)
which will give you val padded on the left with zeroes to a size of 10 (ideal for a CHAR(10), for example). That will at least guarantee that the fields are the same size and the comparison will work for your particular case. But I urge you to rethink how you're doing things: per-row functions rarely scale well, performance-wise.
If you're using z/OS, you should have a few DBAs just lying around on the computer room floor waiting for work - you can probably ask one of them for advice more tailored to your specific application :-)
One thing that comes to mind in the use of an insert/update trigger and secondary column PRI_CODE_PADDED to hold the PRI_CODE column fully padded out (using the same method as above). Then make sure your PriCode variable is similarly formatted before executing the select ... where PR_CODE_PADDED < PriCode.
Incurring that cost at insert/update time will amortise it over all the selects you're likely to do (which, because they're no longer using per-row functions, will be blindingly fast), giving you better overall performance (assuming your database isn't one of those incredibly rare beasts that are written more than read, of course).

Related

Hungarian Algorithm for non square matrix

I'm trying to implement the Hungarian algorithm. Everything is fine except for when the matrix isn't square. All the methods I've searched are saying that I should make it square by adding dummy rows/columns, and filling the dummy row/column with the maximum number in the matrix. My question is that won't this affect the final result? Shouldn't the dummy row/column be filled with at least max+1?

The dummy values should all be zero. The point is that it doesn't matter which one you choose, you're going to ignore those choices in the end because they weren't in the original data. By making them zero (at the start), your algorithm won't have to work as hard to find a value you're not going to use.

The main idea of the Hungarian algorithm is built upon the fact that the "optimal assignment of jobs, remains the same if a number is added/subtracted from all entries of any row or column of the matrix". Therefore, it does not matter if you use dummy value as "max or max+1 or 0". It can be set as any number and better it is 0 (as Yay295 said, the algorithm would like to work less if entries are already 0)

Fastest way to detect non-equal strings (without storing the string)?

I am writing a templating engine and I am searching for a good way to detect if a template has changed.
For this I have the following requirements (in order of importance):
non-equal strings are required to be detected different
as fast as possible
as less memory as possible (=> do not store the whole string for comparison)
high propability to detect equal strings as equal
It is not a big problem, if sometimes equal strings are not detected as equal as this would just trigger a "re-rendering" which would not be needed, but because of the "heavy work" of this, this should happen as less as possible.
I first thought of using String.GetHashCode(), but the probalility of getting the same hash-code for two non-equal strings is pretty high.
Are there any good combinations like checking hash-code and Length to get the probability of to non-equal strings wrongly detected as equal to an unrealisticly happening low number?
Or is using some hashing algorithm, like MD5 or SHA, a good alternative (after hash-code is equal)?
My rendering looks something like the following:
public string RenderTemplate(string name, string template)
{
var cachedTemplate = Cache.Get(name);
if(cachedTemplate == null || !cachedTemplate.Equals(template)) // <= Equals
{
cachedTemplate = new Template(name, template);
cachedTemplate.Render();
Cache.Set(name, cachedTemplate);
}
return cachedTemplate.Result;
}
The Equals is the point I am asking about.
I am also open for other suggestions how this could be solved.
UPDATE:
To add some numbers to get more context:
I expect to have >1000 individual templates and each template will have up to at least a few thousand characters.
This is why I would like to avoid storing the whole template-string "in memory" only for the comparison.
Most of the templates are stored in the DB.
UPDATE 2:
What do you think about extending my RenderTemplate method with a timestamp as suggested by Nikola:
public string RenderTemplate(string name, string template, DateTime timestamp)
Then I could compare name, GetHashCode and timestamp which does not need much memory, should be pretty fast and the probability of a "wrongly detected equality" is practically 0. The timestamp I can read from the DB (have it already there) or the "last changed date" from the file-system for a file-based template.

You don't have much choice. If you don't compare strings by comparing their content, use a hash algorithm to determine if strings are equal. Personally, I would probably use a hash algorithm. If you are a bit paranoid and afraid of a collision, choose algorithm with widest space (e.g. SHA512).
Why do you need to compare strings to determine that a template has changed? Why not use a different approach?
If file is stored on disk, why not use a file watcher?
If stored in database, why not use a timestamp to detect when it was saved?
If application is restarted, anyway reload templates
Also, it's worrying that a template for UI changes so often that you must make checks like this. I think you have more problems with design beside comparing strings.

string(";P") is bigger or string("-_-") is bigger?

I found very confusing when sorting a text file. Different algorithm/application produces different result, for example, on comparing two string str1=";P" and str2="-_-"
Just for your reference here gave the ASCII for each char in those string:
char(';') = 59; char('P') = 80;
char('-') = 45; char('_') = 95;
So I've tried different methods to determine which string is bigger, here is my result:
In Microsoft Office Excel Sorting command:
";P" < "-_-"
C++ std::string::compare(string &str2), i.e. str1.compare(str2)
";P" > "-_-"
C# string.CompareTo(), i.e. str1.CompareTo(str2)
";P" < "-_-"
C# string.CompareOrdinal(), i.e. CompareOrdinal(w1, w2)
";P" > "-_-"
As shown, the result varied! Actually my intuitive result should equal to Method 2 and 4, since the ASCII(';') = 59 which is larger than ASCII('-') = 45 .
So I have no idea why Excel and C# string.CompareTo() gives a opposite answer. Noted that in C# the second comparison function named string.CompareOrdinal(). Does this imply that the default C# string.CompareTo() function is not "Ordinal" ?
Could anyone explain this inconsistency?
And could anyone explain in CultureInfo = {en-US}, why it tells ;P > -_- ? what's the underlying motivation or principle? And I have ever heard about different double multiplication in different cultureInfo. It's rather a cultural shock..!

?
std::string::compare: "the result of a character comparison depends only on its character code". It's simply ordinal.
String.CompareTo: "performs a word (case-sensitive and culture-sensitive) comparison using the current culture". So,this not ordinal, since typical users don't expect things to be sorted like that.
String::CompareOrdinal: Per the name, "performs a case-sensitive comparison using ordinal sort rules".
EDIT: CompareOptions has a hint: "For example, the hyphen ("-") might have a very small weight assigned to it so that "coop" and "co-op" appear next to each other in a sorted list."

Excel 2003 (and earlier) does a sort ignoring hyphens and apostrophes, so your sort really compares ; to _, which gives the result that you have. Here's a Microsoft Support link about it. Pretty sparse, but enough to get the point across.

MonoTouch on iPad: How to make text search faster?

I need to do text search based on user input in a relative large list (about 37K lines with 50 to 100 chars each line). The search is done after entering each character and the result is shown in a UITableView. This is my current code:
if (input.Any(x => Char.IsUpper(x)))
return _list.Where(x => x.Desc.Contains(input));
else
return _list.Where(x => x.Desc.ToLower().Contains(input));
It performs okay on a MacBook running simulator, but too slow on iPad.
On interesting thing I observed is that it takes longer and longer as input grows. For example, say "examin" as input. It takes about 1 second after entering e, 2 seconds after x, 5 seconds after a, but 28 seconds after m and so on. Why that?
I hope there is a simple way to improve it.

Always take care to avoid memory allocations in time sensitive code.
For example we often produce code often allocates string without realizing it, e.g.
x => x.Desc.ToLower().Contains(input)
That will allocate a string to return from ToLower. From your description this will occurs many time. You can easily avoid this by using:
x = x.Desc.IndexOf ("s", StringComparison.OrdinalIgnoreCase) != -1
note: just select the StringComparison.*IgnoreCase that match your need.
Also LINQ is nice but it hides allocations in many cases - maybe not in your case but measuring is key to get things faster. In that case using another algorithm (like suggested in another answer) could give you much better results (but keep in mind the allocations ;-)
UPDATE:
Mono's Contains(string) will call, after a few checks, the following:
CultureInfo.CurrentCulture.CompareInfo.IndexOf (this, value, 0, length, CompareOptions.Ordinal);
which, with your ToLower requirement that using StringComparison.OrdinalIgnoreCase is the perfect (i.e. identical) match for your existing code (it did not do any culture specific comparison).

Generally I've found that contains operations are not preferable for search, so I'd recommend you take a look at the Mastering Core Data Session (login required ) video on the WWDC 2010 page (around the 10 min mark). Apple knows that 'contains' is terrible w/ SQLite on mobile devices, you can essentially do what Apple does to sort of "hack" FTS on the version of SQLite they ship.
Essentially they do prefix matching by creating a table like:
[[ pk_id || input || normalized_input ]]
Where input and normalized_input are both indexed explicitly. Then they prefix match against the normalized value. So for instance if a user is searching for 'snuggles' and so far they've typed in 'snu' the prefix matching query would look like:
normalized_input >= 'snu' and normalized_input < 'snt'
Not sure if this translates given your use case, but I thought it was worth mentioning. Hope it's helpful!

You need to use a trie. See http://en.wikipedia.org/wiki/Trie

Any way to make this LINQ faster?

I have a LINQ expression that's slowing down my application.
I'm drawing a control, but to do this, I need to know the max width of the text that will appear in my column.
The way I'm doing that is this:
return Items.Max(w => TextRenderer.MeasureText((w.RenatlUnit == null)? "" :
w.RenatlUnit.UnitNumber, this.Font).Width) + 2;
However, this iterates over ~1000 Items, and takes around 20% of the CPU time that is used in my drawing method. To make it worse, there are two other columns that this must be done with, so this LINQ statement on all the items/columns takes ~75-85% of the CPU time.
TextRenderer is from System.Windows.Forms package, and because I'm not using a monospaced font, MeasureText is needed to figure out the pixel width of a string.
How might I make this faster?

I don't believe that your problem lies in the speed of LINQ, it lies in the fact that you're calling MeasureText over 1000 times. I would imagine that taking your logic out of a LINQ query and putting it into an ordinary foreach loop would yield similar run times.
A better idea is probably to employ a little bit of sanity checking around what you're doing. If you go with reasonable inputs (and disregard the possibility of linebreaks), then you really only need to measure the text of strings that are, say, within 10% or so of the absolute longest (in terms of number of characters) string, then use the maximum value. In other words, there's no point in measuring the string "foo" if the largest value is "paleontology". There's no font that has widths THAT variable.

It's the MeasureText method that takes time, so the only way to increase the speed is to do less work.
You can cache the results of the call to MeasureText in a dictionary, that way you don't have to remeasure strings that already has been measured before.
You can calculate the values once and keep along with the data to display. Whenever you change the data, you recalculate the values. That way you don't have to measure the strings every time the control is drawn.

Step 0: Profile. Assuming you find that most of the execution time is indeed in MeasureText, then you can try the following to reduce the number of calls:
Compute the lengths of all individual characters. Since it sounds like you're rendering a number, this should be a small set.
Estimate the length numstr.Select(digitChar=>digitLengthDict[digitChar]).Sum()
Take the strings with the top N lengths, and measure only those.
To avoid even most of the cost of the lookup+sum, also filter to include only those strings within 90% of the maximum string-length, as suggested.
e.g. Something like...
// somewhere else, during initialization - do only once.
var digitLengthDict = possibleChars.ToDictionary(c=>c,c=>TextRenderer.MeasureText(c.ToString()));
//...
var relevantStringArray = Items.Where(w=>w.RenatlUnit!=null).Select(w.RenatlUnit.UnitNumber).ToArray();
double minStrLen = 0.9*relevantStringArray.Max(str => str.Length);
return (
from numstr in relevantStringArray
where str.Length >= minStrLen
orderby numstr.Select(digitChar=>digitLengthDict[digitChar]).Sum() descending
select TextRenderer.MeasureText(numstr)
).Take(10).Max() + 2;
If we knew more about the distribution of the strings, that would help.
Also, MeasureText isn't magic; it's quite possible you can duplicate it's functionality entirely quite easily for a limited set of inputs. For instance, it would not surprise me to learn that the Measured length of a string is precisely equal to the sum of the length of all characters in the string, minus the kerning overhang of all character bigrams in the string. If your string then consists of, say, 0-9, +, -, ,, ., and a terminator symbol, then a lookup table of 14 character widths and 15*15-1 kernel corrections might be enough to precisely emulate MeasureText at a far greater speed, and without much complexity.
Finally, the best solution is to not solve the problem at all - perhaps you can rearchitect the application to not require such a precise number - if a simpler estimate were to suffice, you could avoid MeasureText almost completely.

Unfortunately, it doesn't look like LINQ is your problem. If you ran a for loop and did this same calculation, the amount of time would be the same order of magnitude.
Have you considered running this calculation on multiple threads? It would work nicely with Parallel LINQ.
Edit: It seems Parallel LINQ won't work because MeasureText is a GDI function and will simply be marshaled back to the UI thread (thanks #Adam Robinson for correcting me.)

My guess is the issues is not the LINQ expression but calling the MeasureText several thousand times.
I think you could work around the non-monospaced font issue by breaking the problem into 4 parts.
Find the biggest number in terms of render size
Find the apartment unit with the most digits
Create a string with all values being the value determined in #1 and having size in #2.
Pass the value created in #3 to MeasureText and use that as your basis
This won't yield a perfect solution but it will ensure that you reserve at least enough space for your item and avoids the pitfall of calling MeasureText far too many times.

If you can't figure out how to make MeasureText faster, you could precalculate the width of all the characters in your font size and style and estimate the width of a string like that, although kerning of character pairs would suggest that it would probably be only an estimate and not precise.

You might want to consider as an approximation taking the length of the longest string and then finding the width of a string of that length of 0's (or whatever the widest digit is, I can't remember). That should be a much faster method, but it would only be an approximation and probably longer than necessary.
var longest = Items.Max( w => w.RenatlUnit == null
|| w.RenatlUnit.UnitNumber == null)
? 0
: w.RenatlUnit.UnitNumber.Length );
if (longest == 0)
{
return 2;
}
return TextRenderer.MeasureText( new String('0', longest ) ).Width + 2;

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.