datarow values comparision with c# - c#

I have a sql database with a table that contains my grading scales and comment e.g
debut end comment
5 ---- 10 -- x
0 ---- 4 --- y
I have managed to iterate through the rows of my table with a foreach loop.
I want to supply a value, maybe with a text box control, then the program should check the range in my gradingScale table where the value follows and outputs a corresponding comment
for example
int number;
number=4
comment=y;

Not sure what you're looking for - and you didn't mention what database you're using - so here I'm just guess that you might be looking for something like this:
DECLARE #Number INT
SET #Number = 4
SELECT comment
FROM dbo.gradingScale
WHERE #Number BETWEEN debut AND end
Of course, you could also wrap this inside a stored procedure (if your database supports that):
CREATE PROCEDURE dbo.GetComment (#Number INT)
AS
SELECT comment
FROM dbo.gradingScale
WHERE #Number BETWEEN debut AND end
These code samples are for Microsoft SQL Server 2005 and up (T-SQL).

If I understand correctly, your database list ranges, each associated with a comment. In your example 0 to 4 map to x, while 5 to 10 map to y.
In that case, a very simple approach would be, assuming that your ranges are not overlapping, to sort your table by ascending debut, and then iterate over the rows until you find one which start is <= to your value.

Hard to make out what you want exactly, but here's an example implementation (You don't say how or in what form your Sql results are returned, so I've provided a DTO/List implementation:
static void SO6648999()
{
List<test> sample = new List<test>
{
new test { debut = 0,
end = 4,
comment = "y"},
new test { debut = 5,
end = 10,
comment = "x"}
};
int number = 4;
string comment = sample.Single(x => number >= x.debut && number <= x.end).comment;
}
class test
{
public int debut;
public int end;
public string comment;
}

I believe you are referring to DataTable
You can use a Select on the DataTable and filter out the records by providing an expression. It works similar to a where clause in Sql.
dt1.Select("end = 4")// assuming column holding int value
end is the column name of the value you are searching and this will return the datarow (array) satisfying the condition.

Related

split integer multiple values in one field into rows in ssis

Please help me split column's field values into multiple rows.
Table
ID Name Location DeptNo
1 Jack Florida 101,102,103
I'm looking for output like this
ID Name Location DeptNo
1 Jack FLorida 101
1 Jack FLorida 102
1 Jack FLorida 103
I've figured out the configuration in ssis using script component but not sure about my code
Please check
public class ScriptMain : UserComponent
{
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
int[] Edpt = Row.DeptNo.ToInt().Split(new int[] { ',' }, IntSplitOptions.None);
int i = 0;
while (i < DeptNo.Length)
{
Output0Buffer.AddRow();
Output0Buffer.ID = Row.ID;
Output0Buffer.Name = Row.Name;
Output0Buffer.Location = Row.Location;
Output0Buffer.DeptNo = DeptNo[i];
i++;
}
}
}
99% of the way there.
Given a source like
SELECT
1 AS ID
, 'Jack' AS Name
, 'Florida' AS Location
, '101,102,103' AS DeptNo;
Your Script task becomes Asynchronous as it will not be a 1:1 input to output buffer. I made 3 changes to your script.
The first was in the creation of edpt array. There might be a way to split the strings and convert the result directly to a nullable integer array but it didn't come to mind.
string[] Edpt = Row.DeptNo.Split(new char[] { ',' });
The second changes was your for loop. while (i < DeptNo.Length) is going to look at each character in the source DeptNo string. so you'd have something like 11 output buffers created (which would then fail when it attempts to put the comma into an integer (unless it treats it as a char data type and then uses the ascii value). At any rate, to heck with while loops unless you need them. The foreach helps eliminate the dreaded off by one mistakes. So, I enumerate through my collection (Edpt) and for each value I find, I assign it to a loop scoped variable called item
foreach (var item in Edpt)
The final change is to the assignment in my output buffer. Output0Buffer.DeptNo = DeptNo[i]; again would only be access a specific value in the original string (1, 0, 1, ,, 1, 0, 2, ,, etc). Instead, you want to operate on the splitted array like Output0Buffer.DeptNo = Edpt[i]; But, since we don't need to do any of that ordinal access, we just reference item.
Output0Buffer.DeptNo = Int32.Parse(item);
The final code looks like
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
// Create an array of the department numbers as strings
string[] Edpt = Row.DeptNo.Split(new char[] { ',' });
// no longer needed
int i = 0;
// foreach avoids off by one errors
foreach (var item in Edpt)
{
Output0Buffer.AddRow();
Output0Buffer.ID = Row.ID;
Output0Buffer.Name = Row.Name;
Output0Buffer.Location = Row.Location;
// use the iterator directly
Output0Buffer.DeptNo = Int32.Parse(item);
}
}

regex performance degrades

I'm writing a C# application that runs a number of regular expressions (~10) on a lot (~25 million) of strings. I did try to google this, but any searches for regex with "slows down" are full of tutorials about how backreferencing etc. slows down regexes. I am assuming that this is not my problem because my regexes start out fast and slow down.
For the first million or so strings it takes about 60ms per 1000 strings to run the regular expressions. By the end, it's slowed down to the point where its taking about 600ms. Does anyone know why?
It was worse, but I improved it by using instances of RegEx instead of the cached version and compiling the expressions that I could.
Some of my regexes need to vary e.g. depending on the user's name it might be
mike said (\w*) or john said (\w*)
My understanding is that it is not possible to compile those regexes and pass in parameters (e.g saidRegex.Match(inputString, userName)).
Does anyone have any suggestions?
[Edited to accurately reflect speed - was per 1000 strings, not per string]
This may not be a direct answer to your question about RegEx performance degradation - which is somewhat fascinating. However - after reading all of the commentary and discussion above - I'd suggest the following:
Parse the data once, splitting out the matched data into a database table. It looks like you're trying to capture the following fields:
Player_Name | Monetary_Value
If you were to create a database table containing these values per-row, and then catch each new row as it is being created - parse it - and append to the data table - you could easily do any kind of analysis / calculation against the data - without having to parse 25M rows again and again (which is a waste).
Additionally - on the first run, if you were to break the 25M records down into 100,000 record blocks, then run the algorithm 250 times (100,000 x 250 = 25,000,000) - you could enjoy all the performance you're describing with no slow-down, because you're chunking up the job.
In other words - consider the following:
Create a database table as follows:
CREATE TABLE PlayerActions (
RowID INT PRIMARY KEY IDENTITY,
Player_Name VARCHAR(50) NOT NULL,
Monetary_Value MONEY NOT NULL
)
Create an algorithm that breaks your 25m rows down into 100k chunks. Example using LINQ / EF5 as an assumption.
public void ParseFullDataSet(IEnumerable<String> dataSource) {
var rowCount = dataSource.Count();
var setCount = Math.Floor(rowCount / 100000) + 1;
if (rowCount % 100000 != 0)
setCount++;
for (int i = 0; i < setCount; i++) {
var set = dataSource.Skip(i * 100000).Take(100000);
ParseSet(set);
}
}
public void ParseSet(IEnumerable<String> dataSource) {
String playerName = String.Empty;
decimal monetaryValue = 0.0m;
// Assume here that the method reflects your RegEx generator.
String regex = RegexFactory.Generate();
for (String data in dataSource) {
Match match = Regex.Match(data, regex);
if (match.Success) {
playerName = match.Groups[1].Value;
// Might want to add error handling here.
monetaryValue = Convert.ToDecimal(match.Groups[2].Value);
db.PlayerActions.Add(new PlayerAction() {
// ID = ..., // Set at DB layer using Auto_Increment
Player_Name = playerName,
Monetary_Value = monetaryValue
});
db.SaveChanges();
// If not using Entity Framework, use another method to insert
// a row to your database table.
}
}
}
Run the above one time to get all of your pre-existing data loaded up.
Create a hook someplace which allows you to detect the addition of a new row. Every time a new row is created, call:
ParseSet(new List<String>() { newValue });
or if multiples are created at once, call:
ParseSet(newValues); // Where newValues is an IEnumerable<String>
Now you can do whatever computational analysis or data mining you want from the data, without having to worry about performance over 25m rows on-the-fly.
Regex does takes time to compute. However, U can make it compact using some tricks.
You can also use string functions in C# to avoid regex function.
The code would be lengthy but might improve performance.
String has several functions to cut and extract characters and do pattern matching as u need.
like eg: IndeOfAny, LastIndexOf, Contains....
string str= "mon";
string[] str2= new string[] {"mon","tue","wed"};
if(str2.IndexOfAny(str) >= 0)
{
//success code//
}

Best Way to Check for Used Key with Nhibernate?

on my site I allow people to buy subscriptions to my site in bulk(I call them vouchers). Once they have these vouchers, they give them to whoever and they enter that code into their account to upgrade them.
Right now I am thinking of doing 4 alphanumeric code(upper case, lower case and digits) and will have something like this
var chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
var stringChars = new char[4];
var random = new Random();
for (int i = 0; i < stringChars.Length; i++)
{
stringChars[i] = chars[random.Next(chars.Length)];
}
var finalString = new String(stringChars);
For now I think that will give me more than enough combinations and if I ever do run out I can always up the length of the code. I want to keep it short because I don't want the user to have to type in huge as numbers.
I also don't have the time to make a more elegant solution maybe were they click a link or something in their email and it activates their account and of course this would cut down on someone trying to randomly guess a voucher number.
These are things I would deal with if the site every becomes more popular.
I am wondering though how can I handle the possible duplicate generation of the same voucher. My first thought was to check the database each time a voucher is created and if it exists then make a new one.
However that seems like it could be slow. So I thought also maybe getting all the keys first and store them in memory and they check there but if the list keeps growing I might run into out of memory exceptions and all that great stuff.
So does anyone have any ideas? Or am I stuck doing one of the 2 method I listed above?
I am using nhibernate, asp.net mvc and C#.
Edit
static void Main(string[] args)
{
List<string> hold = new List<string>();
for (int i = 0; i < 10000; i++)
{
HashAlgorithm sha = new SHA1CryptoServiceProvider();
byte[] result = sha.ComputeHash(BitConverter.GetBytes(i));
string hex = null;
foreach (byte x in result)
{
hex += String.Format("{0:x2}", x);
}
hold.Add(hex.Substring(0,3));
Console.WriteLine(hex.Substring(0, 4));
}
Console.WriteLine("Number of Distinct values {0}", hold.Distinct().Count());
}
above is my attempt to try to use hashing. However I think I am missing something as it seems to have quite a bit more duplicates then expected.
Edit 2
I think I added what I was missing but not sure if this is exactly what he meant. I am also not sure what to do in a situation when I moved it as far as I can move it(my has seems to give me a length of 40 places I can move it).
static void Main(string[] args)
{
int subStringLength = 4;
List<string> hold = new List<string>();
for (int i = 0; i < 10000; i++)
{
SHA1CryptoServiceProvider sha = new SHA1CryptoServiceProvider();
byte[] result = sha.ComputeHash(BitConverter.GetBytes(i));
string hex = null;
foreach (byte x in result)
{
hex += String.Format("{0:x2}", x);
}
int startingPositon = 0;
string possibleVoucherCode = hex.Substring(startingPositon,subStringLength);
string voucherCode = Move(subStringLength, hold, hex, startingPositon, possibleVoucherCode);
hold.Add(voucherCode);
}
Console.WriteLine("Number of Distinct values {0}", hold.Distinct().Count());
}
private static string Move(int subStringLength, List<string> hold, string hex, int startingPositon, string possibleVoucherCode)
{
if (hold.Contains(possibleVoucherCode))
{
int newPosition = startingPositon + 1;
if (newPosition <= hex.Length)
{
if ((newPosition + subStringLength) > hex.Length)
{
possibleVoucherCode = hex.Substring(newPosition, subStringLength);
return Move(subStringLength, hold, hex, newPosition, possibleVoucherCode);
}
// return something
return "0";
}
else
{
// return something
return "0";
}
}
else
{
return possibleVoucherCode;
}
}
}
It is going to be slow because you want to generate the vouchers randomly and then check the database for every generated code.
I would create a table vouchers with an id, the code and an is_used column. I would fill that table once with enough random codes. Since this can be done in a separate process, the performance won't be such a big problem. Let it run in the evening and the next day you get a fully filled vouchers-table.
If you want to prevent generating duplicate vouchers, that won't be a problem. You can generate them anyway and put them either in a System.Collections.Generic.HashSet (which prevents adding duplicates without throwing an exception) or call the Linq-method Distinct(), before adding them to that vouchers table.
If you insist on short codes:
Use a GUID as a primary key, generate one random number. How you might want to translate this in to alpha-num is up to you.
Use the last byte or two of the guid and the random number. 1234-684687 This should make it slightly less easy to bruteforce coupons. And handle any (rare) collisions with an exception.
Easy way to shorten an int, change it's base (from 10 to 62). (in VB, and this is old code)
This yields "2lkCB1" when given Int32.MaxValue
''//given intValue as your random integer
Dim result As String = String.Empty
Dim digits as String = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
Dim x As Integer
While (intValue > 0)
x = intValue Mod digits.Length
result = digits(x) & result
intValue = intValue - x
intValue = intValue \ digits.Length
End While
Return result
But now we're already answering more than one question.
For a bulk data operation like this, I would recommend not using NHibernate and just doing straight ADO.NET.
Batch Check
Since you anticipate generating big batches of codes at once, you should batch multiple code checks into a single round-trip to the database. If you're using SQL Server 2008 or higher, you could do this using table-valued parameters, checking a whole list of codes at once.
SELECT DISTINCT b.Code
FROM #batch b
WHERE NOT EXISTS (
SELECT v.Code
FROM dbo.Voucher v
WHERE v.Code = b.Code
);
Concurrency
Now, what about concurrency issues? What if two users generate the same code at roughly the same time? Or simply in-between the time when we check the code for uniqueness and when we insert it into the Voucher table?
We can take care of that by modifying the query as follows:
DECLARE #batchid uniqueidentifier;
SET #batchid = NEWID();
INSERT INTO dbo.Voucher (Code, BatchId)
SELECT DISTINCT b.Code, #batchid
FROM #batch b
WHERE NOT EXISTS (
SELECT Code
FROM dbo.Voucher v
WHERE b.Code = v.Code
);
SELECT Code
FROM dbo.Voucher
WHERE BatchId = #batchid;
Executing via .NET
Assuming that you have defined the following table-valued user type...
CREATE TYPE dbo.VoucherCodeList AS TABLE (
Code nvarchar(8) COLLATE SQL_Latin1_General_CP1_CS_AS NOT NULL
/* !!! Remember to specify the collation on your Voucher.Code column too, since you want upper and lower-case codes. */
);
... you could execute this query via .NET code like this:
public ICollection<string> GenerateCodes(int numberOfCodes)
{
var result = new List<string>(numberOfCodes);
while (result.Count < numberOfCodes)
{
var batchSize = Math.Min(_batchSize, numberOfCodes - result.Count);
var batch = Enumerable.Range(0, batchSize)
.Select(x => GenerateRandomCode());
var oldResultCount = result.Count;
result.AddRange(FilterAndSecureBatch(batch));
var filteredBatchSize = result.Count - oldResultCount;
var collisionRatio = ((double)batchSize - filteredBatchSize) / batchSize;
// Automatically increment length of random codes if collisions begin happening too frequently
if (collisionRatio > _collisionThreshold)
CodeLength++;
}
return result;
}
private IEnumerable<string> FilterAndSecureBatch(IEnumerable<string> batch)
{
using (var command = _connection.CreateCommand())
{
command.CommandText = _sqlQuery; // the concurrency-safe query listed above
var metaData = new[] { new SqlMetaData("Code", SqlDbType.NVarChar, 8) };
var param = command.Parameters.Add("#batch", SqlDbType.Structured);
param.TypeName = "dbo.VoucherCodeList";
param.Value = batch.Select(x =>
{
var record = new SqlDataRecord(metaData);
record.SetString(0, x);
return record;
});
using (var reader = command.ExecuteReader())
while (reader.Read())
yield return reader.GetString(0);
}
}
Performance
After implementing all of this (and moving the command and parameter creation out of the loop so it would be re-used between batches), I was able to insert 10,000 codes using a batch size of 500 consistently in approx. 0.5 to 2 seconds, or 5 to 20 codes per millisecond.
Code Density / Collisions / Guessability
The _collisionThreshold field limits the density of your codes. It's a value between 0 and 1. Actually, it must be less than 1 or else you would wind up in an infinite loop when the 4 digit codes were exhausted (probably should add an assertion for this in code). I would recommend never turning it above 0.5 for performance reasons. More than 50% collisions would mean it's spending more time testing already-used codes than actually generating new ones.
Keeping the collision threshold low is how you would control how hard-to-guess your codes are. Setting _collisionThreshold to 0.01 would generate codes such that there's approximately a 1% chance of someone guessing a code.
If collisions occur too frequently, CodeLength (which is used by the GenerateRandomCode() method) will be incremented. This value needs to be persisted somewhere. After executing GenerateCodes(), check CodeLength to see if it has changed and then save the new value.
Source Code
The full code is available here: https://gist.github.com/3217856. I am the author of this code, and am releasing it under the MIT license. I had fun with this little challenge, and also got to learn how to pass a table-valued parameter to an inline parametrized query. I hadn't ever done that before. I've only ever passed them to full-fledged stored procedures.
A possible solution for you is like this:
Find the maximum ID of a voucher (an integer). Then, run any hash function on it, take the first 32 bits and convert to the string you want to show the user (or use a 32bit hash function such as Jenkins hash function). This will probably work, hash collisions are pretty rare. But this solution is very similar to yours, in the point of randomness.
You could run a test which finds the first 10 or 100 collisions (this should be enough for you) and forces the algorithm to "skip" them and use a different starting value. Then, you don't need to check the database at all (well, at least until you reach about 4294967296 vouchers...)
how about utilizing nHibernate's HiLo algorithm?
Here is an example on how you can get the next value (without DB access).

Get closest/next match in .NET Hashtable (or other structure)

I have a scenario at work where we have several different tables of data in a format similar to the following:
Table Name: HingeArms
Hght Part #1 Part #2
33 S-HG-088-00 S-HG-089-00
41 S-HG-084-00 S-HG-085-00
49 S-HG-033-00 S-HG-036-00
57 S-HG-034-00 S-HG-037-00
Where the first column (and possibly more) contains numeric data sorted ascending and represents a range to determine the proper record of data to get (e.g. height <= 33 then Part 1 = S-HG-088-00, height <= 41 then Part 1 = S-HG-084-00, etc.)
I need to lookup and select the nearest match given a specified value. For example, given a height = 34.25, I need to get second record in the set above:
41 S-HG-084-00 S-HG-085-00
These tables are currently stored in a VB.NET Hashtable "cache" of data loaded from a CSV file, where the key for the Hashtable is a composite of the table name and one or more columns from the table that represent the "key" for the record. For example, for the above table, the Hashtable Add for the first record would be:
ht.Add("HingeArms,33","S-HG-088-00,S-HG-089-00")
This seems less than optimal and I have some flexibility to change the structure if necessary (the cache contains data from other tables where direct lookup is possible... these "range" tables just got dumped in because it was "easy"). I was looking for a "Next" method on a Hashtable/Dictionary to give me the closest matching record in the range, but that's obviously not available on the stock classes in VB.NET.
Any ideas on a way to do what I'm looking for with a Hashtable or in a different structure? It needs to be performant as the lookup will get called often in different sections of code. Any thoughts would be greatly appreciated. Thanks.
A hashtable is not a good data structure for this, because items are scattered around the internal array according to their hash code, not their values.
Use a sorted array or List<T> and perform a binary search, e.g.
Setup:
var values = new List<HingeArm>
{
new HingeArm(33, "S-HG-088-00", "S-HG-089-00"),
new HingeArm(41, "S-HG-084-00", "S-HG-085-00"),
new HingeArm(49, "S-HG-033-00", "S-HG-036-00"),
new HingeArm(57, "S-HG-034-00", "S-HG-037-00"),
};
values.Sort((x, y) => x.Height.CompareTo(y.Height));
var keys = values.Select(x => x.Height).ToList();
Lookup:
var index = keys.BinarySearch(34.25);
if (index < 0)
{
index = ~index;
}
var result = values[index];
// result == { Height = 41, Part1 = "S-HG-084-00", Part2 = "S-HG-085-00" }
You can use a sorted .NET array in combination with Array.BinarySearch().
If you get a non negative value this is the index of exact match.
Otherwise, if result is negative use formula
int index = ~Array.BinarySearch(sortedArray, value) - 1
to get index of previous "nearest" match.
The meaning of nearest is defined by a comparer you use. It must be the same you used when sorting the array. See:
http://gmamaladze.wordpress.com/2011/07/22/back-to-the-roots-net-binary-search-and-the-meaning-of-the-negative-number-of-the-array-binarysearch-return-value/
How about LINQ-to-Objects (This is by no means meant to be a performant solution, btw.)
var ht = new Dictionary<string, string>();
ht.Add("HingeArms,33", "S-HG-088-00,S-HG-089-00");
decimal wantedHeight = 34.25m;
var foundIt =
ht.Select(x => new { Height = decimal.Parse(x.Key.Split(',')[1]), x.Key, x.Value }).Where(
x => x.Height < wantedHeight).OrderBy(x => x.Height).SingleOrDefault();
if (foundIt != null)
{
// Do Something with your item in foundIt
}

Test for gaps in range

I need to test if some objects inside a database fill a specific range, i.e 0-999.
I'm using C# and I've created a generic class using IComparable to test for the intersection. This works fine but I need to invert and find all the gaps that I have in this interval.
My database objects have start and end properties, that are integers. I can find where are the gaps, but I need to cluster them to create the missing pieces.
foreach (var interval in intervals)
{
for (int i = 0; i <= 999; i++)
{
if (Range<int>.Intersects(interval,new Range<int>(i,i)))
continue;
else
doesNotIntersect.Add(i);
}
}
With this code I have a pretty list of "holes". What I'm trying to do now is to group these values, but I find that my solution is not optimal and certainly not elegant.
I've read about BitArrays, but how can they help me? I wish that from a list of ranges I can find the gaps in a fixed range. If we are talking about a line, I need basically the result of fixed - intervals.
I can only use .NET to solve this. I have a large piece of middleware and this process of validation will occur several times a day, so I prefer not having to go through middleware and then databasr to solve.
Let me try to create a picture
Fixed range that needs to be filled
111111111
Ranges that objects provided
101100001
Ranges that need to be filled
010011110
This is my range object:
public class Range<T> where T : IComparable
{
public T Start { get; set; }
public T End { get; set; }
public Range(T start, T end)
{
Start = start;
End = end;
}
public static bool Intersects(Range<T> left,Range<T> right)
{
if (left.Start.CompareTo(right.Start) == 0)
return true;
if (left.Start.CompareTo(right.Start) > 0)
{
return left.Start.CompareTo(right.End) <= 0;
}
return right.Start.CompareTo(left.End) <= 0;
}
}
I need to find gaps in start end points, instead of continous intervals.
Help?
00000000000000000000000000000
| |
8:00 9:00
Suppose every '0' in the bitarray represents a time unit(second, hour etc.)
Start looping the intervals and set bits according to start & end values.
Now you will have something like this
11110001111110001111000111000
The '0' are your grouped gaps
You could use the SQL for that, if the integer value could be represented by entity. Then just create a table with single column seq where are all values from 0 to 999 then using left outer join, join the table with that entity and select only those id where entity is null.
Example query should look like this.
SELECT ts.seq
FROM sequenceTable ts LEFT OUTER JOIN sourceTable st ON ts.seq = st.entity
WHERE st.entity is null;
You could use the row num to create column seq of table seauenceTable.
--EDIT
As the solution should be in CLR, you can use use Collections, create a List with values from 0 to 999, then remove all from then intervals.
Next solution is using a boolean array. Create array with the proper length (999 in this case), then iterate through the intervals, and use interval as index where value true for it in boolean array, then just iterate once again over that array and the missing intervals will be represented by index where value is false.

Categories

Resources