I have a listview control that is filled with returned records from a SQL Statement. The fields may be something like:
SSN------|NAME|DATE----|TIME--|SYS
111222333|Bell|20140130|121507|P
123456789|John|20140225|135000|P
123456789|John|20140225|135002|N
The "duplicates" are generated from a ChangeLog, such as a change of address. Due to bad database design I have no control over however, an address change will create 2 records if a member happens to be a member of both SYS.
What would be the best way to go through each record in my listview, find duplicate values of SSN & DATE (There can be a record generated for both SYS if person is a member of both), and remove the duplicate value with the lower TIME value?
I'm trying to do a code-based solution instead of SQL because the true SQL statement is already highly complex and this application needs to only be maintained until October.
For this, I've assumed you have some class with these record's properties exposed with easy access like SSN and Time, I've also assumed they were both strings. In the code below I refer to this object as Record.
HINT: You might instead want to be removing items with the SYS flag set to False instead of judging it on time (Probably doesn't make a difference) .
I did not used any lambda fun on purpose to try to keep this simple and easy to read.
Call this code every time you load items into the ListView.... it would actually be a better idea to sanitize that list before you load it into the ListView, but the below code is a solution to your question based on the available info.
//Turn the ListView's ItemCollection into an easy to use List<Record>
List<Record> records = myListView.Items.OfType<Record>().ToList();
//Grab records with duplicate SSNs but with lower Time values
List<Record> recordsToRemove = new List<Record>();
foreach (var record in records)
{
foreach (var r in records)
{
if (record.SSN == r.SSN && record != r)
{
if (int.Parse(r.Time) > int.Parse(record.Time))
recordsToRemove.Add(record);
else
recordsToRemove.Add(r);
}
}
}
//Now actually remove the items from the ListView
foreach (var record in recordsToRemove)
{
myListView.Items.Remove(record);
}
Related
The saga of trying to chop flat files up into useable bits continues!
You may see from my other questions that I am trying to wrangle some flat file data into various bits using C# transformer in SSIS. The current challenge is trying to turn a selection of rows with one column into one row with many columns.
A friend has very kindly tipped me off to use List and then to somehow loop through that in the PostExecute().
The main problem is that I do not know how to loop through and create a row to add to the Output Buffer programatically - there might be a variable number of fields listed in the flat file, there is no consistency. For now, I have allowed for 100 outputs and called these pos1, pos2, etc.
What I would really like to do is count everything in my list, and loop through that many times, incrementing the numbers accordingly - i.e. fieldlist[0] goes to OutputBuffer.pos1, fieldlist[1] goes to OutputBuffer.pos2, and if there is nothing after this then nothing is put in pos3 to pos100.
The secondary problem is that I can't even test that my list and writing to an output table is working by specifically using OutputBuffer in PostExecute, never mind working out a loop.
The file has all sorts in it, but the list of fields is handily contained between START-OF-FIELDS and END-OF-FIELDS, so I have used the same logic as before to only process the rows in the middle of those.
bool passedSOF;
bool passedEOF;
List<string> fieldlist = new List<string>();
public override void PostExecute()
{
base.PostExecute();
OutputBuffer.AddRow();
OutputBuffer.field1=fieldlist[0];
OutputBuffer.field2=fieldlist[1];
}
public override void Input_ProcessInputRow(InputBuffer Row)
{
if (Row.RawData.Contains("END-OF-FIELDS"))
{
passedEOF = true;
OutputBuffer.SetEndOfRowset();
}
if (passedSOF && !passedEOF)
{
fieldlist.Add(Row.RawData);
}
if(Row.RawData.Contains("START-OF-FIELDS"))
{
passedSOF = true;
}
}
I have nothing underlined in red, but when I try to run this I get an error message about PostExecute() and "object reference not set to an instance of an object", which I thought meant something contained a null where it shouldn't, but in my test file I have more than two fields between START and END markers.
So first of all, what am I doing wrong in the example above, and secondly, how do I do this in a proper loop? There are only 100 possible outputs right now, but this could increase over time.
"Post execute" It's named that for a reason.
The execution of your data flow has ended and this method is for cleanup or anything that needs to happen after execution - like modification of SSIS variables. The buffers have gone away, there's no way to do interact with the contents of the buffers at this point.
As for the rest of your problem statement... it needs focus
So once again I have misunderstood a basic concept - PostExecute cannot be used to write out in the way I was trying. As people have pointed out, there is no way to do anything with the buffer contents here.
I cannot take credit for this answer, as again someone smarter than me came to the rescue, but I have got permission from them to post the code in case it is useful to anyone. I hope I have explained this OK, as I only just understand it myself and am very much learning as I go along.
First of all, make sure to have the following in your namespace:
using System.Reflection;
using System.Linq;
using System.Collections.Generic;
These are going to be used to get properties for the Output Buffer and to allow me to output the first item in the list to pos_1, the second to pos_2, etc.
As usual I have two boolean variables to determine if I have passed the row which indicates the rows of data I want have started or ended, and I have my List.
bool passedSOF;
bool passedEOF;
List<string> fieldlist = new List<string>();
Here is where it is different - as I have something which indicates I am done processing my rows, which is the row containing END-OF-FIELDS, when I hit that point, I should be writing out my collected List to my output buffer. The aim is to take all of the multiple rows containing field names, and turn that into a single row with multiple columns, with the field names populated across those columns in the row order they appeared.
if (Row.RawData.Contains("END-OF-FIELDS"))
{
passedEOF = true;
//IF WE HAVE GOT TO THIS POINT, WE HAVE ALL THE DATA IN OUR LIST NOW
OutputBuffer.AddRow();
var fields = typeof(OutputBuffer).GetProperties();
//SET UP AND INITIALISE A VARIABLE TO HOLD THE ROW NUMBER COUNT
int rowNumber = 0;
foreach (var fieldName in fieldList)
{
//ADD ONE TO THE CURRENT VALUE OF rowNumber
rowNumber++;
//MATCH THE ROW NUMBER TO THE OUTPUT FIELD NAME
PropertyInfo field = fields.FirstOrDefault(x = > x.Name == string.Format("pos{0}", rowNumber));
if (field != null)
{
field.SetValue(OutputBuffer, fieldName);
}
}
OutputBuffer.SetEndOfRowset();
}
if (passedSOF && !passedEOF)
{
this.fieldList.Add(Row.RawData);
}
if (Row.RawData.Contains("START-OF-FIELDS"))
{
passedSOF = true;
}
So instead of having something like this:
START-OF-FIELDS
FRUIT
DAIRY
STARCHES
END-OF-FIELDS
I have the output:
pos_1 | pos_2 | pos_3
FRUIT | DAIRY | STARCHES
So I can build a position key table to show which field will appear in which order in the current monthly file, and now I am looking forward into getting myself into more trouble splitting the actual data rows out into another table :)
I'm stuck on a problem and am wondering if I just have coded something incorrectly. The application polls every few seconds and grabs every record from a table whose sole purpose is to signify what records to act upon.
Please note I've left out the error handling code for space and readability
//Producing Thread, this is triggered every 5 seconds... UGH, I hate timers
foreach (var Record in GetRecordsFromDataBase()) // returns a dictionary
{
if (!ConcurrentDictionary.Contains(Record.Key))
ConcurrentDictionary.TryAdd(Record.Key, Record.Value);
}
This code works great, with the irritating fact that it may/will select the same record multiple times until said record(s) is/are processed. By processed, each selected record is being written into its own newly created, uniquely named file. Then a stored procedure is called for that record's key to remove it from the database at which point that particular key is removed from the ConcurrentDictionary.
// Consuming Thread, located within another loop to allow
// the below code to continue to cycle until instructed
// to terminate
while (!ConcurrentDictionary.IsEmpty)
{
var Record = ConcurrentDictionary.Take(1).First();
WriteToNewFile(Record.Value);
RemoveFromDatabase(Record.Key);
ConcurrentDictionary.TryRemove(Record.Key);
}
For a throughput test I added 20k+ records into the table and then turned the application loose. I was quite surprised when I noticed 22k+ files that continued to increase well into 100k+ territory.
What am I doing wrong??? Have I completely misunderstood what the concurrent dictionary is used for? Did I forget a semi-colon somewhere?
First, eliminate the call to Contains. TryAdd already checks for duplicates, and returns false if the item is already present.
foreach (var Record in GetRecordsFromDataBase()) // returns a dictionary
{
ConcurrentDictionary.TryAdd(Record.Key, Record.Value);
}
The next problem I see is that I don't think that ConcurrentDictionary.Take(1).First() is a good way to get an item from the dictionary since it isn't atomic. I think you want to use a BlockingCollection() instead. It is specifically designed for implementing a producer-consumer pattern.
Lastly, I think your problems don't really have to do with the Dictionary, but with the database. The dictionary itself is thread-safe, but your dictioanry is not atomic with the database. So suppose record A is in the database. GetRecordsFromDataBase() pulls it and adds it to the dictionary. Then it begins processing record A (I assume this is in another thread). Then, that first loop again calls GetRecordsFromDataBase() and gets record A again. Simultaneously, record A is processed and removed from the database. But it's too late! GetRecordsFromDataBase() already grabbed it! So that initial loop adds it to the dictionary again, after it has been removed.
I think you may need to take records that are to be processed, and move them into another table entirely. That way, they won't get picked-up a second time. Doing this at the C# level, rather than the database level, is going to be a problem. Either that, or you don't want to be adding records to the queue while processing records.
What am I doing wrong???
The foreach (add) loop is trying to add any record not in the database to the dictionary.
The while (remove) loop is removing items from the database and then the dictionary, also writing them to file.
This logic looks correct. But there is a race:
GetRecordsFromDataBase(); // returns records 1 through 10.
switch context to remove loop.
WriteToNewFile(Record.Value); // write record 5
RemoveFromDatabase(Record.Key); // remove record 5 from db
ConcurrentDictionary.TryRemove(Record.Key); // remove record 5 from dictionary
switch back to add loop
ConcurrentDictionary.TryAdd(Record.Key, Record.Value); // add record 5 even though it is not in the DB becuase it was part of the records returned by ConcurrentDictionary.TryAdd(Record.Key, Record.Value);;
After the item is removed the foreach loop adds it again. This is why your file count is multiplying.
foreach (var Record in GetRecordsFromDataBase()) // returns a dictionary
{
if (!ConcurrentDictionary.Contains(Record.Key)) // this if is not required. try add will do.
ConcurrentDictionary.TryAdd(Record.Key, Record.Value);
}
Try something like this:
add loop:
foreach (var Record in GetRecordsFromDataBase()) // returns a dictionary
{
if (ConcurrentDictionary.TryAdd(Record.Key, false)) // only adds the record if it has not been processed.
{
ConcurrentQueue.Enque(record) // enqueue the record
}
}
Remove loop
var record;// you will need to specify the type
if (ConcurrentQueue.TryDequeue(record))
{
if (ConcurrentDictionary.TryUpdate(record.key,true,false)) // update the value from true to false
{
WriteToNewFile(Record.Value); // write record 5
RemoveFromDatabase(Record.Key); // remove record 5 from db
}
}
This will leave items in the dictionary for each record processed. You can remove them from the dictionary eventually but multithreading involving a db can be tricky.
I have a table in a SQL database which has a ID column (auto incrementing) and is set to be the primary key. The table consists of this ID and an account name.
I then have a bit of code which reads this table and populates a listview with the data. The problem is, if I order by the account name - I get duplicates listed in the listview. If I order it by the ID, I don't see any duplicates.
The original data in the SQL database contains no duplicate account names, so obviously that's what i'd like to see in the listview.
This is the Linq i'm using to grab the data...
public static IEnumerable<Client2> GetClientList()
{
return (IEnumerable<Client2>)from c in entity.Client2s
orderby c.AccountName
select c;
}
And this is the code which is being used to create the listview...
// Clear the listview
listViewClient.Items.Clear();
// Get imported client list from database
foreach (Client2 c in SQLHandler.GetClientList())
{
ListViewItemClient lvi = new ListViewItemClient(c.AccountName, c);
listViewClient.Items.Add(lvi);
}
As I say, if I change this to orderby c.ID then it returns data as expected. I've also tried adding an index to AccountName. I do use a custom listview item subclass, but all that does is store a reference to the Client object.
Any idea how I can resolve this?
Thanks,
Just to clarify for anyone else potentially reading this issue, it was programmer error. My data did indeed contain duplicates but because of the sort order, they weren't listed together and therefore I didn't see them when manually checking the data. It was only when I started displaying the ID that I realised they weren't sequential.
Ok so I'm basically trying to determine which way is more efficient performance wise when checking if items exist in a database.
I'm using LINQ to SQL on WP7 with SQL Server CE.
I'm going to be importing multiple objects into the database. Now there is a pretty good possibility that some of those objects already exist in the database so I need to check each item as they come in and if its already there, skip it, otherwise add it.
There were two approaches that came to mind. The first was using a foreach and checking if an object exists in the db with the same name:
foreach(var item in items)
{
//Make individual call to db for every item
var possibleItem = /*SQL SERVER STATEMENT WITH WHERE CONDITION*/;
}
Making individual calls to the db though sounds pretty resource intensive. So the other idea was to do a full select on all the objects in the db and store them in the list. And then pretty much do the same concept with the foreach except now I don't have to connect to the db, I have direct access to the list. What are your thoughts on these approaches? Is there a better way?
If you can easily sort your items you could make a select from the database and step through the list while reading through the database to determine which items are new. That should be much faster than multiple selects while preserving memory.
using(var reader = cmd.ExecuteReader())
{
while(reader.HasRows && reader.Read())
{
var id = reader.GetInt32(0);
// test how id compares to the memory list
}
}
But if memory if of no concern, I'd probably just read all keys from the database into memory for simplicity.
I'm writing a Database Editor / Bill of Materials Maker (2 separate .exe) for work, and I have this crazy issue. Here's how the flow works in the apps: Open Database, Search Database, Check Items Needed, Send to BOM Maker, Save as .xls.
So far, I can send checked Items to the BOM Maker, but only if I open the search window, check the items, without actually searching the List. Currently in the Search Form of the Database Editor, i have this loop:
for (int i = 0; i < rowCount; i++)
{
if (ResultBox1.Items[i].Checked == true)
{
//Code that creates .txt file to be loaded by the BOM Maker...
}
}
The loop works flawlessly, but only if i avoid using the search function. The Search Function does clear the ListView, and populate it with results, but why would that matter?
The error i get is:
InvalidArgument=Value of '22' is not
valid for 'index'. Parameter name:
index
'22' being the line I checked relative to the Array I use to populate the ListView from the start.
Unless i need to look into my Search Method, is there another way to perform this action?
I'm useless at "foreach" loops, could anyone give me an opinion?
Thank you!
Does this work?
foreach (ListViewItem item in ResultBox1.Items)
{
if (item.Checked)
{
// Do somethign with it
}
}
It looks like the major problem is that you're getting your index range from your database results, but your ListView doesn't accurately reflect the database results you're using for your index range.
You've forgotten to update something somewhere when you do your search.
Probably the easiest way to handle the issue is to remove the dependency on the database results and depend only on the ListView Items list. For example:
var qry = from item in ResultBox1.Items where item.Checked select item;
foreach(var item in qry)
{
// handle checked items individually.
}
Assuming it is a System.Windows.Forms.ListView...
foreach(var item in ResultBox1.SelectedItems)
{
//Do stuff
}