I have no control over how the data is saved in this table. However, I have to query the table and combine the data for similar pn_id column as one row/record.
For instance current data structure is as follows,
Here we have same pn_id repeated with different question ids. This should have been really saved as one pn_id and then each question as a separate column, per my opinion. However, I have to retrieve the below data as one record like this this..
Any idea how this can be done?
Thanks
Here's some pseudocode for the transform algorithm. Note that it requires scanning the entire data set twice; there are a few other opportunities to improve the efficiency, for example, if the input data can be sorted. Also, since it's pseudocode, I haven't added handling for null values.
var columnNames = new HashSet<string> { "pn_id" };
foreach (var record in data)
columnNames.Add(record.question_id.ToString());
var table = new DataTable();
foreach (var name in columnNames)
table.Columns.Add(new DataColumn(name, typeof(string)));
foreach (var record in data)
{
var targetRecord = CreateNewOrGetExistingRecord(table, record.pn_id);
targetRecord[record.question_id.ToString()] = record.char_value ?? record.date_value.ToString();
}
And here's a sketch of the helper method:
DataRow CreateNewOrGetExistingRecord(DataTable table, object primaryKeyValue)
{
var result = table.Find(primaryKeyValue);
if (result != null)
return result;
//add code here to create a new row, add it to the table, and return it to the caller
}
the structure is fine. Wouldn't make sense to have one columns per question because you would have to add a new column every time a new question were added.
Your problem can easily be solved with PIVOT. Take a look at this link for explanation
Related
I have two large excel files. I am able to get the rows of these excel files into a list using linqtoexcel. The issue is that I need to use a string from one object within the first list to find if it is part of or contained inside another string within an object of the second list. I was trying the following but the process is taking to long as each list is over 70,000 items.
I have tried using an Any statement but have not be able to pull results. If you have any ideas please share.
List<ExcelOne> exOne = new List<ExcelOne>();
List<ExcelTwo> exTwo = new List<ExcelTwo>();
I am able to build the first list and second list and can verify there are objects in the list. Here was my thought of how I would work through the lists to find matching. Note that once I have found the matching I want to create a new class and add it to a new list.
List<NewFormRow> rows = new List<NewFormRow>();
foreach (var item in exOne)
{
//I am going through each item in list one
foreach (var thing in exTwo)
{
//I now want to check if exTwo.importantRow has or
//contains any part of the string from item.id
if (thing.importantRow.Contains(item.id))
{
NewFormRow adding = new NewFormRow()
{
Idfound = item.id,
ImportantRow = thing.importantRow
};
rows.Add(adding);
Console.WriteLine("added one");
}
}
If you know a quicker way around this please share. Thank you.
It's hard to improve this substring approach. The question is if you have to do it here. Can't you do it where you have filled the lists? Then you don't need this additional step.
However, maybe you find this LINQ query more readable:
List<NewFormRow> rows = exOne
.SelectMany(x => exTwo
.Where(x2 => x2.importantRow.Contains(x.id))
.Select(x2 => new NewFormRow
{
Idfound = x.id,
ImportantRow = x2.importantRow
}))
.ToList();
I have function which inserts record in database. I want to make sure that there are no duplicate entries in database. Function first checks if there is query string parameter. If there is, then it acts like edit mode otherwise insert mode. There is a function which can return currently added records in database. I need to check duplication based on two columns before insertion in database.
myService = new myService();
myFlow mf = new myFlow();
if (!string.IsNullOrEmpty(Request["myflowid"]))
{
mf = myService.Getmyflow(Convert.ToInt32(Request["myflowid"]));
}
int workcount = 0;
int.TryParse(txtWorkCount.Text, out workcount);
mf.Name = txtName.Text.Trim();
mf.Description = txtDescription.Text.Trim();
mf.FunctionCode = txtFunctioneCode.Text.Trim();
mf.FunctionType = txtFunctioneType.Text.Trim();
mf.WorkCount = workcount;
if (mf.WorkFlowId == 0)
{
mf.SortOrder = 0;
mf.Active = true;
mf.RecordDateTime = DateTime.Now;
message = "Saved Successfully";
}
else
{
_editMode = true;
message = "Update Successfully";
}
}
int myflowId = mfService.AddEditmyflow(mf);
I want to check duplication based on functiontype and functioncode. Another function mfService.Getmyflows() can return currently added records in database.
How can I check duplication using Linq?
First of all, what database do you use? Many databases support upsert behavior (update or insert depending of was data found or not). For example, MERGE in ms sql, MERGE in oracle, INSERT .. ON DUPLICATE in mysql and so on. This could be preferred solution. Upsert is usually an atomic operation.
In your particular case do you you transactions? Are you sure no one will insert data after you ensured about duplicates but before you have inserted your record? Example:
#1 thread #2 thread
look for duplicates
... look for duplicate
no duplicates found ...
no duplicates found
insert data_1
insert data_1
This will end up with duplicates you trying to avoid.
According to your code you populating data from GUI and adding only one item.
If you have access to myService code you could add method to query item by your two columns, instead of querying all items via mfService.Getmyflows() and looking through this collection inside your code. It would be more performant (especially if you have indexes in that columns) and more memory efficient.
And finally, existing of a single element inside collection can be easily done:
var alreadyExist = mfService.Getmyflows()
.Any(x => x.Column1 == value1 && x.Column2 == value2);
I have datatable "users" and column "is_published" in it. I have about 100k rows.
What is the fastest way to update value in the column, so the whole rows in column have same value = 1.
I try with classic foreach loop and it't slow, also I try with LINQ :
dsData.Tables["users"].Select().ToList().ForEach(x => x["is_published"] = 1;);
and it still isn't fast enough.
Also variant wit Expression doesn't work for me, because after that fields is ReadOnly and I can't change value again.
This is C#.
when you create your table you can simply push a default value to your column..
DataTable dt = new DataTable();
dt.Columns["is_published"].DataType = System.Int32;
dt.Columns["is_Published"].DefaultValue = 1;
then when you need to change the rows to default value ( or will you need? )
// Say your user selects the row which its index is 2..
// The ItemArray gives the selectedRow's cells as object..
// And say your columns index no is 5..
dt.Rows[2].ItemArray[5] = default ;
or
dt.Rows[2].ItemArray[5] = dt.Columns["is_published"].DefaultValue;
Separate the select and the update into two operations. Skip the ToList() operation and instead iterate afterwards over the IEnumerable collection using forEach and update the value:
var rows = dsData.Tables["users"].Select();
forEach(var row in rows)
{
row["is_published"] = 1;
}
The ToList forces an immediate query evaluation which in this case acts as a copy of all items from the IEnumerable collection, so you can gain some speed here. I ran some tests and the result in this case is (using your code and the modification): ToList is 3 times slower than iterating over IEnumerable and single update!
IMO 40 seconds is an awful lot for 100K items. If your DataTable is bound to a DataGridView or some other UI control, i believe that the update of the GUI is taking so long and not the update of the values itself. In my tests the update using ToList took fractions of a second (on my simple Lenovo netbook with AMD E-450 processor, and i assume you are not using a 386 machine). Try suspending the UI bevor updating and refreshing the values and then enable it again - example in this SO post.
My original post (as i can see you gained some speed using the code - interesting):
More an experiment for my part, but it is possible to:
convert the table to XML
fetch all elements that should be changed
change them
write the changed XML back to the table
The code:
// temp table
var dataTable = new DataTable("Table 1");
dataTable.Columns.Add("title", typeof(string));
dataTable.Columns.Add("number", typeof(int));
dataTable.Columns.Add("subnum1", typeof(int));
dataTable.Columns.Add("subnum2", typeof(int));
// add temp data
Enumerable.Range(1, 100000).ToList().ForEach(e =>
{
dataTable.Rows.Add(new object[] { "A", 1, 2, 3 });
});
// "bulk update"!
var sb = new StringBuilder();
var xmlWriter = XmlWriter.Create(sb);
dataTable.WriteXml(xmlWriter);
var xml = XDocument.Parse(sb.ToString());
// take column to change
var elementsToChange = xml.Descendants("title").ToList();
// the list is referenced to the XML, so the XML is changed too!
elementsToChange.ForEach(e => e.Value = "Z");
// clear current table
dataTable.Clear();
// write changed data back to table
dataTable.ReadXml(xml.CreateReader());
The table is updated. IMO the parts that make this solution slow are the
convertion from and to XML
and the fill of the StringBuilder
The other way around the pure update of the list is probably faster than the table update.
Finaly! I speed up update so it takes 2-3 sec. I added BeginLoadData() and EndLoadData()
DataTable dt = ToDataSet().Tables["users"];
var sb = new StringBuilder();
var xmlWriter = XmlWriter.Create(sb);
dt.WriteXml(xmlWriter);
var xml = XDocument.Parse(sb.ToString());
xml.Descendants("is_published").ToList().ForEach(e => e.Value = "1");
dt.Clear();
dt.BeginLoadData();
dt.ReadXml(xml.CreateReader());
dt.EndLoadData();
I'm new to C# and my current code seems a bit of a hack- how can I keep combining referenced datatables more efficiently (less lines of code), or at least more readable?
(I have no say in the framework, BTW, I'm playing it as it lies ;))
I lose the last datatable once I reference it again
I can't import rows into a datatable that doesn't share the same schema.
This code address both issues, but man, it seems fugly.
DataTable dt_ref = new DataTable();
DataTable dt_final = new DataTable();
bool datatable_is_cloned;
String[] customer_ids = {"cust_a", "cust_b", "cust_c"}
foreach(String id in customer_ids)
{
if(!obj_cust.Select_by_customer_id(id, ref dt_ref, ref error_msg))
{
//do error handling
}
if(!datatable_is_cloned)
{
dt_final = dt_ref.Clone();
datatable_is_cloned = true;
}
foreach(DataRow r in dt_ref.Rows)
dt_final.ImportRow(r);
}
Edited for clarity.
Problem is is every time I loop and hit the database with another customer id, my dt_ref loses all of its previous results and gains merely the new ones (yes, this is expected behavior).
I want to keep a running total of all the results from select method.
Im want to delete rows from a table that in my data base.
i have the member
private static WeightScaleEntities Weight = new Weight();
this member contains my database. in the data base i have table: User_Activity.
I want to delete rows from user activity by given i_UserActivityId, as follow:
//Get the rows for delete
var deleteUserActivities = from details in Weight.User_Activity
where details.Id == i_UserActivityId
select details;
Now i want to delete this rows, so i tried to do:
foreach (var item in deleteUserActivities)
{
m_WeightScaleEntities.User_Activity.*
}
and i dont get the method DeleteOnSubmit!
Why?
there is another option???
User_Activity.*: is that a typo?
What I think you want is:
foreach (var item in deleteUserActivities)
{
Weight.DeleteObject(item);
}
And then SaveChanges() on the object context.
BTW, a static object context is not a good idea. You should carefully control the life cycle of object contexts.
There is more than one way to execute deletion in Entity Framework,
You must take into account what are the values that you want to delete? one Row or more.
when you need to delete on Row from table we can use these ways:
// first way
using (WeightScaleEntities db = new WeightScaleEntities())
{
var deleteUserActivities = from details in db.User_Activity
where details.Id == i_UserActivityId
select details;
if (deleteUserActivities.Count() > 0)
{
db.deleteUserActivities.Remove(deleteUserActivities.First());
db.SaveChanges();
}
}
this line deleteUserActivities.Count()>0 to check if you have result in the Query or not.
and this deleteUserActivities.First() if the query return set of rows delete the first. "to make the process more secure if you don't know about the data in the table"
// second way
using (WeightScaleEntities db = new WeightScaleEntities())
{
var deleteUserActivities = (from details in db.User_Activity
where details.Id == i_UserActivityId
select details).SingleOrDefault();
if (deleteUserActivities != null)
{
db.User_Activity.Remove(deleteUserActivities);
// or use this line
//db.Entry(deleteUserActivities).State = System.Data.Entity.EntityState.Deleted;
db.SaveChanges();
}
}
You can also use Single or SingleOrDefault to get a single object. Single or SingleOrDefault will throw an exception, if the result contains more than one element. Use Single or SingleOrDefault where you are sure that the result would contain only one element. If the result has multiple elements then there must be some problem.
Also, if you need to remove one or multi rows use this way:
using (WeightScaleEntities db = new WeightScaleEntities())
{
var deleteUserActivities = (from details in db.User_Activity
where details.Id == i_UserActivityId
select details).ToList<User_Activity>(); //<User_Activity> her name of your DbSet
foreach(deleteObject in deleteUserActivities)
{
db.Entry(deleteObject).State = System.Data.Entity.EntityState.Deleted;
}
db.SaveChanges();
}
Best Regards
and sorry about English language.
using(WeightScaleEntities db=new WeightScaleEntities())
{
var deleteUserActivities = from details in db.User_Activity
where details.Id == i_UserActivityId
select details;
if (deleteUserActivities.Count()>0)
{
db.deleteUserActivities.Remove(deleteUserActivities.First());
db.SaveChanges();
}
}