Linq to SQL / filter duplicates

Linq to SQL / filter duplicates - c#

i have a view in my sql server 2012 with a couple of duplicates and i want to sort them by the newest and filter all others - can anyone help me?
My viewin my SQL Server 2012:
GUID (as primary key), number, datetime and name
+-----+----------+--------------------------------+-----
| guid | number| datetime | name
+-----+----------+--------------------------------+------
| b105..| 1234567|2014-07-07T16:32:20.854+02:00:00|Name1
| s1b5..| 1111222|2014-07-06T16:30:21.854+02:00:00|Name2
| b17a..| 1234567|2014-07-06T15:22:17.854+02:00:00|Name1
| f205..| 1233333|2014-07-07T17:40:20.854+02:00:00|Name3
| b11t..| 1233333|2014-07-04T11:12:15.854+02:00:00|Name3
| rt85..| 1111222|2014-07-07T21:55:52.854+02:00:00|Name2
+-------+--------+--------------------------------+-----
the name is every time the same if the number is the same. for e.g. number 1234567 is always name 1.
I want to filter my table that i have only the newest number without duplicates
so the result should be:
+-----+----------+--------------------------------+-----
| guid | number| datetime | name
+-----+----------+--------------------------------+------
| b105..| 1234567|2014-07-07T16:32:20.854+02:00:00|Name1
| f205..| 1233333|2014-07-07T17:40:20.854+02:00:00|Name3
| rt85..| 1111222|2014-07-07T21:55:52.854+02:00:00|Name2
+-------+--------+--------------------------------+-----
How can i do this in Linq? "Distinct" is not working because of the guid and the datetime

You can do it by grouping your elements by 2 columns. (number and name). Then access the grouped data. You can do it somehow like that:
var query =
from col in viewData
group col by new
{
col.name,
col.number,
} into groupedCol
select new viewData()
{
number = groupedCol.Key.number,
name = groupedCol.Key.name,
datetime = groupedCol.OrderBy( dateCol => dateCol.datetime).First()
};

var res = list.GroupBy(c => c.name).Select(group => group.OrderBy( c1 => c1.datetime).First()).ToList();
This should work as long as datetime is stored as an instance of DateTime, instead of string.

Related

Select row based on dynmically created value

I have a sql table below, value of column parameter and parameter value are dynamically created. The design below is cater for additional parameter being added in later stage. So I think using the parameter and parameter value as column is not ideal for such design.
|---------------------|------------------|------------------|
| Parameter | Parameter Value | Computers |
|---------------------|------------------|------------------|
| Phase | New | PC1 |
|---------------------|------------------|------------------|
| Phase | New | PC2 |
|---------------------|------------------|------------------|
| Phase | Redevelopment | PC3 |
|---------------------|------------------|------------------|
| Cost | High | PC1 |
|---------------------|------------------|------------------|
| Cost | High | PC2 |
|---------------------|------------------|------------------|
| Cost | Cost | PC3 |
|---------------------|------------------|------------------|
Given a scenario where a user search by Phase = "New" AND Cost = "High", it will result in PC1.
At this moment, I could think of is this:
SELECT *
FROM projectParameter
WHERE Parameter = 'Phase' AND Value = 'New' AND Parameter = 'Cost' AND Value = 'High'
Thanks in advance!

First, select all rows that match any part of your filtering.
Then aggregate all those rows to get one result per computer.
Then check each result to see if it contains all the required filtering contraints.
SELECT
Computers
FROM
yourTable
WHERE
(Parameter = 'Phase' AND ParameterValue = 'New')
OR
(Parameter = 'Cost' AND ParameterValue = 'High')
GROUP BY
Computers
HAVING
COUNT(*) = 2

From what I understand, it seems you want list of all computers where there is computer entry for both below conditions :
Parameter = 'Cost' AND Parameter Value = 'High'
Parameter = 'Phase' AND Parameter Value = 'New'
You can try below sql query to see it results your need :
SELECT t.computer
FROM table t
WHERE t.parameter = 'cost'
AND t.parameter_value = 'high'
AND EXISTS (
SELECT computer FROM table where computer=t.computer AND parameter = 'phase' AND parameter_value = 'new');

Insert into database from an Excel file only rows that don't exist, using C#, EF, and SQL Server

I read an Excel file and insert that data into a database table, but every time I do this, it adds the existing rows plus the new data, I only want to insert the rows that aren't already in the table, my unique ID is the current time stamp.
For example, this is what happens currently when I do the first insert:
ExcelFile Database Table
a | b | date a | b | date
----------- ---------------
1 | 1 | 2018/02/12 1 | 1 | 2018/02/12
2 | 2 | 2018 /03/12 2 | 2 | 2018 /03/12
This happens when I do the second insert:
ExcelFile Database Table
a | b | date a | b | date
----------- ---------------
1 | 1 | 2018/02/12 1 | 1 | 2018/02/12
2 | 2 | 2018 /03/12 2 | 2 | 2018 /03/12
3 | 3 | 2018 /04/12 1 | 1 | 2018/02/12
2 | 2 | 2018 /03/12
3 | 3 | 2018 /04/12
I use Entity Framework to perform this and the ExcelDataReader package:
var result = reader.AsDataSet();
DataTable dt = new DataTable();
dt = result.Tables[0]; // here I store the data from the Excel file
foreach (DataRow row in dt.Rows)
{
using (AppContext context = new AppContext())
{
Data data = new Data();
string date = row.ItemArray[4].ToString();
DateTime parseDate = DateTime.Parse(date);
Data datos = new Data
{
a = row.ItemArray[0].ToString(),
b = row.ItemArray[1].ToString(),
c = row.ItemArray[2].ToString(),
d = row.ItemArray[3].ToString(),
e = parseDate
};
context.Data.Add(datos);
context.SaveChanges();
}
}
Is there a way to filter the excel file or compare them?
I'm all ears.

Check for an existing row before adding it. The below should be inserted below where you calculate parseDate.
var existingRow = context.Data.FirstOrDefault(d=>d.e == parseDate); //Note that the ".e" should refer to your "date" field
if (existingRow != null)
{
//This row already exists
}
else
{
//It doesn't exist, go ahead and add it
}

If "a" is the PK on the table and unique across rows then I would check the existence of existing rows by ID before inserting. Similar to Mike's response, though one consideration is if the table has a number of columns I would avoid returning the entity, but rather just an exists check using .Any()
if (!context.Data.Any(x => x.a == row.a)
// insert the row as a new entity
The caveat here is if the excel file contains edits, existing rows where the data changes, this will not accommodate that.
For bulk import processes, I would typically approach these by first staging the excel data into a staging table first. (purging the staging table prior to each import) From there I would have entities mapped to staging tables, vs entities mapped to the "real" tables. If there is a "modified date" that can be extracted from the file for each record then I would also store the import date/time as a part of the application, so that when selecting the rows to import from the staging table, only get records where that modified date/time > the last import date/time. From there you can query data from the staging table in batches, and look for new records vs. existing modifications. I find querying entities on both side of the migration is more flexible than dealing with an in-memory block for the import. With small imports it may not be worthwhile, but for larger files where you will want to work with smaller sub-sets and filtering, it can make things easier.

I could perform exactly what I needed with the help of #MikeH
With this only the rows with different DateTime were added (the DateTime its always an ascending value in my case.)
foreach (DataRow row in dt.Rows) // dt = my dataTable loaded with ExcelDataReader
{
using (AppContext context = new AppContext())
{
string date = row.ItemArray[4].ToString();
DateTime parseDate = DateTime.Parse(date); // I did a parse because the column "e" only accepted DateTime and not String types.
var existingRow = context.Data.FirstOrDefault(d => d.e == parseDate);
if (existingRow != null)
{
Console.WriteLine("Do Nothing");
}
else
{
Data datos = new Data
{
a = row.ItemArray[0].ToString(),
b = row.ItemArray[1].ToString(),
c = row.ItemArray[2].ToString(),
d = row.ItemArray[3].ToString(),
e = parseDate
};
context.Data.Add(datos);
context.SaveChanges();
}
}
}

Is it possible to fetch a link table without fetching all links?

Ok, so first of I would like to say that I'm using NHibernate for my project, and in this project we have (among other things) a sync function (to sync from a central MSSQL database to a local SQLite). Now I know that NHibernate was not made to sync databases, but I would like to do this anyways.
I have a medium large database model so I can't add it here, but the problem is that I have two datatables, and one link table to link them both.
Database model:
| Product | | ProductLinkProducer | | Producer |
|--------------------| |---------------------| |---------------------|
| Id | | LinkId | | Id |
| Name | | Product | | Name |
| ProductLinkProducer| | Producer | | ProductLinkProducer |
Database:
| Product | | ProductLinkProducer | | Producer |
|---------| |---------------------| |----------|
| Id | | LinkId | | Id |
| Name | | ProductId | | Name |
| | | ProducerId | | |
So during the sync, I first copy all data from the Product table, and then from the Producer table (basically var products = session.Query<Products>().ToList()). This is done by NHibernate in a single statement each:
select
product0_.id as id2_,
product0_.name as name2_
from
Product product0_
Now I have to evict all items from the first session (products.ForEach(x => session.Evict(x));)
And then the save (products.ForEach(x => syncSession.save(x));) is one insert per row (as expected).
So when saving the data in the link table I would have wished that there also would be just a single select. However that is not the case. Because first it makes a select ... as above. But now before every row to insert it does even more select for the Product and for the Producer.
So it will look something like:
Products:
select
insert (id 1)
insert (id 2)
Producer:
select
insert (id 101)
insert (id 102)
ProdLinkProducer:
select
select id 1 from Products
select id 1 from Products
select id 101 from Producer
select id 2 from Products
select id 2 from Products
select id 102 from Producer
select id 102 from Producer
insert
insert
So is there anyway avoiding this behavior?
EDIT
To better explain what I have done, I have created a small test project. It can be found here: https://github.com/tb2johm/NHibernateSync
(I would have preferred to add only a ghist, but I think that it might have left out to much data, sorry...)
EDIT2
I have found out one way to make it work, but I don't like it.
The way this solution works is to in the database model create a ProductLinkProducerSync table, that doesn't contain any links, but just the values, and avoid synchronizing the ordinary link tables, but just the "sync" tables. But as I said I don't like this idea, since if I change anything in the database, I have kind of the same data in two places that I need to update.

I was unable to find NHibernate out of the box way of doing what you are asking.
However I was able to get the desired behavior (I guess something is better than nothing:) by manually rebinding the FK references (proxy classes) to the new session:
var links = session.Query<ProductLinkProducer>().ToList();
links.ForEach(x => session.Evict(x));
foreach (var link in links)
{
link.Product = syncSession.Get<Product>(link.Product.Id);
link.Producer = syncSession.Get<Producer>(link.Producer.Id);
syncSession.Save(link);
}
syncSession.Flush();
Here is the generalized version using NHibernate metadata services:
static IEnumerable<Action<ISession, T>> GetRefBindActions<T>(ISessionFactory sessionFactory)
{
var classMeta = sessionFactory.GetClassMetadata(typeof(T));
var propertyNames = classMeta.PropertyNames;
var propertyTypes = classMeta.PropertyTypes;
for (int i = 0; i < propertyTypes.Length; i++)
{
var propertyType = propertyTypes[i];
if (propertyType.IsAssociationType && !propertyType.IsCollectionType)
{
var propertyName = propertyNames[i];
var propertyClass = propertyType.ReturnedClass;
var propertyClassMeta = sessionFactory.GetClassMetadata(propertyClass);
yield return (session, target) =>
{
var oldValue = classMeta.GetPropertyValue(target, propertyName, EntityMode.Poco);
var id = propertyClassMeta.GetIdentifier(oldValue, EntityMode.Poco);
var newValue = session.Get(propertyClass, id);
classMeta.SetPropertyValue(target, propertyName, newValue, EntityMode.Poco);
};
}
}
}
and applying it to your Sync method:
private static void Sync<T>(string tableName, ISession session, ISession syncSession)
{
Console.WriteLine("Fetching data for ####{0}####...", tableName);
var sqlLinks = session.Query<T>();
var links = sqlLinks.ToList();
Console.WriteLine("...Done");
Console.WriteLine("Evicting data...");
links.ForEach(x => session.Evict(x));
Console.WriteLine("...Done");
Console.WriteLine("Saving data...");
var bindRefs = GetRefBindActions<T>(syncSession.SessionFactory).ToList();
foreach (var link in links)
{
foreach (var action in bindRefs) action(syncSession, link);
syncSession.Save(link);
}
Console.WriteLine("...Flushing data...");
syncSession.Flush();
Console.WriteLine("...Done");
Console.WriteLine("\n\n\n");
}

Converting Stored Procedure PIVOT table to LINQ query

My idea was to convert the currently stored procedure what I've defined before. The intention is, that I can't return the data from the database with the stored procedure. The reasons for this purpose was leaving in the query. I need to convert an existing table to a pivot table and after that I have to return the data via ASP.NET WebAPI. This pivot table is dynamically that means when the user adds a new article then it will be added in the pivot table as a column.
The normal table looks like as follows:
datum | rate | article
--------------------------------
2013-01-03 | 97,766..| DE011
2013-01-05 | 90.214..| DE090
2013-01-10 | 97,890..| DE011
2013-01-13 | 65,023..| DE220
2013-01-13 | 97,012..| DE300
2013-01-15 | 97,344..| DE300
....
the pivot table should looks like as follows:
rate | DE011 | ... | DE090 | ... | DE220 | ... | DE300
-------------------------------------------------------
100 | 0 | ... | 1 | ... | 0 | ... | 0
98 | 2 | ... | 0 | ... | 1 | ... | 0
97 | 0 | ... | 0 | ... | 0 | ... | 2
90 | 0 | ... | 1 | ... | 0 | ... | 4
...
the column datum is important for the pivot table because the user have to take some input in the angular view.. in this example the user is choosing dateFrom and dateTo inputs. The rate will round the numbers how you can see in the pivot column rate.The article descriptions are in the new table column titles and the rate will count for each article.
My stored procedure works fine in SQL Server. But after the SP was imported to the EDM Model the Entity Framework defined a return type INT and that is impossible for my purposes.
Here is the code of EF:
public virtual int getMonthIsin(Nullable<System.DateTime> fromDate, Nullable<System.DateTime> toDate)
{
var fromDateParameter = fromDate.HasValue ?
new ObjectParameter("fromDate", fromDate) :
new ObjectParameter("fromDate", typeof(System.DateTime));
var toDateParameter = toDate.HasValue ?
new ObjectParameter("toDate", toDate) :
new ObjectParameter("toDate", typeof(System.DateTime));
return ((IObjectContextAdapter)this).ObjectContext.ExecuteFunction("getMonthIsin", fromDateParameter, toDateParameter);
}
I also have tried the .SqlQuery() in my WebAPI-Controller as follows:
return db.Database.SqlQuery<IQueryable>("EXEC getMonthIsin #fromDate, #toDate", fromDate, toDate).AsQueryable();
But it doesn't work.
Well, now the idea is to try a LINQ query and get the return values. I don't have any idea to implement this :(
Currently I've tried approximately this LINQ query:
public IQueryable getDatas(DateTime fromDate, DateTime toDate)
{
var query = from t in db.table1
where t.datum >= fromDate && t.datum <= toDate
group t by t.article
into grp
select new
{
articles = grp.Key,
rate = grp.Select(g => g.rate),
total = grp.Select(g => g.rate).AsQueryable()
};
return query;
}
But this isn't really the correct return. It would be very helpful when anyone can help me!! I will upvoted each good answer!

Entity Framework is not suitable for fetching dynamic data structures. Dapper is the tool to use here. It basically is a collection of extension methods on IDbConnection, one of which is Query that returns an IEnumerable<dynamic>, where dynamic is an object that implements IDictionary<string, object>. Getting the data is really simple:
IEnumerable<IDictionary<string, object>> result;
using (var cnn = new SqlConnection(connectionString))
{
cnn.Open();
var p = new DynamicParameters();
p.Add(" #fromDate", fromDate, DbType.DateTime);
p.Add(" #toDate", toDate, DbType.DateTime);
result = (IEnumerable<IDictionary<string, object>>)
cnn.Query(sql: "getMonthIsin",
param: p,
commandType: CommandType.StoredProcedure);
}
Now you have an IEnumerable<IDictionary<string, object>> in which one item (IDictionary<string, object>) represents one row of key/value pairs from the stored procedure's result set:
Key Value
----- ----
rate 100
DE011 0
... ...
DE090 1
... ...
DE220 0
... ...
DE300 0
It's up to you how to go from here. You could, for instance, convert the result to a DataTable as shown here: Dictionary<string, object> to DataTable.
By the way, Dapper isn't only simple, it's blazingly fast too.

Selecting records with max version

I have a table as follows:
ConfigID | VersionNo | ObjectType
ConfigID and VersionNo constitute the unique key.
I want to be able to select the record with the highest VersionNo for each configID based on an object type.
I have tried
configs = (from config in configRepository.FindBy(x => x.ObjectType.Equals(typeof(Node).ToString(), StringComparison.InvariantCultureIgnoreCase))
group config by config.ConfigID into orderedConfigs
select orderedConfigs.OrderBy(x => x.ConfigID).ThenByDescending(x => x.VersionNo).First());
EDIT: I must add that the FindBy is basically just a where clause.
But I am getting no results. Please help me with this.
EDIT:
The data in the table could look like:
3fa1e32a-e341-46fd-885d-8f06ad0caf2e | 1 | Sybrin10.Common.DTO.Node
3fa1e32a-e341-46fd-885d-8f06ad0caf2e | 2 | Sybrin10.Common.DTO.Node
51d2a6c7-292d-42fc-ae64-acd238d26ccf | 3 | Sybrin10.Common.DTO.Node
51d2a6c7-292d-42fc-ae64-acd238d26ccf | 4 | Sybrin10.Common.DTO.Node
8dbf7a33-441f-40bc-b594-e34c5a2c3f51 | 1 | Some Other Type
91413e73-4997-4643-b7d2-e4c208163c0d | 1 | Some Other Type
From this I would only want to retrieve the second and fourth records as they have the highest version numbers for the configID and are of the required type.

Not sure if 100% works because writing out of VS :) but idea should be good
var configs = configRepository.Where(x=>x.ObjectType==typeof(Node).ToString());
var grouped = configs.GroupBy(x=>x.ConfigId);
var a = grouped.select(x=>x.OrderByDescending(y=>y.VersionNo).First());

It looks LINQ sql to but in pure SQL i can write the query like this
SELECT ConfigID ,MAX(VersionNo ) FROM CUSTOMIZE_COLUMNS_DETAIL WHERE
ObjectType = 'objectType' GROUP BY ConfigID
I have tried to replicate the scenario in sql , might by useful to you, THanks

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Linq to SQL / filter duplicates - c#

var res = list.GroupBy(c => c.name).Select(group => group.OrderBy( c1 => c1.datetime).First()).ToList(); This should work as long as datetime is stored as an instance of DateTime, instead of string.

Related

Select row based on dynmically created value

Insert into database from an Excel file only rows that don't exist, using C#, EF, and SQL Server

Is it possible to fetch a link table without fetching all links?

Converting Stored Procedure PIVOT table to LINQ query

Selecting records with max version

Categories

Resources