Need help for a complex linq query - c#

Ok so I've got a DataTable here's the schema
DataTable dt = new DataTable();
dt.Columns.Add("word", typeof(string));
dt.Columns.Add("pronunciation", typeof(string));
The table is filled already and I'm trying to make a linq query so that i can output to the console or anywhere something like :
Pronunciation : akses9~R => (list of words)
I want to output the pronunciations the most common and all the words that use it.

Something like this should give you what you want:
var results = dt.GroupBy(dr => dr.pronunciation);
foreach(var result in results)
{
Console.Write("Pronunciation : {0} =>", result.Key);
foreach(var word in result)
{
Console.Write("{0} ", word);
}
Console.WriteLine();
}
The GroupBy gives you an IGrouping whose Key property will contain the pronunciation and collection itself will contain all the words.

Sounds like you want a group by:
var q =
from row in dt.Rows.Cast<DataRow>()
let val = new { Word = (string)row["word"], Pronunciation = (string)row["pronunciation"] }
group val by val.Pronunciation into g
select g;
foreach (var group in q)
{
Console.WriteLine(
"Pronunciation : {0} => ({1})",
group.Key,
String.Join(", ", group.Select(x => x.Word).ToArray()));
}

var words = from row in table
where row.pronunciation == "akses9~R"
select row.word;
foreach (string word in words)
{
Console.WriteLine(word);
}

Related

Order list by date and sequence number

I have been learning C# (I am relatively new) and I have a list of input files consisting file naming format like "inputFile_dateSequence_sequenceNumber.xml". The code that I am using to sort the file lists in ascending order is below:
using System;
using System.Collections.Generic;
using System.Linq;
public class Program
{
public static void Main()
{
string[] inputfiles = { "inputFile_2020-04-10_1.xml",
"inputFile_2020-04-10_2.xml",
"inputFile_2020-04-10_4.xml",
"inputFile_2020-04-10_3.xml",
"inputFile_2020-04-10_10.xml",
"inputFile_2020-05-10_1.xml",
"inputFile_2020-05-10_2.xml",
"inputFile_2020-05-10_10.xml",
"inputFile_2020-05-10_11.xml" };
List<string> stringList = new List<string>();
foreach (string s in inputfiles)
{
string bz = s.Split('.')[0];
stringList.Add(bz);
}
string[] Separator = new string[] { "_" };
var sortedList = stringList.OrderBy(i => i).ThenBy(s => int.Parse(s.Split(Separator, StringSplitOptions.None)[2])).ToList();
foreach (string i in sortedList)
{
Console.WriteLine(i);
}
}
}
But in ascending order, I am getting output as below:
inputFile_2020-04-10_1
inputFile_2020-04-10_10
inputFile_2020-04-10_2
inputFile_2020-04-10_3
inputFile_2020-04-10_4
inputFile_2020-05-10_1
inputFile_2020-05-10_10
inputFile_2020-05-10_11
inputFile_2020-05-10_2
but my desired output is like below:
inputFile_2020-04-10_1.xml
inputFile_2020-04-10_2.xml
inputFile_2020-04-10_3.xml
inputFile_2020-04-10_4.xml
inputFile_2020-04-10_10.xml
inputFile_2020-05-10_1.xml
inputFile_2020-05-10_2.xml
inputFile_2020-05-10_10.xml
inputFile_2020-05-10_11.xml
What modification should the code need in order to get the output like this?
You could achieve your need by using regex:
var sortedList= stringList.OrderBy(x => Regex.Replace(x, #"\d+", m => m.Value.PadLeft(10, '0')));
Loads of ways to solve this, as you can see...
You can order first by just the date part of the name, then by the length of the name string, so smaller numbers like 1, 7 sort before longer numbers like 10, 17.. then by the name itself
.OrderBy(x => x.Remove(20))
.ThenBy(x=>x.Length)
.ThenBy(x=>x)
Perhaps though you'd parse the entire thing:
class MyFile{
string FullName {get;set;}
string Name {get;set;}
DateTime Date {get;set;}
int Num {get;set;}
MyFile(string fullname){
var bits = Path.GetFilenameWithoutExtension( fullname).Split('_');
FullName = FullName;
Name = bits[0];
Date = DateTime.Parse(bits[1]);
Num = int.Parse(bits[2]);
}
Then
var parsed = inputfiles.Select(x => new MyFile(x));
Now you can OrderBy that:
parsed.OrderBy(m => m.Date).ThenBy(m => m.Num);
Try to avoid doing everything at some base level of string/int primitive; this is OO programming! 😀
Use the following code:
var sortedList = stringList
.OrderBy(s => s.Substring(0, s.LastIndexOf('_'))) // sort by inputFile_dateSequence
.ThenBy(s => int.Parse(s.Substring(s.LastIndexOf('_') + 1))) // sort by sequenceNumber as integer
.ToList();
Update. If you want to preserve file extension, you can use the following:
List<string> sortedList = inputfiles
.Select(s =>
{
int nameSeparator = s.LastIndexOf('_');
int extSeparator = s.LastIndexOf('.');
return new
{
FullName = s,
BaseName = s.Substring(0, nameSeparator),
Sequence = int.Parse(s.Substring(nameSeparator + 1, extSeparator - nameSeparator - 1)),
Extension = s.Substring(extSeparator + 1)
};
})
.OrderBy(f => f.BaseName) // sort by inputFile_dateSequence
.ThenBy(f => f.Sequence) // sort by sequenceNumber
.ThenBy(f => f.Extension) // sort by file extension
.Select(f => f.FullName)
.ToList();
Using DataTable and LinQ may help you to do the task done easier.
Below is the code with DataTable that generate your desire output
public static void Main()
{
string[] inputfiles = { "inputFile_2020-04-10_1.xml",
"inputFile_2020-04-10_2.xml",
"inputFile_2020-04-10_4.xml",
"inputFile_2020-04-10_3.xml",
"inputFile_2020-04-10_10.xml",
"inputFile_2020-05-10_1.xml",
"inputFile_2020-05-10_2.xml",
"inputFile_2020-05-10_10.xml",
"inputFile_2020-05-10_11.xml" };
DataTable dt = new DataTable();
dt.Columns.Add("filename", typeof(string));
dt.Columns.Add("date", typeof(DateTime));
dt.Columns.Add("sequence", typeof(int));
foreach (string s in inputfiles)
{
DataRow dr = dt.NewRow();
dr[0] = s;
dr[1] = Convert.ToDateTime(s.Split('_')[1]);
dr[2] = Convert.ToInt32(s.Split('_')[2].Split('.')[0]);
dt.Rows.Add(dr);
}
DataTable sortedDT = dt.AsEnumerable()
.OrderBy(r => r.Field<DateTime>("date"))
.ThenBy(r => r.Field<int>("sequence"))
.CopyToDataTable();
foreach (DataRow dr in sortedDT.Rows)
{
Console.WriteLine(dr[0]);
}
}
Output:

C# deserialize JSON without custom class into Dictionary or DataTable

from an API I get a json like this:
66 results of an player with somewhat 31 attributes containing single values or an array of values.
{"api":
{"results":66,
"players":
[{
"player_id":10,
"player_name":"Gustavo Ferrareis",
... (some 31 stats)
"shots":{
"total":13,
"on":2
},
...
},
"player_id":21,
...
}]
}
And I wanted to know if there's a way to deserialize the collection of players into an Dictionary or better DataTable with all 31 attributes without a custom player class or accessing every attribute individually?
So far I tried accessing the players list by:
var data = JObject.Parse(json);
foreach (var field in data)
{
var data2 = JObject.Parse(field.Value.ToString());
foreach (var field2 in data2)
{
if (field2.Key.ToString() == "players")
{
dynamic array2 = JsonConvert.DeserializeObject(field2.Value.ToString());
foreach (var field3 in array2)
Console.WriteLine("Player_id: " + field3.player_id.ToString() + " - Player_name: " + field3.player_name.ToString());
}
}
}
which returns
Player_id: 10 - Player_name: Gustavo Ferrareis
Player_id: 22 - Player_name: GetĂșlio
Player_id: 22 - Player_name: GetĂșlio
I imagine something like:
Dictionary<string, object> dict = new Dictionary<string, object>();
foreach (var player in array2)
dict.Add(player.Key(), player.Value());
The answer can't be that I have to make an custom player class and then use that?
Open for any advice.
Thank you.
You can use Newtonsoft.Json.Linq and get the required result as shown below:
var jObject = JObject.Parse(jsonFromAPI)["api"];
var formattedPlayers = jObject["Players"].Children()
.Select(p => $"Player_id: {p["player_id"]} - Player_name: {p["player_name"]}");
or if you wanted dictionary, then use below:
var playersDictionary = jObject["Players"].Children().Select(p => new {player_id = p["player_id"], player_name = p["player_name"]}).ToDictionary(x => x.player_id, v => v.player_name);
If you want to display all properties of Players, then you need to run loop something like below:
var allPlayerDetails = new List<Dictionary<string, object>>();
foreach (JObject player in jObject["Players"].Children())
{
var playerDictionary = player.Properties()
.ToDictionary<JProperty, string, object>(property => property.Name, property => property.Value);
allPlayerDetails.Add(playerDictionary);
}
for (var index = 0; index < allPlayerDetails.Count; index++)
{
var playerDictionary = allPlayerDetails[index];
Console.WriteLine(Environment.NewLine);
Console.WriteLine(string.Format("Printing Player# {0}", index));
foreach (var d in playerDictionary)
{
Console.WriteLine(d.Key + " - " + d.Value);
}
}
If you want to convert to DataTable from list of players, then you can do something like below:
DataTable dt = new DataTable();
foreach (var column in allPlayerDetails.SelectMany(p => p.Keys).Select(k => k.Trim()).Distinct())
{
dt.Columns.Add(new DataColumn(column));
}
foreach (var details in allPlayerDetails)
{
var dr = dt.NewRow();
foreach (DataColumn dc in dt.Columns)
{
dr[dc.ColumnName] = details.ContainsKey(dc.ColumnName) ? details[dc.ColumnName] : null;
}
dt.Rows.Add(dr);
}
Fiddler can be found here.
You could parse into IEnumerable<string> like this:
IEnumerable<string> = JObject.Parse(json)["players"]
.Children()
.Select(jo => $"Player_id: {jo["player_id"]} - Player_name: {jo["player_name"]});
A similar approach would work for Dictionary using ToDictionary instead of Select, but it depends on what you consider key and value.
Here is the single line code to get the playerid and playername to List or Dictionary
//To List
var resultToList = JObject.Parse(jsonstring)["api"]["players"]
.Select(p => (p["player_id"].ToString(), p["player_name"].ToString()))
.ToList();
//To Dictionary
var resultToDict = JObject.Parse(jsonstring)["api"]["players"]
.Select(p => (p["player_id"].ToString(), p["player_name"].ToString()))
.ToDictionary(x=>x.Item1, y=>y.Item2);

Comparing several Datatables with same structure

I'm creating Datatables from .csv files. This part actually works. My current issue is the following one:
I have to compare two or more Datatable's with the same structure. So
Datatable1:
KeyColumn, ValueColumn
KeyA, ValueA
KeyB, ValueB
KeyC, ValueC
Datatable2:
KeyColumn, ValueColumn
KeyB, ValueB
KeyC, ValueC
KeyD, ValueD
And this should end up like this:
ResultDatatable:
KeyColumn, ValueColumn (of DT1), ValueColumn (of DT2)
KeyA, ValueA
KeyB, ValueB (of DT1), ValueB (of DT2)
KeyC, ValueC (of DT1), ValueC (of DT2)
KeyD, ValueD
I can't even manage to insert the Data of the first Datatable because of different ColumnNames. Another problem is, that the Datatables own the same ColumnNames, so I can't add those to the ResultDatatable.
I have tried many ways and end up with no solution. Any ideas how to address this problem?
Edit:
The solution with Dictionaries was too sophisticated, so I continued trying to solve it with the Datatables. The source of the problem was something very unexpected.
The attempt to rename a column name to something, which contains a simple dot ('.') results with losing all data in that column.
e.g. If you have Datatable dt:
PrimaryColumn, ValueColumn
KeyA1, KeyB1
KeyA2, KeyB2
After dt.Columns[ValueColumn].ColumnName = "Value.Column"; You will lose any data in that column. I will ask MS, if this is desired or if it is a Bug in the .NET-Framework. Here is my final Code (C#). I have List<string>keys which will remain in the resultTable. and List<string>values which will be added for every Table that should be compared.
private DataTable CompareTables(List<AnalyseFile> files, Query query, List<string> keys, List<string> values) {
// Add first table completely to resultTable
DataTable resultTable =
files[0].GetDataTable(false, query.Header, query.Startstring, query.Endstring, query.Key).Copy();
foreach (string value in values) {
resultTable.Columns[value].ColumnName = "(" + files[0].getFileNameWithoutExtension() + ") " + value;
}
// Set primary keys
resultTable.PrimaryKey = keys.Select(key => resultTable.Columns[key]).ToArray();
// process remaining tables
for (int i = 1; i < files.Count; i++) {
DataTable currentTable = files[i].GetDataTable(false, query.Header, query.Startstring, query.Endstring, query.Key);
// Add value-columns to the resultTable
foreach (string value in values) {
resultTable.Columns.Add("(" + files[i].getFileNameWithoutExtension() + ") " + value);
}
// Set again primary keys
currentTable.PrimaryKey = keys.Select(key => currentTable.Columns[key]).ToArray();
// populate common Rows
foreach (DataRow dataRow in resultTable.Rows) {
foreach (DataRow row in currentTable.Rows) {
foreach (string key in keys) {
if (dataRow[key].ToString().Equals(row[key].ToString())) {
foreach (string value in values) {
string colname = "(" + files[i].getFileNameWithoutExtension() + ") " + value;
dataRow[colname] = row[value];
}
}
}
}
}
// Get all Rows, which do not exist in resultTable yet
IEnumerable<string> isNotinDT =
currentTable.AsEnumerable()
.Select(row => row.Field<string>(keys[0]))
.Except(resultTable.AsEnumerable().Select(row => row.Field<string>(keys[0])));
// Add all the non existing rows to resulTable
foreach (string row in isNotinDT) {
DataRow currentRow = currentTable.Rows.Find(row);
DataRow dRow = resultTable.NewRow();
foreach (string key in keys) {
dRow[key] = currentRow[key];
}
foreach (string value in values) {
dRow["(" + files[i].getFileNameWithoutExtension() + ") " + value] = currentRow[value];
}
resultTable.Rows.Add(dRow);
}
}
return resultTable;
}
Any improvements are Welcome!
Ok Here is an example of my version using the dictionaries.
Fiddle: http://dotnetfiddle.net/AljK9J
//Setup Sample Data
var data1 = new Dictionary<string, string>();
data1.Add("KeyA", "ValueA");
data1.Add("KeyB", "ValueB");
data1.Add("KeyC", "ValueC");
var data2 = new Dictionary<string, string>();
data2.Add("KeyB", "ValueB");
data2.Add("KeyC", "ValueC");
data2.Add("KeyD", "ValueD");
//Second DataType in the Dictionary could be something other than a Tuple
var result = new Dictionary<string, Tuple<string, string>>();
//Fill in for items existing only in data1 and in both data1 and data2
foreach(var item in data1)
{
result.Add(item.Key, new Tuple<string, string>(item.Value, data2.FirstOrDefault(x => x.Key == item.Key).Value));
}
//Fill in remaining items that exist only in data2
foreach(var item in data2.Where(d2 => !result.Any(x => x.Key == d2.Key )))
{
result.Add(item.Key, new Tuple<string, string>(null, item.Value));
}
//Demonstrating how to access the data
var formattedOutput = result.Select(x => string.Format("{0}, {1} (of D1), {2} (of D2)", x.Key, x.Value.Item1 ?? "NoValue", x.Value.Item2 ?? "NoValue"));
foreach(var line in formattedOutput)
{
Console.WriteLine(line);
}

DataTable Select Expression

I want to select those gesellschaft_id's who have duplicates. I used the code below. this is selecting distinct gesellschaft_id. How to write the select expression to select that row, which rows, gesellschaft_id have more values in the datatable?
foreach (DataRow dr1 in table1.Rows)
{
DataRow[] drDup = table2.Select("('" + dr1[0].ToString() + "' = gesellschaft_id ) AND Count(gesellschaft_id)>1");
}
This will give you the DataRows which have a gesellschaft_id which exists in more than one row:
var rowsWithADuplicateGesellschaftId = table1.Rows
.Cast<DataRow>()
.GroupBy(row => row["gesellschaft_id"])
.Where(group => group.Count() > 1)
.ToArray();
public ArrayList FindDuplicateRows(DataTable dTable, string colName)
{
Hashtable hTable = new Hashtable();
ArrayList duplicateList = new ArrayList();
//add duplicate item value in arraylist.
foreach (DataRow drow in dTable.Rows)
{
if (hTable.Contains(drow[colName]))
duplicateList.Add(drow);
else
hTable.Add(drow[colName], string.Empty);
}
return duplicateList;
}
Also useful duplicate find and add/remove links:
http://www.dotnetspider.com/resources/4535-Remove-duplicate-records-from-table.aspx
http://www.dotnetspark.com/kb/94-remove-duplicate-rows-value-from-datatable.aspx
You could do something like this, assuming that the first column is the one which you want to check for duplicates. This of course assumes that you have Linq available.
var duplicateIds = table2.AsEnumerable()
.GroupBy(row = row[0])
.Where(x => x.Count() > 1);
If you add a bit more detail to the question, that would be helpful.

Join collection of objects into comma-separated string

In many places in our code we have collections of objects, from which we need to create a comma-separated list. The type of collection varies: it may be a DataTable from which we need a certain column, or a List<Customer>, etc.
Now we loop through the collection and use string concatenation, for example:
string text = "";
string separator = "";
foreach (DataRow row in table.Rows)
{
text += separator + row["title"];
separator = ", ";
}
Is there a better pattern for this? Ideally I would like an approach we could reuse by just sending in a function to get the right field/property/column from each object.
string.Join(", ", Array.ConvertAll(somelist.ToArray(), i => i.ToString()))
static string ToCsv<T>(IEnumerable<T> things, Func<T, string> toStringMethod)
{
StringBuilder sb = new StringBuilder();
foreach (T thing in things)
sb.Append(toStringMethod(thing)).Append(',');
return sb.ToString(0, sb.Length - 1); //remove trailing ,
}
Use like this:
DataTable dt = ...; //datatable with some data
Console.WriteLine(ToCsv(dt.Rows, row => row["ColName"]));
or:
List<Customer> customers = ...; //assume Customer has a Name property
Console.WriteLine(ToCsv(customers, c => c.Name));
I don't have a compiler to hand but in theory it should work. And as everyone knows, in theory, practice and theory are the same. In practice, they're not.
I found string.Join and lambda Select<Func<>> helps to write minimum code.
List<string> fruits = new List<string>();
fruits.Add("Mango");
fruits.Add("Banana");
fruits.Add("Papaya");
string commaSepFruits = string.Join(",", fruits.Select(f => "'" + f + "'"));
Console.WriteLine(commaSepFruits);
List<int> ids = new List<int>();
ids.Add(1001);
ids.Add(1002);
ids.Add(1003);
string commaSepIds = string.Join(",", ids);
Console.WriteLine(commaSepIds);
List<Customer> customers = new List<Customer>();
customers.Add(new Customer { Id = 10001, Name = "John" });
customers.Add(new Customer { Id = 10002, Name = "Robert" });
customers.Add(new Customer { Id = 10002, Name = "Ryan" });
string commaSepCustIds = string.Join(", ", customers.Select(cust => cust.Id));
string commaSepCustNames = string.Join(", ", customers.Select(cust => "'" + cust.Name + "'"));
Console.WriteLine(commaSepCustIds);
Console.WriteLine(commaSepCustNames);
Console.ReadLine();
// using System.Collections;
// using System.Collections.Generic;
// using System.Linq
public delegate string Indexer<T>(T obj);
public static string concatenate<T>(IEnumerable<T> collection, Indexer<T> indexer, char separator)
{
StringBuilder sb = new StringBuilder();
foreach (T t in collection) sb.Append(indexer(t)).Append(separator);
return sb.Remove(sb.Length - 1, 1).ToString();
}
// version for non-generic collections
public static string concatenate<T>(IEnumerable collection, Indexer<T> indexer, char separator)
{
StringBuilder sb = new StringBuilder();
foreach (object t in collection) sb.Append(indexer((T)t)).Append(separator);
return sb.Remove(sb.Length - 1, 1).ToString();
}
// example 1: simple int list
string getAllInts(IEnumerable<int> listOfInts)
{
return concatenate<int>(listOfInts, Convert.ToString, ',');
}
// example 2: DataTable.Rows
string getTitle(DataRow row) { return row["title"].ToString(); }
string getAllTitles(DataTable table)
{
return concatenate<DataRow>(table.Rows, getTitle, '\n');
}
// example 3: DataTable.Rows without Indexer function
string getAllTitles(DataTable table)
{
return concatenate<DataRow>(table.Rows, r => r["title"].ToString(), '\n');
}
In .NET 4 you can just do string.Join(", ", table.Rows.Select(r => r["title"]))
You could write a function that transforms a IEnumerable<string> into a comma-separated string:
public string Concat(IEnumerable<string> stringList)
{
StringBuilder textBuilder = new StringBuilder();
string separator = String.Empty;
foreach(string item in stringList)
{
textBuilder.Append(separator);
textBuilder.Append(item);
separator = ", ";
}
return textBuilder.ToString();
}
You can then use LINQ to query your collection/dataset/etc to provide the stringList.
As an aside: The first modification I would make is to use the StringBuilder Class instead of just a String - it'll save resources for you.
I love Matt Howells answer in this post:
I had to make it into an extension:
public static string ToCsv<T>(this IEnumerable<T> things, Func<T, string> toStringMethod)
Usage (I am getting all the emails and turning them into a CSV string for emails):
var list = Session.Find("from User u where u.IsActive = true").Cast<User>();
return list.ToCsv(i => i.Email);
For collections you can use this method as well, for example:
string.Join(", ", contactsCollection.Select(i => i.FirstName));
You can select any property that you want to separate.
string strTest = "1,2,4,6";
string[] Nums = strTest.Split(',');
Console.Write(Nums.Aggregate<string>((first, second) => first + "," + second));
//OUTPUT:
//1,2,4,6
Here's my favorite answer adapted to the question,
and corrected Convert to ConvertAll:
string text = string.Join(", ", Array.ConvertAll(table.Rows.ToArray(), i => i["title"]));

Categories

Resources