LINQ multiple group by, how to fill in dictionary? - c#

I'm trying to retrieve a dataset from RavenDB, group it by several categories and then store it per group in a dictionary. The retrieval and groupby have been solved. However, I'm stuck on how to put the combined group (group on 4 variables) into a dictionary. So in other words: the dictionary needs to be filled with each distinct name/year/month/day combination. I need this to later on display it in a graph - that part is already covered.
Dictionary<string, int> chartInformation = new Dictionary<string, int>();
List<string> xAxisCategories = new List<string>();
if (model.Period.Value == Timespan.Day)
{
var groupedRecords = transformedRecords.GroupBy(x => new
{
x.Name,
x.DateTo.Value.Year,
x.DateTo.Value.Month,
x.DateTo.Value.Day
});
foreach (var recordGroup in groupedRecords)
{
if (!chartInformation.ContainsKey(recordGroup.Key.Name,
recordGroup.Key.Year, recordGroup.Key.Month, recordGroup.Key.Day))
// how to do this properly
{
chartInformation.Add(recordGroup.Key.?, 0);
}
if (!xAxisCategories.Contains(recordGroup.Key.?))
{
xAxisCategories.Add(recordGroup.Key.?);
}
foreach (var record in recordGroup)
{
//filling stuff here
}
}
}

You need to project your group key to a string like this:
var groupedRecords = transformedRecords.GroupBy(x => String.Format("{0}-{1}-{2}-{3}",
x.Name,
x.DateTo.Value.Year,
x.DateTo.Value.Month,
x.DateTo.Value.Day));
Then you can iterate over the groups and check for existence of a certain key in your dictionary:
foreach(var group in groupedRecords)
{
if(!chartInformation.ContainsKey(group.Key))
{
chartInformation.Add(group.Key, 0)
}
}
However, keep in mind that projecting your key to a string as shown above can lead to collisions, i.e. rows that belong to different groups may end up in the same group.
Hope this helps.

Related

Merge data from two arrays or something else

How to combine Id from the list I get from file /test.json and id from list ourOrders[i].id?
Or if there is another way?
private RegionModel FilterByOurOrders(RegionModel region, List<OurOrderModel> ourOrders, MarketSettings market, bool byOurOrders)
{
var result = new RegionModel
{
updatedTs = region.updatedTs,
orders = new List<OrderModel>(region.orders.Count)
};
var json = File.ReadAllText("/test.json");
var otherBotOrders = JsonSerializer.Deserialize<OrdersTimesModel>(json);
OtherBotOrders = new Dictionary<string, OrderTimesInfoModel>();
foreach (var otherBotOrder in otherBotOrders.OrdersTimesInfo)
{
//OtherBotOrders.Add(otherBotOrder.Id, otherBotOrder);
BotController.WriteLine($"{otherBotOrder.Id}"); //Output ID orders to the console works
}
foreach (var order in region.orders)
{
if (ConvertToDecimal(order.price) < 1 || !byOurOrders)
{
int i = 0;
var isOurOrder = false;
while (i < ourOrders.Count && !isOurOrder)
{
if (ourOrders[i].id.Equals(order.id, StringComparison.InvariantCultureIgnoreCase))
{
isOurOrder = true;
}
++i;
}
if (!isOurOrder)
{
result.orders.Add(order);
}
}
}
return result;
}
OrdersTimesModel Looks like that:
public class OrdersTimesModel
{
public List<OrderTimesInfoModel> OrdersTimesInfo { get; set; }
}
test.json:
{"OrdersTimesInfo":[{"Id":"1"},{"Id":"2"}]}
Added:
I'll try to clarify the question:
There are three lists with ID:
First (all orders): region.orders, as order.id
Second (our orders): ourOrders, as ourOrders[i].id in a while loop
Third (our orders 2): from the /test.json file, as an array {"Orders":[{"Id":"12345..."...},{"Id":"12345..." ...}...]}
There is a foreach in which there is a while, where the First (all orders) list and the Second (our orders) list are compared. If the id's match, then these are our orders: isOurOrder = true;
Accordingly, those orders that isOurOrder = false; will be added to the result: result.orders.Add(order)
I need:
So that if (ourOrders[i].id.Equals(order.id, StringComparison.InvariantCultureIgnoreCase)) would include more Id's from the Third (our orders 2) list.
Or any other way to do it?
You should be able to completely avoid writing loops if you use LINQ (there will be loops running in the background, but it's way easier to read)
You can access some documentation here: https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/introduction-to-linq-queries
and you have some pretty cool extension methods for arrays: https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable?view=net-6.0 (these are great to get your code easy to read)
Solution
unsing System.Linq;
private RegionModel FilterByOurOrders(RegionModel region, List<OurOrderModel> ourOrders, MarketSettings market, bool byOurOrders)
{
var result = new RegionModel
{
updatedTs = region.updatedTs,
orders = new List<OrderModel>(region.orders.Count)
};
var json = File.ReadAllText("/test.json");
var otherBotOrders = JsonSerializer.Deserialize<OrdersTimesModel>(json);
// This line should get you an array containing
// JUST the ids in the JSON file
var idsFromJsonFile = otherBotOrders.Select(x => x.Id);
// Here you'll get an array with the ids for your orders
var idsFromOurOrders = ourOrders.Select(x => x.id);
// Union will only take unique values,
// so you avoid repetition.
var mergedArrays = idsFromJsonFile.Union(idsFromOurOrders);
// Now we just need to query the region orders
// We'll get every element that has an id contained in the arrays we created earlier
var filteredRegionOrders = region.orders.Where(x => !mergedArrays.Contains(x.id));
result.orders.AddRange(filteredRegionOrders );
return result;
}
You can add conditions to any of those actions (like checking for order price or the boolean flag you get as a parameter), and of course you can do it without assigning so many variables, I did it that way just to make it easier to explain.

c# linq list with varying where conditions

private void getOrders()
{
try
{
//headerFileReader is assigned with a CSV file (not shown here).
while (!headerFileReader.EndOfStream)
{
headerRow = headerFileReader.ReadLine();
getOrderItems(headerRow.Substring(0,8))
}
}
}
private void getOrderItems(string ordNum)
{
// lines is an array assigned with a CSV file...not shown here.
var sorted = lines.Skip(1).Select(line =>
new
{
SortKey = (line.Split(delimiter)[1]),
Line = line
})
.OrderBy(x => x.SortKey)
.Where(x => x.SortKey == ordNum);
//Note ordNum is different every time when it is passed.
foreach (var orderItems in sorted) {
//Process each line here.
}
}
Above is my code. What I am doing is for every order number from headerFile, I process the detailLines. I would like to only search for those lines specific to the order nr. The above logic works fine but it reads with where clause for every order number which simply is not required as well as delays the process.
I basically want to have getOrderItems something like below but I can't get as the sorted can't be passed but I think it should be possible??
private void getOrderItems(string ordNum)
{
// I would like to have sorted uploaded with data elsewhere and I pass it this function and reference it by other means but I am not able to get it.
var newSorted = sorted.Where(x => x.SortKey == docNum);
foreach (var orderItems in newSorted) {
//Process each line here.
}
}
Please suggest.
UPDATE : Thanks for the responses & improvements but my main question is I don't want to create the list every time (like I have shown in my code). What I want is to create the list first time and then only search within the list for a particular value (here docNum as shown). Please suggest.
It might be a good idea to preprocess your input lines and build a dictionary, where each distinct sort key maps to a list of lines. Building the dictionary is O(n), and after that you get constant time O(1) lookups:
// these are your unprocessed file lines
private string[] lines;
// this dictionary will map each `string` key to a `List<string>`
private Dictionary<string, List<string>> groupedLines;
// this is the method where you are loading your files (you didn't include it)
void PreprocessInputData()
{
// you already have this part somewhere
lines = LoadLinesFromCsv();
// after loading, group the lines by `line.Split(delimiter)[1]`
groupedLines = lines
.Skip(1)
.GroupBy(line => line.Split(delimiter)[1])
.ToDictionary(x => x.Key, x => x.ToList());
}
private void ProcessOrders()
{
while (!headerFileReader.EndOfStream)
{
var headerRow = headerFileReader.ReadLine();
List<string> itemsToProcess = null;
if (groupedLines.TryGetValue(headerRow, out itemsToProcess))
{
// if you are here, then
// itemsToProcess contains all lines where
// (line.Split(delimiter)[1]) == headerRow
}
else
{
// no such key in the dictionary
}
}
}
The following will get your way and also be more efficient.
var sorted = lines.Skip(1)
.Where(line => (line.Split(delimiter)[1] == ordNum))
.Select(
line =>
new
{
SortKey = (line.Split(delimiter)[1]),
Line = line
}
)
.OrderBy(x => x.SortKey);

Why is RemoveAll(x => x.Condition) removing all my records?

I'm working on creating a filter for a collection of employees. In order to do this I initially fetch a raw collection of all employees. I clone this list so I can iterate over the original list but remove items from the second list.
For each filter I have, I build a collection of employee ids that pass the filter. Having gone through all filters I then attempt to remove everything that isn't contained in any of these lists from the cloned list.
However for some reason, whenever I attempt to do this using .RemoveAll(), all records seemed to be removed and I can't figure out why.
Here is a stripped down version of the method I'm using, with only 1 filter applied:
public List<int> GetFilteredEmployeeIds(int? brandId)
{
List<int> employeeIds = GetFilteredEmployeeIdsBySearchTerm();
List<int> filteredEmployeeIds = employeeIds.Clone();
// Now filter the results based on which checkboxes are ticked
foreach (var employeeId in employeeIds)
{
// 3rd party API used to get values - please ignore for this example
Member m = new Member(employeeId);
if (m.IsInGroup("Employees"))
{
int memberBrandId = Convert.ToInt32(m.getProperty("brandID").Value);
// Filter by brand
List<int> filteredEmployeeIdsByBrand = new List<int>();
if (brandId != null)
{
if (brandId == memberBrandId)
filteredEmployeeIdsByBrand.Add(m.Id);
var setToRemove = new HashSet<int>(filteredEmployeeIdsByBrand);
filteredEmployeeIds.RemoveAll(x => !setToRemove.Contains(x));
}
}
}
return filteredEmployeeIds;
}
As you can see, I'm basically attempting to remove all records from the cloned record set, wherever the id doesn't match in the second collection. However for some reason every record seems to be getting removed.
Anybody know why?
P.S: Just to clarify, I have put in logging to check the values throughout the process and there are records appearing in the second list, however for whatever reason they're not getting matched in the RemoveAll()
Thanks
Ok only minutes after posting this I realised what I did wrong: The scoping is incorrect. What it should've been was like so:
public List<int> GetFilteredEmployeeIds(int? brandId)
{
List<int> employeeIds = GetFilteredEmployeeIdsBySearchTerm();
List<int> filteredEmployeeIds = employeeIds.Clone();
List<int> filteredEmployeeIdsByBrand = new List<int>();
// Now filter the results based on which checkboxes are ticked
foreach (var employeeId in employeeIds)
{
Member m = new Member(employeeId);
if (m.IsInGroup("Employees"))
{
int memberBrandId = Convert.ToInt32(m.getProperty("brandID").Value);
// Filter by brand
if (brandId != null)
{
if (brandId == memberBrandId)
filteredEmployeeIdsByBrand.Add(m.Id);
}
}
}
var setToRemove = new HashSet<int>(filteredEmployeeIdsByBrand);
filteredEmployeeIds.RemoveAll(x => !setToRemove.Contains(x));
return filteredEmployeeIds;
}
Essentially the removal of entries needed to be done outside the loop of the employee ids :-)
I know that you said your example was stripped down, so maybe this wouldn't suit, but could you do something like the following:
public List<int> GetFilteredEmployeeIds(int? brandId)
{
List<int> employeeIds = GetFilteredEmployeeIdsBySearchTerm();
return employeeIds.Where(e => MemberIsEmployeeWithBrand(e, brandId)).ToList();
}
private bool MemberIsEmployeeWithBrand(int employeeId, int? brandId)
{
Member m = new Member(employeeId);
if (!m.IsInGroup("Employees"))
{
return false;
}
int memberBrandId = Convert.ToInt32(m.getProperty("brandID").Value);
return brandId == memberBrandId;
}
I've just done that off the top of my head, not tested, but if all you need to do is filter the employee ids, then maybe you don't need to clone the original list, just use the Where function to do the filtering on it directly??
Please someone let me know if i've done something blindingly stupid!!

Is this way of using dictionary correct to store product info and apply calculations

I have a dictionary object containing product names and corresponding prices:
var products = new List<Dictionary<string, decimal>>()
{
new Dictionary<string, decimal> {{"product1", 10}},
new Dictionary<string, decimal> {{"product2", 20}},
new Dictionary<string, decimal> {{"product3", 30}}
};
I then loop through this and apply some calculations per product:
foreach (var product in products)
{
foreach (KeyValuePair<string, decimal> kvp in product)
{
var result = GetRatePerProduct(kvp.Key) * kvp.Value;
}
}
GetRatePerProduct simply takes the productname and returns a decimal rate.
Based on this loop the calculated results will be inserted into orders table which contains a column per product, Product1, Product2, Product3...
So after the loop the table should look like this:
Product1 Product2 Product3 ...
12 24 36
I thought about creating another dictionary like this:
var results = new List<Dictionary<string, decimal>>();
and populating it within the for loop and then outside the loop use this dictionary to insert the data into SQL database. I have shortened the number of products in this example, but we will have 52 products, this number will never change.
Is this approach right? Is there a better way of doing this, possibly using Linq? Can you point to me into the right direction in terms of doing this with Linq?
Thanks
There is nothing particularly wrong with the way you have done it, though there are lots of other options.
In c# you can easily use Linq for that sort of operation, and you could also very easily parallelize the operation.
You have the question tagged with 'sql-server-2008' which makes me wonder whether the data for this operation is coming from a database. If so it may be more efficient to do the work in SQL rather than in C#.
So in answer to your two questions:
Is this approach right? - it is certainly not wrong.
Is there a better way of doing this? - there are other ways, but whether they are better depends on factors not mentioned in the question.
... more:
I've come up with two linq-ish approaches. The first preserves your data structures:
var products = new List<Dictionary<string, decimal>>()
{
new Dictionary<string, decimal> {{"product1", 10}},
new Dictionary<string, decimal> {{"product2", 20}},
new Dictionary<string, decimal> {{"product3", 30}}
};
var results = new List<Dictionary<string, decimal>>();
results.AddRange(products.Select(product =>
{
var resultDictionary = new Dictionary<string, decimal>();
foreach (string key in product.Keys)
{
resultDictionary.Add(key, GetRatePerProduct(key));
}
return resultDictionary;
}));
The second makes a new Product class to cover the three properties you are working with, but does the same work with them.
public class Product
{
public string ProductName { get; set; }
public decimal ProductValue { get; set; }
public decimal ProductResult { get; set; }
}
var products = new List<Product>()
{
new Product () {ProductName = "product1", ProductValue = 10},
new Product () {ProductName = "product2", ProductValue = 20},
new Product () {ProductName = "product3", ProductValue = 30},
};
var results = new List<Dictionary<string, decimal>>();
foreach (var product in products)
{
product.ProductValue = GetRatePerProduct(product.ProductName);
}
// the next two lines do exactly the same thing, just one of them is explicitly parallelized
products.ForEach(product => { product.ProductValue = GetRatePerProduct(product.ProductName); });
Parallel.ForEach(products, product => { product.ProductValue = GetRatePerProduct(product.ProductName); });
No reason to use dictionary objects if you are only going to use it to store one key - value pair.
Much better solution is to create a class Products that has properties Name and Price and then to add those objects to the list.

get unique values from query to build Dictionary

I want to build a combobox with key->postal and value->city to use as filter for my accomodations.
To limit the number of items in the list I only use the postals I have used when filling up the table tblAccomodations.
For now I do not use a relational table with postals and city's although I'm thinking about an update later on.
Here I build my dictionary:
public static Dictionary<int, string> getPostals()
{
Dictionary<int, string> oPostals = new Dictionary<int, string>();
using (DBReservationDataContext oReservation = new DBReservationDataContext())
{
var oAllPostals = (from oAccomodation in oReservation.tblAccomodations
orderby oAccomodation.Name ascending
select oAccomodation);
foreach (tblAccomodation item in oAllPostals)
{
oPostals.Add(int.Parse(item.Postal), item.City);
}
}
return oPostals;
}
As expected I got an error: some Accomodations are located in the same city, so there are double values for the key. So how can I get a list of unique cities and postals (as key)?
I tried to use
select oAccomodation.Postal.Distinct()
but that didn't work either.
UPDATE: I have found the main problem. There are multiple cities with the same postal ("Subcity"). So I'm gonna filter on "City" and not on "Postal".
I think your looking for 'Distinct'. Gather your list of all postals and then return myPostals.Distinct().
Hope than helps.
change
foreach (tblAccomodation item in oAllPostals)
{
oPostals.Add(int.Parse(item.Postal), item.City);
}
to
foreach (tblAccomodation item in oAllPostals.Distinct(x=>x..Postal)
{
if(!oPostals.ContainsKey(int.Parse(item.Postal)))
oPostals.Add(int.Parse(item.Postal), item.City);
}
BTW, if you have multiple cities in one postal (I am not sure if it is possible in your domain), which one you want to see?
If any of cities will do, then it is easy to just get the first one per postal:
var oAllPostals = oReservation.tblAccomodations
.OrderBy(x=>x.Name)
.ToLookup(x=>x.Postal, x=>x.City)
.ToDictionary(x=>x.Key, x.First());
In the same example if you do .ToList() or even .Distinct().ToList() instead of .First() you will have all of cities in the dictionary of Dictionary<Postal, List<City>>.
Assuming the combination of postal + city is unique you could do the following:
public static Dictionary<int, string> getPostals()
{
Dictionary<int, string> oPostals = new Dictionary<int, string>();
using (DBReservationDataContext oReservation = new DBReservationDataContext())
{
var oAllPostals = (from oAccomodation in oReservation.tblAccomodations
orderby oAccomodation.Name ascending
select oAccomodation);
foreach (tblAccomodation item in oAllPostals)
{
oPostals.Add((item.Postal + item.City).GetHashCode(), item.Postal + " " + item.City);
}
}
return oPostals;
}
Edit:
If you want to use the selected value from the drop box then you can use the following:
public static Dictionary<int, Tuple<string, string>> getPostals()
{
Dictionary<int, string> oPostals = new Dictionary<int, string>();
using (DBReservationDataContext oReservation = new DBReservationDataContext())
{
var oAllPostals = (from oAccomodation in oReservation.tblAccomodations
orderby oAccomodation.Name ascending
select oAccomodation);
foreach (tblAccomodation item in oAllPostals)
{
oPostals.Add((item.Postal + item.City).GetHashCode(), new Tuple<string, string>(item.Postal, item.City));
}
}
return oPostals;
}
The way you bind the following depends on whether you're using asp.net, winforms etc. Here's an example for winforms.
Using .containkey will exclude [1 (postal key) to n (cities relation)]. i.e since Key already exists next city (with the same postal key ) will not get into your dictionary.
However, if you want to map your postal to list of cities, you can represent a dictionary that can contain a collection of values like the following:
Dictionary < String[Postal]> , List < Cities>>
This way you'll have a dictionary that can have multiple values.

Categories

Resources