c# linq list with varying where conditions - c#

private void getOrders()
{
try
{
//headerFileReader is assigned with a CSV file (not shown here).
while (!headerFileReader.EndOfStream)
{
headerRow = headerFileReader.ReadLine();
getOrderItems(headerRow.Substring(0,8))
}
}
}
private void getOrderItems(string ordNum)
{
// lines is an array assigned with a CSV file...not shown here.
var sorted = lines.Skip(1).Select(line =>
new
{
SortKey = (line.Split(delimiter)[1]),
Line = line
})
.OrderBy(x => x.SortKey)
.Where(x => x.SortKey == ordNum);
//Note ordNum is different every time when it is passed.
foreach (var orderItems in sorted) {
//Process each line here.
}
}
Above is my code. What I am doing is for every order number from headerFile, I process the detailLines. I would like to only search for those lines specific to the order nr. The above logic works fine but it reads with where clause for every order number which simply is not required as well as delays the process.
I basically want to have getOrderItems something like below but I can't get as the sorted can't be passed but I think it should be possible??
private void getOrderItems(string ordNum)
{
// I would like to have sorted uploaded with data elsewhere and I pass it this function and reference it by other means but I am not able to get it.
var newSorted = sorted.Where(x => x.SortKey == docNum);
foreach (var orderItems in newSorted) {
//Process each line here.
}
}
Please suggest.
UPDATE : Thanks for the responses & improvements but my main question is I don't want to create the list every time (like I have shown in my code). What I want is to create the list first time and then only search within the list for a particular value (here docNum as shown). Please suggest.

It might be a good idea to preprocess your input lines and build a dictionary, where each distinct sort key maps to a list of lines. Building the dictionary is O(n), and after that you get constant time O(1) lookups:
// these are your unprocessed file lines
private string[] lines;
// this dictionary will map each `string` key to a `List<string>`
private Dictionary<string, List<string>> groupedLines;
// this is the method where you are loading your files (you didn't include it)
void PreprocessInputData()
{
// you already have this part somewhere
lines = LoadLinesFromCsv();
// after loading, group the lines by `line.Split(delimiter)[1]`
groupedLines = lines
.Skip(1)
.GroupBy(line => line.Split(delimiter)[1])
.ToDictionary(x => x.Key, x => x.ToList());
}
private void ProcessOrders()
{
while (!headerFileReader.EndOfStream)
{
var headerRow = headerFileReader.ReadLine();
List<string> itemsToProcess = null;
if (groupedLines.TryGetValue(headerRow, out itemsToProcess))
{
// if you are here, then
// itemsToProcess contains all lines where
// (line.Split(delimiter)[1]) == headerRow
}
else
{
// no such key in the dictionary
}
}
}

The following will get your way and also be more efficient.
var sorted = lines.Skip(1)
.Where(line => (line.Split(delimiter)[1] == ordNum))
.Select(
line =>
new
{
SortKey = (line.Split(delimiter)[1]),
Line = line
}
)
.OrderBy(x => x.SortKey);

Related

Merge data from two arrays or something else

How to combine Id from the list I get from file /test.json and id from list ourOrders[i].id?
Or if there is another way?
private RegionModel FilterByOurOrders(RegionModel region, List<OurOrderModel> ourOrders, MarketSettings market, bool byOurOrders)
{
var result = new RegionModel
{
updatedTs = region.updatedTs,
orders = new List<OrderModel>(region.orders.Count)
};
var json = File.ReadAllText("/test.json");
var otherBotOrders = JsonSerializer.Deserialize<OrdersTimesModel>(json);
OtherBotOrders = new Dictionary<string, OrderTimesInfoModel>();
foreach (var otherBotOrder in otherBotOrders.OrdersTimesInfo)
{
//OtherBotOrders.Add(otherBotOrder.Id, otherBotOrder);
BotController.WriteLine($"{otherBotOrder.Id}"); //Output ID orders to the console works
}
foreach (var order in region.orders)
{
if (ConvertToDecimal(order.price) < 1 || !byOurOrders)
{
int i = 0;
var isOurOrder = false;
while (i < ourOrders.Count && !isOurOrder)
{
if (ourOrders[i].id.Equals(order.id, StringComparison.InvariantCultureIgnoreCase))
{
isOurOrder = true;
}
++i;
}
if (!isOurOrder)
{
result.orders.Add(order);
}
}
}
return result;
}
OrdersTimesModel Looks like that:
public class OrdersTimesModel
{
public List<OrderTimesInfoModel> OrdersTimesInfo { get; set; }
}
test.json:
{"OrdersTimesInfo":[{"Id":"1"},{"Id":"2"}]}
Added:
I'll try to clarify the question:
There are three lists with ID:
First (all orders): region.orders, as order.id
Second (our orders): ourOrders, as ourOrders[i].id in a while loop
Third (our orders 2): from the /test.json file, as an array {"Orders":[{"Id":"12345..."...},{"Id":"12345..." ...}...]}
There is a foreach in which there is a while, where the First (all orders) list and the Second (our orders) list are compared. If the id's match, then these are our orders: isOurOrder = true;
Accordingly, those orders that isOurOrder = false; will be added to the result: result.orders.Add(order)
I need:
So that if (ourOrders[i].id.Equals(order.id, StringComparison.InvariantCultureIgnoreCase)) would include more Id's from the Third (our orders 2) list.
Or any other way to do it?
You should be able to completely avoid writing loops if you use LINQ (there will be loops running in the background, but it's way easier to read)
You can access some documentation here: https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/introduction-to-linq-queries
and you have some pretty cool extension methods for arrays: https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable?view=net-6.0 (these are great to get your code easy to read)
Solution
unsing System.Linq;
private RegionModel FilterByOurOrders(RegionModel region, List<OurOrderModel> ourOrders, MarketSettings market, bool byOurOrders)
{
var result = new RegionModel
{
updatedTs = region.updatedTs,
orders = new List<OrderModel>(region.orders.Count)
};
var json = File.ReadAllText("/test.json");
var otherBotOrders = JsonSerializer.Deserialize<OrdersTimesModel>(json);
// This line should get you an array containing
// JUST the ids in the JSON file
var idsFromJsonFile = otherBotOrders.Select(x => x.Id);
// Here you'll get an array with the ids for your orders
var idsFromOurOrders = ourOrders.Select(x => x.id);
// Union will only take unique values,
// so you avoid repetition.
var mergedArrays = idsFromJsonFile.Union(idsFromOurOrders);
// Now we just need to query the region orders
// We'll get every element that has an id contained in the arrays we created earlier
var filteredRegionOrders = region.orders.Where(x => !mergedArrays.Contains(x.id));
result.orders.AddRange(filteredRegionOrders );
return result;
}
You can add conditions to any of those actions (like checking for order price or the boolean flag you get as a parameter), and of course you can do it without assigning so many variables, I did it that way just to make it easier to explain.

Fastest equivalent of comparing all elements of array

What is the fastest equivalent in C#/LINQ to compare all combination of elements of an array like so and add them to a bucket if they are not in a bucket. AKA. How could I optimize this piece of code in C#.
// pseudocode
List<T> elements = { .... }
HashSet<T> bucket = {}
foreach (T element in elements)
foreach (var myelemenet in elements.Where(e => e.id != element.id))
{
if (!element.notInTheList)
{
_elementIsTheSame = element.Equals(myelement);
if (_elementIsTheSame)
{
// append element to a bucket
if (!elementIsInTheBucket(bucket, element))
{
element.notInTheList = true;
addToBucket(bucket, element);
}
}
}
}
}
// takes about 150ms on a fast workstation with only 300 elements in the LIST!
The final order of the elements in the bucket is important
elements.GroupBy(x=>x).SelectMany(x=>x);
https://dotnetfiddle.net/yZ9JDp
This works because GroupBy preserves order.
Note that this puts the first element of each equivalence class first. Your code puts the first element last, and skips classes with just a single element.
Skipping the classes with just a single element can be done with a where before the SelectMany.
elements.GroupBy(x=>x).Where(x=>x.Skip(1).Any()).SelectMany(x=>x);
Getting the first element last is a bit more tricky, but I suspect it's a bug in your code so I will not try to write it out.
Depending on how you use the result you might want to throw a ToList() at the end.
It sounds like you're effectively after DistinctBy? Which can be simulated with something like:
var list = new List<MainType>();
var known = new HashSet<PropertyType>();
foreach (var item in source)
{
if (known.Add(item.TheProperty))
list.Add(item);
}
You now have a list of the items taking the first only when there are duplicates via the selected property, preserving order.
If the intent is to find the fastest solution, as the StackOverflow title suggests, then I would consider using CSharp's Parallel.ForEach to perform a map and reduce.
For example:
var resultsCache = new IRecord[_allRecords.Length];
var resultsCount = 0;
var parallelOptions = new ParallelOptions
{
MaxDegreeOfParallelism = 1 // use an appropriate value
};
Parallel.ForEach(
_allRecords,
parallelOptions,
// Part1: initialize thread local storage
() => { return new FilterMapReduceState(); },
// Part2: define task to perform
(record, parallelLoopState, index, results) =>
{
if (_abortFilterOperation)
{
parallelLoopState.Break();
}
if (strategy.CanKeep(record))
{
resultsCache[index] = record;
results.Count++;
}
else
{
resultsCache[index] = null;
}
return results;
},
// Part3: merge the results
(results) =>
{
Interlocked.Add(ref resultsCount, results.Count);
}
);
where
class FilterMapReduceState
{
public FilterMapReduceState()
{
this.Count = 0;
}
/// <summary>
/// Represents the number of records that meet the search criteria.
/// </summary>
internal int Count { get; set; }
}
As I understand from what you did you need is these pieces.
var multipleIds=elements.GroupBy(x => x.id)
.Where(g => g.Count() > 1)
.Select(y => y.Key);
var distinctIds=elements.Select(x=>x.id).Distinct();
var distinctElements=elements.Where(x=>distinctIds.Contains(x.id));
}

List function, how to get an average of scores for each name- c# console application

I have a list function on a console application on C#. This list function has different items where they look something like 'matt,5' 'matt,7' 'jack,4' 'jack,8' etc...
I want to be able to combine all of the names where I only see their name written once but the number after them are averaged out so it would be like 'jack,5+7/2' which would then display as 'jack,6'.
So far I have this...
currentFileReader = new StreamReader(file);
List<string> AverageList = new List<string>();
while (!currentFileReader.EndOfStream)
{
string text = currentFileReader.ReadLine();
AverageList.Add(text.ToString());
}
AverageList.GroupBy(n => n).Any(c => c.Count() > 1);
Not really sure where to go from here.
What you need is to Split your each string item on , and then group by first element of the returned array and average second element of the array (after parsing it to int) something like:
List<string> AverageList = new List<string> { "matt,5", "matt,7", "jack,4", "jack,8" };
var query = AverageList.Select(s => s.Split(','))
.GroupBy(sp => sp[0])
.Select(grp =>
new
{
Name = grp.Key,
Avg = grp.Average(t=> int.Parse(t[1])),
});
foreach (var item in query)
{
Console.WriteLine("Name: {0}, Avg: {1}", item.Name, item.Avg);
}
and it will give you:
Name: matt, Avg: 6
Name: jack, Avg: 6
But, a better option would be to use a class with Name and Score properties instead of comma separated string values.
(The code above doesn't check for invalid input values).
Firstly you will want to populate your unformatted data into a List, as you can see I called it rawScores. You could then Split each line by the comma delimiting them. You can then check to see if an existing person is in your Dictionary and add his score to it, or if not create a new person.
After that you would simply have to generate the Average of the List.
Hope this helps!
var scores = new Dictionary<string, List<int>>();
var rawScores = new List<string>();
rawScores.ForEach(raw =>
{
var split = raw.Split(',');
if (scores.Keys.All(a => a != split[0]))
{
scores.Add(split[0], new List<int> {Convert.ToInt32(split[1])});
}
else
{
var existing = scores.FirstOrDefault(f => f.Key == split[0]);
existing.Value.Add(Convert.ToInt32(split[1]));
}
});

C# dedupe List based on split

I'm having a hard time deduping a list based on a specific delimiter.
For example I have 4 strings like below:
apple|pear|fruit|basket
orange|mango|fruit|turtle
purple|red|black|green
hero|thor|ironman|hulk
In this example I should want my list to only have unique values in column 3, so it would result in an List that looks like this,
apple|pear|fruit|basket
purple|red|black|green
hero|thor|ironman|hulk
In the above example I would have gotten rid of line 2 because line 1 had the same result in column 3. Any help would be awesome, deduping is tough in C#.
how i'm testing this:
static void Main(string[] args)
{
BeginListSet = new List<string>();
startHashSet();
}
public static List<string> BeginListSet { get; set; }
public static void startHashSet()
{
string[] BeginFileLine = File.ReadAllLines(#"C:\testit.txt");
foreach (string begLine in BeginFileLine)
{
BeginListSet.Add(begLine);
}
}
public static IEnumerable<string> Dedupe(IEnumerable<string> list, char seperator, int keyIndex)
{
var hashset = new HashSet<string>();
foreach (string item in list)
{
var array = item.Split(seperator);
if (hashset.Add(array[keyIndex]))
yield return item;
}
}
Something like this should work for you
static IEnumerable<string> Dedupe(this IEnumerable<string> input, char seperator, int keyIndex)
{
var hashset = new HashSet<string>();
foreach (string item in input)
{
var array = item.Split(seperator);
if (hashset.Add(array[keyIndex]))
yield return item;
}
}
...
var list = new string[]
{
"apple|pear|fruit|basket",
"orange|mango|fruit|turtle",
"purple|red|black|green",
"hero|thor|ironman|hulk"
};
foreach (string item in list.Dedupe('|', 2))
Console.WriteLine(item);
Edit: In the linked question Distinct() with Lambda, Jon Skeet presents the idea in a much better fashion, in the form of a DistinctBy custom method. While similar, his is far more reusable than the idea presented here.
Using his method, you could write
var deduped = list.DistinctBy(item => item.Split('|')[2]);
And you could later reuse the same method to "dedupe" another list of objects of a different type by a key of possibly yet another type.
Try this:
var list = new string[]
{
"apple|pear|fruit|basket",
"orange|mango|fruit|turtle",
"purple|red|black|green",
"hero|thor|ironman|hulk "
};
var dedup = new List<string>();
var filtered = new List<string>();
foreach (var s in list)
{
var filter = s.Split('|')[2];
if (dedup.Contains(filter)) continue;
filtered.Add(s);
dedup.Add(filter);
}
// Console.WriteLine(filtered);
Can you use a HashSet instead? That will eliminate dupes automatically for you as they are added.
May be you can sort the words with delimited | on alphabetical order. Then store them onto grid (columns). Then when you try to insert, just check if there is column having a word which starting with this char.
If LINQ is an option, you can do something like this:
// assume strings is a collection of strings
List<string> list = strings.Select(a => a.Split('|')) // split each line by '|'
.GroupBy(a => a[2]) // group by third column
.Select(a => a.First()) // select first line from each group
.Select(a => string.Join("|", a))
.ToList(); // convert to list of strings
Edit (per Jeff Mercado's comment), this can be simplified further:
List<string> list =
strings.GroupBy(a => a.split('|')[2]) // group by third column
.Select(a => a.First()) // select first line from each group
.ToList(); // convert to list of strings

Compare adjacent list items

I'm writing a duplicate file detector. To determine if two files are duplicates I calculate a CRC32 checksum. Since this can be an expensive operation, I only want to calculate checksums for files that have another file with matching size. I have sorted my list of files by size, and am looping through to compare each element to the ones above and below it. Unfortunately, there is an issue at the beginning and end since there will be no previous or next file, respectively. I can fix this using if statements, but it feels clunky. Here is my code:
public void GetCRCs(List<DupInfo> dupInfos)
{
var crc = new Crc32();
for (int i = 0; i < dupInfos.Count(); i++)
{
if (dupInfos[i].Size == dupInfos[i - 1].Size || dupInfos[i].Size == dupInfos[i + 1].Size)
{
dupInfos[i].CheckSum = crc.ComputeChecksum(File.ReadAllBytes(dupInfos[i].FullName));
}
}
}
My question is:
How can I compare each entry to its neighbors without the out of bounds error?
Should I be using a loop for this, or is there a better LINQ or other function?
Note: I did not include the rest of my code to avoid clutter. If you want to see it, I can include it.
Compute the Crcs first:
// It is assumed that DupInfo.CheckSum is nullable
public void GetCRCs(List<DupInfo> dupInfos)
{
dupInfos[0].CheckSum = null ;
for (int i = 1; i < dupInfos.Count(); i++)
{
dupInfos[i].CheckSum = null ;
if (dupInfos[i].Size == dupInfos[i - 1].Size)
{
if (dupInfos[i-1].Checksum==null) dupInfos[i-1].CheckSum = crc.ComputeChecksum(File.ReadAllBytes(dupInfos[i-1].FullName));
dupInfos[i].CheckSum = crc.ComputeChecksum(File.ReadAllBytes(dupInfos[i].FullName));
}
}
}
After having sorted your files by size and crc, identify duplicates:
public void GetDuplicates(List<DupInfo> dupInfos)
{
for (int i = dupInfos.Count();i>0 i++)
{ // loop is inverted to allow list items deletion
if (dupInfos[i].Size == dupInfos[i - 1].Size &&
dupInfos[i].CheckSum != null &&
dupInfos[i].CheckSum == dupInfos[i - 1].Checksum)
{ // i is duplicated with i-1
... // your code here
... // eventually, dupInfos.RemoveAt(i) ;
}
}
}
I have sorted my list of files by size, and am looping through to
compare each element to the ones above and below it.
The next logical step is to actually group your files by size. Comparing consecutive files will not always be sufficient if you have more than two files of the same size. Instead, you will need to compare every file to every other same-sized file.
I suggest taking this approach
Use LINQ's .GroupBy to create a collection of files sizes. Then .Where to only keep the groups with more than one file.
Within those groups, calculate the CRC32 checksum and add it to a collection of known checksums. Compare with previously calculated checksums. If you need to know which files specifically are duplicates you could use a dictionary keyed by this checksum (you can achieve this with another GroupBy. Otherwise a simple list will suffice to detect any duplicates.
The code might look something like this:
var filesSetsWithPossibleDupes = files.GroupBy(f => f.Length)
.Where(group => group.Count() > 1);
foreach (var grp in filesSetsWithPossibleDupes)
{
var checksums = new List<CRC32CheckSum>(); //or whatever type
foreach (var file in grp)
{
var currentCheckSum = crc.ComputeChecksum(file);
if (checksums.Contains(currentCheckSum))
{
//Found a duplicate
}
else
{
checksums.Add(currentCheckSum);
}
}
}
Or if you need the specific objects that could be duplicates, the inner foreach loop might look like
var filesSetsWithPossibleDupes = files.GroupBy(f => f.FileSize)
.Where(grp => grp.Count() > 1);
var masterDuplicateDict = new Dictionary<DupStats, IEnumerable<DupInfo>>();
//A dictionary keyed by the basic duplicate stats
//, and whose value is a collection of the possible duplicates
foreach (var grp in filesSetsWithPossibleDupes)
{
var likelyDuplicates = grp.GroupBy(dup => dup.Checksum)
.Where(g => g.Count() > 1);
//Same GroupBy logic, but applied to the checksum (instead of file size)
foreach(var dupGrp in likelyDuplicates)
{
//Create the key for the dictionary (your code is likely different)
var sample = dupGrp.First();
var key = new DupStats() {FileSize = sample.FileSize, Checksum = sample.Checksum};
masterDuplicateDict.Add(key, dupGrp);
}
}
A demo of this idea.
I think the for loop should be : for (int i = 1; i < dupInfos.Count()-1; i++)
var grps= dupInfos.GroupBy(d=>d.Size);
grps.Where(g=>g.Count>1).ToList().ForEach(g=>
{
...
});
Can you do a union between your two lists? If you have a list of filenames and do a union it should result in only a list of the overlapping files. I can write out an example if you want but this link should give you the general idea.
https://stackoverflow.com/a/13505715/1856992
Edit: Sorry for some reason I thought you were comparing file name not size.
So here is an actual answer for you.
using System;
using System.Collections.Generic;
using System.Linq;
public class ObjectWithSize
{
public int Size {get; set;}
public ObjectWithSize(int size)
{
Size = size;
}
}
public class Program
{
public static void Main()
{
Console.WriteLine("start");
var list = new List<ObjectWithSize>();
list.Add(new ObjectWithSize(12));
list.Add(new ObjectWithSize(13));
list.Add(new ObjectWithSize(14));
list.Add(new ObjectWithSize(14));
list.Add(new ObjectWithSize(18));
list.Add(new ObjectWithSize(15));
list.Add(new ObjectWithSize(15));
var duplicates = list.GroupBy(x=>x.Size)
.Where(g=>g.Count()>1);
foreach (var dup in duplicates)
foreach (var objWithSize in dup)
Console.WriteLine(objWithSize.Size);
}
}
This will print out
14
14
15
15
Here is a netFiddle for that.
https://dotnetfiddle.net/0ub6Bs
Final note. I actually think your answer looks better and will run faster. This was just an implementation in Linq.

Categories

Resources