Quicker way to separate lists

Quicker way to separate lists - c#

I have two classes.
class Vehicle{
public string VehicleId {get;set;}
public string VinNo {get;set;}
public int ModelYear {get;set;}
public string Make {get;set;}
public bool WaterDamaged {get;set;}
}
class Vins{
public string VinNo {get;set;}
}
I have populated a list of vehicles from the database using this class structure.
So the code looks somewhat like this.
List<Vehicle> vehicles = GetAllVehicles();
Another list I have is sourced from an file. This list contains all the VINs that are water damaged.
I was able to use the same structure for this class as above.
List<Vins> damaged = ReadFile();
List<Vehicles> damagedGoods = new List <Vehicles>();
List<Vehicles> goodGoods = new List <Vehicles>();
I need to create two separate XML files using this info. The first will be called DamagedVehicles_{date} and the next will be GoodVehicles_{date}.
So what I did was write a loop like this.
foreach(var v in damaged)
{
foreach(var v2 in vehicles)
{
if(v.VinNo == v2.VinNo)
{
damagedGoods.Add(new Vehicle{});
}
else
{
goodGoods.Add(new Vehicle{});
}
}
}
This is causing a minor issue. Firstly the goodGoods are getting duplicates, which I later weed out.
Secondly if I receive a list of 80,000 vehicles, it takes a long time for this to process.
Is there some way I can speed the processing and avoid the duplicates?

Your nested foreach is performing a cross product of the two lists. That's...not what you want. Not only is it an inherently expensive operation, but it's results simply aren't in line with what you want.
What you want to do is something like this:
foreach(var vehicle in vehicles)
{
if(damaged.Contains(vehicle.VinN)
{
damagedGoods.Add(new Vehicle{});
}
else
{
goodGoods.Add(new Vehicle{});
}
}
(Note the outer loop is removed entirely.)
This can be further improved due to the fact that List is not particularly efficient at searching. If we use a HashSet to hold onto the damaged vehicles the Contains will be much faster. This is easy enough to do:
HashSet<Vins> damaged = new HashSet<Vins>(ReadFile());

Related

Entity Framework, how to make this more generic to write less code

So, I am writing a program, and getting my data using EF 6.
Basically, I though the simplest approach would be to have one class like this:
public class DataRetriever
{
public List<TEntity> GetAll<TEntity>() where TEntity : class
{
using (var Db = new Entity())
{
return Db.Set<TEntity>().ToList();
}
}
}
So then, you care start creating other classes on the basis of specifying which data you want to be collected. So say I have a list of carnival rides and one method is to get a single ride or something. SO then I would have the following:
public class SelectedRide
{
int RideId { get; set; }
string RideName { get; set; }
string DomValue { get; set; }
public SelectedRide(DataRetriever Retriever)
{
var Records = Retriever.GetAll<vwRideList>();
var Record = from x in Records
where x.RideId == RideId
select x;
RideName = Record.Single().RideName;
DomValue = Record.Single().DomValue;
}
}
Ride ID being an identity.
So then one could say that if another class like say we had multiple parks where rides were, could be public class SelectedPark it would have the same logic, but in the Retriever.GetAll<vwParkList>(); The ride list is now the park list. And so on.
I can't tell if this is quickly going to get out of hand if I say had 50 separate types of retrieving that needed to be done. Granted, I won't as I know the total scope of this program, but WHAT IF.
I've seen stuff like the repo pattern as well, but I can't tell if that's somewhat of a waste of time or not. I can't tell what I am getting out of it. This seemed to keep it generic enough that I am not writing the context in a million different places.

Use C# Linq Lambda to combine fields from two objects into one, preferably without anonymous objects

I have a class setup like this:
public class Summary
{
public Geometry geometry { get; set; }
public SummaryAttributes attributes { get; set; }
}
public class SummaryAttributes
{
public int SERIAL_NO { get; set; }
public string District { get; set; }
}
public class Geometry
{
public List<List<List<double>>> paths { get; set; }
}
and i take a json string of records for that object and cram them in there like this:
List<Summary> oFeatures = reportObject.layers[0].features.ToObject<List<Summary>>();
my end goal is to create a csv file so i need one flat List of records to send to the csv writer i have.
I can do this:
List<SummaryAttributes> oAtts = oFeatures.Select(x => x.attributes).ToList();
and i get a nice List of the attributes and send that off to csv. Easy peasy.
What i want though is to also pluck a field off of the Geometry object as well and include that in my final List to go to csv.
So the final List going to the csv writer would contain objects with all of the fields from SummaryAttributes plus the first and last double values from the paths field on the Geometry object (paths[0][0][first] and paths[0][0][last])
It's hard to explain. I want to graft two extra attributes onto the original SummaryAttributes object.
I would be ok with creating a new SummaryAttributesXY class with the two extra fields if that's what it takes.
But i'm trying to avoid creating a new anonymous object and having to delimit every field in the SummaryAttributes class as there are many more than i have listed in this sample.
Any suggestions?

You can select new anonymous object with required fields, but you should be completely sure that paths has at least one item in each level of lists:
var query = oFeatures.Select(s => new {
s.attributes.SERIAL_NO,
s.attributes.District,
First = s.geometry.paths[0][0].First(), // or [0][0][0]
Last = s.geometry.paths[0][0].Last()
}).ToList()

Got it figured out. I include the X and Y fields in the original class definition. When the json gets deserialized they will be null. Then i loop back and fill them in.
List<Summary> oFeatures = reportObject.layers[0].features.ToObject<List<Summary>>();
List<Summary> summary = oFeatures.Select(s =>
{
var t = new Summary
{
attributes = s.attributes
};
t.attributes.XY1 = string.Format("{0} , {1}", s.geometry.paths[0][0].First(), s.geometry.paths[0][1].First());
t.attributes.XY2 = string.Format("{0} , {1}", s.geometry.paths[0][0].Last(), s.geometry.paths[0][1].First());
return t;
}).ToList();
List<SummaryAttributes> oAtts = summary.Select(x => x.attributes).ToList();

Update List of classes with data from a list of classes

I have a class:
public class DataMember {
public string ID{ get; set; }
public List<string> Versions { get; set; }
}
And another class:
public class MasterDataMember {
public string ID { get; set; }
public List<string> FoundVersions { get; set; }
}
I store both sets of data in a Cache as:
List<DataMember> datamembers
List<MasterDataMember> masterdatamembers
When originally built, the MasterDataMember is a list of partial "versions". These versions need to be confirmed and found in the list of DataMember's.
How can I update masterdatamembers with the confirmed versions found in datamembers?
(this code block is untested but it illustrates what I'm trying to do)
foreach (MasterDataMember item in masterdatamembers) {
List<string> confirmedvers = new List<string>();
foreach(string rawver in item.FoundVersions ){
foreach(DataMember checkitem in datamembers){
foreach (string confirmedver in checkitem.Versions) {
if (rawver.Contains(confirmedver)) {
confirmedvers.Add(confirmedver);
}
}
}
}
item.FoundVersions = vers;
}
Is there a LINQ that can accomplish this a lot easier, faster (I've already tried lots of ideas, iterations)?
Speed is the key here since both lists can be hundreds to thousands long.
Thank you in advance!

foreach (MasterDataMember item in masterdatamembers) {
IEnumerable<string> confirmedvers = item.FoundVersions.Where(rawver => rawver.Any(confirmedver => datamembers.Any(checkitem => checkitem.Versions.Contains(rawver)));
}
HOLY crap bro that was confusing as hell for me!
Awesome mind experiment though!

If speed really is your primary concern because of large lists, then you'll want to use hash table constructs. Using LINQ is slick, but won't necessarily make things faster (or clearer) for you. What you really need is to use the proper collection type.
Assumptions made for the code that follows:
datamembers cache cannot have duplicate DataMember entries (where more than one entry has the same ID).
masterdatamembers cache cannot have duplicate MasterDataMember entries (where more than one entry has the same ID).
In both DataMember and MasterDataMember, the Versions and FoundVersions lists cannot have duplicate version entries.
Algorithm Description
I still feel that your code block doesn't quite reflect your intent. And unfortunately, as a result, I think you got wrong answers.
This is the algorithm I followed, based on trying to interpret your intended result:
For each master data member, update its FoundVersions set (or list) by only keeping the versions in the list that can also be found in the matching data member's Versions set (or list). If no matching data member is found, then I assume you want the master data members FoundVersions set (or list) to be emptied, as none of the versions can be confirmed.
Implementation
Notice that I replaced a few uses of List<T> with Dictionary<K, V> or HashSet<T> where it would benefit performance. Of course, I am assuming that your lists can become large as you said. Otherwise, the performance will be similar as simple lists.
Your 2 classes, (notice the change in types):
public class DataMember
{
public string ID { get; set; }
public HashSet<string> Versions { get; set; } // using hashset is faster here.
}
public class MasterDataMember
{
public string ID { get; set; }
public HashSet<string> FoundVersions { get; set; } // used HashSet for consistency, but for the purposes of the algorithm, a List can still be used here if you want.
}
Your cached data, (notice the change to a Dictionary):
Dictionary<string, DataMember> datamembers; // using a Dictionary here, where your key is the DataMember's ID, is your fastest option.
List<MasterDataMember> masterdatamembers; // this can stay as a list if you want.
And finally, the work is done here:
foreach (var masterDataMember in masterdatamembers)
{
DataMember dataMember;
if (datamembers.TryGetValue(masterDataMember.ID, out dataMember))
{
HashSet<string> newSet = new HashSet<string>();
foreach (var version in masterDataMember.FoundVersions)
{
if (dataMember.Versions.Contains(version))
{
newSet.Add(version);
}
}
masterDataMember.FoundVersions = newSet;
}
else
{
masterDataMember.FoundVersions.Clear();
}
}

Your code will look like something like this in Linq
masterDataMembers.ForEach(q=>q.FoundVersions = (from rawver in q.FoundVersions from checkitem in dataMembers from confirmedver in checkitem.Versions where rawver.Contains(confirmedver) select confirmedver).ToList());

Side effects on collection items or return a new collection?

Let's say I have a WriteItem class that looks like this:
public class WriteItem
{
public string Name { get; set; }
public object Value { get; set; }
public int ResultCode { get; set; }
public string ErrorMessage { get; set;}
}
I need to process each item and set its ResultCode and ErrorMessage properties and I though about defining a method similar to this:
public void ProcessItems(WriteItemCollection items)
{
foreach(var item in items)
{
// Process each item and set its result.
}
}
The processing of each item is done by another class.
Is this the best way to do it?
Or is it better to have the method return a collection of a custom Result class?

Both options have their advantages and disadvantages. Both are "fine" in the sense that there is nothing wrong with them and they are commonly used in C#.
Option 1 has the big advantage of being simple and easy. You can even keep a reference to a WriteItem instance and check its status after processing.
Option 2 has a clearer separation of concerns: In Option 1, you need to add comments to your WriteItem class to define which are "input" and which are "output" properties. Option 2 does not need that. In addition, Option 2 allows you to make WriteItem and ProcessingResult immutable, which is a nice property.
Option 2 is also more extensible: If you want to process something else than WriteItems (with the same return options), you can define a class
class ProcessingResult<T>
{
public T Item { get; set; }
public int ResultCode { get; set; }
public string ErrorMessage { get; set; }
}
and use it as ProcessingResult<WriteItem> as well as ProcessingResult<SomeOtherItem>.

What you wrote will work. You can modify the object properties without having side effects while iterating in the collection.
I wouldn't return a new collection unless you need to keep a copy of the original collection untouched.

I think it all comes down to readability.
When you call ProcessItems, is it obvious that the collection has changed? If you call the method like this:
var items = GetItemsFromSomewhere();
ProcessItems(items);
versus calling it like this:
var items = GetItemsFromSomewhere();
items = ProcessItems(items);
or simply changing your methodname:
var items = GetItemsFromSomewhere();
items = UpdateItemStatuses(items);
In the end there is no right answer to this question in my book. You should do what feels right for your application. And consider: what if another developer was looking at this piece of code? Can he surmise what is happening here, or would he have to dive into the ProcessItems-function to get the gist of the application.

It is better to return a new results class.
why?
As others have said you are modifying the collection and its not really clear. But for me this is not the main reason. You can have processes which modify objects.
For me its because you have had to add extra properties to your WriteItem object in order to support the processor. This in effect creates a strong coupling between the model and a processor which should not exist.
Consider you have another method ProcessItems_ForSomeOtherPurpose(List<WriteItem> items) do you expand your ResultCode int to have more meaningful values? do you add another property ResultCode_ForSomeOtherPurpose? what if you need to process the same item mutiple times with multiple processors?
I would give your Model an Id. Then you can log multiple processes against it
eg.
item 1 - loaded
item 1 - picking failed!
item 1 - picked
item 1 - delivered

Parsing large text file and storing it in generic list - out of memory issue

Following requirements:
Text file with approx. 250MB (around 2.5 million lines)
parse each line of the text-file (the structure is one line is a primary record and then there are x lines of continuation-records, then one primary record line again and x lines of continuation records, and so on)
if possible (and if it's avoidable), I don't want to use any databases
Why do I use all these lines in memory?
Hm good question - I´m honest, I don't want to use a database. LINQ is very fast and I can do all what I can do in the database, too. Further, due to the amount of different exports (after parsing the data-file) it has a positive performance effect, too. Not sure if I get the same performance with a database.
But now the issue - it´s calling OOM (out of memory)
Ok, here are is short code-snippet of an example class:
abstract class BaseClass {
public string Version { get; set; }
}
class PrimaryRecord : BaseClass {
public string PName { get; set; }
public PrimaryRecord PRecord;
public ContRecord CRecord;
public PrimaryRecord() {
CRecord = new ContRecord();
}
}
class ContRecord : BaseClass {
public string CName { get; set; }
public List<ContRecord> ContRecords { get; set; }
public ContRecord() {
ContRecords = new List<ContRecord>();
}
}
Now, the process of parsing the text-file is as follows:
Read the file line by line and figure out, if it's a new "packages of primary + x continuation records". If yes, store the primary line into a List. All following lines (1..*) in the List, which (as you can see) is a property of the PrimaryRecord. So far so good, theoretically ...
The result is: this construct runs into an OOM and I guess it is due to the List<> and the large amount of instances of the PrimaryRecord class (and sub-instances of the ContRecord class). The memory is growing exorbitantly and I have no clue why. One word about the properties, the classes have more than one property - per class 5-10.
Any idea what I'm doing wrong? Or in other words: does anyone have a better idea how I can parse the file and handle the structure in the memory in a more efficient way?

Two suggestions :
Call the constructor of List if you know how many items will be in the list (read on MSDN how memory is allocated for List)
Try to use a LinkedList instead of a List

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Quicker way to separate lists - c#

Related

Entity Framework, how to make this more generic to write less code

Use C# Linq Lambda to combine fields from two objects into one, preferably without anonymous objects

Update List of classes with data from a list of classes

Side effects on collection items or return a new collection?

Parsing large text file and storing it in generic list - out of memory issue

Categories

Resources