How can I sort the file txt line 5000000? - c#

i've got a disordered file with 500000 line which its information and date are like the following :
for instance desired Result
------------ ---------------
723,80 1,4
14,50 1,5
723,2 10,8
1,5 14,50
10,8 723,2
1,4 723,80
Now how can i implement such a thing ?
I've tried the sortedList and sorteddictionary methods but there is no way for implemeting a new value in the list because there are some repetative values in the list.
I'd appreciate it if u suggest the best possible method .
One more thing , i've seen this question but this one uses the class while i go with File!
C# List<> Sort by x then y

var result = File.ReadAllLines("...filepath...")
.Select(line => line.Split(','))
.Select(parts => new
{
V1 = int.Parse(parts[0]),
V2 = int.Parse(parts[1])
})
.OrderBy(v => v.V1)
.ThenBy(v => v.V2)
.ToList();
Duplicates will be handled properly by default. If you want to remove them, add .Distinct() somewhere, for example after ReadAllLines.

You need to parse the file into an object defined by a class. Once it's in the object, you can start to sort it.
public class myObject
{
public int x { get; set; }
public int y { get; set; }
}
Now once you get the file parsed into a list of objects, you should be able to do something like the following:
var myList = new List<myObject>(); //obviously, you should have parsed the file into the list.
var sortedList = myList.OrderBy(l => l.x).ThenBy(l => l.y).ToList();

First, sort each row so that they are in the correct order (e.g [723,80] - > [80,723]
Then sort all rows using a comparison something like this:
int Compare(Tuple<int,int> lhs, Tuple<int,int> rhs)
{
int res = lhs.Item1.CompareTo(rhs.Item1)
if(res == 0) res=lhs.Item2.CompareTo(rhs.Item2);
return res;
}

Related

Using LINQ and lambdas to search Dictionary>Class>List>Struct data

given the following data source:
public struct Strc
{
public decimal A;
public decimal B;
// more stuff
}
public class CLASS
{
public List<Strc> listStrc = new List<Strc>();
// other stuff
}
Dictionary<string, CLASS> dict = new Dictionary<string, CLASS>();
I need to collect all the Strc.B in the dictionary, provided Strc.A is e.g > 3.
I get the result doing the following:
List<decimal> results = (
from v in dizS.Values
from ls in v.listStr
where ls.A > 3
select ls.B
).ToList();
I was also trying to write it using lambdas, but I fail miserably...
var res = dict.Values.Where(x => x.listStrc.Any(z => z.A > 3))
this is as far as I could get, but I do not manage to select then the .B data...
What do I do wrong? (given I did anything right in the first place :D)
Thanks for your time.
You're essentially flattening to a sequence of the struct values - and that flattening is represented with SelectMany. So you want:
var res = dict.Values
.SelectMany(x => x.listSrc)
.Where(ls => ls.A > 3)
.Select(ls => ls.B);
This is basically equivalent to your query expression, but your attempted method calls suggest trying to get a different result, where if any of the entries of listSrc have an A value greater than 3, you'd want all of the B values from that listSrc. Hopefully the former is what you really want, but it's worth thinking about that carefully.
You can try this
var result = dict.Values.SelectMany(x=>x.listStrc.Where(l=>l.A>3)).Select(x=>x.B);

C# List.OrderBy with multiple lists

I got 5 lists. One is containing the date of release and the others are the attributes of that list but seperated in multiple lists.
List<string> sortedDateList = x1.OrderBy(x => x).ToList();
This code is sorting the list with the oldest date first, like it should. But I also want to sort (sync) the other attributes list, because they need the same index as the date.
How can I realize that? I'm new to Linq-methods.
You could use the .Zip() method to combine the lists as described here. You could combine them into a class or an anonymous type and then sort them.
int[] numbers = { 1, 2, 3, 4 };
string[] words = { "one", "two", "three" };
var numbersAndWords = numbers.Zip(words, (first, second) => new { Num = first, Word = second });
var sorted = numbersAndWords.OrderBy(x => x.Num).ToList();
Alternately, if you can guarantee that all the lists are of the same length (or just grab the shortest list) you could use the following instead of the .Zip() extension.
var numbersAndWords = numbers.Select((number, i) => new { Num = number, Word = words[i], Foo = myFoos[i] }); // Where myFoos is another collection.
And in the lambda combine all the items from the separate lists into an object at the same time by accessing the collection by index. (Avoids multiple use of .Zip()) Of course, if you try to access an index that is larger than the list size you will get an IndexOutOfRangeException.
As far as I understand your question, you have different lists containing properties of certain objects. You should definitely look into storing all data into one list of a class of your making, where you consolidate all separate information into one object:
var list = new List<YourClass>
{
new YourClass
{
Date = ...,
OtherProperty = ...,
},
new YourClass
{
Date = ...,
OtherProperty = ...,
},
};
var ordered = list.OrderBy(o => o.Date);
But if you insist in storing different properties each in their own list, then you could to select the dates with their index, then sort that, as explained in C# Sorting list by another list:
var orderedDates = list.Select((n, index) => new { Date = n, Index = index })
.OrderBy(x => x.Date)
.ToList();
Then you can use the indexes of the sorted objects to look up the properties in the other lists, by index, or sort them on index as explained in C# Sort list while also returning the original index positions?, Sorting a list and figuring out the index, and so on.
It almost sounds like you want 1 list of a class.
public class MyClass{
public string Date{get; set;} //DateTime is a better type to use for dates by the way
public string Value2{get; set;}
public string Value3{get; set;}
public string Value4{get; set;}
public string Value5{get; set;}
}
...
var sortedDateList = x1.OrderBy(x => x.Date).ToList()
Create an Object containing the date and attributes:
public class DateWithAttributes
{
public string Date {get;set;}
public Attribute Attribute1 {get;set;}
public Attribute Attribute2 {get;set;}
...
}
List<DateWithAttributes> DateWithAttributesList = new List<DateWithAttributes>()
{
DateWithAttribute1,
DateWithAttribute2
}
List<DateWithAttributes> sortedDateList = DateWithAttributesList.OrderBy(x => x.date).ToList();
If you want to keep the lists separate, and/or create the ordered versions as separate lists, then you can concatenate the index to the dates and sort by dates, then use the sorted indexes:
var orderedIndexedDateOfReleases = dateOfReleases.Select((d, i) => new { d, i }).OrderBy(di => di.d);
var orderedDateOfReleases = orderedIndexedDateOfReleases.Select(di => di.d).ToList();
var orderedMovieNames = orderedIndexedDateOfReleases.Select(di => movieNames[di.i]).ToList();
If you don't mind the result being combined, you can create a class or use an anonymous class, and again sort by the dates:
var orderedTogether = dateOfReleases.Select((d, i) => new { dateOfRelease = d, movieName = movieNames[i] }).OrderBy(g => g.dateOfRelease).ToList();

Sorting a list of objects based on another

public class Product
{
public string Code { get; private set; }
public Product(string code)
{
Code = code;
}
}
List<Product> sourceProductsOrder =
new List<Product>() { new Product("BBB"), new Product("QQQ"),
new Product("FFF"), new Product("HHH"),
new Product("PPP"), new Product("ZZZ")};
List<Product> products =
new List<Product>() { new Product("ZZZ"), new Product("BBB"),
new Product("HHH")};
I have two product lists and I want to reorder the second one with the same order as the first.
How can I reorder the products list so that the result would be : "BBB", "HHH", "ZZZ"?
EDIT: Changed Code property to public as #juharr mentioned
You would use IndexOf:
var sourceCodes = sourceProductsOrder.Select(s => s.Code).ToList();
products = products.OrderBy(p => sourceCodes.IndexOf(p.Code));
The only catch to this is if the second list has something not in the first list those will go to the beginning of the second list.
MSDN post on IndexOf can be found here.
You could try something like this
products.OrderBy(p => sourceProductsOrder.IndexOf(p))
if it is the same Product object. Otherwise, you could try something like:
products.OrderBy(p => GetIndex(sourceProductsOrder, p))
and write a small GetIndex helper method. Or create a Index() extension method for List<>, which would yield
products.OrderBy(p => sourceProductsOrder.Index(p))
The GetIndex method is rather simple so I omit it here.
(I have no PC to run the code so please excuse small errors)
Here is an efficient way to do this:
var lookup = sourceProductsOrder.Select((p, i) => new { p.Code, i })
.ToDictionary(x => x.Code, x => x.i);
products = products.OrderBy(p => lookup[p.Code]).ToList();
This should have a running time complexity of O(N log N), whereas an approach using IndexOf() would be O(N2).
This assumes the following:
there are no duplicate product codes in sourceProductsOrder
sourceProductsOrder contains all of the product codes in products
you make the Code field/property non-private
If needed, you can create a safeguard against the first bullet by replacing the first statement with this:
var lookup = sourceProductsOrder.GroupBy(p => p.Code)
.Select((g, i) => new { g.Key, i })
.ToDictionary(x => x.Key, x => x.i);
You can account for the second bullet by replacing the second statement with this:
products = products.OrderBy(p =>
lookup.ContainsKey(p.Code) ? lookup[p.Code] : Int32.MaxValue).ToList();
And you can use both if you need to. These will slow down the algorithm a bit, but it should continue to have an O(N log N) running time even with these alterations.
I would implement a compare function that does a lookup of the order from sourceProductsOrder using a hash table. The lookup table would look like
(key) : (value)
"BBB" : 1
"QQQ" : 2
"FFF" : 3
"HHH" : 4
"PPP" : 5
"ZZZ" : 6
Your compare could then lookup the order of the two elements and do a simple < (pseudo code):
int compareFunction(Product a, Product b){
return lookupTable[a] < lookupTable[b]
}
Building the hash table would be linear and doing the sort would generally be nlogn
Easy come easy go:
IEnumerable<Product> result =
products.OrderBy(p => sourceProductsOrder.IndexOf(sourceProductsOrder.FirstOrDefault(p2 => p2.Code == p.Code)));
This will provide the desired result. Objects with ProductCodes not available in the source list will be placed at the beginning of the resultset. This will perform just fine for a couple of hundred of items I suppose.
If you have to deal with thousands of objects than an answer like #Jon's will likely perform better. There you first create a kind of lookup value / score for each item and then use that for sorting / ordering.
The approach I described is O(n2).

Linq, select List Item where column is the max value

I basically have a List that has a few columns in it. All I want to do is select whichever List item has the highest int in a column called Counted.
List<PinUp> pinned= new List<PinUp>();
class PinUp
{
internal string pn { get; set; }
internal int pi{ get; set; }
internal int Counted { get; set; }
internal int pp{ get; set; }
}
So basically I just want pinned[whichever int has highested Count]
Hope this makes sense
The problem is i want to remove this [whichever int has highested Count] from the current list. So I have to no which int it is in the array
One way, order by it:
PinUp highest = pinned
.OrderByDescending(p => p.Counted)
.First();
This returns only one even if there are multiple with the highest Counted. So another way is to use Enumerable.GroupBy:
IEnumerable<PinUp> highestGroup = pinned
.GroupBy(p => p.Counted)
.OrderByDescending(g => g.Key)
.First();
If you instead just want to get the highest Counted(i doubt that), you just have to use Enumerable.Max:
int maxCounted = pinned.Max(p => p.Counted);
Update:
The problem is i want to remove this [whichever int has highested Count] from the current list.
Then you can use List(T).RemoveAll:
int maxCounted = pinned.Max(p => p.Counted);
pinned.RemoveAll(p => p.Counted == maxCounted);
var excludingHighest = pinned.OrderByDescending(x => x.Counted)
.Skip(1);
If you need need to have a copy of the one being removed and still need to remove it you can do something like
var highestPinned = pinned.OrderByDescending(x => x.Counted).Take(1);
var excludingHighest = pinned.Except(highestPinned);
You can order it:
var pin = pinned.OrderByDescending(p => p.Counted).FirstOrDefault();
// if pin == null then no elements found - probably empty.
If you want to remove, you don't need an index:
pinned.Remove(pin);
it is a sorting problem.
Sort your list by Counted in descending order and pick the first item.
Linq has a way to do it:
var highest = pinned.OrderByDescending(p => p.Counted).FirstOrDefault();
Try the following:
PinUp pin = pinned.OrderByDescending(x => x.Counted).First();

Average extension method in Linq for default value

Anyone know how I can set a default value for an average? I have a line like this...
dbPlugins = (from p in dbPlugins
select new { Plugin = p, AvgScore = p.DbVersions.Average(x => x.DbRatings.Average(y => y.Score)) })
.OrderByDescending(x => x.AvgScore)
.Select(x => x.Plugin).ToList();
which throws an error becase I have no ratings yet. If I have none I want the average to default to 0. I was thinking this should be an extension method where I could specify what the default value should be.
There is: DefaultIfEmpty.
I 'm not sure about what your DbVersions and DbRatings are and which collection exactly has zero items, but this is the idea:
var emptyCollection = new List<int>();
var average = emptyCollection.DefaultIfEmpty(0).Average();
Update: (repeating what's said in the comments below to increase visibility)
If you find yourself needing to use DefaultIfEmpty on a collection of class type, remember that you can change the LINQ query to project before aggregating. For example:
class Item
{
public int Value { get; set; }
}
var list = new List<Item>();
var avg = list.Average(item => item.Value);
If you don't want to/can not construct a default Item with Value equal to 0, you can project to a collection of ints first and then supply a default:
var avg = list.Select(item => item.Value).DefaultIfEmpty(0).Average();
My advice would to create a reusable solution instead of a solution for this problem only.
Make an extension method AverageOrDefault, similar to FirstOrDefault. See extension methods demystified
public static class MyEnumerableExtensions
{
public static double AverageOrDefault(this IEnumerable<int> source)
{
// TODO: decide what to do if source equals null: exception or return default?
if (source.Any())
return source.Average();
else
return default(int);
}
}
There are 9 overloads of Enumerable.Average, so you'll need to create an AverageOrDefault for double, int?, decimal, etc. They all look similar.
Usage:
// Get the average order total or default per customer
var averageOrderTotalPerCustomer = myDbContext.Customers
.GroupJoin(myDbContext.Orders,
customer => customer.Id,
order => order.CustomerId,
(customer, ordersOfThisCustomer) => new
{
Id = customer.Id,
Name = customer.Name,
AverageOrder = ordersOfThisCustomer.AverageOrDefault(),
});
I don't think there's a way to select default, but how about this query
dbPlugins = (from p in dbPlugins
select new {
Plugin = p, AvgScore =
p.DbVersions.Any(x => x.DbRatings) ?
p.DbVersions.Average(x => x.DbRatings.Average(y => y.Score)) : 0 })
.OrderByDescending(x => x.AvgScore)
.Select(x => x.Plugin).ToList();
Essentially the same as yours, but we first ask if there are any ratings before averaging them. If not, we return 0.

Categories

Resources