Tricky LINQ query for text file format transformation - c#

I have a shopping list in a text file, like this:
BuyerId Item;
1; Item1;
1; Item2;
1; ItemN;
2; Item1;
2; ItemN;
3; ItemN;
I need to transform this list to a format like this:
Item1; Item2; Item3; ...; ItemN <--- For buyer 1
Item1; ...; ItemN <--- For buyer 2
Item1; ...; ItemN <--- For buyer 3
First I parse the CSV file like this:
IList<string[]> parsedcsv = (from line in lines.Skip(1)
let parsedLine = line.TrimEnd(';').Split(';')
select parsedLine).ToList();
Then I group the items with LINQ and aggregate them to the final format:
IEnumerable<string> buyers = from entry in parsedcsv
group entry by entry[0] into cart
select cart.SelectMany(c => c.Skip(1))
.Aggregate((item1, item2) =>
item1 + ";" + item2).Trim();
HOWEVER, as it happens, the BuyerId is not unique, but repeats after a number of times (for example, it can repeat like this: 1,2,3,4,5,1,2,3,4,5,1,2,3 or like this 1,2,3,1,2,3,1,2).
No big deal, I could quite easily fix this by grouping the items in a loop that checks that I only deal with one buyer at a time:
int lastBatchId = 0;
string currentId = parsedcsv[0][0];
for (int i = 0; i < parsedcsv.Count; i++)
{
bool last = parsedcsv.Count - 1 == i;
if (parsedcsv[i][0] != currentId || last)
{
IEnumerable<string> buyers = from entry in parsedcsv.Skip(lastBatchId)
.Take(i - lastBatchId + (last ? 1 : 0))
...
lastBatchId = i;
currentId = parsedcsv[i][0];
...
... however, this is not the most elegant solution. I'm almost certain this can be done only with LINQ.
Can anyone help me out here please ?
Thanks!

You should have a look at GroupAdjacent.

I'm not sure this is the best solution, but you said you want a pure Linq answer, so here you have it:
var result = from r in (
from l in lines.Skip(1)
let data = l.Split(new string[]{";"," "},
StringSplitOptions.RemoveEmptyEntries)
select new { Id = data.First(), Item = data.Skip(1).First() })
.Aggregate(new
{
Rows = Enumerable.Repeat(new
{
Id = string.Empty,
Items = new List<string>()
}, 1).ToList(),
LastID = new List<string>() { "" }
},
(acc, x) =>
{
if (acc.Rows[0].Id == string.Empty)
acc.Rows.Clear();
if (acc.LastID[0] != x.Id)
acc.Rows.Add(new
{
Id = x.Id,
Items = new List<string>()
});
acc.Rows.Last().Items.Add(x.Item);
acc.LastID[0] = x.Id;
return acc;
}
).Rows
select new
{
r.Id,
Items = string.Join(";", from x in r.Items
select x)
};
I wrote it pretty fast and it could be improved, I don't like it particularly because it resorts to a couple of tricks, but it's pure Linq and could be a starting point.

Related

In a LINQ with a select can I compare forward to the next row and decide what to select?

What I implemented with a for loop is this:
phraseSources2 = new List<PhraseSource2>();
for (int i = 0; i < phraseSources.Count; i++)
{
var ps = phraseSources[i];
if (i != phraseSources.Count - 1)
{
var psNext = phraseSources[i + 1];
if (psNext != null &&
ps.Kanji == psNext.Kanji &&
ps.Kana == psNext.Kana &&
ps.English.Length <= psNext.English.Length)
{
i++;
ps = phraseSources[i];
}
} else
{
ps = phraseSources[i];
}
phraseSources2.Add(new PhraseSource2()
{
Kanji = ps.Kanji,
Kana = ps.Kana,
Furigana = ps.Furigana,
English = ps.English,
});
}
Previously I had been using LINQ
phraseSources2 = (List<Data1.Model.PhraseSource2>)phraseSources
.Select(x => new PhraseSource2()
{
Kanji = x.Kanji,
Kana = x.Kana,
Furigana = x.Furigana,
English = x.English,
}).ToList();
I know LINQ can do a lot but can it look forward at the next row when doing a select?
If I understand your problem correcly I wouldn't "look forward" but use GroupBy instead and group by Kanji and Kana then Select the longest English as the value in the PhraseSource2 object.
Something like this:
var phraseSource2 = phraseSources
.GroupBy(x => new {Kanji = x.Kanji, Kana = x.Kana})
.Select(g => new PhraseSource2 {
Kanji = g.Key.Kanji,
Kana = g.Key.Kana,
Furigana = g.First().Furigana,
English = g.OrderByDescending(x => x.English.Length).First().English
});
If the source collection can be accessed by index than you can use an overload to the select which gives you the current index.
var source = new[] { 'a', 'b', 'c' };
var result = source.Select((x, i) => new { Current = x, Next = source.Length > i+1 ? source[i+1] : ' '});
All you have to do is just set up a variable inside a query where you can easily retrieve next or previous value like this:
phraseSources2 = (List<Data1.Model.PhraseSource2>)phraseSources
.Select((x, y) =>
var NextKanji = (List<Data1.Model.PhraseSource2>)phraseSources.Skip(y + 1).FirstOrDefault().Kanji;
new PhraseSource2()
{
Kanji = NextKanji,
Kana = x.Kana,
Furigana = x.Furigana,
English = x.English,
}).ToList();
If you want to check some conditions before, you can do it like this:
phraseSources2 = (List<Data1.Model.PhraseSource2>)phraseSources
.Where((x, y) =>
var NextEnglish = (List<Data1.Model.PhraseSource2>)phraseSources.Skip(y + 1).FirstOrDefault().English;
x.English.Length < NextEnglish.Length)
.Select(x =>
new PhraseSource2()
{
Kanji = x.Kanji,
Kana = x.Kana,
Furigana = x.Furigana,
English = x.English,
}).ToList();
There is no built-in method, but there are third-party libraries that offer this functionality. The MoreLinq is a respected and free .NET library that offers a WindowLeft extension method, that processes a sequence into a series of subsequences representing a windowed subset of the original. So you could use it to process your phraseSources in pairs, and discard the pairs that have two equal phrases. Finally select the first phrase of the pairs that survived.
using static MoreLinq.Extensions.WindowLeftExtension;
var phraseSources2 = phraseSources
.WindowLeft(size: 2)
.Where(phrases => // phrases is of type IList<PhraseSource2>
{
if (phrases.Count == 2) // All have size 2 except from the last
{
var ps = phrases[0];
var psNext = phrases[1];
return ps.Kanji != psNext.Kanji || ps.Kana != psNext.Kana ||
ps.English.Length > psNext.English.Length;
}
else // The last is a single phrase
{
return true;
}
})
.Select(window => window[0]) // Select the first phrase
.ToList();

Rename List item when there is the same string multiple time

I have List of names like:
var list = new List<string> {"Allan", "Michael", "Jhon", "Smith", "George", "Jhon"};
and a combobox which itemssource is my list. As you can see in the list there is Jhon 2 times, what I want is when I put those name into combobox add "2" to second Jhon. I mean when I open the combobox names in it shoud look like:
Allan
Michael
Jhon
Smith
George
Jhon2
I have tired linq to do that but I'm quite new to c#/linq. Could someone show me simple way to do that?
I would do this:
var result = list.Take(1).ToList();
for (var i = 1; i < list.Count; i++)
{
var name = list[i];
var count = list.Take(i - 1).Where(n => n == name).Count() + 1;
result.Add(count < 2 ? name : name + count.ToString());
}
Here is what I would do:
First off, separate the list into two smaller ones, one that contains all the unique names, and one that contains only duplicates:
var duplicates = myList.GroupBy(s => s)
.SelectMany(grp => grp.Skip(1));
var unique = new HashSet<string>(myList).ToList();
Then process:
var result = new List<string>();
foreach (string uniqueName in unique)
{
int index=2;
foreach (string duplicateName in duplicates.Where(dupe => dupe == uniqueName))
{
result.Add(string.Format("{0}{1}", duplicateName, index.ToString()));
index++;
}
}
What we are doing here is the following:
Iterate through unique names.
Initialize a variable index with value 2. This will be the number we add at the end of each name.
Iterate through matching duplicate names.
Modify the name string by adding the number stored at index to the end.
Add this new value to the results list.
Increment index.
Finally, add the unique names back in:
result.AddRange(unique);
The result list should now contain all the same values as the original myList, only difference being that all names that appear more than once have a number appended to their end. Per your specification, there is no name name1. Instead, counting starts from 2.
Another possibility:
var groups = list.Select((name, index) => new { name, index }).GroupBy(s => s.name).ToList();
foreach (var group in groups.Where(g => g.Count() > 1))
{
foreach (var entry in group.Skip(1).Select((g, i) => new { g, i }))
{
list[entry.g.index] = list[entry.g.index] + entry.i;
}
}
Someone might be able to give a more efficient answer, but this does the job.
The dictionary keeps track of how many times a name has been repeated in the list. Each time a new name in the list is encountered, it is added to the dictionary and is added as is to the new list. If the name already exists in the dictionary (with the key check), instead, the count is increased by one in the dictionary and this name is added to the new list with the count (from the dictionary value corresponding to the name as the key) appended to the end of the name.
var list = new List<string> {"Allan", "Michael", "Jhon", "Smith", "George", "Jhon", "George", "George"};
Dictionary<string, int> dictionary = new Dictionary<string,int>();
var newList = new List<string>();
for(int i=0; i<list.Count();i++){
if(!dictionary.ContainsKey(list[i])){
dictionary.Add(list[i], 1);
newList.Add(list[i]);
}
else{
dictionary[list[i]] += 1;
newList.Add(list[i] + dictionary[list[i]]);
}
}
for(int i=0; i<newList.Count(); i++){
Console.WriteLine(newList[i]);
}
Output:
Allan
Michael
Jhon
Smith
George
Jhon2
George2
George3
Check this solution:
public List<string> AddName(IEnumerable<string> list, string name)
{
var suffixSelector = new Regex("^(?<name>[A-Za-z]+)(?<suffix>\\d?)$",
RegexOptions.Singleline);
var namesMap = list.Select(n => suffixSelector.Match(n))
.Select(x => new {name = x.Groups["name"].Value, suffix = x.Groups["suffix"].Value})
.GroupBy(x => x.name)
.ToDictionary(x => x.Key, x => x.Count());
if (namesMap.ContainsKey(name))
namesMap[name] = namesMap[name] + 1;
return namesMap.Select(x => x.Key).Concat(
namesMap.Where(x => x.Value > 1)
.SelectMany(x => Enumerable.Range(2, x.Value - 1)
.Select(i => $"{x.Key}{i}"))).ToList();
}
It handle case when you already has 'Jhon2' in the list
I would do
class Program
{
private static void Main(string[] args)
{
var list = new List<string> { "Allan", "Michael", "Jhon", "Smith", "George", "Jhon" };
var duplicates = list.GroupBy(x => x).Select(r => GetTuple(r.Key, r.Count()))
.Where(x => x.Count > 1)
.Select(c => { c.Count = 1; return c; }).ToList();
var result = list.Select(v =>
{
var val = duplicates.FirstOrDefault(x => x.Name == v);
if (val != null)
{
if (val.Count != 1)
{
v = v + " " + val.Count;
}
val.Count += 1;
}
return v;
}).ToList();
Console.ReadLine();
}
private static FooBar GetTuple(string key, int count)
{
return new FooBar(key, count);
}
}
public class FooBar
{
public int Count { get; set; }
public string Name { get; set; }
public FooBar(string name, int count)
{
Count = count;
Name = name;
}
}

how to arrange the item of a list into series arrangement

I have a list of data which contains of random data with combination of string and number:
List<String> Data1 = new List<String>()
{
"1001A",
"1002A",
"1003A",
"1004A",
"1015A",
"1016A",
"1007A",
"1008A",
"1009A",
};
I want this data to arrange into series like this:
1001A - 1004A, 1007A - 1009A, 1015A, 1016A
for every more than 2 counts of data series the output shall be have "-" between the first count and the last count of series, the other non series data will be just added to the last part and all together will separated by ",".
I'd already made some codes only to arrange the data series by the last char of it:
string get_REVISIONMARK = "A";
var raw_serries = arrange_REVISIONSERIES.Where(p => p[p.Length - 1].ToString() == get_REVISIONMARK) .OrderBy(p => p[p.Length - 1) .ThenBy(p => p.Substring(0, p.Length - 1)).ToList();
just ignore the last char I'd already have function for that, and my problem only about the arrangement of the numbers, the length of data is not fixed. for other example of output "1001A - 1005A, 301A, 32A"
I had another sample of my codes this works fine to me, but for me its so lazy code.
for (int c1 = 0; c1 < list_num.Count; c1++)
{
if (list_num[c1] != 0)
{
check1 = list_num[c1];
for (int c2 = 0; c2 < list_num.Count; c2++)
{
if (check1 == list_num[c2])
{
list_num[c2] = 0;
check1 += 1;
list_series.Add(arrange_REVISIONSERIES[c2]);
}
}
check1 = 0;
if (list_series.Count > 2)
{
res_series.Add(list_series[0] + " to " +list_series[list_series.Count - 1]);
list_series.Clear();
}
else
{
if (list_series.Count == 1)
{
res_series.Add(list_series[0]);
list_series.Clear();
}
else
{
res_series.Add(list_series[0] + "," + list_series[1]);
list_series.Clear();
}
}
}
}
var combine_res = String.Join(",", res_series);
MessageBox.Show(combine_res);
this codes work fine for the series number ...
A possible solution (working with current set of values), Please follow the steps below
Declare a class level string list as
public List<String> data_result = new List<string>();
Create a function to iterate through input string list (input string declared inside, named 'data')
public void ArrangeList()
{
List<String> data = new List<string>() { "1001A", "1002A", "1003A",
"1004A", "1015A", "1016A", "1007A", "1008A", "1009A", "1017A" };
List<int> data_int = data.Select(a => Convert.ToInt32(a.Substring(0,
a.Length - 1))).OrderBy(b => b).ToList();
int initializer = 0, counter = 0;
int finalizer = 0;
foreach (var item in data_int)
{
if (initializer == 0)
{ initializer = item; continue; }
else
{
counter++;
if (item == initializer + counter)
finalizer = item;
else
{
LogListing(initializer, finalizer);
initializer = item;
finalizer = item;
counter = 0;
}
}
}
LogListing(initializer, finalizer);
}
Create a function which just logs the result into data_result string list.
public void LogListing(int initializer, int finalizer)
{
if (initializer != finalizer)
{
if (finalizer == initializer + 1)
{
data_result.Add(initializer + "A");
data_result.Add(finalizer + "A");
}
else
data_result.Add(initializer + "A - " + finalizer + "A");
}
else
data_result.Add(initializer + "A");
}
It perfectly generates the result list as
Thumb-up if you like
A linqy solution:
char get_REVISIONMARK = 'A';
var res = arrange_REVISIONSERIES.Select(s => new { Rev = s[s.Length - 1], Value = int.Parse(s.Substring(0, s.Length - 1)), Org = s })
.Where(d => d.Rev == get_REVISIONMARK).OrderBy(d => d.Value)
.Select((val, ind) => new { Index = ind, Org = val.Org, Value = val.Value }).GroupBy(a => a.Value - a.Index)
.Select(gr=>gr.ToList()).OrderBy(l=>l.Count > 2 ? 0 : 1 ).Aggregate(new List<string>(), (list, sublist) =>
{
if (sublist.Count > 2)
list.Add(sublist[0].Org + " - " + sublist[sublist.Count - 1].Org);
else
list.AddRange(sublist.Select(a => a.Org));
return list;
});
The first lines are basically the same as the code you already have (filter on revision and sort), but with the difference that the subvalues are stored in an anonymous type. You could do the same on the pre ordered list, but since splitting the string would be done twice I've included it in the total.
Then a select with index (.Select((val, ind) =>) is made to get value/index pairs. This is done to be able to get the sequences based on an old t-sql row_number trick: for each 'group' the difference between value and index is the same .GroupBy(a => a.Value - a.Index)
After that, normally you'd be as good as done, but since you only want to make sequences of 2 and longer, we make sublists out of the groupby values and do the ordering beforehand to make sure the ranges come for the eventual single elements .Select(gr=>gr.ToList()).OrderBy(l=>l.Count > 2 ? 0 : 1 )
Finally, the list is created of the groups. Several options, but I like to use Aggregate for that. The seed is the resulting list, and the aggregate simply adds to that (where subranges > 2 are cummulated and for single elements and pairs, the single elements are added)
I'm making two assumptions:
The list is already ordered
The non-numeric characters can be ignored
You will get the results in the results variable:
void Main()
{
List<String> Data1 = new List<String>()
{
"1001A",
"1002A",
"1003A",
"1004A",
"1015A",
"1016A",
"1007A",
"1008A",
"1009A",
};
var accu = new List<List<Tuple<int, string>>>();
foreach (var data in Data1)
{
if (accu.Any(t => t.Any(d => d.Item1 == (ToInt(data) - 1))))
{
var item = accu.First(t => t.Any(d => d.Item1 == (ToInt(data) - 1)));
item.Add(new Tuple<int, string>(ToInt(data), data));
}
else
{
accu.Add(new List<Tuple<int, string>>{ new Tuple <int, string>(ToInt(data), data)});
}
}
var results = new List<string>();
results.AddRange(accu.Where(g => g.Count > 2).Select(g => string.Format("{0} - {1}", g.First().Item2, g.Last().Item2)));
results.AddRange(accu.Where(g => g.Count <= 2).Aggregate(new List<string>(), (total, current) => { total.AddRange(current.Select(i => i.Item2)); return total; } ));
}
private static Regex digitsOnly = new Regex(#"[^\d]");
public static int ToInt(string literal)
{
int i;
int.TryParse(digitsOnly.Replace(literal, ""), out i);
return i;
}
So given your starting data:
List<String> arrange_REVISIONSERIES = new List<String>()
{
"1001A",
"1002A",
"1003A",
"1004A",
"1015A",
"1016A",
"1007A",
"1008A",
"1009A",
};
I do this first:
var splits =
arrange_REVISIONSERIES
.Select(datum => new
{
value = int.Parse(datum.Substring(0, datum.Length - 1)),
suffix = datum.Substring(datum.Length - 1, 1),
})
.OrderBy(split => split.suffix)
.ThenBy(split => split.value)
.ToArray();
That's basically the same as your raw_serries, but orders the number part as a number. It seems to me that you need it as a number to make the range part work.
I then do this to compute the groupings:
var results =
splits
.Skip(1)
.Aggregate(
new[]
{
new
{
start = splits[0].value,
end = splits[0].value,
suffix = splits[0].suffix
}
}.ToList(),
(a, s) =>
{
if (a.Last().suffix == s.suffix && a.Last().end + 1 == s.value)
{
a[a.Count - 1] = new
{
start = a.Last().start,
end = s.value,
suffix = s.suffix
};
}
else
{
a.Add(new
{
start = s.value,
end = s.value,
suffix = s.suffix
});
}
return a;
})
.Select(r => r.start == r.end
? String.Format("{0}{1}", r.end, r.suffix)
: (r.start + 1 == r.end
? String.Format("{0}{2}, {1}{2}", r.start, r.end, r.suffix)
: String.Format("{0}{2} - {1}{2}", r.start, r.end, r.suffix)))
.ToArray();
And finally, this to create a single string:
var result = String.Join(", ", results);
That gives me:
1001A - 1004A, 1007A - 1009A, 1015A, 1016A
This code nicely works with data containing different suffixes.

Replace a part of an element in a list of strings

I have a list which contains some strings like below:
List<String> l = new List<String>(){
"item1 1",
"item2 2",
"item3 3",
"item1 4",
"item1 5",
"item3 6"};
I would like to sum the items which are the same. Example:
l = {"item1 10", "item2 2", "item3 9"}
I've tried this:
List<String> result = new List<String>();
for (int i = 0; i < total.Count; i++)
{
for (int j = 0; j < i; j++)
{
int diferenta = 0;
if (total[i].Substring(0, total[i].IndexOf(" ")).Equals(total[j].Substring(0, total[j].IndexOf(" "))))
{
diferenta = int.Parse(ExtractNumber(total[i].Substring(total[i].IndexOf(" ")))) + int.Parse(ExtractNumber(total[j].Substring(total[j].IndexOf(" "))));
total[i] = total[i].Replace(ExtractNumber(total[i].Substring(total[i].IndexOf(" "))), diferenta.ToString());
result.Add(total[i]);
}
}
And to get the distinct elements:
List<String> final = result.Distinct().toList();
My way is not correct at all so i want to ask you for help.
You can split each element, group by the first component, then sum the the second components up:
var groupQuery = l.Select(x => x.Split(new[] { ' ' })).GroupBy(x => x[0]);
var sumQuery = groupQuery.Select(x => new { x.Key, Total = x.Select(elem => int.Parse(elem[1])).Sum() });
foreach (var total in sumQuery)
{
Console.WriteLine("{0}: {1}", total.Key, total.Total);
}
This code obviously omits a bunch of error checking (what happens if a string doesn't split, or doesn't have a second component that can be parsed?), but that can be added in without too much difficulty.
List<string> outputList =
inputList.GroupBy(s => s.Split(' ')[0])
.Select(g.Key + " " + g.Sum(s => int.Parse(s.Split(' ')[1])).ToString());
Hooray for LINQ! :)
Note: There is no error trapping, and I am assuming the data is always correct. I have not tested the code for performance or errors.
Without handling bad data(for instance not splitted by a white-space etc.).
int sumTotal = (from i in l
let parts = i.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)
let id = parts[0]
let count = int.Parse(parts[1])
group count by id into Numbers
where Numbers.Count() != 1
select Numbers.Sum()).Sum();
Edit: Haven't seen that you want to count every item even if it has no duplicate. That's even easier, you just need to remove where CountGroup.Count() != 1 from the query :)
So the complete LINQ query including handling data in wrong format:
int number=0;
int sum = (from i in l
let parts = i.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)
where parts.Length==2
let id = parts[0]
let isInt = int.TryParse(parts[1], out number)
where isInt
group number by id into Numbers
select Numbers.Sum()).Sum();
Iterate over the list. For each item, split on " ". Add the prefix (splits[0]) to a map of (String:Integer). If string exists in map, then retrieve its value first, then add current item value (split[1]), then add the sum back to map.
At the end your map will have sum for each string. Just iterate through the map items to output it.

If I'm projecting with linq and not using a range variable what is the proper syntax?

I have a query that sums and aggregates alot of data something like this:
var anonType = from x in collection
let Cars = collection.Where(c=>c.Code == "Cars")
let Trucks = collection.Where(c=>c.Code == "Trucks")
select new {
Total = collection.Sum(v=>v.Amount),
CarValue = Cars.Sum(v=>v.Amout),
TruckValue = Trucks.Sum(v=>v.Amount),
CarCount = Cars.Count(),
TruckCount = Trucks.Count()
};
I find it really weird that I have to declare the range variable x, especially if I'm not using it. So, am I doing something wrong or is there a different format I should be following?
I could be wrong, but from your usage, I don't think you want to do a traditional query expression syntax query with your collection anyway, as it appears you are only looking for aggregates. The way you have it written, you would be pulling multiple copies of the aggregated data because you're doing it for each of the items in the collection. If you wished, you could split your query like this (sample properties thrown in)
var values = collection.Where(c => c.Code == "A");
var anonType = new
{
Sum = values.Sum(v => v.Amount),
MinimumStartDate = values.Min(v => v.StartDate),
Count = values.Count()
};
You declare a range variable no matter the looping construct:
foreach(var x in collection)
or
for(var index = 0; index < collection.Count; index++)
or
var index = 0;
while(index < collection.Count)
{
//...
index++;
}
Queries are no different. Just don't use the variable, it doesn't hurt anything.
So, am I doing something wrong?
Your query is not good. For each element in the collection, you are enumerating the collection 5 times (cost = 5*n^2).
Is there a different format I should be following?
You could get away with enumerating the collection 5 times (cost = 5n).
IEnumerable<X> cars = collection.Where(c => c.Code == "Cars");
IEnumerable<X> trucks = collection.Where(c => c.Code == "Trucks");
var myTotals = new
{
Total = collection.Sum(v => v.Amount),
CarValue = cars.Sum(v => v.Amount),
TruckValue = trucks.Sum(v => v.Amount,
CarCount = cars.Count(),
TruckCount = trucks.Count()
};

Categories

Resources