Removing duplicate collection strings in memory - c#

I am working on a hypothetical question. One of them being that if there are duplicate string collections in memory, how would I get about removing the duplicates while maintaining the original order or the collections?

try something like this
List<String> stringlistone = new List<string>() { "Hello", "Hi" };
List<String> stringlisttwo = new List<string>() { "Hi", "Bye" };
IEnumerable<String> distinctList = stringlistone.Concat(stringlisttwo).Distinct(StringComparer.OrdinalIgnoreCase);
List<List<String>> listofstringlist = new List<List<String>>() { stringlistone, stringlisttwo };
IEnumerable<String> distinctlistofstringlist = listofstringlist.SelectMany(x => x).Distinct(StringComparer.OrdinalIgnoreCase);
its depends on how you join the lists but it should give you a idea, added the ordinal ignore case in case you wanted the destinct list to treat "hi" and "Hi" as the same
you can also just call the distinct so if you did
List<String> stringlistone = new List<string>() { "Hi", "Hello", "Hi" };
stringlistone = stringlistone.Distinct(StringComparer.OrdinalIgnoreCase);
stringlistone would be a list with stringlistone[0] == "Hi" and stringlistone[1] == "Hello"

Don't worry about it. Framework does not create duplicate string in memory. All pointers with same string value points to same location in memory.

Say you have a List<List<string>> that you read from a file or database (so they're not already interned) and you want no duplicate strings, you can use this code:
public void FoldStrings(List<List<string>> stringCollections)
{
var interned = new Dictionary<string,string> ();
foreach (var stringCollection in stringCollections)
{
for (int i = 0; i < stringCollection.Count; i++)
{
string str = stringCollection[i];
string s;
if (interned.TryGetValue (str, out s))
{
// We already have an instance of this string.
stringCollection[i] = s;
}
else
{
// First time we've seen this string... add to hashtable.
interned[str]=str;
}
}
}
}

Related

How to search with LINQ entites having string[] array properties and find these containing any of string from string[] array?

With EF Core 2.2 I am having entities with string[] array properties, where in ApplicationDbContext they are retreived with:
modelBuilder.Entity<FruitBasket>()
.Property(e => e.FruitTypes)
.HasConversion(
v => string.Join(',', v),
v => v.Split(',', StringSplitOptions.RemoveEmptyEntries));
For example an entity mat contain in FruitType column an strning array: {"Apple", "Banana", "Orange"} saved in the database as: Apple,Banana,Orange
I am trying to find in my DB all objects containing any of string from my input string, lets say any of:
string[] BasketSearchedFruitTypes = new string[] { "Apple", "Grapefruit", "Pineaple" }
My IQueryable:
IQueryable<BasketModel> baskets = GetBasketsQueryable(); //BasketModel contains FruitType string[] prop
To search for entities I have right now LINQ that says:
if (search.BasketSearchedFruitTypes != null && search.BasketSearchedFruitTypes.Length != 0)
baskets = baskets
.Where(data => search.BasketSearchedFruitTypes
.Any(x => data.FruitType
.Contains(x)));
Unfortunatelly it returns me nothing and I ran out of ideas.
EDIT 1:
after using expression:
baskets = baskets
.Where(data => search.BasketSearchedFruitTypes
.Any(x => data.FruitType
.Contains(x)));
when I try take it to the List<>, I am getting ArgumentNullException. Also I am not able to use foreach, .Count() on it. Same I have with:
var result = baskets.Where(data => search.BasketSearchedFruitTypes.Intersect(data.FruitType).Any();
EDIT 2:
I just noted, that foreach loop goes through returned IQueryable, but at some point breaks giving ArgumentNullException. Even try catch inside of the loop does not help...
EDIT 3:
Actually when I place foreach of returned IQueryable into try catch, it is kind of temporary solution and it works fine. But still I do not understand why it crashes on enumerating (looping, not code inside of the loop).
If I make a list similar at your DB protocol Than this codes work for me.
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApp2
{
class BuilderClass
{
List<BasketModel> baskets;
public BuilderClass()
{
baskets = new List<BasketModel>()
{ new BasketModel { FruitType = new string[] { "Apple", "Grapefruit", "Pineaple", "Bing Cherry", "Cantaloupe" } },
new BasketModel { FruitType = new string[] { "Grapefruit", "Cantaloupe", "Pineaple", "Boysenberries", "Apple" } },
new BasketModel { FruitType = new string[] { "Clementine", "Bing Cherry", "Boysenberries", "Cantaloupe", "Entawak" } },
new BasketModel { FruitType = new string[] { "Entawak", "Grapefruit", "Apple", "Pineaple", "Cantaloupe" } },
new BasketModel { FruitType = new string[] { "Apple", "Pineaple", "Bing Cherry", "Entawak", "Grapefruit" } }
};
}
string[] BasketSearchedFruitTypes = new string[]
{ "Apple", "Grapefruit", "Pineaple" };
public void check()
{
var qbaskets = baskets.AsQueryable();
if (BasketSearchedFruitTypes != null && BasketSearchedFruitTypes.Length != 0)
{
var result = qbaskets.Where(data => BasketSearchedFruitTypes.Any(x => data.FruitType.Contains(x))).ToList();
// result have list with count of 4
}
}
}
class BasketModel
{
public string[] FruitType { get; set; }
}
}

how do I extract a specific substring from a string in c# using a loop?

I am trying to extract just the route number from a response I get from a web server. The response looks like this;
[{"Description":"METRO Blue Line","ProviderID":"8","Route":"901"},{"Description":"METRO Green Line","ProviderID":"8","Route":"902"},
All I need is to get the route numbers so I can populate a combobox with them. I am trying to use a loop as there are quite a few. My current solution gets the first route number, but then for some reason I only get the provider number after that.This is what I have so far.
//get bus routes and popluate the busRoutecmb
restClient.endPoint = routeUrl + formatLine;
string response = restClient.request();
//show me what was returned for debug puposes
System.Diagnostics.Debug.Write(response);
//sort through data and put relevent item in a list
List<string> responseItems = new List<string>();
//splitting into routes
string[] splitByRoute = response.Split('}');
//extracting route number from elements in splitByRoute
List<string> extractedRouteNums = new List<string>();
foreach (string thing in splitByRoute)
{
//splitting each bus route up by piece of information
string[] splitByWord = thing.Split(',');
//getting rid of everything but the route number
int length = splitByWord.Length;
int count = 2;
while (count <= length)
{
string[] word = splitByWord[count].Split(':');
string routeNum = word[1].Trim('"');
count += 3;
extractedRouteNums.Add(routeNum);
System.Diagnostics.Debug.WriteLine(count);
System.Diagnostics.Debug.WriteLine(routeNum);
}
}
//add repsonse to busRoutecmb
busRoutecmb.DataSource = extractedRouteNums;
}
Gratzy is right about this being JSON, but I would propose you use JSON.NET to deserialize it.
var items = JsonConvert.DeserializeObject<JArray>(response);
var routeNumbers = items.Select(i => i.Value<string>("Route")).ToList();
You could also use http://json2csharp.com/ to produce a strongly-typed model, and deserialize to that model type if you prefer.
public class RootObject
{
public string Description { get; set; }
public string ProviderID { get; set; }
public string Route { get; set; }
}
var items = JsonConvert.DeserializeObject<RootObject[]>(response);
var routeNumbers = items.Select(i => i.Route).ToList();
What your getting back is basically a JSON string which is just an array of name value pairs or a Dictionary List.
[{
"Description": "METRO Blue Line",
"ProviderID": "8",
"Route": "901"
},
{
"Description": "METRO Green Line",
"ProviderID": "8",
"Route": "902"
}]
You can deserialize that string into a List in several ways one is to use System.Web.Script.Serialization.JavaScriptSerializer or JSON.NET. Once in a List you can query that list and return just the route key,Value pair.
var data = "[{ Description:\"METROBlueLine\",ProviderID:\"8\",Route:\"901\"},{Description:\"METRO Green Line\",ProviderID:\"8\",Route:\"902\"}]";
var ser = new System.Web.Script.Serialization.JavaScriptSerializer();
var mylist = ser.Deserialize<List<Dictionary<string,string>>>(data);
//or JSON.net
var mylist = Newtonsoft.Json.JsonConvert.DeserializeObject<Dictionary<string, string>>(json);
var routes = mylist.SelectMany(a => a).Where(c => c.Key == "Route").ToList();
foreach (var route in routes)
Console.Write(route);
output
[Route, 901][Route, 902]
If you really only want the values then
var routesonly = routes.Select(r => r.Value).ToList();

Object data being overwritten after creation

I am self teaching myself C# and was hoping somebody could point out what it is that I am doing wrong. I am attempting to iterate through some XML data and create objects when I get a match.
My sequence of events are
Using the foreach, iterate through until i see a specific data match
When I see a pattern1 match clear down my lists in readiness to populate them
From now on everytime we see a certain pattern match update the lists
When I see a pattern5 match create the object with the populated lists
The foreach continues
Keep iterating through until we see a pattern1 match again
Repeat from step 2
My object gets created with the populated lists in Step 4 but is subsequently overwritten when we repeat in step 2.
class XMLData {
public static List<Device> Search(XElement XE)
{
//local variables
bool DeviceCreated = false;
List<Device> Devices = new List<Device>();
string Out = "";
List<int> List1 = new List<int>();
List<int> List2 = new List<int>();
List<int> List3 = new List<int>();
IEnumerable<XElement> Logic =
from LL in XE.Descendants("Text")
select LL;
foreach (XElement XML in Logic)
{
//Regex Patterns
string pattern1 = #"(?=O\()[^\)]+(?<=S)";
string pattern2 = #"(?=O\()[^\)]+(?<=I)";
string pattern3 = #"(?=O\()[^\)]+(?<=FA)";
string pattern4 = #"(?=X\()[^\)]+(?<=F)";
string pattern5 = #"(?=O\()[^\)]+(?<=L)";
string pattern6 = #"(?=O\()[^\)]+(?<=FT).+?(?<=)";
MatchCollection All = Common.Find(XML);
if (Regex.Match(XML.Value, pattern1).Success)
{
//Clear down data ready to create a new device
DeviceCreated = false;
List1.Clear();
List2.Clear();
List3.Clear();
List1 = Common.Find(All);
}
else if (Regex.Match(XML.Value, pattern2).Success)
{
List2 = Common.Find(All);
}
else if (Regex.Match(XML.Value, pattern3).Success)
{
List3 = Common.Find(All);
}
else if (Regex.Match(XML.Value, pattern5).Success)
{
// create a device when we see this pattern as we should now have all of the data in the lists
if (!DeviceCreated)
{
Devices.Add(new Device(List1, List2, List3));
DeviceCreated = true;
}
}
else
{
//nothing
}
}
return Devices;
}
}
When you do List1 = Common.Find(All), List2 = Common.Find(All) etc, it simply overwrites the existing list.
Do an append, or in C# terms, an AddRange():
List1.AddRange(Common.Find(All));

C# compare fields from different lines in csv

I am trying to compare the value in the 0 index of an array on one line and the 0 index on the following line. Imagine a CSV where I have a unique identifier in the first column, a corresponding value in the second column.
USER1, 1P
USER1, 3G
USER2, 1P
USER3, 1V
I would like to check the value of [0] the next line (or previous if that's easier) to compare and if they are the same (as they are in the example) concatenate it to index 1. That is, the data should read as
USER1, 1P, 3G
USER2, 1P
USER3, 1V
before it gets passed onto the next function. So far I have
private void csvParse(string path)
{
using (TextFieldParser parser = new TextFieldParser(path))
{
parser.Delimiters = new string[] { "," };
while (!parser.EndOfData)
{
string[] parts = parser.ReadFields();
if (parts == null)
{
break;
}
contact.ContactId = parts[0];
long nextLine;
nextLine = parser.LineNumber+1;
//if line1 parts[0] == line2 parts[0] etc.
}
}
}
Does anyone have any suggestions? Thank you.
How about saving the array into a variable:
private void csvParse(string path)
{
using (TextFieldParser parser = new TextFieldParser(path))
{
parser.Delimiters = new string[] { "," };
string[] oldParts = new string[] { string.Empty };
while (!parser.EndOfData)
{
string[] parts = parser.ReadFields();
if (parts == null || parts.Length < 1)
{
break;
}
if (oldParts[0] == parts[0])
{
// concat logic goes here
}
else
{
contact.ContactId = parts[0];
}
long nextLine;
nextLine = parser.LineNumber+1;
oldParts = parts;
//if line1 parts[0] == line2 parts[0] etc.
}
}
}
If I understand you correctly, what you are asking is essentially "how do I group the values in the second column based on the values in the first column?".
A quick and quite succinct way of doing this would be to Group By using LINQ:
var linesGroupedByUser =
from line in File.ReadAllLines(path)
let elements = line.Split(',')
let user = new {Name = elements[0], Value = elements[1]}
group user by user.Name into users
select users;
foreach (var user in linesGroupedByUser)
{
string valuesAsString = String.Join(",", user.Select(x => x.Value));
Console.WriteLine(user.Key + ", " + valuesAsString);
}
I have left out the use of your TextFieldParser class, but you can easily use that instead. This approach does, however, require that you can afford to load all of the data into memory. You don't mention whether this is viable.
The easiest way to do something like this is to convert each line to an object. You can use CsvHelper, https://www.nuget.org/packages/CsvHelper/, to do the work for you or you can iterate each line and parse to an object. It is a great tool and it knows how to properly parse CSV files into a collection of objects. Then, whether you create the collection yourself or use CsvHelper, you can use Linq to GroupBy, https://msdn.microsoft.com/en-us/library/bb534304(v=vs.100).aspx, your "key" (in this case UserId) and Aggregate, https://msdn.microsoft.com/en-us/library/bb549218(v=vs.110).aspx, the other property into a string. Then, you can use the new, grouped by, collection for your end goal (write it to file or use it for whatever you need).
You're basically finding all the unique entries so put them into a dictionary with the contact id as the key. As follows:
private void csvParse(string path)
{
using (TextFieldParser parser = new TextFieldParser(path))
{
parser.Delimiters = new string[] { "," };
Dictionary<string, List<string>> uniqueContacts = new Dictionary<string, List<string>>();
while (!parser.EndOfData)
{
string[] parts = parser.ReadFields();
if (parts == null || parts.Count() != 2)
{
break;
}
//if contact id not present in dictionary add
if (!uniqueContacts.ContainsKey(parts[0]))
uniqueContacts.Add(parts[0],new List<string>());
//now there's definitely an existing contact in dic (the one
//we've just added or a previously added one) so add to the
//list of strings for that contact
uniqueContacts[parts[0]].Add(parts[1]);
}
//now do something with that dictionary of unique user names and
// lists of strings, for example dump them to console in the
//format you specify:
foreach (var contactId in uniqueContacts.Keys)
{
var sb = new StringBuilder();
sb.Append($"contactId, ");
foreach (var bit in uniqueContacts[contactId])
{
sb.Append(bit);
if (bit != uniqueContacts[contactId].Last())
sb.Append(", ");
}
Console.WriteLine(sb);
}
}
}

How to remove duplicates from List<string> without LINQ? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Remove duplicates from a List<T> in C#
i have a List like below (so big email list):
source list :
item 0 : jumper#yahoo.com|32432
item 1 : goodzila#yahoo.com|32432|test23
item 2 : alibaba#yahoo.com|32432|test65
item 3 : blabla#yahoo.com|32432|test32
the important part of each item is email address and the other parts(separated with pipes are not important) but i want to keep them in final list.
as i said my list is to big and i think it's not recommended to use another list.
how can i remove duplicate emails (entire item) form that list without using LINQ ?
my codes are like below :
private void WorkOnFile(UploadedFile file, string filePath)
{
File.SetAttributes(filePath, FileAttributes.Archive);
FileSecurity fSecurity = File.GetAccessControl(filePath);
fSecurity.AddAccessRule(new FileSystemAccessRule(#"Everyone",
FileSystemRights.FullControl,
AccessControlType.Allow));
File.SetAccessControl(filePath, fSecurity);
string[] lines = File.ReadAllLines(filePath);
List<string> list_lines = new List<string>(lines);
var new_lines = list_lines.Select(line => string.Join("|", line.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries)));
List<string> new_list_lines = new List<string>(new_lines);
int Duplicate_Count = 0;
RemoveDuplicates(ref new_list_lines, ref Duplicate_Count);
File.WriteAllLines(filePath, new_list_lines.ToArray());
}
private void RemoveDuplicates(ref List<string> list_lines, ref int Duplicate_Count)
{
char[] splitter = { '|' };
list_lines.ForEach(delegate(string line)
{
// ??
});
}
EDIT :
some duplicate email addrresses in that list have different parts ->
what can i do about them :
mean
goodzila#yahoo.com|32432|test23
and
goodzila#yahoo.com|asdsa|324234
Thanks in advance.
say you have a list of possible duplicates:
List<string> emailList ....
Then the unique list is the set of that list:
HashSet<string> unique = new HashSet<string>( emailList )
private void RemoveDuplicates(ref List<string> list_lines, ref int Duplicate_Count)
{
Duplicate_Count = 0;
List<string> list_lines2 = new List<string>();
HashSet<string> hash = new HashSet<string>();
foreach (string line in list_lines)
{
string[] split = line.Split('|');
string firstPart = split.Length > 0 ? split[0] : string.Empty;
if (hash.Add(firstPart))
{
list_lines2.Add(line);
}
else
{
Duplicate_Count++;
}
}
list_lines = list_lines2;
}
The easiest thing to do is to iterate through the lines in the file and add them to a HashSet. HashSets won't insert the duplicate entries and it won't generate an exception either. At the end you'll have a unique list of items and no exceptions will be generated for any duplicates.
1 - Get rid of your pipe separated string (create an dto class corresponding to the data it's representing)
2 - which rule do you want to apply to select two object with the same id ?
Or maybe this code can be useful for you :)
It's using the same method as the one in #xanatos answer
string[] lines= File.ReadAllLines(filePath);
Dictionary<string, string> items;
foreach (var line in lines )
{
var key = line.Split('|').ElementAt(0);
if (!items.ContainsKey(key))
items.Add(key, line);
}
List<string> list_lines = items.Values.ToList();
First, I suggest to you load the file via stream.
Then, create a type that represent your rows and load them into a HashSet(for
performance considerations).
Look (Ive removed some of your code to make it simple):
public struct LineType
{
public string Email { get; set; }
public string Others { get; set; }
public override bool Equals(object obj)
{
return this.Email.Equals(((LineType)obj).Email);
}
}
private static void WorkOnFile(string filePath)
{
StreamReader stream = File.OpenText(filePath);
HashSet<LineType> hashSet = new HashSet<LineType>();
while (true)
{
string line = stream.ReadLine();
if (line == null)
break;
string new_line = string.Join("|", line.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries));
LineType lineType = new LineType()
{
Email = new_line.Split('|')[3],
Others = new_line
};
if (!hashSet.Contains(lineType))
hashSet.Add(lineType);
}
}

Categories

Resources