How to remove duplicates from List<string> without LINQ? [duplicate] - c#

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Remove duplicates from a List<T> in C#
i have a List like below (so big email list):
source list :
item 0 : jumper#yahoo.com|32432
item 1 : goodzila#yahoo.com|32432|test23
item 2 : alibaba#yahoo.com|32432|test65
item 3 : blabla#yahoo.com|32432|test32
the important part of each item is email address and the other parts(separated with pipes are not important) but i want to keep them in final list.
as i said my list is to big and i think it's not recommended to use another list.
how can i remove duplicate emails (entire item) form that list without using LINQ ?
my codes are like below :
private void WorkOnFile(UploadedFile file, string filePath)
{
File.SetAttributes(filePath, FileAttributes.Archive);
FileSecurity fSecurity = File.GetAccessControl(filePath);
fSecurity.AddAccessRule(new FileSystemAccessRule(#"Everyone",
FileSystemRights.FullControl,
AccessControlType.Allow));
File.SetAccessControl(filePath, fSecurity);
string[] lines = File.ReadAllLines(filePath);
List<string> list_lines = new List<string>(lines);
var new_lines = list_lines.Select(line => string.Join("|", line.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries)));
List<string> new_list_lines = new List<string>(new_lines);
int Duplicate_Count = 0;
RemoveDuplicates(ref new_list_lines, ref Duplicate_Count);
File.WriteAllLines(filePath, new_list_lines.ToArray());
}
private void RemoveDuplicates(ref List<string> list_lines, ref int Duplicate_Count)
{
char[] splitter = { '|' };
list_lines.ForEach(delegate(string line)
{
// ??
});
}
EDIT :
some duplicate email addrresses in that list have different parts ->
what can i do about them :
mean
goodzila#yahoo.com|32432|test23
and
goodzila#yahoo.com|asdsa|324234
Thanks in advance.

say you have a list of possible duplicates:
List<string> emailList ....
Then the unique list is the set of that list:
HashSet<string> unique = new HashSet<string>( emailList )

private void RemoveDuplicates(ref List<string> list_lines, ref int Duplicate_Count)
{
Duplicate_Count = 0;
List<string> list_lines2 = new List<string>();
HashSet<string> hash = new HashSet<string>();
foreach (string line in list_lines)
{
string[] split = line.Split('|');
string firstPart = split.Length > 0 ? split[0] : string.Empty;
if (hash.Add(firstPart))
{
list_lines2.Add(line);
}
else
{
Duplicate_Count++;
}
}
list_lines = list_lines2;
}

The easiest thing to do is to iterate through the lines in the file and add them to a HashSet. HashSets won't insert the duplicate entries and it won't generate an exception either. At the end you'll have a unique list of items and no exceptions will be generated for any duplicates.

1 - Get rid of your pipe separated string (create an dto class corresponding to the data it's representing)
2 - which rule do you want to apply to select two object with the same id ?

Or maybe this code can be useful for you :)
It's using the same method as the one in #xanatos answer
string[] lines= File.ReadAllLines(filePath);
Dictionary<string, string> items;
foreach (var line in lines )
{
var key = line.Split('|').ElementAt(0);
if (!items.ContainsKey(key))
items.Add(key, line);
}
List<string> list_lines = items.Values.ToList();

First, I suggest to you load the file via stream.
Then, create a type that represent your rows and load them into a HashSet(for
performance considerations).
Look (Ive removed some of your code to make it simple):
public struct LineType
{
public string Email { get; set; }
public string Others { get; set; }
public override bool Equals(object obj)
{
return this.Email.Equals(((LineType)obj).Email);
}
}
private static void WorkOnFile(string filePath)
{
StreamReader stream = File.OpenText(filePath);
HashSet<LineType> hashSet = new HashSet<LineType>();
while (true)
{
string line = stream.ReadLine();
if (line == null)
break;
string new_line = string.Join("|", line.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries));
LineType lineType = new LineType()
{
Email = new_line.Split('|')[3],
Others = new_line
};
if (!hashSet.Contains(lineType))
hashSet.Add(lineType);
}
}

Related

If statement with multiple variables ending with a number [duplicate]

This question already has answers here:
How do I check for null or empty string for many arguments? - C#
(5 answers)
Closed 2 years ago.
Variables:
private string filePath1 = null;
private string filePath2 = null;
private string filePath3 = null;
private string filePath4 = null;
private string filePath5 = null;
private string filePath6 = null;
private string filePath7 = null;
private string filePath8 = null;
private string filePath9 = null;
private string filePath10 = null;
Current If statement
if (string.IsNullOrEmpty(filePath1))
{
errors.Add("File Not Attached");
}
if (string.IsNullOrEmpty(filePath2))
{
errors.Add("File Not Attached");
}
....
Question:
Instead of having multiple if statements, for each variable. How can I create 1 if statement to go through all these variables?
Something like this:
if (string.IsNullOrEmpty(filePath + range(1 to 10))
{
errors.Add("File Not Attached");
}
You can achieve this using Reflection. This is obviously discouraged for this scenario, as the other answers provide better solutions, just wanted to show you it's doable the way you intended it to be done (which doesn't mean it's the correct way)
public class Test
{
private string filePath1 = null;
private string filePath2 = null;
private string filePath3 = null;
}
Usage:
Test obj = new Test();
//loop through the private fields of our class
foreach (var fld in obj.GetType().GetFields(BindingFlags.NonPublic | BindingFlags.Instance)
.Where(x => x.Name.StartsWith("filePath"))) // filter
{
if (string.IsNullOrEmpty(fld.GetValue(obj) as string))
{
errors.Add("File Not Attached in variable: " + fld.Name);
}
}
In nearly all cases where you're using variables with a differently numbered suffix, you should really be using a collection (array, list, ...). This is one of those cases. I'll be using a list for this answer but any collection will suffice.
private List<string> filePaths = new List<string>()
{
"path1",
"path2",
"path3",
"path4"
};
You can then use a loop to iterate over your list:
foreach (string path in filePaths)
{
if(String.IsNullOrEmpty(path))
errors.Add("File not attached");
}
Create a new arraylist, add all file paths to it (or initialise it with all filepaths) and the loop over the elements in the array (using for-each loop). For each element, check if nullOrEmpty and if yes add to your errors string.
ArrayList arrlist = new ArrayList();
arrList.add(filePath1);
arrList.add(filePath2);
arrList.add(filePath3);
arrList.add(filePath4);
arrList.add(filePath5);
arrList.add(filePath6);
arrList.add(filePath7);
arrList.add(filePath8);
arrList.add(filePath9);
arrList.add(filePath10);
foreach (string element in arrList)
{
if (string.IsNullOrEmpty(element)
{
errors.Add("File Not Attached");
}
}
ps. You might want to print a new line after each error:
errors.Add("File Not Attached\n");
// Create list
List<string> filePaths = new List<string>;
//Add path in list like
filePaths.add(filePath1);
//Check for null path here
foreach (string filepath in filePaths)
{
if (string.IsNullOrEmpty(filepath)
{
errors.Add("File Not Attached");
}
}
In order to treat all strings the same way they have to be in some collection.
using System.Linq;
...
string[] allPaths = new string[10];
// Do something with these ten paths...
if (allPaths.Any(x => string.IsNullOrEmpty(x))
errors.Add("File Not Attached");
As stated every other answers, you should use a collection.
If you really want to stick with fields names, you can use reflection, but I strongly recommend to use collections over reflection :
// using System.Reflection;
// Below code is meant to be used in a method of the class that holds the fields.
for (int i = 1; i <= 10; i++)
{
if (string.IsNullOrEmpty(this.GetType()
.GetField($"filePath{i}",
BindingFlags.NonPublic | BindingFlags.Instance)?
.GetValue(this))
{
errors.Add("File Not Attached");
}
}
If you can make those variable class fields i would vote for Innat3's Answer.
Bu if this is not possible and you can't make those variables class fields then i suggest to you do like following :
class Program
{
static void Main(string[] args)
{
Dictionary<string, int> names = new Dictionary<string,int>();
for (int i = 0; i < 10; i++)
{
names.Add(String.Format("name{0}", i.ToString()), i);
}
var xx1 = names["name1"];
var xx2 = names["name2"];
var xx3 = names["name3"];
}
}
Because in c# we can't compute dynamically variable names.
Hope this helps.

C# How to implement CSV file into this code

Hi I am fairly new to coding, I have a piece of code that searches for a string and replaces it with another string like so:
var replacements = new[]{
new{Find="123",Replace="Word one"},
new{Find="ABC",Replace="Word two"},
new{Find="999",Replace="Word two"},
};
var myLongString = "123 is a long 999 string yeah";
foreach(var set in replacements)
{
myLongString = myLongString.Replace(set.Find, set.Replace);
}
If I want to use a CSV file that contains a lot of words and their replacements, for example, LOL,Laugh Out Loud, and ROFL, Roll Around Floor Laughing. How would I implement that?
Create a text file that looks like (you could use commas, but I like pipes (|)):
123|Word One
ABC|Word Two
999|Word Three
LOL|Laugh Out Loud
ROFL|Roll Around Floor Laughing
Then create a tiny helper class:
public class WordReplace
{
public string Find { get; set; }
public string Replace { get; set; }
}
And finally, call this code:
private static string DoWordReplace()
{
//first read in the data
var fileData = File.ReadAllLines("WordReplace.txt");
var wordReplacePairs = new List<WordReplace>();
var lineNo = 1;
foreach (var item in fileData)
{
var pair = item.Split(new[] {'|'}, StringSplitOptions.RemoveEmptyEntries);
if (pair.Length != 2)
{
throw new ApplicationException($"Malformed file, line {lineNo}, data = [{item}] ");
}
wordReplacePairs.Add(new WordReplace{Find = pair[0], Replace = pair[1]});
++lineNo;
}
var longString = "LOL, 123 is a long 999 string yeah, ROFL";
//now do the replacements
var buffer = new StringBuilder(longString);
foreach (var pair in wordReplacePairs)
{
buffer.Replace(pair.Find, pair.Replace);
}
return buffer.ToString();
}
The result is:
Laugh Out Loud, Word One is a long Word Three string yeah, Roll Around Floor Laughing

C# compare fields from different lines in csv

I am trying to compare the value in the 0 index of an array on one line and the 0 index on the following line. Imagine a CSV where I have a unique identifier in the first column, a corresponding value in the second column.
USER1, 1P
USER1, 3G
USER2, 1P
USER3, 1V
I would like to check the value of [0] the next line (or previous if that's easier) to compare and if they are the same (as they are in the example) concatenate it to index 1. That is, the data should read as
USER1, 1P, 3G
USER2, 1P
USER3, 1V
before it gets passed onto the next function. So far I have
private void csvParse(string path)
{
using (TextFieldParser parser = new TextFieldParser(path))
{
parser.Delimiters = new string[] { "," };
while (!parser.EndOfData)
{
string[] parts = parser.ReadFields();
if (parts == null)
{
break;
}
contact.ContactId = parts[0];
long nextLine;
nextLine = parser.LineNumber+1;
//if line1 parts[0] == line2 parts[0] etc.
}
}
}
Does anyone have any suggestions? Thank you.
How about saving the array into a variable:
private void csvParse(string path)
{
using (TextFieldParser parser = new TextFieldParser(path))
{
parser.Delimiters = new string[] { "," };
string[] oldParts = new string[] { string.Empty };
while (!parser.EndOfData)
{
string[] parts = parser.ReadFields();
if (parts == null || parts.Length < 1)
{
break;
}
if (oldParts[0] == parts[0])
{
// concat logic goes here
}
else
{
contact.ContactId = parts[0];
}
long nextLine;
nextLine = parser.LineNumber+1;
oldParts = parts;
//if line1 parts[0] == line2 parts[0] etc.
}
}
}
If I understand you correctly, what you are asking is essentially "how do I group the values in the second column based on the values in the first column?".
A quick and quite succinct way of doing this would be to Group By using LINQ:
var linesGroupedByUser =
from line in File.ReadAllLines(path)
let elements = line.Split(',')
let user = new {Name = elements[0], Value = elements[1]}
group user by user.Name into users
select users;
foreach (var user in linesGroupedByUser)
{
string valuesAsString = String.Join(",", user.Select(x => x.Value));
Console.WriteLine(user.Key + ", " + valuesAsString);
}
I have left out the use of your TextFieldParser class, but you can easily use that instead. This approach does, however, require that you can afford to load all of the data into memory. You don't mention whether this is viable.
The easiest way to do something like this is to convert each line to an object. You can use CsvHelper, https://www.nuget.org/packages/CsvHelper/, to do the work for you or you can iterate each line and parse to an object. It is a great tool and it knows how to properly parse CSV files into a collection of objects. Then, whether you create the collection yourself or use CsvHelper, you can use Linq to GroupBy, https://msdn.microsoft.com/en-us/library/bb534304(v=vs.100).aspx, your "key" (in this case UserId) and Aggregate, https://msdn.microsoft.com/en-us/library/bb549218(v=vs.110).aspx, the other property into a string. Then, you can use the new, grouped by, collection for your end goal (write it to file or use it for whatever you need).
You're basically finding all the unique entries so put them into a dictionary with the contact id as the key. As follows:
private void csvParse(string path)
{
using (TextFieldParser parser = new TextFieldParser(path))
{
parser.Delimiters = new string[] { "," };
Dictionary<string, List<string>> uniqueContacts = new Dictionary<string, List<string>>();
while (!parser.EndOfData)
{
string[] parts = parser.ReadFields();
if (parts == null || parts.Count() != 2)
{
break;
}
//if contact id not present in dictionary add
if (!uniqueContacts.ContainsKey(parts[0]))
uniqueContacts.Add(parts[0],new List<string>());
//now there's definitely an existing contact in dic (the one
//we've just added or a previously added one) so add to the
//list of strings for that contact
uniqueContacts[parts[0]].Add(parts[1]);
}
//now do something with that dictionary of unique user names and
// lists of strings, for example dump them to console in the
//format you specify:
foreach (var contactId in uniqueContacts.Keys)
{
var sb = new StringBuilder();
sb.Append($"contactId, ");
foreach (var bit in uniqueContacts[contactId])
{
sb.Append(bit);
if (bit != uniqueContacts[contactId].Last())
sb.Append(", ");
}
Console.WriteLine(sb);
}
}
}

multiple foreach loops inside while loop

is it possible to include multiple "foreach" statements inside any of the looping constructs like while or for ... i want to open the .wav files from two different directories simultaneously so that i can compare files from both.
here is what i am trying to so but it is certainly wrong.. any help in this regard is appreciated.
string[] fileEntries1 = Directory.GetFiles(folder1, "*.wav");
string[] fileEntries2 = Directory.GetFiles(folder11, "*.wav");
while ( foreach(string fileName1 in fileEntries1) && foreach(string fileName2 in fileEntries2))
Gramatically speaking no. This is because a foreach construct is a statement whereas the tests in a while statement must be expressions.
Your best bet is to nest the foreach blocks:
foreach(string fileName1 in fileEntries1)
{
foreach(string fileName2 in fileEntries2)
I like this kind of statements in one line. So even though most of the answers here are correct, I give you this.
string[] fileEntries1 = Directory.GetFiles(folder1, "*.wav");
string[] fileEntries2 = Directory.GetFiles(folder11, "*.wav");
foreach( var fileExistsInBoth in fileEntries1.Where(fe1 => fileEntries2.Contains(fe1) )
{
/// here you will have the records which exists in both of the lists
}
Something like this since you only need to validate same file names:
IEnumerable<string> fileEntries1 = Directory.GetFiles(folder1, "*.wav").Select(x => Path.GetFileName(x));
IEnumerable<string> fileEntries2 = Directory.GetFiles(folder2, "*.wav").Select(x => Path.GetFileName(x));
IEnumerable<string> filesToIterate = (fileEntries1.Count() > fileEntries2.Count()) ? fileEntries1 : fileEntries2;
IEnumerable<string> filesToValidate = (fileEntries1.Count() < fileEntries2.Count()) ? fileEntries1 : fileEntries2;
// Iterate the bigger collection
foreach (string fileName in filesToIterate)
{
// Find the files in smaller collection
if (filesToValidate.Contains(fileName))
{
// Get actual file and compare
}
else
{
// File does not exist in another list. Handle appropriately
}
}
.Net 2.0 based solution:
List<string> fileEntries1 = new List<string>(Directory.GetFiles(folder1, "*.wav"));
List<string> fileEntries2 = new List<string>(Directory.GetFiles(folder2, "*.wav"));
List<string> filesToIterate = (fileEntries1.Count > fileEntries2.Count) ? fileEntries1 : fileEntries2;
filesToValidate = (fileEntries1.Count < fileEntries2.Count) ? fileEntries1 : fileEntries2;
string iteratorFileName;
string validatorFilePath;
// Iterate the bigger collection
foreach (string fileName in filesToIterate)
{
iteratorFileName = Path.GetFileName(fileName);
// Find the files in smaller collection
if ((validatorFilePath = FindFile(iteratorFileName)) != null)
{
// Compare fileName and validatorFilePath files here
}
else
{
// File does not exist in another list. Handle appropriately
}
}
FindFile method:
static List<string> filesToValidate;
private static string FindFile(string fileToFind)
{
string returnValue = null;
foreach (string filePath in filesToValidate)
{
if (string.Compare(Path.GetFileName(filePath), fileToFind, true) == 0)
{
// Found the file
returnValue = filePath;
break;
}
}
if (returnValue != null)
{
// File was found in smaller list. Remove this file from the list since we do not need to look for it again
filesToValidate.Remove(returnValue);
}
return returnValue;
}
You may or may not choose to make fields and methods static based on your needs.
If you want to iterate all pairs of files in both paths respectively, you can do it as follows.
string[] fileEntries1 = Directory.GetFiles(folder1, "*.wav");
string[] fileEntries2 = Directory.GetFiles(folder11, "*.wav");
foreach(string fileName1 in fileEntries1)
{
foreach(string fileName2 in fileEntries2)
{
// to the actual comparison
}
}
This is what I would suggest, using linq
using System.Linq;
var fileEntries1 = Directory.GetFiles(folder1, "*.wav");
var fileEntries2 = Directory.GetFiles(folder11, "*.wav");
foreach (var entry1 in fileEntries1)
{
var entries = fileEntries2.Where(x => Equals(entry1, x));
if (entries.Any())
{
//We have matches
//entries is a list of matches in fileentries2 for entry1
}
}
If you want to enable both collections "in parallel", then use their iterators like this:
var fileEntriesIterator1 = Directory.EnumerateFiles(folder1, "*.wav").GetEnumerator();
var fileEntriesIterator2 = Directory.EnumerateFiles(folder11, "*.wav").GetEnumerator();
while(fileEntriesIterator1.MoveNext() && fileEntriesIterator2.MoveNext())
{
var file1 = fileEntriesIterator1.Current;
var file2 = fileEntriesIterator2.Current;
}
If one collection is shorter than the other, this loop will end when the shorter collection has no more elements.

Removing duplicate collection strings in memory

I am working on a hypothetical question. One of them being that if there are duplicate string collections in memory, how would I get about removing the duplicates while maintaining the original order or the collections?
try something like this
List<String> stringlistone = new List<string>() { "Hello", "Hi" };
List<String> stringlisttwo = new List<string>() { "Hi", "Bye" };
IEnumerable<String> distinctList = stringlistone.Concat(stringlisttwo).Distinct(StringComparer.OrdinalIgnoreCase);
List<List<String>> listofstringlist = new List<List<String>>() { stringlistone, stringlisttwo };
IEnumerable<String> distinctlistofstringlist = listofstringlist.SelectMany(x => x).Distinct(StringComparer.OrdinalIgnoreCase);
its depends on how you join the lists but it should give you a idea, added the ordinal ignore case in case you wanted the destinct list to treat "hi" and "Hi" as the same
you can also just call the distinct so if you did
List<String> stringlistone = new List<string>() { "Hi", "Hello", "Hi" };
stringlistone = stringlistone.Distinct(StringComparer.OrdinalIgnoreCase);
stringlistone would be a list with stringlistone[0] == "Hi" and stringlistone[1] == "Hello"
Don't worry about it. Framework does not create duplicate string in memory. All pointers with same string value points to same location in memory.
Say you have a List<List<string>> that you read from a file or database (so they're not already interned) and you want no duplicate strings, you can use this code:
public void FoldStrings(List<List<string>> stringCollections)
{
var interned = new Dictionary<string,string> ();
foreach (var stringCollection in stringCollections)
{
for (int i = 0; i < stringCollection.Count; i++)
{
string str = stringCollection[i];
string s;
if (interned.TryGetValue (str, out s))
{
// We already have an instance of this string.
stringCollection[i] = s;
}
else
{
// First time we've seen this string... add to hashtable.
interned[str]=str;
}
}
}
}

Categories

Resources