How to remove rows from IEnumerable - c#

I'm loading CSV Files into a IEnumerable.
string[] fileNames = Directory.GetFiles(#"read\", "*.csv");
for (int i = 0; i < fileNames.Length; i++)
{
string file = #"read\" + Path.GetFileName(fileNames[i]);
var lines = from rawLine in File.ReadLines(file, Encoding.Default)
where !string.IsNullOrEmpty(rawLine)
select rawLine;
}
After that I work with the Data but now there are couple of Files that are pretty much empty and only have ";;;;;;" (the amount varies) written in there.
How can I delete those rows before working with them and without changing anything in the csv files?

If the amount of ; characters per line is variable, this is what your "where" condition should look like:
where !string.IsNullOrEmpty(rawLine) && !string.IsNullOrEmpty(rawLine.Trim(';'))
rawLine.Trim(';') will return a copy of the string with all ; characters removed. If this new string is empty, it means this line can be ignored, since it only contained ; characters.

You can't remove anything from an IEnumerable(like from a List<T>), buty ou can add a filter:
lines = lines.Where(l => !l.Trim().All(c => c == ';'));
This won't delete anything, but you won't process these lines anymore.

You can't remove rows from an enumerable - https://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx.
Instead try creating a new array with the filtered data or filter it on the where clause that you presented like:
string[] fileNames = Directory.GetFiles(#"read\", "*.csv");
for (int i = 0; i < fileNames.Length; i++)
{ string file = #"read\" + Path.GetFileName(fileNames[i]);
var lines = from rawLine in File.ReadLines(file, Encoding.Default) where !string.IsNullOrEmpty(rawLine) && rawLine != ";;;;;;" select rawLine;}

There are multiple solution.
Convert enumerable to List, then delete from List. This is bit expensive.
Create one function. // You can apply multiple filter in case required.
public IEnumrable<string> GetData(ref IEnumrable<string> data)
{
return data.Where(c=> !String.Equals(c,"<<data that you want to filter>>");
}

As another option to read CSV file is to make use of TextFieldParser class. It has CommentTokens and Delimiters which may help you on this.
Specifying ; as a CommentTokens may help you.
Tutorial

Related

Searching String List for FileName Match and Replace with new FilePath

I have got a List<string> called Filelist, it contains the full paths to files. I have another List<string> called optimizelist that also contains filepaths. I'm checking if Filelist contains the same file in optimizelist; if yes,then replace the corresponding element in Filelist with the element in optimizelist.
int x = 0;
foreach (string file in optimizelist)
{
for (int i = 0; i < Filelist.Count; i++)
{
if (Path.GetFileName(file) == Path.GetFileName(Filelist[i]))
{
Filelist.RemoveAt(x);
Filelist.Add(file);
break;
}
}
x++;
}
But the replacement is not proper and there is duplication and missing entries.
What I'm i doing wrong? Please advice.
You are changing a list that you're iterating, which may cause unexpected behavior. You can avoid surprises by iterating it backwards (because changes are done at the current index and the end of the list, so never at lower indexes).
for (int i = Filelist.Count - 1; i >= 0; i--)
Edit: you probably also want to remove the "break" statement. It prevents your code from iterating the full list.
You are removing(in the middle) and adding (to the end) while you are enumerating the list. That's not a good idea and causes this issue. I'd suggest this approach to replace files:
var fileNameLookup = optimizelist.ToLookup(f => Path.GetFileName(f));
for (int i = 0; i < Filelist.Count; i++)
{
string fileName = Path.GetFileName(Filelist[i]);
var optimizedFile = fileNameLookup[fileName].FirstOrDefault();
if(optimizedFile != null)
Filelist[i] = optimizedFile;
}
In addition to Peter M. answer: if you don't have duplicates in optimizelist you can try using dictionary:
// key - what to find (file name without directory, e.g. "abc.txt")
// value - what to substitute (full path name, e.g. "c:\test\abc.txt")
// StringComparer.OrdinalIgnoreCase - case insensitive keys, i.e. "abc.txt" == "ABC.txt"
Dictionary<string, string> substitutes = optimizelist
.ToDictionary(item => Path.GetFileName(item),
item => item,
StringComparer.OrdinalIgnoreCase);
for (int i = 0; i < Filelist.Count; i++)
// if we have a substitution (i.e. a better file path)...
if (optimizelist.TryGetValue(Path.GetFileName(Filelist[i]), out var optimalFile))
Filelist[i] = optimalFile; // <- substitute with optimalFile

Re-order elements in a string array if they match another string?

I am trying to re-order strings in an array, based on if they match a string value.
My program is getting a list of files in a directory, and then renaming and moving the files. But I need certain files with a specific name to be renamed after other files. (the files are being renamed with time stamps and are processed in that order).
Example File names:
File-302.txt
File-302-IAT.txt
File-303.txt
File-303-IAT.txt
File-304.txt
File-304-IAT.txt
File-305.txt
File-305-IAT.txt
Basically what I am trying to accomplish, is I would like to move all the files containing "-IAT" to the end if the array, so that when I loop through, the IAT files are processed after their non "IAT" partner file.
Here is my code but theres not much to it:
string[] outFiles = Directory.GetFiles(workingDir);
for (int i = 0; i <= outFiles.Length - 1; i++
{
//code to rename the file in the array
}
You can use a custom IComparer<string> implementing your sorting rule:
class IatEqualityComparer : IComparer<string>
{
public int Compare(string a, string b)
{
if (a == b)
return 0;
var aWithoutExtension = Path.GetFileNameWithoutExtension(a);
var bWithoutExtension = Path.GetFileNameWithoutExtension(b);
var aIsIat = aWithoutExtension.EndsWith("IAT", StringComparison.InvariantCultureIgnoreCase);
var bIsIat = bWithoutExtension.EndsWith("IAT", StringComparison.InvariantCultureIgnoreCase);
if (aIsIat && !bIsIat)
return 1;
if (!aIsIat && bIsIat)
return -1;
return a.CompareTo(b);
}
}
(In Windows file names are not case sensitive so you have to be very careful when you look for a specific pattern like IAT in a file name. It will almost always work as expected except for that one time in production where the file ended with iat and not IAT...)
You can then sort an array using Array.Sort:
Array.Sort(outFiles, new IatEqualityComparer());
This will sort the array in place. The result is:
File-302.txt
File-303.txt
File-304.txt
File-305.txt
File-302-IAT.txt
File-303-IAT.txt
File-304-IAT.txt
File-305-IAT.txt
The IComparer<string> can also be used when sorting lists in place and with LINQ OrderBy.
If you project your items into a new sequence with two different ordering fields, you can then use LINQ to sort the projection accordingly, then extract the file name from the resulting sequence:
outFiles
.Select(fn => new{
order = Path.GetFileNameWithoutExtension(fn).EndsWith("-IAT") ? 1 : 0,
fn
})
.OrderBy(x => x.order)
.ThenBy(x => x.fn)
.Select(x => x.fn)
Actually, what I ended up doing was just a simple bubble sort, as the amount of files i am dealing with is very small. I changed from storing the files in an array to a list.
List<string> outFiles = new List<string>(Directory.GetFiles(workingDir));
bool noSort;
do
{
noSort = true;
for (int i = 0; i <= outFiles.Count - 2; i++)
{
if (outFiles[i].EndsWith("IAT.TXT"))
{
if (!outFiles[i + 1].EndsWith("IAT.TXT"))
{
string temp = outFiles[i + 1];
outFiles[i + 1] = outFiles[i];
outFiles[i] = temp;
noSort = false;
}
}
}
}
while (noSort == false);

Remove part of a string from List<string>

I have a List<string>, the string part representing filenames that I need to filter out: anything that comes before the character '&' included must be erased.
List<string> zippedTransactions = new List<string>();
zippedTransactions.Add("33396&20151007112154000549659S03333396SUMMARIES.PDF");
zippedTransactions.Add("33395&20151007112400000549659S03333395SUMMARIES.PDF");
zippedTransactions.Add("33397&20151007112555000549659S03333397SUMMARIES.PDF");
// desired output:
// "20151007112154000549659S03333396SUMMARIES.PDF";
// "20151007112400000549659S03333395SUMMARIES.PDF";
// "20151007112555000549659S03333397SUMMARIES.PDF"
NOTE: I don't want to give it the classic iterative-style look, since C# provides for plentiful of functional interfaces to interact with this sort of data structure, I want to start using it.
Here is one Linq approach with RegEx
Transactions = Transactions.Select(x => Regex.Replace(x, ".*&", string.Empty)).ToList();
That's more fault tolerant compared to Split('&')[1] in case there is no & in your filename
Try this
for (int i = 0; i < zippedTransactions.Count; i++)
{
zippedTransactions[i] = zippedTransactions[i].Split('&')[1];
}
If you happen to have visual studio, and a version that supports C# Interactive, I suggest you try this.
> zippedTransactions = new List<string>() {
"33396&20151007112154000549659S03333396SUMMARIES.PDF",
"33395&20151007112400000549659S03333395SUMMARIES.PDF",
"33397&20151007112555000549659S03333397SUMMARIES.PDF"
};
>
> zippedTransactions.Select(dirname => dirname.Split('&')[1])
Enumerable.WhereSelectListIterator<string, string> { "20151007112154000549659S03333396SUMMARIES.PDF", "20151007112400000549659S03333395SUMMARIES.PDF", "20151007112555000549659S03333397SUMMARIES.PDF" }
>
And even if you don't, you can get an idea of what's happening just by looking at the code.
The WhereSelectListIterator is a data structure holding the logic you intend to execute on the data structure. It is evaluated (read: the loop actually happens) only when you consume it (for example, calling .ToList() at the end).
This code will only take the second element coming after splitting the string on '&', so you might wanna generalize it or tune it for your requirements.
Use string.Split to split the string at the desired character and retrieve the portion that you want:
foreach (var item in zippedTransactions)
{
string[] result = item.Split('&');
Console.WriteLine(result[1]);
}
You can use the string.IndexOf function to find the location of a character in the string and then use string.Remove to remove the characters up to that point:
for(int i =0; i < zippedTransactions.Count; i++)
{
int count = zippedTransactions[i].IndexOf("&") + 1;
zippedTransactions[i] = zippedTransactions[i].Remove(0, count);
}
following code will help you
for (int i = 0; i < zippedTransactions.Count; i++)
{
string[] result = zippedTransactions[i].Split('&');
zippedTransactions[i] = result[result.Length-1];
}

Can't find string in input file

I have a text file, which I am trying to insert a line of code into. Using my linked-lists I believe I can avoid having to take all the data out, sort it, and then make it into a new text file.
What I did was come up with the code below. I set my bools, but still it is not working. I went through debugger and what it seems to be going on is that it is going through the entire list (which is about 10,000 lines) and it is not finding anything to be true, so it does not insert my code.
Why or what is wrong with this code?
List<string> lines = new List<string>(File.ReadAllLines("Students.txt"));
using (StreamReader inFile = new StreamReader("Students.txt", true))
{
string newLastName = "'Constant";
string newRecord = "(LIST (LIST 'Constant 'Malachi 'D ) '1234567890 'mdcant#mail.usi.edu 4.000000 )";
string line;
string lastName;
bool insertionPointFound = false;
for (int i = 0; i < lines.Count && !insertionPointFound; i++)
{
line = lines[i];
if (line.StartsWith("(LIST (LIST "))
{
values = line.Split(" ".ToCharArray());
lastName = values[2];
if (newLastName.CompareTo(lastName) < 0)
{
lines.Insert(i, newRecord);
insertionPointFound = true;
}
}
}
if (!insertionPointFound)
{
lines.Add(newRecord);
}
You're just reading the file into memory and not committing it anywhere.
I'm afraid that you're going to have to load and completely re-write the entire file. Files support appending, but they don't support insertions.
you can write to a file the same way that you read from it
string[] lines;
/// instanciate and build `lines`
File.WriteAllLines("path", lines);
WriteAllLines also takes an IEnumerable, so you can past a List of string into there if you want.
one more issue: it appears as though you're reading your file twice. one with ReadAllLines and another with your StreamReader.
There are at least four possible errors.
The opening of the streamreader is not required, you have already read
all the lines. (Well not really an error, but...)
The check for StartsWith can be fooled if you lines starts with blank
space and you will miss the insertionPoint. (Adding a Trim will remove any problem here)
In the CompareTo line you check for < 0 but you should check for == 0. CompareTo returns 0 if the strings are equivalent, however.....
To check if two string are equals you should avoid using CompareTo as
explained in MSDN link above but use string.Equals
List<string> lines = new List<string>(File.ReadAllLines("Students.txt"));
string newLastName = "'Constant";
string newRecord = "(LIST (LIST 'Constant 'Malachi 'D ) '1234567890 'mdcant#mail.usi.edu 4.000000 )";
string line;
string lastName;
bool insertionPointFound = false;
for (int i = 0; i < lines.Count && !insertionPointFound; i++)
{
line = lines[i].Trim();
if (line.StartsWith("(LIST (LIST "))
{
values = line.Split(" ".ToCharArray());
lastName = values[2];
if (newLastName.Equals(lastName))
{
lines.Insert(i, newRecord);
insertionPointFound = true;
}
}
}
if (!insertionPointFound)
lines.Add(newRecord);
I don't list as an error the missing write back to the file. Hope that you have just omitted that part of the code. Otherwise it is a very simple problem.
(However I think that the way in which CompareTo is used is probably the main reason of your problem)
EDIT Looking at your comment below it seems that the answer from Sam I Am is the right one for you. Of course you need to write back the modified array of lines. All the changes are made to an in memory array of lines and nothing is written back to a file if you don't have code that writes a file. However you don't need new file
File.WriteAllLines("Students.txt", lines);

C# Remove file extension from string list

What my program does is basically it lists file names (including it's extension) from a directory into a listbox. It then has a sorting function which sorts the list strings into alphabetical order.
Lastly it has a binary search function that allows the users to input any string which the program will then compare and display the matched results into a listbox.
Now, all these functions work perfectly however I can't seem to remove the extension off of a file name after a search.
For example in the scanning and sorting part it lists the file names as: filename.mp3
Now, what I want it do when the searching button is clicked is to remove the file extension and display just the filename.
private void buttonSearch_Click(object sender, RoutedEventArgs e)
{
listBox1.Items.Clear();
string searchString = textBoxSearchPath.Text;
int index = BinarySearch(list1, 0, list1.Count, searchString);
for (int n = index; n < list1.Count; n++)
{
//Removes file extension from last decimal point ''not working''
int i = list1[n].LastIndexOf(".");
if (i > 0)
list1[n].Substring(0, i);
// Adds items to list
if (list1[n].IndexOf(searchString, StringComparison.OrdinalIgnoreCase) != 0) break;
listBox1.Items.Add(list1[n]);
}
MessageBox.Show("Done");
}
C# is so easy that if something takes more than 2 minutes, there probably is a method for it in the Framework.
The Substring method returns a new fresh copy of the string, copied from the source one. If you want to "cut the extension off", then you must fetch what Substring returns and store it somewhere, i.e.:
int i = list1[n].LastIndexOf(".");
if (i > 0)
list1[n] = list1[n].Substring(0, i);
However, this is quite odd way to remove an extension.
Firstly, use of Substring(0,idx) is odd, as there's a Remove(idx)(link) which does exactly that:
int i = list1[n].LastIndexOf(".");
if (i > 0)
list1[n] = list1[n].Remove(i);
But, sencondly, there's even better way of doing it: the System.IO.Path class provides you with a set of well written static methods that, for example, remove the extension (edit: this is what L-Three suggested in comments), with full handling of dots and etc:
var str = System.IO.Path.GetFileNameWithoutExtension("myfile.txt"); // == "myfile"
See MSDN link
It still returns a copy and you still have to store the result somewhere!
list1[n] = Path.GetFileNameWithoutExtension( list1[n] );
Try like below ite will help you....
Description : Filename without Extension
listBox1.Items.Add(Path.GetFileNameWithoutExtension(list1[n]));
Use Path.GetFileNameWithoutExtension
Use Path.GetFileNameWithoutExtension method. Quite easy I guess.
http://msdn.microsoft.com/en-us/library/system.io.path.getfilenamewithoutextension.aspx
Not sure how you've implemented your directory searching, but you can leverage LINQ to your advantage in these situations for clean, easy to read code:
var files = Directory.EnumerateFiles(#"\\PathToFiles")
.Select(f => Path.GetFileNameWithoutExtension(f));
If you're using .NET 4.0, Enumerate files seems to be a superior choice over GetFiles. However it also sounds like you want to get both the full file path and the file name without extension. Here's how you could create a Dictionary so you'd eliminate looping through the collection twice:
var files = Directory.EnumerateFiles(#"\\PathToFiles")
.ToDictionary(f => f, n => Path.GetFileNameWithoutExtension(n));
A way to do this if you don't have a file path, just a file Name
string filePath = (#"D:/" + fileName);
string withoutExtension = Path.getFileNameWithoutExtension(filePath);

Categories

Resources