Looking for a linq solution to replace a for-loop

Looking for a linq solution to replace a for-loop - c#

I run into examples like this all the time. In this case I want to populate a stringbuilder with a new line for each FileInfo object in a previously loaded variable called files, that of course contains a bunch of FileInfo objects. For the first object, I want to add FIRST after the text then for everything else I want to add NOTFIRST. To do this with a forloop, I have to setup a counter, do an if statement and increment the counter.
I've learned just enough linq that its on the tip of my fingers, but I know there has to be an elegant LINQ solution.
var mysb = new StringBuilder();
var count = 0;
string extra;
foreach (System.IO.FileInfo fi in files)
{
var newLine = fi.Name;
if (count == 0)
extra = "FIRST";
else
extra= "NOTFIRST";
count = count++;
mysb.AppendLine(string.Format("({0} {1})", newLine, extra));
}

Personally, I would forego the LINQ and stick with what you have, just simpler:
var mysb = new StringBuilder();
foreach (FileInfo fi in files)
{
string extra = mysb.Length == 0 ? "FIRST" : "NOTFIRST";
mysb.Append(fi.Name);
mysb.AppendLine(extra);
}
(It's not clear to me why you are treating the file name as a valid format string...of course, if it really is a valid format string, you can change my two calls to Append() and AppendLine() back to the single call with the string.Format())

You may use the overload of Select that gives you the current index: http://msdn.microsoft.com/pl-pl/library/bb534869(v=vs.110).aspx
I also don't like mutating state when using linq so I would use String.Join instead.
mysb.AppendLine(String.Join(Environment.NewLine,
files.Select((fi, i) => String.Format(fi.Name, i == 0 ? "FIRST" : "NOTFIRST"))));

As often as it happens I ask questions here, I went with a hybrid of suggetsions:
foreach (var fi in files)
{
var extra = (fi == files.First() ? "FIRST" : "NOTFIRST");
sb.AppendLine(fi.Name + extra);
}
I was unwilling to check the length of the stringbuilder, because I have other scenarios where extra pretty much requires using a linq function.
I suppose I could have just as easily done the following (for my stated example):
sb.AppendLine(files.First().Name + " FIRST");
sb.AppendLine(String.Join(Environment.NewLine,
files.Skip(1).Select( fi => fi.Name + " NOTFIRST")));
But to be honest, its half as readable.

I'm not suggesting that this is the best way to do this, but it was fun to write:
var fi = new [] { new { Name= "A"},
new { Name= "B"},
new { Name= "C"}};
String.Join(Environment.NewLine,
fi.Take(1).Select (f => Tuple.Create(f.Name,"FIRST"))
.Concat(fi.Skip(1).Select (f => Tuple.Create(f.Name,"NONFIRST")))
.Select(t=> String.Format("({0} {1})", t.Item1, t.Item2)))
.Dump();

Related

How to check if a string is in an array

I am trying to check whether a string is in an array and if continues even though the fileInfo.Name.Contains a string that is in files.Any:
\\FILES LIKE DATABASE.MDB IS IN C:PROJECTS\HOLON\DATABASE.MDB
**if (files.Any((fileInfo.Name.Contains)))**
\\DO SOMETHING
Console.WriteLine(
fileInfo.Name, fileInfo.Length,

If you alread have the filenames collected in an array, then you should either do it this way:
if (files.Any() && files.Contains(fileInfo.Name))
{
// Do something
}
If you just want to check if a file exists then you can use File.Exists:
if(System.IO.File.Exists(fileInfo.Name))
{
// Do Something
}

So you have a collection of full file paths? And you want to check if one or more of those list entries match with a specific file name?
Perhaps this would work for you:
string fileToSearch = "DATABASE.MDB";
bool found = files.Any(fileName => new FileInfo(fileName).Name.ToUpper() == fileToSearch.ToUpper());
Edit:
An alternative to constructing new FileInfo objects would be to use System.IO.Path:
bool found = files.Any(fileName => Path.GetFileName(fileName).ToUpper() == fileToSearch.ToUpper());
Edit 2:
On the other hand, if you want to search for a specific file name, and you want to use the result, you could do something like this:
var fileToSearch = "DATABASE.MDB";
var fileInfo =
(from f in files
let fi = new FileInfo(f)
where fi.Name.ToUpper() == fileToSearch.ToUpper()
select fi).FirstOrDefault();
if (fileInfo != null)
{
if (fileInfo.Exists)
{
Console.WriteLine($"{fileInfo.Name} ({fileInfo.Length} bytes).");
}
else
{
Console.WriteLine($"{fileInfo.Name} (does not exist).");
}
}
I used a LINQ query here for readability. You could use the extension methods (files.Select(f => new FileInfo(f)).Where(fi => fi.Name.ToUpper() == fileToSearch.ToUpper()).FirstOrDefault()) as well, but that's up to you.

if (Array.Exists(files, element => element.Contains(fileInfo.Name)))

Linq query for building a dictionary from a reg file

I'm building a simple dictionary from a reg file (export from Windows Regedit). The .reg file contains a key in square brackets, followed by zero or more lines of text, followed by a blank line. This code will create the dictionary that I need:
var a = File.ReadLines("test.reg");
var dict = new Dictionary<String, List<String>>();
foreach (var key in a) {
if (key.StartsWith("[HKEY")) {
var iter = a.GetEnumerator();
var value = new List<String>();
do {
iter.MoveNext();
value.Add(iter.Current);
} while (String.IsNullOrWhiteSpace(iter.Current) == false);
dict.Add(key, value);
}
}
I feel like there is a cleaner (prettier?) way to do this in a single Linq statement (using a group by), but it's unclear to me how to implement the iteration of the value items into a list. I suspect I could do the same GetEnumerator in a let statement but it seems like there should be a way to implement this without resorting to an explicit iterator.
Sample data:
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.msu]
#="Microsoft.System.Update.1"
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.MTS]
#="WMP11.AssocFile.M2TS"
"Content Type"="video/vnd.dlna.mpeg-tts"
"PerceivedType"="video"
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.MTS\OpenWithProgIds]
"WMP11.AssocFile.M2TS"=hex(0):
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.MTS\ShellEx]
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.MTS\ShellEx\{BB2E617C-0920-11D1-9A0B-00C04FC2D6C1}]
#="{9DBD2C50-62AD-11D0-B806-00C04FD706EC}"
Update
I'm sorry I need to be more specific. The files am looking at around ~300MB so I took the approach I did to keep the memory footprint down. I'd prefer an approach that doesn't require pulling the entire file into memory.

You can always use Regex:
var dict = new Dictionary<String, List<String>>();
var a = File.ReadAllText(#"test.reg");
var results = Regex.Matches(a, "(\\[[^\\]]+\\])([^\\[]+)\r\n\r\n", RegexOptions.Singleline);
foreach (Match item in results)
{
dict.Add(
item.Groups[1].Value,
item.Groups[2].Value.Split(new[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries).ToList()
);
}
I whipped this out real quick. You might be able to improve the regex pattern.

Instead of using GetEnumerator you can take advantage of TakeWhile and Split methods to break your list into smaller list (each sublist represents one key and its values)
var registryLines = File.ReadLines("test.reg");
Dictionary<string, List<string>> resultKeys = new Dictionary<string, List<string>>();
while (registryLines.Count() > 0)
{
// Take the key and values into a single list
var keyValues = registryLines.TakeWhile(x => !String.IsNullOrWhiteSpace(x)).ToList();
// Adds a new entry to the dictionary using the first value as key and the rest of the list as value
if (keyValues != null && keyValues.Count > 0)
resultKeys.Add(keyValues[0], keyValues.Skip(1).ToList());
// Jumps to the next registry (+1 to skip the blank line)
registryLines = registryLines.Skip(keyValues.Count + 1);
}
EDIT based on your update
Update I'm sorry I need to be more specific. The files am looking at
around ~300MB so I took the approach I did to keep the memory
footprint down. I'd prefer an approach that doesn't require pulling
the entire file into memory.
Well, if you can't read the whole file into memory, it makes no sense to me asking for a LINQ solution. Here is a sample of how you can do it reading line by line (still no need for GetEnumerator)
Dictionary<string, List<string>> resultKeys = new Dictionary<string, List<string>>();
using (StreamReader reader = File.OpenText("test.reg"))
{
List<string> keyAndValues = new List<string>();
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
// Adds key and values to a list until it finds a blank line
if (!string.IsNullOrWhiteSpace(line))
keyAndValues.Add(line);
else
{
// Adds a new entry to the dictionary using the first value as key and the rest of the list as value
if (keyAndValues != null && keyAndValues.Count > 0)
resultKeys.Add(keyAndValues[0], keyAndValues.Skip(1).ToList());
// Starts a new Key collection
keyAndValues = new List<string>();
}
}
}

I think you can use a code like this - if you can use memory -:
var lines = File.ReadAllText(fileName);
var result =
Regex.Matches(lines, #"\[(?<key>HKEY[^]]+)\]\s+(?<value>[^[]+)")
.OfType<Match>()
.ToDictionary(k => k.Groups["key"], v => v.Groups["value"].ToString().Trim('\n', '\r', ' '));
C# Demo
This will take 24.173 seconds for a file with more than 4 million lines - Size:~550MB - by using 1.2 GB memory.
Edit :
The best way is using File.ReadAllLines as it is lazy:
var lines = File.ReadAllLines(fileName);
var keyRegex = new Regex(#"\[(?<key>HKEY[^]]+)\]");
var currentKey = string.Empty;
var currentValue = string.Empty;
var result = new Dictionary<string, string>();
foreach (var line in lines)
{
var match = keyRegex.Match(line);
if (match.Length > 0)
{
if (!string.IsNullOrEmpty(currentKey))
{
result.Add(currentKey, currentValue);
currentValue = string.Empty;
}
currentKey = match.Groups["key"].ToString();
}
else
{
currentValue += line;
}
}
This will take 17093 milliseconds for a file with 795180 lines.

Remove names that contain another in a list

I have a file with "Name|Number" in each line and I wish to remove the lines with names that contain another name in the list.
For example, if there is "PEDRO|3" , "PEDROFILHO|5" , "PEDROPHELIS|1" in the file, i wish to remove the lines "PEDROFILHO|5" , "PEDROPHELIS|1".
The list has 1.8 million lines, I made it like this but its too slow :
List<string> names = File.ReadAllLines("firstNames.txt").ToList();
List<string> result = File.ReadAllLines("firstNames.txt").ToList();
foreach (string name in names)
{
string tempName = name.Split('|')[0];
List<string> temp = names.Where(t => t.Contains(tempName)).ToList();
foreach (string str in temp)
{
if (str.Equals(name))
{
continue;
}
result.Remove(str);
}
}
File.WriteAllLines("result.txt",result);
Does anyone know a faster way? Or how to improve the speed?

Since you are looking for matches everywhere in the word, you will end up with O(n2) algorithm. You can improve implementation a bit to avoid string deletion inside a list, which is an O(n) operation in itself:
var toDelete = new HashSet<string>();
var names = File.ReadAllLines("firstNames.txt");
foreach (string name in names) {
var tempName = name.Split('|')[0];
toDelete.UnionWith(
// Length constraint removes self-matches
names.Where(t => t.Length > name.Length && t.Contains(tempName))
);
}
File.WriteAllLines("result.txt", names.Where(name => !toDelete.Contains(name)));

This works but I don't know if it's quicker. I haven't tested on millions of lines. Remove the tolower if the names are in the same case.
List<string> names = File.ReadAllLines(#"C:\Users\Rob\Desktop\File.txt").ToList();
var result = names.Where(w => !names.Any(a=> w.Split('|')[0].Length> a.Split('|')[0].Length && w.Split('|')[0].ToLower().Contains(a.Split('|')[0].ToLower())));
File.WriteAllLines(#"C:\Users\Rob\Desktop\result.txt", result);
test file had
Rob|1
Robbie|2
Bert|3
Robert|4
Jan|5
John|6
Janice|7
Carol|8
Carolyne|9
Geoff|10
Geoffrey|11
Result had
Rob|1
Bert|3
Jan|5
John|6
Carol|8
Geoff|10

How do i create a c# dictionary that will read in key and values from .csv file

I have a .csv file with a list of abbreviations and their actual meaning e.g.
Laughing Out Loud, LOL
I need to be able to search for an abbreviation in a text box and replace the abbreviation with the actual words. This is what I have attempted so far to understand dictionaries but cannot work out how to read in values from the file.
Dictionary<string, string> Abbreviations = new Dictionary<string, string>();
Abbreviations.Add("Laughing Out Loud", "lol");
foreach (KeyValuePair<string, string> abbrev in Abbreviations)
{
txtinput.Text = txtinput + "<<" + abbrev.Key + ">>";
}

You can try this LINQ solution the GroupBy is to handle the case where a key is in a file multiple times.
Dictionary<string, string[]> result =
File.ReadLines("test.csv")
.Select(line => line.Split(','))
.GroupBy(arr => arr[0])
.ToDictionary(gr => gr.Key,
gr => gr.Select(s => s[1]).ToArray());
To check if the abbreviation in the TextBox exists in the Dictionary:
foreach (KeyValuePair<string, string[]> abbrev in result)
{
if (txtinput.Text == abbrev.Value)
{
txtinput.Text = txtinput + "<<" + abbrev.Key + ">>";
}
}

You can start by creating a Stream Reader for your file, then looping for all your values in the CSV and add them to the dictionary.
static void Main(string[] args)
{
var csv_reader = new StreamReader(File.OpenRead(#"your_file_path"));
//declare your dictionary somewhere outside the loop.
while (!csv_reader.EndOfStream)
{
//read the line and split if you need to with .split('')
var line = reader.ReadLine();
//Add to the dictionary here
}
//Call another method for your search and replace.
SearchAndReplace(your_input)
}
Then have the implementation of that method, search if the input exists in the dictionary and if it does replace it.
You could use LINQ to put the values of the csv into your dictionary, if that's easier for you.

I'm going to assume that your input file may have commas in the actual text, and not just separating the two fields.
Now, if that were the case, then the standard CSV file format for format the file like this:
Laughing Out Loud,LOL
"I Came, I Saw, I Conquered",ICISIC
However, from your example you have a space before the "LOL", so it doesn't appear that you're using standard CSV.
So I'll work on this input:
Laughing Out Loud, LOL
"I Came, I Saw, I Conquered",ICISIC
"to, too, or two", 2
because,B/C
For this input then this code produces a dictionary:
var dictionary =
(
from line in File.ReadAllLines("FILE.CSV")
let lastComma = line.LastIndexOf(',')
let abbreviation = line.Substring(lastComma + 1).Trim()
let actualRaw = line.Substring(0, lastComma).Trim()
let actual = actualRaw.StartsWith("\"") && actualRaw.EndsWith("\"")
? actualRaw.Substring(1, actualRaw.Length - 2)
: actualRaw
select new { abbreviation, actual }
).ToDictionary(x => x.abbreviation, x => x.actual);
You can go one better than this though. It's quite possible to create a "super function" that will do all of the replaces in one go for you.
Try this:
var translate =
(
from line in File.ReadAllLines("FILE.CSV")
let lastComma = line.LastIndexOf(',')
let abbreviation = line.Substring(lastComma + 1).Trim()
let actualRaw = line.Substring(0, lastComma).Trim()
let actual = actualRaw.StartsWith("\"") && actualRaw.EndsWith("\"")
? actualRaw.Substring(1, actualRaw.Length - 2)
: actualRaw
select (Func<string, string>)(x => x.Replace(abbreviation, actual))
).Aggregate((f1, f2) => x => f2(f1(x)));
Then I can do this:
Console.WriteLine(translate("It was me 2 B/C ICISIC, LOL!"));
I get this result:
It was me to, too, or two because I Came, I Saw, I Conquered, Laughing Out Loud!

C# Exception Handling continue on error

I have a basic C# console application that reads a text file (CSV format) line by line and puts the data into a HashTable. The first CSV item in the line is the key (id num) and the rest of the line is the value. However I've discovered that my import file has a few duplicate keys that it shouldn't have. When I try to import the file the application errors out because you can't have duplicate keys in a HashTable. I want my program to be able to handle this error though. When I run into a duplicate key I would like to put that key into a arraylist and continue importing the rest of the data into the hashtable. How can I do this in C#
Here is my code:
private static Hashtable importFile(Hashtable myHashtable, String myFileName)
{
StreamReader sr = new StreamReader(myFileName);
CSVReader csvReader = new CSVReader();
ArrayList tempArray = new ArrayList();
int count = 0;
while (!sr.EndOfStream)
{
String temp = sr.ReadLine();
if (temp.StartsWith(" "))
{
ServMissing.Add(temp);
}
else
{
tempArray = csvReader.CSVParser(temp);
Boolean first = true;
String key = "";
String value = "";
foreach (String x in tempArray)
{
if (first)
{
key = x;
first = false;
}
else
{
value += x + ",";
}
}
myHashtable.Add(key, value);
}
count++;
}
Console.WriteLine("Import Count: " + count);
return myHashtable;
}

if (myHashtable.ContainsKey(key))
duplicates.Add(key);
else
myHashtable.Add(key, value);

A better solution is to call ContainsKey to check if the key exist before adding it to the hash table instead. Throwing exception on this kind of error is a performance hit and doesn't improve the program flow.

ContainsKey has a constant O(1) overhead for every item, while catching an Exception incurs a performance hit on JUST the duplicate items.
In most situations, I'd say check for the key, but in this case, its better to catch the exception.

Here is a solution which avoids multiple hits in the secondary list with a small overhead to all insertions:
Dictionary<T, List<K>> dict = new Dictionary<T, List<K>>();
//Insert item
if (!dict.ContainsKey(key))
dict[key] = new List<string>();
dict[key].Add(value);
You can wrap the dictionary in a type that hides this or put it in a method or even extension method on dictionary.

If you have more than 4 (for example) CSV values, it might be worth setting the value variable to use a StringBuilder as well since the string concatenation is a slow function.

Hmm, 1.7 Million lines? I hesitate to offer this for that kind of load.
Here's one way to do this using LINQ.
CSVReader csvReader = new CSVReader();
List<string> source = new List<string>();
using(StreamReader sr = new StreamReader(myFileName))
{
while (!sr.EndOfStream)
{
source.Add(sr.ReadLine());
}
}
List<string> ServMissing =
source
.Where(s => s.StartsWith(" ")
.ToList();
//--------------------------------------------------
List<IGrouping<string, string>> groupedSource =
(
from s in source
where !s.StartsWith(" ")
let parsed = csvReader.CSVParser(s)
where parsed.Any()
let first = parsed.First()
let rest = String.Join( "," , parsed.Skip(1).ToArray())
select new {first, rest}
)
.GroupBy(x => x.first, x => x.rest) //GroupBy(keySelector, elementSelector)
.ToList()
//--------------------------------------------------
List<string> myExtras = new List<string>();
foreach(IGrouping<string, string> g in groupedSource)
{
myHashTable.Add(g.Key, g.First());
if (g.Skip(1).Any())
{
myExtras.Add(g.Key);
}
}

Thank you all.
I ended up using the ContainsKey() method. It takes maybe 30 secs longer, which is fine for my purposes. I'm loading about 1.7 million lines and the program takes about 7 mins total to load up two files, compare them, and write out a few files. It only takes about 2 secs to do the compare and write out the files.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Looking for a linq solution to replace a for-loop - c#

Related

How to check if a string is in an array

Linq query for building a dictionary from a reg file

Remove names that contain another in a list

How do i create a c# dictionary that will read in key and values from .csv file

C# Exception Handling continue on error

Categories

Resources