I have some version folders like Vx_x_x
I want to retrieve the max folder version.
For example:
Folder contains,
V8_2_1
V9_3_2
V10_4_1
I want to check the max number next to V and so on to get a latest folder version.
I am able to get a list of folders, But confusion to get how can I get a max number. If anyone can suggest me would me a great helpful. Thank you.
private static void GetFolderVersion()
{
string startFolder = #"C:\Version\";
System.IO.DirectoryInfo dir = new System.IO.DirectoryInfo(startFolder);
IEnumerable<System.IO.DirectoryInfo> directoryList = dir.GetDirectories("*.*", System.IO.SearchOption.AllDirectories);
}
I would consider using the built-in System.Version type. Assuming all directory names are in the same format of "VX_Y_Z" (where X, Y and Z represent one or more digits, and V represents a literal "V"), the following code will do what you want:
public string GetMaxVersion(IEnumerable<string> directoryNames)
{
var vDict = directoryNames.ToDictionary(
s => new Version(s.Substring(1).Replace("_", ".")),
s => s);
var maxKey = vDict.Keys.Max();
return vDict[maxKey];
}
Here we build a dictionary of version to file name mappings (note that we change the string format from "VX_Y_Z" to "X.Y.Z" to be able to create a System.Version object). All that remains is to retrieve the max value of all dictionary keys, and return the value assigned to that given key, which will be the directory name you're looking for.
UPDATE: For completeness, here's a piece of code that uses the method above and takes care of everything:
public string GetMaxVersionDirectory(string rootDirectory)
{
var dirNames = Directory.GetDirectories(rootDirectory, "V*_*_*")
.Select(dir => Path.GetFileName(dir));
return GetMaxVersion(dirNames);
}
In your case, you need to pass #"C:\Version" as the rootDirectory parameter.
If the files all match the pattern you gave, I'd be tempted to extract the version information using a regular expression, then selecting the highest value, starting with the major version and working out.
Update: replace the V in the regular expression with the correct prefix for your case.
var regex = new Regex(#"^V(\d+)_(\d+)_(\d+)$", RegexOptions.Compiled);
var versions = directoryList
.Select(f => regex.Match(f))
.Where(m => m.Success)
.Select(m => new
{
Major = Int32.Parse(m.Groups[1].Value),
Minor = Int32.Parse(m.Groups[2].Value),
Patch = Int32.Parse(m.Groups[3].Value)
}).ToList();
var major = versions.Max(a => a.Major);
versions = versions
.Where(a => a.Major == major)
.ToList();
var minor = versions.Max(a => a.Minor);
versions = versions
.Where(a => a.Minor == minor)
.ToList();
var patch = versions.Max(a => a.Patch);
versions = versions
.Where(a => a.Patch == patch)
.ToList();
var newest = versions.First();
var filename = String.Format("V_{0}_{1}_{2}", newest.Major, newest.Minor, newest.Patch);
Although this probably does work, it's advisable to use the solution by #jbartuszek that uses the Version class.
Something along these lines could do it, but be aware that this is from the top of my head, written in notepad so might not build or contain some form of logical error. This just demonstrates the gist of it:
string[] version = folderName.Split('_');
string[] otherVersion = otherFolderName.Split('_');
then you write a method that checks the individual parts of the version:
private int CompareVersions(string[] version, string[] otherVersion)
{
//we won't need the first character (the 'V')
int vmajor = int.Parse(version[0].Substring(1));
int vminor = int.Parse(version[1]);
int vrevision = int.Parse(version[2]);
int ovmajor = int.Parse(otherVersion[0].Substring(1));
int ovminor = int.Parse(otherVersion[1]);
int ovrevision = int.Parse(otherVersion[2]);
int majorCompare = vmajor.CompareTo(ovmajor);
//check if major already decides outcome
if(majorCompare != 0)
{
return majorCompare;
}
int minorCompare = vminor.CompareTo(ovminor);
//then if major equal, check if minor decides outcome
if(minorCompare != 0)
{
return minorCompare;
}
//lastly, return outcome of revision compare
return vrevision.CompareTo(ovrevision);
}
this is how you'd compare two folder names. If you wanted to get the max folder version, you could foreach the folder names:
//we'll start out by assigning the first folder name as a preliminary max
string maxFolder = folderNames[1];
string[] maxFolderVersion = maxFolder.Split('_');
foreach(string folderName in folderNames)
{
if(CompareVersions(folderName.Split('_'), maxFolderVersion) > 0)
{
maxFolder = folderName;
}
}
You can use a mathematical approach. Let's say every single part of your version can go up to max. 1000. So the base is 1000. The version parts are your coefficents and the exponents are built in your loop. By summing up coefficent*base^exp you get a value which you can compare to get the highest version:
private static string GetHighestFolderVersion()
{
string startFolder = #"C:\Version\";
System.IO.DirectoryInfo dir = new System.IO.DirectoryInfo(startFolder);
IEnumerable<System.IO.DirectoryInfo> directoryList = dir.GetDirectories("*.*", System.IO.SearchOption.AllDirectories);
KeyValuePair<string, long> highestVersion = new KeyValuePair<string, long>("", 0);
foreach (System.IO.DirectoryInfo dirInfo in directoryList)
{
string versionOrigName = dirInfo.Name;
string versionStr = versionOrigName.Substring(1);
List<string> versionParts = versionStr.Split('_').ToList<string>();
long versionVal = 0;
int exp = 0;
for (int i = versionParts.Count - 1; i > -1; i--)
{
versionVal += (long.Parse(versionParts[i]) * (long)(Math.Pow(1000, exp)));
exp++;
}
if (versionVal > highestVersion.Value)
{
highestVersion = new KeyValuePair<string, long>(versionOrigName, versionVal);
}
}
return highestVersion.Key;
}
Related
I have a file with "Name|Number" in each line and I wish to remove the lines with names that contain another name in the list.
For example, if there is "PEDRO|3" , "PEDROFILHO|5" , "PEDROPHELIS|1" in the file, i wish to remove the lines "PEDROFILHO|5" , "PEDROPHELIS|1".
The list has 1.8 million lines, I made it like this but its too slow :
List<string> names = File.ReadAllLines("firstNames.txt").ToList();
List<string> result = File.ReadAllLines("firstNames.txt").ToList();
foreach (string name in names)
{
string tempName = name.Split('|')[0];
List<string> temp = names.Where(t => t.Contains(tempName)).ToList();
foreach (string str in temp)
{
if (str.Equals(name))
{
continue;
}
result.Remove(str);
}
}
File.WriteAllLines("result.txt",result);
Does anyone know a faster way? Or how to improve the speed?
Since you are looking for matches everywhere in the word, you will end up with O(n2) algorithm. You can improve implementation a bit to avoid string deletion inside a list, which is an O(n) operation in itself:
var toDelete = new HashSet<string>();
var names = File.ReadAllLines("firstNames.txt");
foreach (string name in names) {
var tempName = name.Split('|')[0];
toDelete.UnionWith(
// Length constraint removes self-matches
names.Where(t => t.Length > name.Length && t.Contains(tempName))
);
}
File.WriteAllLines("result.txt", names.Where(name => !toDelete.Contains(name)));
This works but I don't know if it's quicker. I haven't tested on millions of lines. Remove the tolower if the names are in the same case.
List<string> names = File.ReadAllLines(#"C:\Users\Rob\Desktop\File.txt").ToList();
var result = names.Where(w => !names.Any(a=> w.Split('|')[0].Length> a.Split('|')[0].Length && w.Split('|')[0].ToLower().Contains(a.Split('|')[0].ToLower())));
File.WriteAllLines(#"C:\Users\Rob\Desktop\result.txt", result);
test file had
Rob|1
Robbie|2
Bert|3
Robert|4
Jan|5
John|6
Janice|7
Carol|8
Carolyne|9
Geoff|10
Geoffrey|11
Result had
Rob|1
Bert|3
Jan|5
John|6
Carol|8
Geoff|10
I have a list of database items and a list of files. I am trying to find out what files are missing from the database. I read the database into a list DBItems. I read the files into another list:
List<DBFiles> DBItems = new List<DBFiles>();
ArrayList FileArray = Directory.GetFiles(#"C:\reports\", "*.rpt", SearchOption.AllDirectories);
public class DBFiles
{
public DBFiles(string fileName, string flag)
{
this.FileName = fileName;
this.Flag = flag;
}
public string FileName { set; get; }
public string Flag { set; get; }
}
My question is how do I look up if each item in FileArray is in DBFiles with a specific Flag. This is what I have so far:
private void ListCompare()
{
for (int i = 0; i < FileArray.Count; i++)
{
if (DBItems.FileName.Contains(FileArray[i]) && DBItems.Flag.Contains("A") )
{
}
}
}
Obviously it's not working. Any help would be sincerely appreciated.
Try using Linq: you want all the files in the directory Except the files in the DBItems:
var result = Directory
.EnumerateFiles(#"C:\reports\", "*.rpt", SearchOption.AllDirectories)
.Except(DBItems
.Where(item => item.Flag.Contains("A")) // add condition via Where in required
.Select(item => item.FileName)
,StringComparer.OrdinalIgnoreCase)
.ToArray(); // let's materialize into array
P.S. Try avoiding obsolete ArrayList class; put List<T> (List<string> in your case) instead.
Edit: according to the question refines (see comments) below the condition is more complicated.
I'm looking for files that start with h9347 (the database defines only
the first part of the file name for client specific needs
In this case I suggest creating a HashSet<string> of what to exclude, and again use Linq:
// File names in name.extension format to exclude
HashSet<string> toExclude = new HashSet<string>(DBItems
.Where(item => item.Flag.Contains("A")
.Select(item => item.FileName),
StringComparer.OrdinalIgnoreCase);
var result => Directory
.EnumerateFiles(#"C:\reports\", "*.rpt", SearchOption.AllDirectories)
.Where(file => Path.GetFileName(file).StartsWith("h9347") &&
!toExclude.Contains(Path.GetFileName(file)))
.ToArray();
I have a dictionary with a list of strings that each look something like:
"beginning|middle|middle2|end"
Now what I wanted was to do this:
List<string> stringsWithPipes = new List<string>();
stringWithPipes.Add("beginning|middle|middle2|end");
...
if(stringWithPipes.Contains("beginning|middle|middle2|end")
{
return true;
}
problem is, the string i'm comparing it against is built slightly different so it ends up being more like:
if(stringWithPipes.Contains(beginning|middle2|middle||end)
{
return true;
}
and obviously this ends up being false. However, I want to consider it true, since its only the order that is different.
What can I do?
You can split your string on | and then split the string to be compared, and then use Enumerable.Except along with Enumerable.Any like
List<string> stringsWithPipes = new List<string>();
stringsWithPipes.Add("beginning|middle|middle2|end");
stringsWithPipes.Add("beginning|middle|middle3|end");
stringsWithPipes.Add("beginning|middle2|middle|end");
var array = stringsWithPipes.Select(r => r.Split('|')).ToArray();
string str = "beginning|middle2|middle|end";
var compareArray = str.Split('|');
foreach (var subArray in array)
{
if (!subArray.Except(compareArray).Any())
{
//Exists
Console.WriteLine("Item exists");
break;
}
}
This can surely be optimized, but the above is one way to do it.
Try this instead::
if(stringWithPipes.Any(P => P.split('|')
.All(K => "beginning|middle2|middle|end".split('|')
.contains(K)))
Hope this will help !!
You need to split on a delimeter:
var searchString = "beginning|middle|middle2|end";
var searchList = searchString.Split('|');
var stringsWithPipes = new List<string>();
stringsWithPipes.Add("beginning|middle|middle2|end");
...
return stringsWithPipes.Select(x => x.Split('|')).Any(x => Match(searchList,x));
Then you can implement match in multiple ways
First up must contain all the search phrases but could include others.
bool Match(string[] search, string[] match) {
return search.All(x => match.Contains(x));
}
Or must be all the search phrases cannot include others.
bool Match(string[] search, string[] match) {
return search.All(x => match.Contains(x)) && search.Length == match.Length;
}
That should work.
List<string> stringsWithPipes = new List<string>();
stringsWithPipes.Add("beginning|middle|middle2|end");
string[] stringToVerifyWith = "beginning|middle2|middle||end".Split(new[] { '|' },
StringSplitOptions.RemoveEmptyEntries);
if (stringsWithPipes.Any(s => !s.Split('|').Except(stringToVerifyWith).Any()))
{
return true;
}
The Split will remove any empty entries created by the doubles |. You then check what's left if you remove every common element with the Except method. If there's nothing left (the ! [...] .Any(), .Count() == 0 would be valid too), they both contain the same elements.
So I want to use one of these LINQ functions with this List<string> I have.
Here's the setup:
List<string> all = FillList();
string temp = "something";
string found;
int index;
I want to find the string in all that matches temp when both are lower cased with ToLower(). Then I'll use the found string to find it's index and remove it from the list.
How can I do this with LINQ?
I get the feeling that you don't care so much about comparing the lowercase versions as you do about just performing a case-insensitive match. If so:
var listEntry = all.Where(entry =>
string.Equals(entry, temp, StringComparison.CurrentCultureIgnoreCase))
.FirstOrDefault();
if (listEntry != null) all.Remove(listEntry);
OK, I see my imperative solution is not getting any love, so here is a LINQ solution that is probably less efficient, but still avoids searching through the list two times (which is a problem in the accepted answer):
var all = new List<string>(new [] { "aaa", "BBB", "Something", "ccc" });
const string temp = "something";
var found = all
.Select((element, index) => new {element, index})
.FirstOrDefault(pair => StringComparer.InvariantCultureIgnoreCase.Equals(temp, pair.element));
if (found != null)
all.RemoveAt(found.index);
You could also do this (which is probably more performant than the above, since it does not create new object for each element):
var index = all
.TakeWhile(element => !StringComparer.InvariantCultureIgnoreCase.Equals(temp, element))
.Count();
if (index < all.Count)
all.RemoveAt(index);
I want to add to previous answers... why don't you just do it like this :
string temp = "something";
List<string> all = FillList().Where(x => x.ToLower() != temp.ToLower());
Then you have the list without those items in the first place.
all.Remove(all.FirstOrDefault(
s => s.Equals(temp,StringComparison.InvariantCultureIgnoreCase)));
Use the tool best suited for the job. In this case a simple piece of procedural code seems more appropriate than LINQ:
var all = new List<string>(new [] { "aaa", "BBB", "Something", "ccc" });
const string temp = "something";
var cmp = StringComparer.InvariantCultureIgnoreCase; // Or another comparer of you choosing.
for (int index = 0; index < all.Count; ++index) {
string found = all[index];
if (cmp.Equals(temp, found)) {
all.RemoveAt(index);
// Do whatever is it you want to do with 'found'.
break;
}
}
This is probably as fast as you can get, because:
Comparison it done in place - there is no creation of temporary uppercase (or lowercase) strings just for comparison purposes.
Element is searched only once (O(index)).
Element is removed in place without constructing a new list (O(all.Count-index)).
No delegates are used.
Straight for tends to be faster than foreach.
It can also be adapted fairly easily should you want to handle duplicates.
I'm writing a duplicate file detector. To determine if two files are duplicates I calculate a CRC32 checksum. Since this can be an expensive operation, I only want to calculate checksums for files that have another file with matching size. I have sorted my list of files by size, and am looping through to compare each element to the ones above and below it. Unfortunately, there is an issue at the beginning and end since there will be no previous or next file, respectively. I can fix this using if statements, but it feels clunky. Here is my code:
public void GetCRCs(List<DupInfo> dupInfos)
{
var crc = new Crc32();
for (int i = 0; i < dupInfos.Count(); i++)
{
if (dupInfos[i].Size == dupInfos[i - 1].Size || dupInfos[i].Size == dupInfos[i + 1].Size)
{
dupInfos[i].CheckSum = crc.ComputeChecksum(File.ReadAllBytes(dupInfos[i].FullName));
}
}
}
My question is:
How can I compare each entry to its neighbors without the out of bounds error?
Should I be using a loop for this, or is there a better LINQ or other function?
Note: I did not include the rest of my code to avoid clutter. If you want to see it, I can include it.
Compute the Crcs first:
// It is assumed that DupInfo.CheckSum is nullable
public void GetCRCs(List<DupInfo> dupInfos)
{
dupInfos[0].CheckSum = null ;
for (int i = 1; i < dupInfos.Count(); i++)
{
dupInfos[i].CheckSum = null ;
if (dupInfos[i].Size == dupInfos[i - 1].Size)
{
if (dupInfos[i-1].Checksum==null) dupInfos[i-1].CheckSum = crc.ComputeChecksum(File.ReadAllBytes(dupInfos[i-1].FullName));
dupInfos[i].CheckSum = crc.ComputeChecksum(File.ReadAllBytes(dupInfos[i].FullName));
}
}
}
After having sorted your files by size and crc, identify duplicates:
public void GetDuplicates(List<DupInfo> dupInfos)
{
for (int i = dupInfos.Count();i>0 i++)
{ // loop is inverted to allow list items deletion
if (dupInfos[i].Size == dupInfos[i - 1].Size &&
dupInfos[i].CheckSum != null &&
dupInfos[i].CheckSum == dupInfos[i - 1].Checksum)
{ // i is duplicated with i-1
... // your code here
... // eventually, dupInfos.RemoveAt(i) ;
}
}
}
I have sorted my list of files by size, and am looping through to
compare each element to the ones above and below it.
The next logical step is to actually group your files by size. Comparing consecutive files will not always be sufficient if you have more than two files of the same size. Instead, you will need to compare every file to every other same-sized file.
I suggest taking this approach
Use LINQ's .GroupBy to create a collection of files sizes. Then .Where to only keep the groups with more than one file.
Within those groups, calculate the CRC32 checksum and add it to a collection of known checksums. Compare with previously calculated checksums. If you need to know which files specifically are duplicates you could use a dictionary keyed by this checksum (you can achieve this with another GroupBy. Otherwise a simple list will suffice to detect any duplicates.
The code might look something like this:
var filesSetsWithPossibleDupes = files.GroupBy(f => f.Length)
.Where(group => group.Count() > 1);
foreach (var grp in filesSetsWithPossibleDupes)
{
var checksums = new List<CRC32CheckSum>(); //or whatever type
foreach (var file in grp)
{
var currentCheckSum = crc.ComputeChecksum(file);
if (checksums.Contains(currentCheckSum))
{
//Found a duplicate
}
else
{
checksums.Add(currentCheckSum);
}
}
}
Or if you need the specific objects that could be duplicates, the inner foreach loop might look like
var filesSetsWithPossibleDupes = files.GroupBy(f => f.FileSize)
.Where(grp => grp.Count() > 1);
var masterDuplicateDict = new Dictionary<DupStats, IEnumerable<DupInfo>>();
//A dictionary keyed by the basic duplicate stats
//, and whose value is a collection of the possible duplicates
foreach (var grp in filesSetsWithPossibleDupes)
{
var likelyDuplicates = grp.GroupBy(dup => dup.Checksum)
.Where(g => g.Count() > 1);
//Same GroupBy logic, but applied to the checksum (instead of file size)
foreach(var dupGrp in likelyDuplicates)
{
//Create the key for the dictionary (your code is likely different)
var sample = dupGrp.First();
var key = new DupStats() {FileSize = sample.FileSize, Checksum = sample.Checksum};
masterDuplicateDict.Add(key, dupGrp);
}
}
A demo of this idea.
I think the for loop should be : for (int i = 1; i < dupInfos.Count()-1; i++)
var grps= dupInfos.GroupBy(d=>d.Size);
grps.Where(g=>g.Count>1).ToList().ForEach(g=>
{
...
});
Can you do a union between your two lists? If you have a list of filenames and do a union it should result in only a list of the overlapping files. I can write out an example if you want but this link should give you the general idea.
https://stackoverflow.com/a/13505715/1856992
Edit: Sorry for some reason I thought you were comparing file name not size.
So here is an actual answer for you.
using System;
using System.Collections.Generic;
using System.Linq;
public class ObjectWithSize
{
public int Size {get; set;}
public ObjectWithSize(int size)
{
Size = size;
}
}
public class Program
{
public static void Main()
{
Console.WriteLine("start");
var list = new List<ObjectWithSize>();
list.Add(new ObjectWithSize(12));
list.Add(new ObjectWithSize(13));
list.Add(new ObjectWithSize(14));
list.Add(new ObjectWithSize(14));
list.Add(new ObjectWithSize(18));
list.Add(new ObjectWithSize(15));
list.Add(new ObjectWithSize(15));
var duplicates = list.GroupBy(x=>x.Size)
.Where(g=>g.Count()>1);
foreach (var dup in duplicates)
foreach (var objWithSize in dup)
Console.WriteLine(objWithSize.Size);
}
}
This will print out
14
14
15
15
Here is a netFiddle for that.
https://dotnetfiddle.net/0ub6Bs
Final note. I actually think your answer looks better and will run faster. This was just an implementation in Linq.