File sequence > Find Name Pattern

File sequence > Find Name Pattern - c#

I'm trying to figure out a solid way to solve multiple types of file sequences.
Consider these sequences
file_0000.jpg
file_0001.jpg
file_0002.jpg etc
&
new1File001.jpg
new1File002.jpg
new1File003.jpg
So it needs to find out where the first decimal of the sequence code starts.
FileInfo[] files = new DirectoryInfo(#"\\fileserver\").GetFiles("*.*", SearchOption.AllDirectories);
var grouped = files.OrderBy(f => f.Name).GroupBy(f => f.Name.Substring(0, f.Name.LastIndexOf("_")));
Obviously this finds file sequences where the sequence numbering is separated by "_". I want it to be filtered by the position of the first decimal of the last decimal sequence. My regex skills are not good and even then I don't know how to use it in the lamba expression.
The main question is, how can I find out where the number string starts for the above mentioned cases.
Any pointers would be great!
Thanks,
-Johan

Yes, regex is to rescue:
var r = new Regex(#".+(\d{2,}).");
var grouped =
files.
OrderBy(f => f.Name).
GroupBy(f => r.Match(f.Name).Groups[0].Value);

Related

How to extract preceding strings in a given directory

The folder names are variable but I have this constant value in the directory - the "distributions" folder.
How can I extract the all the strings before the "distributions" folder?
> /<root>/win/<usr>/distributions/<dbms>/<repository>/<port
> type>/<remote system>/<port>
Currently I'm doing it in lengthy way (e.g. getting the length of the whole directory, finding the location of distributions word in the string, etc...).
I'm looking for a more elegant way. Could this be done using Regex, or a shorter version of my current implementation?

string.Split followed by TakeWhile can help you
var resultArray = str.Split(new []{#"/"},StringSplitOptions.RemoveEmptyEntries)
.TakeWhile(x=>!x.Equals("distributions"));
Output
<root>
win
<usr>
Update based on Commments
If you need entire path based before "distributions", You can use
var result = str.Split(new []{#"distributions"},StringSplitOptions.RemoveEmptyEntries)
.First();
Output
/<root>/win/<usr>/

string.split('/') will put each "component" of the path (or any string) in an array splitting them by delimiter (/ here). you could then loop through it.

Assuming you do want to get the path up until that point i would recommend using regex here is how i would do it.
Regex regex = new Regex(#".+?(?=distributions)");
Debug.WriteLine(regex.Match("/<root>/win/<usr>/distributions/<dbms>/<repository>").Value);
this outputs
/<root>/win/<usr>/

What is the problem with the good old way?
var s = "/<root>/win/<usr>/distributions/<dbms>/<repository>/<port.....";
var result = s.Substring(0, s.IndexOf("distributions"));
or s.Substring(0, s.IndexOf("/distributions/")+1) if that text might appeare in other form too...

Reading in a text file more 'intelligently'

I have a text file which contains a list of alphabetically organized variables with their variable numbers next to them formatted something like follows:
aabcdef 208
abcdefghijk 1191
bcdefga 7
cdefgab 12
defgab 100
efgabcd 999
fgabc 86
gabcdef 9
h 11
ijk 80
...
...
I would like to read each text as a string and keep it's designated id# something like read "aabcdef" and store it into an array at spot 208.
The 2 issues I'm running into are:
I've never read from file in C#, is there a way to read, say from
start of line to whitespace as a string? and then the next string as
an int until the end of line?
given the nature and size of these files I do not know the highest ID value of each file (not all numbers are used so some
files could house a number like 3000, but only actually list 200
variables) So how could I make a flexible way to store these
variables when I don't know how big the array/list/stack/etc.. would
need to be.

Basically you need a Dictionary instead of an array or list. You can read all lines with File.ReadLines method then split each of them based on space and \t (tab), like this:
var values = File.ReadLines("path")
.Select(line => line.Split(new [] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries))
.ToDictionary(parts => int.Parse(parts[1]), parts => parts[0]);
Then values[208] will give you aabcdef. It looks like an array doesn't it :)
Also make sure you have no duplicate numbers because Dictionary keys should be unique otherwise you will get an exception.

I've been thinking about how I would improve other answers and I've found this alternative solution based on Regex which makes the search into the whole string (either coming from a file or not) safer.
Check that you can alter the whole regular expression to include other separators. Sample expression will detect spaces and tabs.
At the end of the day, I found that MatchCollection returns a safer result, since you always know that 3rd group is an integer and 2nd group is a text because regular expression does a lot of checking for you!
StringBuilder builder = new StringBuilder();
builder.AppendLine("djdodjodo\t\t3893983");
builder.AppendLine("dddfddffd\t\t233");
builder.AppendLine("djdodjodo\t\t39838");
builder.AppendLine("djdodjodo\t\t12");
builder.AppendLine("djdodjodo\t\t444");
builder.AppendLine("djdodjodo\t\t5683");
builder.Append("djdodjodo\t\t33");
// Replace this line with calling File.ReadAllText to read a file!
string text = builder.ToString();
MatchCollection matches = Regex.Matches(text, #"([^\s^\t]+)(?:[\s\t])+([0-9]+)", RegexOptions.IgnoreCase | RegexOptions.Multiline);
// Here's the magic: we convert an IEnumerable<Match> into a dictionary!
// Check that using regexps, int.Parse should never fail because
// it matched numbers only!
IDictionary<int, string> lines = matches.Cast<Match>()
.ToDictionary(match => int.Parse(match.Groups[2].Value), match => match.Groups[1].Value);
// Now you can access your lines as follows:
string value = lines[33]; // <-- By value
Update:
As we discussed in chat, this solution wasn't working in some actual use case you showed me, but it's not the approach what's not working but your particular case, because keys are "[something].[something]" (for example: address.Name).
I've changed given regular expression to ([\w\.]+)[\s\t]+([0-9]+) so it covers the case of key having a dot.
It's about improving the matching regular expression to fit your requirements! ;)
Update 2:
Since you told me that you need keys having any character, I've changed the regular expression to ([^\s^\t]+)(?:[\s\t])+([0-9]+).
Now it means that key is anything excepting spaces and tabs.
Update 3:
Also I see you're stuck in .NET 3.0 and ToDictionary was introduced in .NET 3.5. If you want to get the same approach in .NET 3.0, replace ToDictionary(...) with:
Dictionary<int, string> lines = new Dictionary<int, string>();
foreach(Match match in matches)
{
lines.Add(int.Parse(match.Groups[2].Value), match.Groups[1].Value);
}

What is the most efficient collection class in C# for string search

string[] words = System.IO.File.ReadAllLines("word.txt");
var query = from word in words
where word.Length > "abe".Length && word.StartsWith("abe")
select word;
foreach (var w in query.AsParallel())
{
Console.WriteLine(w);
}
Basically the word.txt contains 170000 English words. Is there a collection class in C# that is faster than array of string for the above query? There will be no insert or delete, just search if a string starts with "abe" or "abdi".
Each word in the file is unique.
EDIT 1 This search will be performed potentially millions of times in my application. Also I want to stick with LINQ for collection query because I might need to use aggregate function.
EDIT 2 The words from the file are sorted already, the file will not change

myself I'd create a Dictionary<char, List<string>>, where I'd group words by their first letter. This will reduce substantially the lookup of needed word.

If you need to do search once there is nothing better than linear search - array is perfectly fine for it.
If you need to perform repeated searches you can consider soring the array (n Log n) and search by any prefix will be fast (long n). Depending on type of search using dictionary of string lists indexed by prefix may be another good option.

If you search much often than you change a file with words. You can sort words in file every time you change list. After this you can use bisectional search. So you will have to make up to 20 comparisons to find any word witch match with your key and some additional comparisons of neighborhood.

Search file by filter with regex

I need to find a file(s) that begin with the character "prft" the name of this files is "prft0000.140", "prft2100.140", "prft1258.140"... etc. And I need to verify if this file(s) exists in a directory specific. So I have this Regex for find them, but I don't know how write the filter to match.
List<string> prftFiles = (new DirectoryInfo(filePath))
.GetFiles(".", SearchOption.AllDirectories)
.Where(a => Regex.IsMatch(a.Name, "prft[^*]$"))
.Select(fi => fi.Name)
.ToList();
this not work "prft[^*]$", so, How is it??

why not just do List prftFiles = (new DirectoryInfo(filePath)).GetFiles("prft*", SearchOption.AllDirectories)

This is regex you could use
string pattern = #"^(prft\d{4}\.\d{3})$";
but you can find files by the wildcard and * like other guys said
if you want exactly math for patter prft 4 digits . 3 digits you should use the regex
because the prft* will find any files with name starts with prft

You actually don't need to use a Regex here, as the Directory class has a searching mechanism in the pattern you select.
Directory.GetFiles("C:\SomeDirectory", "prft*");
The * widlcard matches to anything.

C# Regex: Get group names?

myRegex.GetGroupNames()
Seems to return the numbered groups as well... how do I get only the named ones?
A solution using the actual Match object would be fine as well.

Does using the RegexOptions.ExplicitCapture option when creating the regex do what you want ? e.g.
Regex theRegex = new Regex(#"\b(?<word>\w+)\b", RegexOptions.ExplicitCapture);
From MSDN:
Specifies that the only valid captures
are explicitly named or numbered
groups of the form (?<name>...). This
allows unnamed parentheses to act as
noncapturing groups without the
syntactic clumsiness of the expression
(?:...).
So you won't need to worry about whether users of your API may or may not use non-capturing groups if this option is set.

See the other comments/answers about using (?:) and/or sticking with "one style". Here is my best approach that tries to directly solve the question:
var named = regex.GetGroupNames().Where(x => !Regex.IsMatch(x, "^\\d+$"));
However, this will fail for regular expressions like (?<42>...).
Happy coding.

public string[] GetGroupNames(Regex re)
{
var groupNames = re.GetGroupNames();
var groupNumbers = re.GetGroupNumbers();
Contract.Assert(groupNames.Length == groupNumbers.Length);
return Enumerable.Range(0, groupNames.Length)
.Where(i => groupNames[i] != groupNumbers[i].ToString())
.Select(i => groupNames[i])
.ToArray();
}
Actually, this will still fail when the group name and number happen to be the same :\ But it will succeed even when the group name is a number, as long as the number is not the same as its index.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

File sequence > Find Name Pattern - c#

Yes, regex is to rescue: var r = new Regex(#".+(\d{2,})."); var grouped = files. OrderBy(f => f.Name). GroupBy(f => r.Match(f.Name).Groups[0].Value);

Related

How to extract preceding strings in a given directory

Reading in a text file more 'intelligently'

What is the most efficient collection class in C# for string search

Search file by filter with regex

C# Regex: Get group names?

Categories

Resources