I need to get all files with prefix 009 from a server path.
But my code retrieving all files with 0000 prefix not specifically that starts with 009.
For example, I have files "000028447_ ghf.doc","0000316647 abcf.doc","009028447_ test2.doc","abcd.doc".
string [] files =Directory.GetFiles(filePath,"009*.doc)
is giving me all files except "abcd.doc". But I need "009028447_ test2.doc" instead.
If im giving Directory.GetFiles(filePath,"ab*.doc) it will retrieve "abcd.doc", and working as fine.But When im trying to give a pattern like "009"or "00002" it wont work as expected.
Your code snippet is missing a closing quote-character in the pattern. The code should be:
string[] files = Directory.GetFiles(filePath, "009*.doc");
Other than that, it seems to be working as intended. I've tested this by creating a folder with the files you mention in the question:
Next I created a console application, which uses your code to find the files, and prints all the results to the console. The output is the expected result:
C:\testfolder\009028447_ test2.doc
Here is the entire code for the console application:
using System;
using System.IO;
class Program
{
static void Main(string[] args)
{
string filePath = #"C:\testfolder";
string[] files = Directory.GetFiles(filePath, "009*.doc");
// Creates a string with all the elements of the array, separated by ", "
string matchingFiles = string.Join(", ", files);
Console.WriteLine(matchingFiles);
// Since there is only one matching file, the above line only prints:
// C:\testfolder\009028447_ test2.doc
}
}
In conclusion, the code works. If you are getting other results, there must be other differences in your setup or code that you haven't mentioned.
If (and I did not check,) it is true that you are only receiving the wrong Files you should maybe use a foreach or linq to check if the Files match your criteria:
Foreach:
List<string> arrPaths = new List<string>();
Foreach(string strPath in Directory.GetFiles(filePath,".doc"))
{
if(strPath.EndsWith(".doc") & strPath.StartsWith("009"))
arrPaths.Add(strPath);
}
Linq:
List<string> arrPaths = Directory.GetFiles(filePath,".doc").Where(pths => pths.StartsWith("009") && pths.EndsWith(".doc")).ToList();
Both ways are more a workaround than a real solution, but I hope they're helping:)
EDIT
If you want to only get the Filenames i would subtract the filePath from your strPath like this:
Foreach:
arrPaths.Add(strPath.Replace(filePath + "\\", ""));
Linq:
List<string> arrPaths = Directory.GetFiles(filePath,".doc").Where(pt => pt.StartsWith("009") && pths.EndsWith(".doc")).Select(pths => pths.ToString().Replace(filePath + "\\", "").ToList();
Related
I have a huge directory I need retrieve files from including subdirectories.
I have files that are folders contain various files but I am only interested in specific proprietary files named with an extension with a length of 7 digits.
For example, I have folder that contains the following files:
abc.txt
def.txt
GIWFJ1XA.0201000
GIWFJ1UC.0501000
NOOBO0XA.0100100
summary.pdf
someinfo.zip
T7F4JUXA.0300600
vxy98796.csv
YJHLPLBO.0302300
YJHLPLUC.0302800
I have tried the following:
var fileList = Directory.GetFiles(someDir, "*.???????", SearchOption.AllDirectories)
and also
string searchSting = string.Empty;
for (int j = 0; j < 9999999; j++)
{
searchSting += string.Format(", *.{0} ", j.ToString("0000000"));
}
var fileList2 = Directory.GetFiles(someDir, searchSting, SearchOption.AllDirectories);
which errors because the string is too long obviously.
I want to only return the files with the specified length of the extension, in this case, 7 digits to avoid having to loop over the thousands I would have to process.
I have considered creating a variable string for the search criteria that would contain all 99,999,999 possible digits but d
How can I accomplish this?
I don't believe there's a way you can do this without looping through the files in the directory and its subfolders. The search pattern for GetFiles doesn't support regular expressions, so we can't really use something like [\d]{7} as a filter. I would suggest using Directory.EnumerateFiles and then return the files that match your criteria.
You can use this to enumerate the files:
private static IEnumerable<string> GetProprietaryFiles(string topDirectory)
{
Func<string, bool> filter = f =>
{
string extension = Path.GetExtension(f);
// is 8 characters long including the .
// all remaining characters are digits
return extension.Length == 8 && extension.Skip(1).All(char.IsDigit);
};
// EnumerateFiles allows us to step through the files without
// loading all of the filenames into memory at once.
IEnumerable<string> matchingFiles =
Directory.EnumerateFiles(topDirectory, "*", SearchOption.AllDirectories)
.Where(filter);
// Return each file as the enumerable is iterated
foreach (var file in matchingFiles)
{
yield return file;
}
}
Path.GetExtension includes the . so we check that the number of characters including the . is 8, and that all remaining characters are digits.
Usage:
List<string> fileList = GetProprietaryFiles(someDir).ToList();
I would just grab the list of files in the directory, and then check if the substring length after the '.' is equal to 7. (* As long as you know no other files would have that length extension)
EDITED to use Path instead:
Directory.GetFiles(#"C:\temp").Where(
fileName => Path.GetExtension(fileName).Length == 8
).ToList();
OLD:
Directory.GetFiles(someDir).Where(
fileName => fileName.Substring(fileName.LastIndexOf('.') + 1).Length == 7
).ToList();
Consider files as Directory.GetFiles() result.
using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
List<string> files = new List<string>()
{"abc.txt", "def.txt", "GIWFJ1XA.0201000", "GIWFJ1UC.0501000", "NOOBO0XA.0100100", "summary.pdf", "someinfo.zip", "T7F4JUXA.0300600", "vxy98796.csv", "YJHLPLBO.0302300", "YJHLPLUC.0302800"};
Regex r = new Regex("^\\.\\d{7}$");
foreach (string file in files.Where(o => r.IsMatch(Path.GetExtension(o))))
{
Console.WriteLine(file);
}
}
}
Output:
GIWFJ1XA.0201000
GIWFJ1UC.0501000
NOOBO0XA.0100100
T7F4JUXA.0300600
YJHLPLBO.0302300
YJHLPLUC.0302800
Edit: I tried (r.IsMatch) instead of using o but dotnetfiddle Compiler is giving me error saying
Compilation error (line 14, col 27): The call is ambiguous between the following methods or properties: 'System.Linq.Enumerable.Where<string>(System.Collections.Generic.IEnumerable<string>, System.Func<string,bool>)' and 'System.Linq.Enumerable.Where<string>(System.Collections.Generic.IEnumerable<string>, System.Func<string,int,bool>)'
Can't debug it since I am busy now, I'd be happy if anyone passing by suggest any fix for that. But the current code above works.
I have a text file with several lines and a list of approved characters that can be used. If there are any characters in a line that are not on the approved list, the entire line needs to be deleted.
How can I go about completing this? C# would be the ideal, but Python, PowerShell or JS would be helpful as well.
Example approved characters: abcdefg
Valid: abc
Invalid: abc1
For my program I want the following list of approved characters:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890#^,;.
After sorting the contents I want it to write them back to the file (without the invalid lines).
Here's a program that filters out all lines that contain invalid characters where args[0] is the input file and args[1] is the output file.
class Program
{
public static async Task Main(string[] args)
{
const string AllowedChars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890#^,;.";
var lines = File.ReadAllText(args[0]);
using StreamWriter outfile = new (args[1]);
foreach (string line in lines)
if (line.All(x => AllowedChars.Contains(x)))
await file.WriteLineAsync(line);
}
}
You can try using Linq in order to query the file:
using System.IO;
using System.Linq;
...
// HashSet<T> is more efficient than List<T> for Contains: O(1) vs. O(N)
HashSet<char> allowed = new HashSet<char>(
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890#^,;."
);
string fileName = #"c:\MyFile.txt";
var clearedLines = File
.ReadLines(fileName)
.Where(line => line.All(letter => allowed.Contains(letter)))
.ToArray(); // Since we have to write back, we have to materialize the data
File.WriteAllLines(fileName, clearedLines);
I have a folder location corresponding to the variable "path".
In this folder, I have a lot of files, but only one called "common.build.9897ytyt4541". What I want to do is to read the content of this file, so with the following, it's working :
string text = File.ReadAllText(Path.Combine(path, "common.build.9897ytyt4541.js"));
The problem is, that the part between "build" and "js", change at each source code compilation, and I am getting a new hash, so I would like to replace the previous code, to have something working at each build, whatever the hash is, I thought to regex but this is not working :
string text = File.ReadAllText(Path.Combine(path, #"common.build.*.js"));
Thanks in advance for your help
If you know you'll only find one file you can write something like this (plus error handling):
using System.Linq;
...
var filePath = Directory.GetFiles(path, "common.build.*.js").FirstOrDefault();
string text = File.ReadAllText(filePath);
No, you need to use the exact filename with using File.ReadAllText. Instead, you need to search for the file, for this you can use Directory.GetFiles, for example:
var matches = Directory.GetFiles(path, "common.build.*.js");
if(matches.Count() == 0)
{
//File not found
}
else if(matches.Count() > 1)
{
//Multiple matches found
}
else
{
string text = File.ReadAllText(matches[0]);
}
I want to add data into a text file based on a specific output, it will read an XML file and write a certain line to a text file. If the data is already written into the text file, i dont want to write it again.
Code:
public void output(string folder)
{
string S = "Data" + DateTime.Now.ToString("yyyyMMddHHmm") + ".xml";
//Trades.Save(S);
string path = Path.Combine(folder, S);
Console.WriteLine(path);
XDocument f = new XDocument(Trades);
f.Save(path);
string[] lines = File.ReadAllLines(path);
File.WriteAllLines(path, lines);
using (System.IO.StreamWriter file = new System.IO.StreamWriter(#"H:\Test" + DateTime.Now.ToString("yyMMdd") + ".txt", true))
{
foreach (string line in lines)
{
if (line.Contains("CertainData"))
{
file.WriteLine(line);
if (File.ReadAllLines(path).Any(x => x.Equals(line)))
{
}
else
{
string[] tradeRefLines = File.ReadAllLines(path);
File.WriteAllLines(path, tradeRefLines); ;
}
}
}
}
}
The problem is it will still write the line even if the data is exactly the same elsewhere. I don't want duplicate lines
Any advice?
CLARIFICATION UPDATE
The "CertainData" is a reference number
I have a bunch of files that has data in it and the piece i want to seperate and put into a text file is "CertainData" field, which will have a reference number
Sometimes the files i get sent will have the same formatted information inside it with the "CertainData" appearing in their for reference
When i run this programme, if the text file i have already contains the "CertainData" reference number inside it, i dont want it to be written
If you need anymore clarification let me know and i will update the post
I think you want this: read all lines, filter out those containing a keyword and write it to a new file.
var lines = File.ReadAllLines(path).ToList();
var filteredLines = lines.Where(!line.Contains("CertainData"));
File.WriteAllLines(path, filteredLines);
If you also want to remove duplicate lines, you can add a distinct like this:
filteredLines = filteredLines.Distinct();
Why you don't use Distinct before for loop. This will filter your lines before write in file.
Try something like this
string[] lines = new string[] { "a", "b", "c", "a" };
string[] filterLines = lines.Distinct().ToArray<string>();
I'm trying to rename files that my program lists as having "illegal characters" for a SharePoint file importation. The illegal characters I am referring to are: ~ # % & * {} / \ | : <> ? - ""
What i'm trying to do is recurse through the drive, gather up a list of filenames and then through Regular Expressions, pick out file names from a List and try to replace the invalid characters in the actual filenames themselves.
Anybody have any idea how to do this? So far i have this: (please remember, i'm a complete n00b to this stuff)
class Program
{
static void Main(string[] args)
{
string[] files = Directory.GetFiles(#"C:\Documents and Settings\bob.smith\Desktop\~Test Folder for [SharePoint] %testing", "*.*", SearchOption.AllDirectories);
foreach (string file in files)
{
Console.Write(file + "\r\n");
}
Console.WriteLine("Press any key to continue...");
Console.ReadKey(true);
string pattern = " *[\\~#%&*{}/:<>?|\"-]+ *";
string replacement = " ";
Regex regEx = new Regex(pattern);
string[] fileDrive = Directory.GetFiles(#"C:\Documents and Settings\bob.smith\Desktop\~Test Folder for [SharePoint] %testing", "*.*", SearchOption.AllDirectories);
StreamWriter sw = new StreamWriter(#"C:\Documents and Settings\bob.smith\Desktop\~Test Folder for [SharePoint] %testing\File_Renames.txt");
foreach(string fileNames in fileDrive)
{
string sanitized = regEx.Replace(fileNames, replacement);
sw.Write(sanitized + "\r\n");
}
sw.Close();
}
}
So what i need to figure out is how to recursively search for these invalid chars, replace them in the actual filename itself. Anybody have any ideas?
When you are working recursively with files and directories, many times it's easier to use the DirectoryInfo class and it's members instead of the static methods. There is a pre-built tree structure for you, so you don't have to manage that yourself.
GetDirectories returns more DirectoryInfo instances so you can walk the tree, while GetFiles returns FileInfo objects.
This guy created a custom iterator to recursively yield file info, which when you combine it with your existing regex work, would complete your solution.
File.Move() effectively renames files. Basically, you'll just need to
File.Move(fileNames, sanitized);
inside the latter loop.
ALERT - possibly there'll be duplicate file names, so you'll have to establish a policy to avoid this, like appending a counter at the end of the sanitized variable. Also, apply a proper exception handling.
PS: Certainly, you don't need to search for characters like :\*.