I have thousands of .log files and I need to find some string in all of the files.
I will explain with example: on all of .log files I have string calles "AAA" and after that string I have anumber that can be diffrenet from one log file to other log file. I know how to search the AAA string. what I dont knew is how to crop only the string number that is after the AAA string.
update:
the .log file containes a lot of lines.
on the .log file I have only 1 line that contains the string "A12A".
after that line I have anumber (for examle: 5465).
what I need is to extract the number after the A12A.
note: there is a spacing between the A12A to the 5465 string number.
example:
.log file : "assddsf dfdfsd dfd A12A 5465 dffdsfsdf dfdf dfdf "
what I need to extract: 5465.
what I have so far is:
// Modify this path as necessary.
string startFolder = #"c:\program files\Microsoft Visual Studio 9.0\";
// Take a snapshot of the file system.
System.IO.DirectoryInfo dir = new System.IO.DirectoryInfo(startFolder);
// This method assumes that the application has discovery permissions
// for all folders under the specified path.
IEnumerable<System.IO.FileInfo> fileList = dir.GetFiles("*.*", System.IO.SearchOption.AllDirectories);
string searchTerm = #"Visual Studio";
// Search the contents of each file.
// A regular expression created with the RegEx class
// could be used instead of the Contains method.
// queryMatchingFiles is an IEnumerable<string>.
var queryMatchingFiles =
from file in fileList
where file.Extension == ".htm"
let fileText = GetFileText(file.FullName)
where fileText.Contains(searchTerm)
select file.FullName;
// Execute the query.
Console.WriteLine("The term \"{0}\" was found in:", searchTerm);
foreach (string filename in queryMatchingFiles)
{
Console.WriteLine(filename);
}
// Keep the console window open in debug mode.
Console.WriteLine("Press any key to exit");
Console.ReadKey();
}
// Read the contents of the file.
static string GetFileText(string name)
{
string fileContents = String.Empty;
// If the file has been deleted since we took
// the snapshot, ignore it and return the empty string.
if (System.IO.File.Exists(name))
{
fileContents = System.IO.File.ReadAllText(name);
}
return fileContents;
}
I would recommend the following code for doing the search itself:
private static readonly string _SearchPattern = "A12A";
private static readonly Regex _NumberExtractor = new Regex(#"\d+");
private static IEnumerable<Tuple<String, int>> FindMatches()
{
var startFolder = #"D:\";
var filePattern = #"*.htm";
var matchingFiles = Directory.EnumerateFiles(startFolder, filePattern, SearchOption.AllDirectories);
foreach (var file in matchingFiles)
{
// What encoding do your files use?
var lines = File.ReadLines(file, Encoding.UTF8);
foreach (var line in lines)
{
int number;
if (TryGetNumber(line, out number))
{
yield return Tuple.Create(file, number);
// Stop searching that file and continue with the next one.
break;
}
}
}
}
private static bool TryGetNumber(string line, out int number)
{
number = 0;
// Should casing be ignored??
var index = line.IndexOf(_SearchPattern, StringComparison.InvariantCultureIgnoreCase);
if (index >= 0)
{
var numberRaw = line.Substring(index + _SearchPattern.Length);
var match = _NumberExtractor.Match(numberRaw);
return Int32.TryParse(match.Value, out number);
}
return false;
}
The reasons are that when doing I/O operations, the drive itself is normally the bottleneck. So it doesn't make sense to do anything in parallel or to read a lot of data from the file into memory without using it.
By using the Directory.EnumerateFiles method the drive will be searched lazily and you can start examining the first file, right after it was found. The same holds true for the File.ReadLines method. It lazily iterates through the file while you are searching for your pattern.
Through this approach you should get a maximum speed (depending on your hard-drive performance) cause it makes the minimum needed I/O calls needed to get the files and content to your code.
Related
I was wondering if someone could assist or point me in the right direction to move files where part of the filename needs to be matched to part of the foldername for example:
Moving filename Cust-10598.txt to a folder named John-Doe-10598 Is this possible?
I was able to create all the folders inside the root directory where all the files are contained, now I would like to sort them and put each of them inside the matching folder.
Any help or ideas are highly appreciated
Assuming you already have a list of probably folders using Directory.GetDirectores(),
var listOfFolders = Directory.GetDirectories(basePath);
You can find the associated Folder for given filename using following method.
string GetAssociatedDirectory(string fileName,IEnumerable<string> folderNames)
{
Regex regEx = new Regex(#"Cust-(?<Id>[\d]*)",RegexOptions.Compiled);
Match match = regEx.Match(fileName);
if (match.Success)
{
var customerId = match.Groups["Id"].Value;
if(folderNames.Any(folder=>folder.EndsWith($"-{customerId}")))
{
return folderNames.First(folder=>folder.EndsWith(customerId));
}
else
{
throw new Exception("Folder not found");
}
}
throw new Exception("Invalid File Name");
}
You can then use File.Move to copy the file to destination directory
You could simply Split() on '-' if it's that simple of naming convention.
class Program
{
static void Main(string[] args)
{
var file = "Cust-10598.txt";
var fileSplit = file.Split('-');
var sourceDir = #"C:\";
var destFolder = "{name of destination folder}-" + Path.GetFileNameWithoutExtension(fileSplit[1]);
var destPath = #"C:\newpath";
File.Move(Path.Combine(source, file), Path.Combine(destPath, destFolder, file));
}
}
I'm using this to find and replace exe files
public static void Replace()
{
string origFile = Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory) + "/Virus.exe"; //original file
IEnumerable<string> toOverwrite = EnumerateFiles(); //newly enumarated files
string backupFile = Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory) + "/temp"; //backup file (temporary)
File.Replace (origFile, toOverwrite, backupFile); //replaces files
}
public static IEnumerable<string> EnumerateFiles()
{
string pathToSearch = Environment.GetFolderPath (Environment.SpecialFolder.DesktopDirectory); //Sets search directory to the desktop
IEnumerable<string> exeFiles = Directory.EnumerateFiles (pathToSearch); //Searches desktop for all exe files
return exeFiles; //Returns enumerated list of files
}
but when I do the File.Replace method, it tells me that I can't convert IEnumerable String to a normal String. How can I change the string type without changing the value?
Thats beacuse IEnumeable is a bunch of strings. File.Replace wants one string. What you are trying to do doesnt make sense. I suspect you need to loop over the files in toOverwrite collection
public static void Replace()
{
string origFile = Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory) + "/Virus.exe"; //original file
IEnumerable<string> toOverwrite = EnumerateFiles(); //newly enumarated files
string backupFile = Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory) + "/temp"; //backup file (temporary)
foreach(var file in toOverwrite)
{
File.Replace (origFile, file, backupFile); //replaces files
}
}
Using this article from MSDN, I'm trying to search through files in a directory. The problem is, every time I execute the program, I get:
"An unhandled exception of type 'System.OutOfMemoryException' occurred in mscorlib.dll".
I have tried to some other options like StreamReader, but I can't get it to work. These files are HUGE. Some of them range in upwards to 1.5-2GB each and there could be 5 or more files per day.
This code fails:
private static string GetFileText(string name)
{
var fileContents = string.Empty;
// If the file has been deleted since we took
// the snapshot, ignore it and return the empty string.
if (File.Exists(name))
{
fileContents = File.ReadAllText(name);
}
return fileContents;
}
Any ideas what could be happening or how to make it read without memory errors?
Entire code (in case you don't want to open the MSDN article)
class QueryContents {
public static void Main()
{
// Modify this path as necessary.
string startFolder = #"c:\program files\Microsoft Visual Studio 9.0\";
// Take a snapshot of the file system.
System.IO.DirectoryInfo dir = new System.IO.DirectoryInfo(startFolder);
// This method assumes that the application has discovery permissions
// for all folders under the specified path.
IEnumerable<System.IO.FileInfo> fileList = dir.GetFiles("*.*", System.IO.SearchOption.AllDirectories);
string searchTerm = #"Visual Studio";
// Search the contents of each file.
// A regular expression created with the RegEx class
// could be used instead of the Contains method.
// queryMatchingFiles is an IEnumerable<string>.
var queryMatchingFiles =
from file in fileList
where file.Extension == ".htm"
let fileText = GetFileText(file.FullName)
where fileText.Contains(searchTerm)
select file.FullName;
// Execute the query.
Console.WriteLine("The term \"{0}\" was found in:", searchTerm);
foreach (string filename in queryMatchingFiles)
{
Console.WriteLine(filename);
}
// Keep the console window open in debug mode.
Console.WriteLine("Press any key to exit");
Console.ReadKey();
}
// Read the contents of the file.
static string GetFileText(string name)
{
string fileContents = String.Empty;
// If the file has been deleted since we took
// the snapshot, ignore it and return the empty string.
if (System.IO.File.Exists(name))
{
fileContents = System.IO.File.ReadAllText(name);
}
return fileContents;
}
}
The problem you're having is based on trying to load multiple gigabytes of text at the same time. If they're text files, you can stream them and just compare one line at a time.
var queryMatchingFiles =
from file in fileList
where file.Extension == ".htm"
let fileLines = File.ReadLines(file.FullName) // lazy IEnumerable<string>
where fileLines.Any(line => line.Contains(searchTerm))
select file.FullName;
I would suggest that you are getting an out of memory error because the way the query is written I believe that you will need to load the entire text of every file into memory and none of the objects can be released until the entire file set has been loaded. Could you not check for the search term in the GetFileText function and then just return a true or false?
If you did that the file text at least falls out of scope at the end of the function and the GC can recover the memory. It would actually be better to rewrite as a streaming function if you are dealing with large files/amounts then you could exit your reading early if you come across the search term and you wouldn't need the entire file in memory all the time.
Previous question on finding a term in an HTML file using a stream
So basically i am making an app that will sync file types is different ways, I want to search the whole of a logical Drive for example C:\ for all text files.How ever once i find all the text files i want to apply an action for example move all text files to one location or email all text files to the users email.
I have found this code from a past Stack overflow post
public List<string> Search()
{
var files = new List<string>();
foreach (DriveInfo d in DriveInfo.GetDrives().Where(x => x.IsReady))
{
try
{
files.AddRange(Directory.GetFiles(d.RootDirectory.FullName, "*.txt", SearchOption.AllDirectories));
}
catch(Exception e)
{
Logger.Log(e.Message); // Log it and move on
}
}
return files;
}
But what i want to know is how do i do somthing when i find the files ?
The code you posted looks like it should fill List<string> files with strings representing names of files that have a .txt extension.
It should be as simple as iterating over the value returned from the function and doing as you please with them.
This code should (untested) check for a target directory, create it if it doesn't exist, and then copy each file returned from Search() to the target path.
List<string> results = Search();
String targetPath = "C:/TargetDirectory/";
if (!System.IO.Directory.Exists(targetPath))
System.IO.Directory.CreateDirectory(targetPath);
foreach (string aFileStr in results)
{
String sourceFile = aFileStr;
String destFile = Path.Combine(targetPath, Path.GetFileName(aFileStr));
System.IO.File.Copy(sourceFile, destFile, true);
}
You would do a foreach on the list of strings that that function returns.
I'm not quite sure if I understand you correctly. If you just want to know how to process your filelist, you could for instance do the following:
var filelist = Search();
foreach (var s in filelist) {
string fn = System.IO.Path.GetFileName(s);
string dest = System.IO.Path.Combine("c:\\tmp", fn);
System.IO.File.Copy(s, dest, true);
}
which will copy all files in filelist to c:\tmp and overwrite files with equal filename.
I am having a problem writing the files in folders and subfolders .
For Example:- test is the main folder
1) C:\test\
and i want to read and write the subfolder files
2)C:\test\12-05-2011\12-05-2011.txt
3)C:\test\13-05-2011\13-05-2011.txt
4)C:\test\14-05-2011\14-05-2011.txt
My code is:
private void button1_Click(object sender, EventArgs e)
{
const string Path1 = #"C:\test";
DoOnSubfolders(Path1);
try
{
StreamReader reader1 = File.OpenText(Path1);
string str = reader1.ReadToEnd();
reader1.Close();
reader1.Dispose();
File.Delete(Path1);
string[] Strarray = str.Split(new char[] { Strings.ChrW(10) });
int abc = Strarray.Length - 2;
int xyz = 0;
while (xyz <= abc)
}
I am getting an error. The error is
Access to the path 'C:\test' is denied.
Can anyone say me what i need to change in this code?
At first you could flatten your recursive calls by calling DirectoryInfo.GetFiles(string, SearchOption) and setting the SearchOption to AllDirectories.
What's also a common mistake (but not clear from your question) is that a directory needs to be created, before you can create a file. Simply call Directory.CreateDirectory(). And put in the complete path (without filename) into it. It will automatically do nothing if the directory already exists and is also able to create the whole needed structure. So no checks or recursive calls are needed (maybe a try-catch if you don't have write access).
Update
So here is an example that reads in a file, does some conversion on each line and writes the result into a new file. If this works properly the original file will be replaced by the converted one.
private static void ConvertFiles(string pathToSearchRecursive, string searchPattern)
{
var dir = new DirectoryInfo(pathToSearchRecursive);
if (!dir.Exists)
{
throw new ArgumentException("Directory doesn't exists: " + dir.ToString());
}
if (String.IsNullOrEmpty(searchPattern))
{
throw new ArgumentNullException("searchPattern");
}
foreach (var file in dir.GetFiles(searchPattern, SearchOption.AllDirectories))
{
var tempFile = Path.GetTempFileName();
// Use the using statement to make sure file is closed at the end or on error.
using (var reader = file.OpenText())
using (var writer = new StreamWriter(tempFile))
{
string line;
while (null != (line = reader.ReadLine()))
{
var split = line.Split((char)10);
foreach (var item in split)
{
writer.WriteLine(item);
}
}
}
// Replace the original file be the converted one (if needed)
////File.Copy(tempFile, file.FullName, true);
}
}
In your case you could call this function
ConvertFiles(#"D:\test", "*.*")
To recursively walk the sub-folders, you need a recursive function ie. One that calls itself. here is an example that should be enough for you to work with:
static void Main(string[] args)
{
const string path = #"C:\temp\";
DoOnSubfolders(path);
}
private static void DoOnSubfolders(string rootPath)
{
DirectoryInfo d = new DirectoryInfo(rootPath);
FileInfo[] fis = d.GetFiles();
foreach (var fi in fis)
{
string str = File.ReadAllText(fi.FullName);
//do your stuff
}
DirectoryInfo[] ds = d.GetDirectories();
foreach (var info in ds)
{
DoOnSubfolders(info.FullName);
}
}
You need use class Directory info and FileInfo.
DirectoryInfo d = new DirectoryInfo("c:\\test");
FileInfo [] fis = d.GetFiles();
DirectoryInfo [] ds = d.GetDirectories();
Here's a quick one liner to write the contents of all text files in a given directory (and all subdirectories) to the console:
Directory.GetFiles(myDirectory,"*.txt*",SearchOption.AllDirectories)
.ToList()
.ForEach(a => Console.WriteLine(File.ReadAllText(a)));
This code:
const string Path1 = #"C:\test";
StreamReader reader1 = File.OpenText(Path1);
Says open "c:\test" as a text file... The error you're getting is:
Access to the path 'C:\test' is denied
You're getting the error because as you stated above, 'c:\test' is a folder. You can't open folders like they are text files, hence the error...
A basic (full depth search) for files with a .txt extension looks like this:
static void Main(string[] args) {
ProcessDir(#"c:\test");
}
static void ProcessDir(string currentPath) {
foreach (var file in Directory.GetFiles(currentPath, "*.txt")) {
// Process each file (replace this with your code / function call /
// change signature to allow a delegate to be passed in... etc
// StreamReader reader1 = File.OpenText(file); // etc
Console.WriteLine("File: {0}", file);
}
// recurse (may not be necessary), call each subfolder to see
// if there's more hiding below
foreach (var subFolder in Directory.GetDirectories(currentPath)) {
ProcessDir(subFolder);
}
}
Have a look at http://support.microsoft.com/kb/303974 for a start. The secret is Directory.GetDirectories in System.IO.
You have to configure (NTFS) security on the c:\Test folder.
Normally you would have the application run under non-admininstrator account so the account that is running the program should have access.
If you are running on Vista or Windows 7 with UAC, you might be an administrator but you will not be using the administrative (elevated) permissions by default.
EDIT
Look at these lines:
const string Path1 = #"C:\test";
DoOnSubfolders(Path1);
try
{
StreamReader reader1 = File.OpenText(Path1);
That last line is trying to read the FOLDER 'c:\test' as if it was a text file.
You can't do that. What are you trying to accomplish there?