looking for way to read and search file fast in c#

looking for way to read and search file fast in c# - c#

I have 100Mb text file and I need to check every line for special word.
I am looking for fast way to do it.
so I divide the file to 10 :
public void ParseTheFile(BackgroundWorker bg)
{
Lines = File.ReadAllLines(FilePath);
this.size = Lines.Length;
chankSise=size/10;
reports reportInst = new reports(bg,size);
ParserThread [] ParserthreadArray = new ParserThread[10];
for (int i = 0; i <ParserthreadArray.Length; i++)
{
ParserthreadArray[i] = new ParserThread((reportInst));
ParserthreadArray[i].Init(SubArray(Lines,i * chankSise, chankSise), OutputPath);
}
Thread oThread0 = new Thread(ParserthreadArray[0].run);
oThread0.IsBackground = true;
Thread oThread1 = new Thread(ParserthreadArray[1].run);
oThread1.IsBackground = true;
Thread oThread2 = new Thread(ParserthreadArray[2].run);
oThread2.IsBackground = true;
Thread oThread3 = new Thread(ParserthreadArray[3].run);
oThread3.IsBackground = true;
Thread oThread4 = new Thread(ParserthreadArray[4].run);
oThread4.IsBackground = true;
Thread oThread5 = new Thread(ParserthreadArray[5].run);
oThread5.IsBackground = true;
Thread oThread6 = new Thread(ParserthreadArray[6].run);
oThread6.IsBackground = true;
Thread oThread7 = new Thread(ParserthreadArray[7].run);
oThread7.IsBackground = true;
Thread oThread8 = new Thread(ParserthreadArray[8].run);
oThread8.IsBackground = true;
Thread oThread9 = new Thread(ParserthreadArray[9].run);
oThread9.IsBackground = true;
oThread0.Start();
oThread1.Start();
oThread2.Start();
oThread3.Start();
oThread4.Start();
oThread5.Start();
oThread6.Start();
oThread7.Start();
oThread8.Start();
oThread9.Start();
oThread0.Join();
oThread1.Join();
oThread2.Join();
oThread3.Join();
oThread4.Join();
oThread5.Join();
oThread6.Join();
oThread7.Join();
oThread8.Join();
oThread9.Join();
this is the Init method:
public void Init(string [] olines,string outputPath)
{
Lines = olines;
OutputPath = outputPath+"/"+"ThreadTemp"+threadID;
}
this is the SubArray method:
public string [] SubArray(string [] data, int index, int length)
{
string [] result = new string[length];
Array.Copy(data, index, result, 0, length);
return result;
}
and each thread do this:
public void run()
{
if (!System.IO.Directory.Exists(OutputPath))
{
System.IO.Directory.CreateDirectory(OutputPath);
DirectoryInfo dir = new DirectoryInfo(OutputPath);
dir.Attributes |= FileAttributes.Hidden;
}
this.size = Lines.Length;
foreach (string line in Lines)
{
bgReports.sendreport(allreadychecked);
allreadychecked++;
hadHandlerOrEngine = false;
words = line.Split(' ');
if (words.Length>4)
{
for (int i = 5; i < words.Length; i++)
{
if (words[i] == "Handler" | words[i] == "Engine")
{
hadHandlerOrEngine = true;
string num = words[1 + i];
int realnum = int.Parse(num[0].ToString());
cuurentEngine = (realnum);
if (engineArry[realnum] == false)
{
File.Create(OutputPath + "/" + realnum + ".txt").Close();
engineArry[realnum] = true;
}
TextWriter tw = new StreamWriter(OutputPath + "/" + realnum + ".txt", true);
tw.WriteLine(line);
tw.Close();
break;
}
}
}
if (hadHandlerOrEngine == false)
{
if (engineArry[cuurentEngine] == true)
{
TextWriter tw = new StreamWriter(OutputPath + "/" + cuurentEngine + ".txt", true);
tw.WriteLine(line);
tw.Close();
}
}
}
my question is there any way to make this run faster

You haven't shown your Init method, but at the moment it looks like each of your threads will actually be checking all of the lines. Additionally, it looks like all of those may be trying to write to the same files - and not doing so in an exception-safe way (using using statements) either.
EDIT: Okay, so now we can see Init but we can't see SubArray. Presumably it just copies a chunk of the array.
How slow is this if you avoid using threads to start with? Is it definitely too slow? What is your performance target? It seems unlikely that using 10 threads is going to help though, given that at that point it's entirely memory/CPU-bound. (You should also try to avoid repeating so much code for starting all the threads - why aren't you using a collection for that?)

You are probably IO bound, so I'd guess that multiple threads aren't going to help much. (Odds are your program spends most of its time here: Lines = File.ReadAllLines(FilePath); and not that much time actually parsing. You should measure though.) In fact, your SubArray splitting is possibly slower than if you just passed the whole thing to a single parser thread.
I would be looking at MemoryMappedFile (if this is .NET 4) which should help some with IO by not having to make copies of all the source data.

I would like to recommend something which may be useful. As someone said, there is no point if you assign multiple thread read your file since this is more of I/O activity which in this case get queued up in OS FileManager. But definitely you can place an async I/O request for any available I/O completion thread to look after.
Now when it comes to processing the file, I would recommend you use Memory-mapped files
. Memory-mapped files are ideal for scenarios where an arbitrary chunk file ( view) of a considerably larger file needs to be accessed repeatedly/separately. In your scenario, memory-mapped files can help you split/assemble the file if the chunks arrive/process out of order.
I have no handy examples at the moment. Have a look at the following article Memory Mapped Files.

Related

Slow String Formatting in C# when using more than a few lines

I have created a process which reads a "template" text file and then based on the String.Format requirements uses the tokens to place my custom text in.
So, everything works, but the process is slow.
The template file can have about 500-1000 lines; I am looking for a way to speed this process up.
Any ideas?
Here is my code below:
templateFilePath = System.IO.Path.GetDirectoryName(System.Reflection.Assembly.GetExecutingAssembly().GetName().CodeBase).Replace("file:\\", "");
templateFilePath += "\\Templates\\TemplateFile.txt";
tempRequestFilePath = System.IO.Path.GetTempPath();
tempRequestFilePath += Guid.NewGuid();
Directory.CreateDirectory(tempRequestFilePath);
responseFileToWrite = tempRequestFilePath + "\\" + Path.GetFileNameWithoutExtension(zipMergeFilePath) + ".RSP";
if (!File.Exists(templateFilePath))
{
return false;
}
templateString = System.IO.File.ReadAllText(templateFilePath);
currentRecordNumber = 1;
for (int i = 0; i < createToProcess.rtfText.Lines.Length; i++)
{
if (createToProcess.rtfText.Lines[i].Contains("TAG ID:"))
{
string currentTagID = createToProcess.rtfText.Lines[i].Substring(9, 11).Trim();
string currentCustomerNumber = createToProcess.rtfText.Lines[i].Substring(25, 12).Trim();
string currentTaxPeriod = createToProcess.rtfText.Lines[i].Substring(42, 8).Trim();
string currentCustomerPhoneNumber = createToProcess.rtfText.Lines[i].Substring(55, 9).Trim();
DateTime datePurchases = (DateTime.Now).AddDays(-7);
DateTime dateReceived = (DateTime.Now).AddYears(10);
DateTime dateModified = (DateTime.Now).AddYears(-1);
string currentResearchCreateRecord = String.Format(templateString,
currentTagID.PadRight(6),
currentCustomerNumber.PadRight(12),
currentTaxPeriod.PadRight(6),
currentCustomerPhoneNumber.PadRight(8),
datePurchases.Month.ToString("00") + datePurchases.Day.ToString("00") + datePurchases.Year.ToString("0000"),
"RecordNo: " + currentRecordNumber.ToString(),
dateReceived.Month.ToString("00") + dateReceived.Day.ToString("00") + dateReceived.Year.ToString("0000"),
dateModified.Month.ToString("00") + dateModified.Day.ToString("00") + dateModified.Year.ToString("0000")
);
System.Windows.Forms.Application.DoEvents();
File.AppendAllText(responseFileToWrite, currentResearchCreateRecord);
currentRecordNumber += 1;
}
}
using (ZipFile currentZipFile = new ZipFile())
{
currentZipFile.AddFile(responseFileToWrite, "");
currentZipFile.Save(zipMergeFilePath);
}
return true;

You're re-opening the file handle for each line. That's an expensive operation, and slows you down.
Instead, create (in a using block) a StreamWriter for the file, and call WriteLine() to write a single line without closing the file.
Also, reading the Lines property is quite slow. Change that to a foreach loop (or just cache the array) instead of rerunning all that code for each line.
Finally, don't call DoEvents().

be careful with "+" operator, they are very slow.
You should to use "StringBuilder" operator
System.Text.StringBuilder sb = new System.Text.StringBuilder((int)(sLen * Loops * 1.1));
for(i=0;i<Loops;i++) sb.Append(sSource);
sDest = sb.ToString();
https://support.microsoft.com/en-us/kb/306822

How to present(not open) a very large file into an UI component(TextBox) quickly

Before talking about my question, I'd like to clarify that this is not a question asking about how to OPEN a large text file.
I have done it. It's a 150MB .txt file and I dump it into a dictionary object around 1 second.
After this, I'd like to display it in an UI component.
I have tried to use TextBox, but until now the application windows hasn't shown up (it's been already 5 mins after I clicked the F5).....
So the question is what is the better UI component to display a large number of characters(I have 393300 elements in the dictionary object)
Thanks
Update:
private void LoadTermCodes(TextBox tb)
{
Stopwatch sw = new Stopwatch();
sw.Start();
StreamReader sr = new StreamReader(#"xxx.txt");
string line;
while ((line = sr.ReadLine()) != null)
{
string[] colums = line.Split('\t');
var id = colums[4];
var diagnosisName = colums[7];
if (dic.Keys.Contains(id))
{
var temp = dic[id];
temp += "," + diagnosisName;
dic[id] = temp;
}
else
{
dic.Add(id, diagnosisName);
}
//tb.Text += line + Environment.NewLine;
}
sw.Stop();
long spentTime = sw.ElapsedMilliseconds;
foreach (var element in dic)
{
tb.Text += element.Key + "\t" + element.Value + Environment.NewLine;
}
//tb.Text = "Eplased time (ms) = " + spentTime;
MessageBox.Show("Jie shu le haha~~~ " + spentTime);
}

The long running issue you're seeing is possibly due to how String are handled by the c# runtime. Since Strings are immutable what's happening every time you're calling + on them it's copying the String so far and the next small part into a new memory location and then returning that.
There's a good couple of articles by Eric Lippert here: Part 1 and Part 2 that explain it under the hood.
Instead, to stop all of this copying, you should use a StringBuilder. What this will do to your code is:
private void LoadTermCodes(TextBox tb)
{
Stopwatch sw = new Stopwatch();
sw.Start();
StreamReader sr = new StreamReader(#"xxx.txt");
string line;
// initialise the StringBuilder
System.Text.StringBuilder outputBuilder = new System.Text.StringBuilder(String.Empty);
while ((line = sr.ReadLine()) != null)
{
string[] colums = line.Split('\t');
var id = colums[4];
var diagnosisName = colums[7];
if (dic.Keys.Contains(id))
{
var temp = dic[id];
temp += "," + diagnosisName;
dic[id] = temp;
}
else
{
dic.Add(id, diagnosisName);
}
}
sw.Stop();
long spentTime = sw.ElapsedMilliseconds;
foreach (var element in dic)
{
// append a line to it, this will stop a lot of the copying
outputBuilder.AppendLine(String.Format("{0}\t{1}", element.Key, element.Value));
}
// emit the text
tb.Text += outputBuilder.ToString();
MessageBox.Show("Jie shu le haha~~~ " + spentTime);
}

"The process cannot access the file because it is being used by another process"

The full error I am receiving is:
"The process cannot access the file 'e:\Batch\NW\data_Test\IM_0232\input\RN318301.WM' because it is being used by another process.>>> at IM_0232.BatchModules.BundleSort(String bundleFileName)
at IM_0232.BatchModules.ExecuteBatchProcess()"
The involved code can be seen below. The RN318301.WM file being processed is a text file that contains information which will eventually be placed in PDF documents. There are many documents referenced in the RN318301.WM text file with each one being represented by a collection of rows. As can be seen in the code, the RN318301.WM text file is first parsed to determine the number of documents represented in it as well as the maximum number of lines in a documents. This information is then used to create two-dimensional array that will contain all of the document information. The RN318301.WM text file is parsed again to populate the two-dimensional array and at the same time information is collected into a dictionary that will be sorted later in the routine.
The failure occurs at the last line below:
File.Delete(_bundlePath + Path.GetFileName(bundleFileName));
This is a sporadic problem that occurs only rarely. It has even been seen to occur with a particular text file with which it had not previously occurred. That is, a particular text file will process fine but then on reprocessing the error will be triggered.
Can anyone help us to diagnose the cause of this error? Thank you very much...
public void BundleSort(string bundleFileName)
{
Dictionary<int, string> memberDict = new Dictionary<int, string>();
Dictionary<int, string> sortedMemberDict = new Dictionary<int, string>();
//int EOBPosition = 0;
int EOBPosition = -1;
int lineInEOB = 0;
int eobCount = 0;
int lineCount = 0;
int maxLineCount = 0;
string compareString;
string EOBLine;
//#string[][] EOBLineArray;
string[,] EOBLineArray;
try
{
_batch.TranLog_Write("\tBeginning sort of bundle " + _bundleInfo.BundleName + " to facilitate householding");
//Read the bundle and create a dictionary of comparison strings with EOB position in the bundle being the key
StreamReader file = new StreamReader(#_bundlePath + _bundleInfo.BundleName);
//The next section of code counts CH records as well as the maximum number of CD records in an EOB. This information is needed for initialization of the 2-dimensional EOBLineArray array.
while ((EOBLine = file.ReadLine()) != null)
{
if (EOBLine.Substring(0, 2) == "CH" || EOBLine.Substring(0, 2) == "CT")
{
if (lineCount == 0)
lineCount++;
if (lineCount > maxLineCount)
{
maxLineCount = lineCount;
}
eobCount++;
if (lineCount != 1)
lineCount = 0;
}
if (EOBLine.Substring(0, 2) == "CD")
{
lineCount++;
}
}
EOBLineArray = new string[eobCount, maxLineCount + 2];
file = new StreamReader(#_bundlePath + _bundleInfo.BundleName);
try
{
while ((EOBLine = file.ReadLine()) != null)
{
if (EOBLine.Substring(0, 2) == "CH")
{
EOBPosition++;
lineInEOB = 0;
compareString = EOBLine.Substring(8, 40).Trim() + EOBLine.Substring(49, 49).TrimEnd().TrimStart() + EOBLine.Substring(120, 5).TrimEnd().TrimStart();
memberDict.Add(EOBPosition, compareString);
EOBLineArray[EOBPosition, lineInEOB] = EOBLine;
}
else
{
if (EOBLine.Substring(0, 2) == "CT")
{
EOBPosition++;
EOBLineArray[EOBPosition, lineInEOB] = EOBLine;
}
else
{
lineInEOB++;
EOBLineArray[EOBPosition, lineInEOB] = EOBLine;
}
}
}
}
catch (Exception ex)
{
throw ex;
}
_batch.TranLog_Write("\tSending original unsorted bundle to archive");
if(!(File.Exists(_archiveDir + "\\" +DateTime.Now.ToString("yyyyMMdd")+ Path.GetFileName(bundleFileName) + "_original")))
{
File.Copy(_bundlePath + Path.GetFileName(bundleFileName), _archiveDir + "\\" +DateTime.Now.ToString("yyyyMMdd")+ Path.GetFileName(bundleFileName) + "_original");
}
file.Close();
file.Dispose();
GC.Collect();
File.Delete(_bundlePath + Path.GetFileName(bundleFileName));

You didn't close/dispose your StreamReader first time round so the file handle is still open
Consider using the using construct - this will automatically dispose of the object when it goes out of scope:
using(var file = new StreamReader(args))
{
// Do stuff
}
// file has now been disposed/closed etc

You need to close your StreamReaders for one thing.
StreamReader file = new StreamReader(#_bundlePath + _bundleInfo.BundleName);
You need to close the StreamReader object, and you could do this in a finally block:
finally {
file.Close();
}
A better way is to use a using block:
using (StreamReader file = new StreamReader(#_bundlePath + _bundleInfo.BundleName)) {
...
}
It looks to me like you are calling GC.Collect to try to force the closing of these StreamReaders, but that doesn't guarantee that they will be closed immediately as per the MSDN doc:
http://msdn.microsoft.com/en-us/library/xe0c2357.aspx
From that doc:
"All objects, regardless of how long they have been in memory, are considered for collection;"

Using StreamWriter to implement a rolling log, and deleting from top

My C# winforms 4.0 application has been using a thread-safe streamwriter to do internal, debug logging information. When my app opens, it deletes the file, and recreates it. When the app closes, it saves the file.
What I'd like to do is modify my application so that it does appending instead of replacing. This is a simple fix.
However, here's my question:
I'd like to keep my log file AROUND 10 megabytes maximum. My constraint would be simple. When you go to close the file, if the file is greater than 10 megabytes, trim out the first 10%.
Is there a 'better' way then doing the following:
Close the file
Check if the file is > 10 meg
If so, open the file
Parse the entire thing
Cull the first 10%
Write the file back out
Close
Edit: well, I ended up rolling my own (shown following) the suggestion to move overt to Log4Net is a good one, but the time it woudl take to learn the new library and move all my log statements (thousands) over isn't time effective for the small enhancement I was trying to make.
private static void PerformFileTrim(string filename)
{
var FileSize = Convert.ToDecimal((new System.IO.FileInfo(filename)).Length);
if (FileSize > 5000000)
{
var file = File.ReadAllLines(filename).ToList();
var AmountToCull = (int)(file.Count * 0.33);
var trimmed = file.Skip(AmountToCull).ToList();
File.WriteAllLines(filename, trimmed);
}
}

I researched this once and never came up with anything, but I can offer you plan B here:
I use the selection below to keep a maximum of 3 log files. At first, log file 1 is created and appended to. When it exceeds maxsize, log 2 and later log 3 are created. When log 3 is too large, log 1 is deleted and the remaining logs get pushed down the stack.
string[] logFileList = Directory.GetFiles(Path.GetTempPath(), "add_all_*.log", SearchOption.TopDirectoryOnly);
if (logFileList.Count() > 1)
{
Array.Sort(logFileList, 0, logFileList.Count());
}
if (logFileList.Any())
{
string currFilePath = logFileList.Last();
string[] dotSplit = currFilePath.Split('.');
string lastChars = dotSplit[0].Substring(dotSplit[0].Length - 3);
ctr = Int32.Parse(lastChars);
FileInfo f = new FileInfo(currFilePath);
if (f.Length > MaxLogSize)
{
if (logFileList.Count() > MaxLogCount)
{
File.Delete(logFileList[0]);
for (int i = 1; i < MaxLogCount + 1; i++)
{
Debug.WriteLine(string.Format("moving: {0} {1}", logFileList[i], logFileList[i - 1]));
File.Move(logFileList[i], logFileList[i - 1]); // push older log files back, in order to pop new log on top
}
}
else
{
ctr++;
}
}
}

The solutions here did not really work for me. I took user3902302's answer, which again was based on bigtech's answer and wrote a complete class. Also, I am NOT using StreamWriter, you can change the one line (AppendAllText against the StreamWrite aequivalent).
There is little error handling (e. g. re-try access when it is failing, though the lock should catch all internal concurrent access).
This might be enough for some people who had to use a big solution like log4net or nlog before. (And log4net RollingAppender is not even thread-safe, this one is. :) )
public class RollingLogger
{
readonly static string LOG_FILE = #"c:\temp\logfile.log";
readonly static int MaxRolledLogCount = 3;
readonly static int MaxLogSize = 1024; // 1 * 1024 * 1024; <- small value for testing that it works, you can try yourself, and then use a reasonable size, like 1M-10M
public static void LogMessage(string msg)
{
lock (LOG_FILE) // lock is optional, but.. should this ever be called by multiple threads, it is safer
{
RollLogFile(LOG_FILE);
File.AppendAllText(LOG_FILE, msg + Environment.NewLine, Encoding.UTF8);
}
}
private static void RollLogFile(string logFilePath)
{
try
{
var length = new FileInfo(logFilePath).Length;
if (length > MaxLogSize)
{
var path = Path.GetDirectoryName(logFilePath);
var wildLogName = Path.GetFileNameWithoutExtension(logFilePath) + "*" + Path.GetExtension(logFilePath);
var bareLogFilePath = Path.Combine(path, Path.GetFileNameWithoutExtension(logFilePath));
string[] logFileList = Directory.GetFiles(path, wildLogName, SearchOption.TopDirectoryOnly);
if (logFileList.Length > 0)
{
// only take files like logfilename.log and logfilename.0.log, so there also can be a maximum of 10 additional rolled files (0..9)
var rolledLogFileList = logFileList.Where(fileName => fileName.Length == (logFilePath.Length + 2)).ToArray();
Array.Sort(rolledLogFileList, 0, rolledLogFileList.Length);
if (rolledLogFileList.Length >= MaxRolledLogCount)
{
File.Delete(rolledLogFileList[MaxRolledLogCount - 1]);
var list = rolledLogFileList.ToList();
list.RemoveAt(MaxRolledLogCount - 1);
rolledLogFileList = list.ToArray();
}
// move remaining rolled files
for (int i = rolledLogFileList.Length; i > 0; --i)
File.Move(rolledLogFileList[i - 1], bareLogFilePath + "." + i + Path.GetExtension(logFilePath));
var targetPath = bareLogFilePath + ".0" + Path.GetExtension(logFilePath);
// move original file
File.Move(logFilePath, targetPath);
}
}
}
catch (Exception ex)
{
System.Diagnostics.Debug.WriteLine(ex.ToString());
}
}
}
edit:
Since I just noticed that you asked a slightly different question: should your lines vary greatly in size, this would be a variation (, that in 90% of cases does not improve over yours, though, and might be very slightly faster, also introduced a new unhandled error (\n not being present)):
private static void PerformFileTrim(string filename)
{
var fileSize = (new System.IO.FileInfo(filename)).Length;
if (fileSize > 5000000)
{
var text = File.ReadAllText(filename);
var amountToCull = (int)(text.Length * 0.33);
amountToCull = text.IndexOf('\n', amountToCull);
var trimmedText = text.Substring(amountToCull + 1);
File.WriteAllText(filename, trimmedText);
}
}

This is derived from bigtech's answer:
private static string RollLogFile()
{
string path = Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments);
string appName = Path.GetFileNameWithoutExtension(Environment.GetCommandLineArgs()[0]);
string wildLogName = string.Format("{0}*.log",appName);
int fileCounter = 0;
string[] logFileList = Directory.GetFiles(path, wildLogName, SearchOption.TopDirectoryOnly);
if (logFileList.Length > 0)
{
Array.Sort(logFileList, 0, logFileList.Length);
fileCounter = logFileList.Length - 1;
//Make sure we apply the MaxLogCount (but only once to reduce the delay)
if (logFileList.Length > MaxLogCount)
{
//Too many files - remove one and rename the others
File.Delete(logFileList[0]);
for (int i = 1; i < logFileList.Length; i++)
{
File.Move(logFileList[i], logFileList[i - 1]);
}
--fileCounter;
}
string currFilePath = logFileList[fileCounter];
FileInfo f = new FileInfo(currFilePath);
if (f.Length < MaxLogSize)
{
//still room in the current file
return currFilePath;
}
else
{
//need another filename
++fileCounter;
}
}
return string.Format("{0}{1}{2}{3:00}.log", path, Path.DirectorySeparatorChar, appName, fileCounter);
}
Usage:
string logFileName = RollLogFile();
using (StreamWriter sw = new StreamWriter(logFileName, true))
{
sw.AutoFlush = true;
sw.WriteLine(string.Format("{0:u} {1}", DateTime.Now, message));
}

This function will allow you to rotate your log based on weekdays. First time y our application will launch on Monday, will check for any existing entry for Monday Date, if not already initialized for today will discard old entries and reinitialize new file. Onwards for whole of that day, file will keep appending the text to same log file.
So, total 7 log files will be created.
debug-Mon.txt, debog-Tue.txt...
it will also add the method name which actually logged the message along with date time. very useful for general purpose use.
private void log(string text)
{
string dd = DateTime.Now.ToString("yyyy-MM-dd");
string mm = DateTime.Now.ToString("ddd");
if (File.Exists("debug-" + mm + ".txt"))
{
String contents = File.ReadAllText("debug-" + mm + ".txt");
if (!contents.Contains("Date: " + dd))
{
File.Delete("debug-" + mm + ".txt");
}
}
File.AppendAllText("debug-" + mm + ".txt", "\r\nDate: " + DateTime.Now.ToString("yyyy-MM-dd HH:mm:s") + " =>\t" + new System.Diagnostics.StackFrame(1, true).GetMethod().Name + "\t" + text);
}

I liked greggorob64's solution but also wanted to zip the old file. This has everything you need other than the part of compressing the old file to a zip, which you can find here: Create zip file in memory from bytes (text with arbitrary encoding)
static int iMaxLogLength = 2000; // Probably should be bigger, say 200,000
static int KeepLines = 5; // minimum of how much of the old log to leave
public static void ManageLogs(string strFileName)
{
try
{
FileInfo fi = new FileInfo(strFileName);
if (fi.Length > iMaxLogLength) // if the log file length is already too long
{
int TotalLines = 0;
var file = File.ReadAllLines(strFileName);
var LineArray = file.ToList();
var AmountToCull = (int)(LineArray.Count - KeepLines);
var trimmed = LineArray.Skip(AmountToCull).ToList();
File.WriteAllLines(strFileName, trimmed);
string archiveName = strFileName + "-" + DateTime.Now.ToString("MM-dd-yyyy") + ".zip";
File.WriteAllBytes(archiveName, Compression.Zip(string.Join("\n", file)));
}
}
catch (Exception ex)
{
Console.WriteLine("Failed to write to logfile : " + ex.Message);
}
}
I have this as part of the initialization / reinitialization section of my application, so it gets run a few times a day.
ErrorLogging.ManageLogs("Application.log");

I was looking through the win32 api, and I'm not even sure it's possible to do this with native win32 vfs calls, nevermind through .Net.
About the only solution I would have would be to use memory-mapped files and move the data manually, which .Net seems to support as of .Net 4.0.
Memory Mapped Files

how to get the oldest file in a directory fast using .NET?

I have a directory with around 15-30 thousand files. I need to just pull the oldest one. In other words the one that was created first. Is there a quick way to do this using C#, other than loading them into a collection then sorting?

You will have to load the FileInfo objects into a collection & sort, but it's a one-liner:
FileSystemInfo fileInfo = new DirectoryInfo(directoryPath).GetFileSystemInfos()
.OrderBy(fi => fi.CreationTime).First();
Ok, two lines because it's a long statement.

The short answer is no. Windows file systems don't index files by date so there is no native way to do this, let alone a .net way without enumerating all of them.

You can't do it without sorting but what you can do is make it fast.
Sorting by CreationTime can be slow because first accessing this property for each file involves interrogation of the file system.
Use A Faster Directory Enumerator that preserves more information about files while enumerating and allows to do sorting faster.
Code to compare performance:
static void Main(string[] args)
{
var timer = Stopwatch.StartNew();
var oldestFile = FastDirectoryEnumerator.EnumerateFiles(#"c:\windows\system32")
.OrderBy(f => f.CreationTime).First();
timer.Stop();
Console.WriteLine(oldestFile);
Console.WriteLine("FastDirectoryEnumerator - {0}ms", timer.ElapsedMilliseconds);
Console.WriteLine();
timer.Reset();
timer.Start();
var oldestFile2 = new DirectoryInfo(#"c:\windows\system32").GetFiles()
.OrderBy(f => f.CreationTime).First();
timer.Stop();
Console.WriteLine(oldestFile2);
Console.WriteLine("DirectoryInfo - {0}ms", timer.ElapsedMilliseconds);
Console.WriteLine("Press ENTER to finish");
Console.ReadLine();
}
For me it gives this:
VEN2232.OLB
FastDirectoryEnumerator - 27ms
VEN2232.OLB
DirectoryInfo - 559ms

Edit: Removed the sort and made it a function.
public static FileInfo GetOldestFile(string directory)
{
if (!Directory.Exists(directory))
throw new ArgumentException();
DirectoryInfo parent = new DirectoryInfo(directory);
FileInfo[] children = parent.GetFiles();
if (children.Length == 0)
return null;
FileInfo oldest = children[0];
foreach (var child in children.Skip(1))
{
if (child.CreationTime < oldest.CreationTime)
oldest = child;
}
return oldest;
}

Sorting is O(n log n). Instead, why don't you just enumerate the directory? I'm not sure what the C# equivalent of FindFirstFile()/FindNextFile() is, but you want to do is:
Keep the current lowest date and filename in a local variable.
Enumerate the directory.
If the date on a given file is less than the local variable, set the local variable to the new date and filename.

Oddly enough, this worked perfectly on a directory of mine with 3000+ jpg files:
DirectoryInfo di = new DirectoryInfo(dpath);
FileInfo[] rgFiles = di.GetFiles("*.jpg");
FileInfo firstfile = rgFiles[0];
FileInfo lastfile = rgFiles[rgFiles.Length - 1];
DateTime oldestfiletime = firstfile.CreationTime;
DateTime newestfiletime = lastfile.CreationTime;

Here's a C# routine that may do what you want by spawning a cmd shell execute a dir /o:D on the specified directory and returning the name of the first file found.
static string GetOldestFile(string dirName)
{
ProcessStartInfo si = new ProcessStartInfo("cmd.exe");
si.RedirectStandardInput = true;
si.RedirectStandardOutput = true;
si.UseShellExecute = false;
Process p = Process.Start(si);
p.StandardInput.WriteLine(#"dir " + dirName + " /o:D");
p.StandardInput.WriteLine(#"exit");
string output = p.StandardOutput.ReadToEnd();
string[] splitters = { Environment.NewLine };
string[] lines = output.Split(splitters, StringSplitOptions.RemoveEmptyEntries);
// find first line with a valid date that does not have a <DIR> in it
DateTime result;
int i = 0;
while (i < lines.Length)
{
string[] tokens = lines[i].Split(' ');
if (DateTime.TryParse(tokens[0], out result))
{
if (!lines[i].Contains("<DIR>"))
{
return tokens[tokens.Length - 1];
}
}
i++;
}
return "";
}

Look, would it not be easier to shell out to a hidden process and redirect the output stream to the input and use the dir /o-d which sorts by the date/time, using the dash reverses the operation....
Edit: here's a sample code to do this...quick and dirty...
public class TestDir
{
private StringBuilder sbRedirectedOutput = new StringBuilder();
public string OutputData
{
get { return this.sbRedirectedOutput.ToString(); }
}
public void Run()
{
System.Diagnostics.ProcessStartInfo ps = new System.Diagnostics.ProcessStartInfo();
ps.FileName = "cmd";
ps.ErrorDialog = false;
ps.Arguments = string.Format("dir {0} /o-d", path_name);
ps.CreateNoWindow = true;
ps.UseShellExecute = false;
ps.RedirectStandardOutput = true;
ps.WindowStyle = System.Diagnostics.ProcessWindowStyle.Hidden;
using (System.Diagnostics.Process proc = new System.Diagnostics.Process())
{
proc.StartInfo = ps;
proc.Exited += new EventHandler(proc_Exited);
proc.OutputDataReceived += new System.Diagnostics.DataReceivedEventHandler(proc_OutputDataReceived);
proc.Start();
proc.WaitForExit();
proc.BeginOutputReadLine();
while (!proc.HasExited) ;
}
}
void proc_Exited(object sender, EventArgs e)
{
System.Diagnostics.Debug.WriteLine("proc_Exited: Process Ended");
}
void proc_OutputDataReceived(object sender, System.Diagnostics.DataReceivedEventArgs e)
{
if (e.Data != null) this.sbRedirectedOutput.Append(e.Data + Environment.NewLine);
//System.Diagnostics.Debug.WriteLine("proc_OutputDataReceived: Data: " + e.Data);
}
}
The very first 4 or 5 lines of the StringBuilder object sbRedirectedOutput can be chopped out,then after that line would contain the oldest filename and would be quite easy to parse out....

I also have thousands of files. To retrieve them in sorted order by the date modified, use either of these C# statements.
files = di.GetFiles("*.*").OrderByDescending(f => f.LastWriteTime).ToArray();
files = di.GetFiles("*.*").OrderBy(f => f.LastWriteTime).ToArray();
To make it easier for the user to access the appropriate file, I display either of the following two lines. I have two windows open. One lists the files in descending order. The other list the files in ascending order. The descending order list is updated by Windows. The ascending order is not updated so the Hm key must be used to put the oldest file at the top of the list.
Console.WriteLine( "DateMod v (latest)");
Console.WriteLine( "DateMod ^ (oldest) Sel Hm");

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

looking for way to read and search file fast in c# - c#

Related

Slow String Formatting in C# when using more than a few lines

How to present(not open) a very large file into an UI component(TextBox) quickly

"The process cannot access the file because it is being used by another process"

Using StreamWriter to implement a rolling log, and deleting from top

how to get the oldest file in a directory fast using .NET?

Categories

Resources