Using StreamWriter to implement a rolling log, and deleting from top - c#

My C# winforms 4.0 application has been using a thread-safe streamwriter to do internal, debug logging information. When my app opens, it deletes the file, and recreates it. When the app closes, it saves the file.
What I'd like to do is modify my application so that it does appending instead of replacing. This is a simple fix.
However, here's my question:
I'd like to keep my log file AROUND 10 megabytes maximum. My constraint would be simple. When you go to close the file, if the file is greater than 10 megabytes, trim out the first 10%.
Is there a 'better' way then doing the following:
Close the file
Check if the file is > 10 meg
If so, open the file
Parse the entire thing
Cull the first 10%
Write the file back out
Close
Edit: well, I ended up rolling my own (shown following) the suggestion to move overt to Log4Net is a good one, but the time it woudl take to learn the new library and move all my log statements (thousands) over isn't time effective for the small enhancement I was trying to make.
private static void PerformFileTrim(string filename)
{
var FileSize = Convert.ToDecimal((new System.IO.FileInfo(filename)).Length);
if (FileSize > 5000000)
{
var file = File.ReadAllLines(filename).ToList();
var AmountToCull = (int)(file.Count * 0.33);
var trimmed = file.Skip(AmountToCull).ToList();
File.WriteAllLines(filename, trimmed);
}
}

I researched this once and never came up with anything, but I can offer you plan B here:
I use the selection below to keep a maximum of 3 log files. At first, log file 1 is created and appended to. When it exceeds maxsize, log 2 and later log 3 are created. When log 3 is too large, log 1 is deleted and the remaining logs get pushed down the stack.
string[] logFileList = Directory.GetFiles(Path.GetTempPath(), "add_all_*.log", SearchOption.TopDirectoryOnly);
if (logFileList.Count() > 1)
{
Array.Sort(logFileList, 0, logFileList.Count());
}
if (logFileList.Any())
{
string currFilePath = logFileList.Last();
string[] dotSplit = currFilePath.Split('.');
string lastChars = dotSplit[0].Substring(dotSplit[0].Length - 3);
ctr = Int32.Parse(lastChars);
FileInfo f = new FileInfo(currFilePath);
if (f.Length > MaxLogSize)
{
if (logFileList.Count() > MaxLogCount)
{
File.Delete(logFileList[0]);
for (int i = 1; i < MaxLogCount + 1; i++)
{
Debug.WriteLine(string.Format("moving: {0} {1}", logFileList[i], logFileList[i - 1]));
File.Move(logFileList[i], logFileList[i - 1]); // push older log files back, in order to pop new log on top
}
}
else
{
ctr++;
}
}
}

The solutions here did not really work for me. I took user3902302's answer, which again was based on bigtech's answer and wrote a complete class. Also, I am NOT using StreamWriter, you can change the one line (AppendAllText against the StreamWrite aequivalent).
There is little error handling (e. g. re-try access when it is failing, though the lock should catch all internal concurrent access).
This might be enough for some people who had to use a big solution like log4net or nlog before. (And log4net RollingAppender is not even thread-safe, this one is. :) )
public class RollingLogger
{
readonly static string LOG_FILE = #"c:\temp\logfile.log";
readonly static int MaxRolledLogCount = 3;
readonly static int MaxLogSize = 1024; // 1 * 1024 * 1024; <- small value for testing that it works, you can try yourself, and then use a reasonable size, like 1M-10M
public static void LogMessage(string msg)
{
lock (LOG_FILE) // lock is optional, but.. should this ever be called by multiple threads, it is safer
{
RollLogFile(LOG_FILE);
File.AppendAllText(LOG_FILE, msg + Environment.NewLine, Encoding.UTF8);
}
}
private static void RollLogFile(string logFilePath)
{
try
{
var length = new FileInfo(logFilePath).Length;
if (length > MaxLogSize)
{
var path = Path.GetDirectoryName(logFilePath);
var wildLogName = Path.GetFileNameWithoutExtension(logFilePath) + "*" + Path.GetExtension(logFilePath);
var bareLogFilePath = Path.Combine(path, Path.GetFileNameWithoutExtension(logFilePath));
string[] logFileList = Directory.GetFiles(path, wildLogName, SearchOption.TopDirectoryOnly);
if (logFileList.Length > 0)
{
// only take files like logfilename.log and logfilename.0.log, so there also can be a maximum of 10 additional rolled files (0..9)
var rolledLogFileList = logFileList.Where(fileName => fileName.Length == (logFilePath.Length + 2)).ToArray();
Array.Sort(rolledLogFileList, 0, rolledLogFileList.Length);
if (rolledLogFileList.Length >= MaxRolledLogCount)
{
File.Delete(rolledLogFileList[MaxRolledLogCount - 1]);
var list = rolledLogFileList.ToList();
list.RemoveAt(MaxRolledLogCount - 1);
rolledLogFileList = list.ToArray();
}
// move remaining rolled files
for (int i = rolledLogFileList.Length; i > 0; --i)
File.Move(rolledLogFileList[i - 1], bareLogFilePath + "." + i + Path.GetExtension(logFilePath));
var targetPath = bareLogFilePath + ".0" + Path.GetExtension(logFilePath);
// move original file
File.Move(logFilePath, targetPath);
}
}
}
catch (Exception ex)
{
System.Diagnostics.Debug.WriteLine(ex.ToString());
}
}
}
edit:
Since I just noticed that you asked a slightly different question: should your lines vary greatly in size, this would be a variation (, that in 90% of cases does not improve over yours, though, and might be very slightly faster, also introduced a new unhandled error (\n not being present)):
private static void PerformFileTrim(string filename)
{
var fileSize = (new System.IO.FileInfo(filename)).Length;
if (fileSize > 5000000)
{
var text = File.ReadAllText(filename);
var amountToCull = (int)(text.Length * 0.33);
amountToCull = text.IndexOf('\n', amountToCull);
var trimmedText = text.Substring(amountToCull + 1);
File.WriteAllText(filename, trimmedText);
}
}

This is derived from bigtech's answer:
private static string RollLogFile()
{
string path = Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments);
string appName = Path.GetFileNameWithoutExtension(Environment.GetCommandLineArgs()[0]);
string wildLogName = string.Format("{0}*.log",appName);
int fileCounter = 0;
string[] logFileList = Directory.GetFiles(path, wildLogName, SearchOption.TopDirectoryOnly);
if (logFileList.Length > 0)
{
Array.Sort(logFileList, 0, logFileList.Length);
fileCounter = logFileList.Length - 1;
//Make sure we apply the MaxLogCount (but only once to reduce the delay)
if (logFileList.Length > MaxLogCount)
{
//Too many files - remove one and rename the others
File.Delete(logFileList[0]);
for (int i = 1; i < logFileList.Length; i++)
{
File.Move(logFileList[i], logFileList[i - 1]);
}
--fileCounter;
}
string currFilePath = logFileList[fileCounter];
FileInfo f = new FileInfo(currFilePath);
if (f.Length < MaxLogSize)
{
//still room in the current file
return currFilePath;
}
else
{
//need another filename
++fileCounter;
}
}
return string.Format("{0}{1}{2}{3:00}.log", path, Path.DirectorySeparatorChar, appName, fileCounter);
}
Usage:
string logFileName = RollLogFile();
using (StreamWriter sw = new StreamWriter(logFileName, true))
{
sw.AutoFlush = true;
sw.WriteLine(string.Format("{0:u} {1}", DateTime.Now, message));
}

This function will allow you to rotate your log based on weekdays. First time y our application will launch on Monday, will check for any existing entry for Monday Date, if not already initialized for today will discard old entries and reinitialize new file. Onwards for whole of that day, file will keep appending the text to same log file.
So, total 7 log files will be created.
debug-Mon.txt, debog-Tue.txt...
it will also add the method name which actually logged the message along with date time. very useful for general purpose use.
private void log(string text)
{
string dd = DateTime.Now.ToString("yyyy-MM-dd");
string mm = DateTime.Now.ToString("ddd");
if (File.Exists("debug-" + mm + ".txt"))
{
String contents = File.ReadAllText("debug-" + mm + ".txt");
if (!contents.Contains("Date: " + dd))
{
File.Delete("debug-" + mm + ".txt");
}
}
File.AppendAllText("debug-" + mm + ".txt", "\r\nDate: " + DateTime.Now.ToString("yyyy-MM-dd HH:mm:s") + " =>\t" + new System.Diagnostics.StackFrame(1, true).GetMethod().Name + "\t" + text);
}

I liked greggorob64's solution but also wanted to zip the old file. This has everything you need other than the part of compressing the old file to a zip, which you can find here: Create zip file in memory from bytes (text with arbitrary encoding)
static int iMaxLogLength = 2000; // Probably should be bigger, say 200,000
static int KeepLines = 5; // minimum of how much of the old log to leave
public static void ManageLogs(string strFileName)
{
try
{
FileInfo fi = new FileInfo(strFileName);
if (fi.Length > iMaxLogLength) // if the log file length is already too long
{
int TotalLines = 0;
var file = File.ReadAllLines(strFileName);
var LineArray = file.ToList();
var AmountToCull = (int)(LineArray.Count - KeepLines);
var trimmed = LineArray.Skip(AmountToCull).ToList();
File.WriteAllLines(strFileName, trimmed);
string archiveName = strFileName + "-" + DateTime.Now.ToString("MM-dd-yyyy") + ".zip";
File.WriteAllBytes(archiveName, Compression.Zip(string.Join("\n", file)));
}
}
catch (Exception ex)
{
Console.WriteLine("Failed to write to logfile : " + ex.Message);
}
}
I have this as part of the initialization / reinitialization section of my application, so it gets run a few times a day.
ErrorLogging.ManageLogs("Application.log");

I was looking through the win32 api, and I'm not even sure it's possible to do this with native win32 vfs calls, nevermind through .Net.
About the only solution I would have would be to use memory-mapped files and move the data manually, which .Net seems to support as of .Net 4.0.
Memory Mapped Files

Related

Asp.Net Mvc Delete file issue

I have an issue with Files.
I am doing an image importer so clients put their files on an FTP server and then they can import it in the application.
During the import process I copy the file in the FTP Folder to another folder with File.copy
public List<Visuel> ImportVisuel(int galerieId, string[] images)
{
Galerie targetGalerie = MemoryCache.GetGaleriById(galerieId);
List<FormatImage> listeFormats = MemoryCache.FormatImageToList();
int i = 0;
List<Visuel> visuelAddList = new List<Visuel>();
List<Visuel> visuelUpdateList = new List<Visuel>();
List<Visuel> returnList = new List<Visuel>();
foreach (string item in images)
{
i++;
Progress.ImportProgress[Progress.Guid] = "Image " + i + " sur " + images.Count() + " importées";
string extension = Path.GetExtension(item);
string fileName = Path.GetFileName(item);
string originalPath = HttpContext.Current.Request.PhysicalApplicationPath + "Uploads\\";
string destinationPath = HttpContext.Current.Server.MapPath("~/Images/Catalogue") + "\\";
Visuel importImage = MemoryCache.GetVisuelByFilName(fileName);
bool update = true;
if (importImage == null) { importImage = new Visuel(); update = false; }
Size imageSize = importImage.GetJpegImageSize(originalPath + fileName);
FormatImage format = listeFormats.Where(f => f.width == imageSize.Width && f.height == imageSize.Height).FirstOrDefault();
string saveFileName = Guid.NewGuid() + extension;
File.Copy(originalPath + fileName, destinationPath + saveFileName);
if (format != null)
{
importImage.format = format;
switch (format.key)
{
case "Catalogue":
importImage.fileName = saveFileName;
importImage.originalFileName = fileName;
importImage.dossier = targetGalerie;
importImage.dossier_id = targetGalerie.id;
importImage.filePath = "Images/Catalogue/";
importImage.largeur = imageSize.Width;
importImage.hauteur = imageSize.Height;
importImage.isRoot = true;
if (update == false) { MemoryCache.Add(ref importImage); returnList.Add(importImage); }
if (update == true) visuelUpdateList.Add(importImage);
foreach (FormatImage f in listeFormats)
{
if (f.key.StartsWith("Catalogue_"))
{
string[] keys = f.key.Split('_');
string destinationFileName = saveFileName.Insert(saveFileName.IndexOf('.'), "-" + keys[1].ToString());
string destinationFileNameDeclinaison = destinationPath + destinationFileName;
VisuelResizer declinaison = new VisuelResizer();
declinaison.Save(originalPath + fileName, f.width, f.height, 1000, destinationFileNameDeclinaison);
Visuel visuel = MemoryCache.GetVisuelByFilName(fileName.Insert(fileName.IndexOf('.'), "-" + keys[1].ToString()));
update = true;
if (visuel == null) { visuel = new Visuel(); update = false; }
visuel.parent = importImage;
visuel.filePath = "Images/Catalogue/";
visuel.fileName = destinationFileName;
visuel.originalFileName = string.Empty;
visuel.format = f;
//visuel.dossier = targetGalerie; On s'en fout pour les déclinaisons
visuel.largeur = f.width;
visuel.hauteur = f.height;
if (update == false)
{
visuelAddList.Add(visuel);
}
else
{
visuelUpdateList.Add(visuel);
}
//importImage.declinaisons.Add(visuel);
}
}
break;
}
}
}
MemoryCache.Add(ref visuelAddList);
// FONCTION à implémenter
MemoryCache.Update(ref visuelUpdateList);
return returnList;
}
After some processes on the copy (the original file is no more used)
the client have a pop-up asking him if he wants to delete the original files in the ftp folder.
If he clicks on Ok another method is called on the same controller
and this method use
public void DeleteImageFile(string[] files)
{
for (int i = 0; i < files.Length; i++)
{
File.Delete(HttpContext.Current.Request.PhysicalApplicationPath + files[i].Replace(#"/", #"\"));
}
}
This method works fine and really delete the good files when I use it in other context.
But here I have an error message:
Process can't acces to file ... because it's used by another process.
Someone have an idea?
Thank you.
Here's the screenshot of Process Explorer
There are couple of thing you can do here.
1) If you can repro it, you can use Process Explorer at that moment and see which process is locking the file and if the process is ur process then making sure that you close the file handle after your work is done.
2) Use try/catch around the delete statement and retry after few seconds to see if the file handle was released.
3) If you can do it offline you can put in some queue and do the deletion on it later on.
You solve this by using c# locks. Just embed your code inside a lock statement and your threads will be safe and wait each other to complete processing.
I found the solution:
in my import method, there a call to that method
public void Save(string originalFile, int maxWidth, int maxHeight, int quality, string filePath)
{
Bitmap image = new Bitmap(originalFile);
Save(ref image, maxWidth, maxHeight, quality, filePath);
}
The bitmap maintains the file opened blocking delete.
just added
image.Dispose();
in the methos and it work fine.
Thank you for your help, and thank you for process explorer. Very useful tool

How to read and update multiple files

I have 10 txt files in Debug\Tests\Text\ (10 txt files). I need to write a program to open all 10 files and updated every single file. I'm not sure how to do it. Now, I'm actually reading the folder and getting the file name and storing the file name in an array. Below is my code:
private void getFilesName()
{
string[] fileArray = Directory.GetFiles(#"Tests\Text");
//looping through the folder and get the fileNames
for (int i = 0; i<fileArray.Length; i++)
{
MessageBox.Show(fileArray[i]); // I'm doing this is to double check i manage to get the file name.
}
}
After doing this, it do read all the text file name, but the challenge now is for me to access the filename and updating every file in it. I have also created another method just for updating the values in the txt files, below is the code:
private bool modifySQLFile()
{
string destFileName = #"Tests\Text\" // I need the fileName?
string[] fileTexts = File.ReadAllLines(destFileName);
int counter = 0;
//Processing the File
foreach(string line in fileTexts)
{
//only read those non-comments line
if(line.StartsWith("--") == false)
{
//Start to replace instances of Access ID
if(line.Contains(Variable) == true)
{
fileTexts[counter] = fileTexts[counter].Replace(Variable, textBox2.Text);
}
}
counter++;
}
//check if file exists in the backup folder
if(File.Exists("Tests\\Text\\file name "+ textBox1.Text +".sql") == true)
{
MessageBox.Show("This file already exist in the backup folder");
return false;
}
else
{
//update the file
File.WriteAllLines(destFileName, fileTexts);
File.Move(destFileName, "Tests\\Text\\file name"+ textBox1.Text +".sql");
MessageBox.Show("Completed");
return true;
}
}
Your problem seems to be passing the filename variable from the loop to the method.
In order to do what you want, add a parameter to the method:
private bool ModifySQLFile(string filename)
{
string[] fileTexts = File.ReadAllLines(filename);
// ...
}
Then call the method with this parameter:
for (int i = 0; i<fileArray.Length; i++)
{
ModifySQLFile(fileArray[i]);
}
But in general you really don't want to treat a formal language as plaintext like you do. It's very easy to break the SQL like that. What if the user wanted to replace the text "insert", or replaces something with "foo'bar"?
First, implement one (file) modification:
private bool modifySQLFile(String file) {
// given source file, let´s elaborate target file name
String targetFile = Path.Combine(
Path.GetDirectoryName(file),
String.Format("{0}{1}.sql",
Path.GetFileNameWithoutExtension(file),
textBox1.Text));
// In case you want a back up
//TODO: given source file name, elaborate back up file name
//String backUpFile = Path.Combine(...);
// Check (validate) before processing: do not override existing files
if (File.Exists(targetFile))
return false;
//TODO: what if back up file exists? Should we override it? skip?
// if line doesn't start with SQL commentary --
// and contains a variable, substitute the variable with its value
var target = File
.ReadLines(file)
.Select(line => (!line.StartsWith("--") && line.Contains(Variable))
? line.Replace(Variable, textBox2.Text)
: line);
// write modified above lines into file
File.WriteAllLines(targetFile, target);
// In case you want a back up
// Move file to backup
//File.Move(file, backUpFile);
return true;
}
Then call it in the loop:
// enumerate all the text files in the directory
var files = Directory
.EnumerateFiles("#"Tests\Text", "*.txt");
//TODO: you may want filter out some files with .Where
//.Where(file => ...);
// update all the files found above
foreach (var file in files) {
if (!modifySQLFile(file))
MessageBox.Show(String.Format("{0} already exist in the backup folder", file));
}
Please, do not do:
Use Magic values: what is #"Tests\Text\" within your modifySQLFile
Mix UI MessageBox.Show(...) and logic: modifySQLFile returns true or false and it's caller who can display message box.
Materialize when it's not required (Directory.GetFiles, File.ReadAllLines)
If you would like to edit the files in parallel. With threads you can parallelize work.
for (int i = 0; i < fileArray.Length; i++)
new Thread(UpdateFileThread).Start(fileArray[i]);
private void UpdateFileThread(object path)
{
string filePath = (string)path;
//ToDo: Edit file
}
In your case you would create 10 Threads. That solution works, but is a bad pattern if you have to deal with more than 10 files.
Below i have posted the real time code ,which i have used project
protected void btnSqlfinder_Click(object sender, EventArgs e)
{
//Defining the path of directory where all files saved
string filepath = # "D:\TPMS\App_Code\";
//get the all file names inside the directory
string[] files = Directory.GetFiles(filepath);
//loop through the files to search file one by one
for (int i = 0; i < files.Length; i++)
{
string sourcefilename = files[i];
StreamReader sr = File.OpenText(sourcefilename);
string sourceline = "";
int lineno = 0;
while ((sourceline = sr.ReadLine()) != null)
{
lineno++;
//defining the Keyword for search
if (sourceline.Contains("from"))
{
//append the result to multiline text box
TxtResult.Text += sourcefilename + lineno.ToString() + sourceline + System.Environment.NewLine;
}
if (sourceline.Contains("into"))
{
TxtResult.Text += sourcefilename + lineno.ToString() + sourceline + System.Environment.NewLine;
}
if (sourceline.Contains("set"))
{
TxtResult.Text += sourcefilename + lineno.ToString() + sourceline + System.Environment.NewLine;
}
if (sourceline.Contains("delete"))
{
TxtResult.Text += sourcefilename + lineno.ToString() + sourceline + System.Environment.NewLine;
}
}
}
}
This code will fetch the multiple files in the given directory,and show the lines as per the keyword in a separate text.
But you can easily change as per your requirement,Kindly let me know your thoughts.
Thanks

Slow String Formatting in C# when using more than a few lines

I have created a process which reads a "template" text file and then based on the String.Format requirements uses the tokens to place my custom text in.
So, everything works, but the process is slow.
The template file can have about 500-1000 lines; I am looking for a way to speed this process up.
Any ideas?
Here is my code below:
templateFilePath = System.IO.Path.GetDirectoryName(System.Reflection.Assembly.GetExecutingAssembly().GetName().CodeBase).Replace("file:\\", "");
templateFilePath += "\\Templates\\TemplateFile.txt";
tempRequestFilePath = System.IO.Path.GetTempPath();
tempRequestFilePath += Guid.NewGuid();
Directory.CreateDirectory(tempRequestFilePath);
responseFileToWrite = tempRequestFilePath + "\\" + Path.GetFileNameWithoutExtension(zipMergeFilePath) + ".RSP";
if (!File.Exists(templateFilePath))
{
return false;
}
templateString = System.IO.File.ReadAllText(templateFilePath);
currentRecordNumber = 1;
for (int i = 0; i < createToProcess.rtfText.Lines.Length; i++)
{
if (createToProcess.rtfText.Lines[i].Contains("TAG ID:"))
{
string currentTagID = createToProcess.rtfText.Lines[i].Substring(9, 11).Trim();
string currentCustomerNumber = createToProcess.rtfText.Lines[i].Substring(25, 12).Trim();
string currentTaxPeriod = createToProcess.rtfText.Lines[i].Substring(42, 8).Trim();
string currentCustomerPhoneNumber = createToProcess.rtfText.Lines[i].Substring(55, 9).Trim();
DateTime datePurchases = (DateTime.Now).AddDays(-7);
DateTime dateReceived = (DateTime.Now).AddYears(10);
DateTime dateModified = (DateTime.Now).AddYears(-1);
string currentResearchCreateRecord = String.Format(templateString,
currentTagID.PadRight(6),
currentCustomerNumber.PadRight(12),
currentTaxPeriod.PadRight(6),
currentCustomerPhoneNumber.PadRight(8),
datePurchases.Month.ToString("00") + datePurchases.Day.ToString("00") + datePurchases.Year.ToString("0000"),
"RecordNo: " + currentRecordNumber.ToString(),
dateReceived.Month.ToString("00") + dateReceived.Day.ToString("00") + dateReceived.Year.ToString("0000"),
dateModified.Month.ToString("00") + dateModified.Day.ToString("00") + dateModified.Year.ToString("0000")
);
System.Windows.Forms.Application.DoEvents();
File.AppendAllText(responseFileToWrite, currentResearchCreateRecord);
currentRecordNumber += 1;
}
}
using (ZipFile currentZipFile = new ZipFile())
{
currentZipFile.AddFile(responseFileToWrite, "");
currentZipFile.Save(zipMergeFilePath);
}
return true;
You're re-opening the file handle for each line. That's an expensive operation, and slows you down.
Instead, create (in a using block) a StreamWriter for the file, and call WriteLine() to write a single line without closing the file.
Also, reading the Lines property is quite slow. Change that to a foreach loop (or just cache the array) instead of rerunning all that code for each line.
Finally, don't call DoEvents().
be careful with "+" operator, they are very slow.
You should to use "StringBuilder" operator
System.Text.StringBuilder sb = new System.Text.StringBuilder((int)(sLen * Loops * 1.1));
for(i=0;i<Loops;i++) sb.Append(sSource);
sDest = sb.ToString();
https://support.microsoft.com/en-us/kb/306822

"The process cannot access the file because it is being used by another process"

The full error I am receiving is:
"The process cannot access the file 'e:\Batch\NW\data_Test\IM_0232\input\RN318301.WM' because it is being used by another process.>>> at IM_0232.BatchModules.BundleSort(String bundleFileName)
at IM_0232.BatchModules.ExecuteBatchProcess()"
The involved code can be seen below. The RN318301.WM file being processed is a text file that contains information which will eventually be placed in PDF documents. There are many documents referenced in the RN318301.WM text file with each one being represented by a collection of rows. As can be seen in the code, the RN318301.WM text file is first parsed to determine the number of documents represented in it as well as the maximum number of lines in a documents. This information is then used to create two-dimensional array that will contain all of the document information. The RN318301.WM text file is parsed again to populate the two-dimensional array and at the same time information is collected into a dictionary that will be sorted later in the routine.
The failure occurs at the last line below:
File.Delete(_bundlePath + Path.GetFileName(bundleFileName));
This is a sporadic problem that occurs only rarely. It has even been seen to occur with a particular text file with which it had not previously occurred. That is, a particular text file will process fine but then on reprocessing the error will be triggered.
Can anyone help us to diagnose the cause of this error? Thank you very much...
public void BundleSort(string bundleFileName)
{
Dictionary<int, string> memberDict = new Dictionary<int, string>();
Dictionary<int, string> sortedMemberDict = new Dictionary<int, string>();
//int EOBPosition = 0;
int EOBPosition = -1;
int lineInEOB = 0;
int eobCount = 0;
int lineCount = 0;
int maxLineCount = 0;
string compareString;
string EOBLine;
//#string[][] EOBLineArray;
string[,] EOBLineArray;
try
{
_batch.TranLog_Write("\tBeginning sort of bundle " + _bundleInfo.BundleName + " to facilitate householding");
//Read the bundle and create a dictionary of comparison strings with EOB position in the bundle being the key
StreamReader file = new StreamReader(#_bundlePath + _bundleInfo.BundleName);
//The next section of code counts CH records as well as the maximum number of CD records in an EOB. This information is needed for initialization of the 2-dimensional EOBLineArray array.
while ((EOBLine = file.ReadLine()) != null)
{
if (EOBLine.Substring(0, 2) == "CH" || EOBLine.Substring(0, 2) == "CT")
{
if (lineCount == 0)
lineCount++;
if (lineCount > maxLineCount)
{
maxLineCount = lineCount;
}
eobCount++;
if (lineCount != 1)
lineCount = 0;
}
if (EOBLine.Substring(0, 2) == "CD")
{
lineCount++;
}
}
EOBLineArray = new string[eobCount, maxLineCount + 2];
file = new StreamReader(#_bundlePath + _bundleInfo.BundleName);
try
{
while ((EOBLine = file.ReadLine()) != null)
{
if (EOBLine.Substring(0, 2) == "CH")
{
EOBPosition++;
lineInEOB = 0;
compareString = EOBLine.Substring(8, 40).Trim() + EOBLine.Substring(49, 49).TrimEnd().TrimStart() + EOBLine.Substring(120, 5).TrimEnd().TrimStart();
memberDict.Add(EOBPosition, compareString);
EOBLineArray[EOBPosition, lineInEOB] = EOBLine;
}
else
{
if (EOBLine.Substring(0, 2) == "CT")
{
EOBPosition++;
EOBLineArray[EOBPosition, lineInEOB] = EOBLine;
}
else
{
lineInEOB++;
EOBLineArray[EOBPosition, lineInEOB] = EOBLine;
}
}
}
}
catch (Exception ex)
{
throw ex;
}
_batch.TranLog_Write("\tSending original unsorted bundle to archive");
if(!(File.Exists(_archiveDir + "\\" +DateTime.Now.ToString("yyyyMMdd")+ Path.GetFileName(bundleFileName) + "_original")))
{
File.Copy(_bundlePath + Path.GetFileName(bundleFileName), _archiveDir + "\\" +DateTime.Now.ToString("yyyyMMdd")+ Path.GetFileName(bundleFileName) + "_original");
}
file.Close();
file.Dispose();
GC.Collect();
File.Delete(_bundlePath + Path.GetFileName(bundleFileName));
You didn't close/dispose your StreamReader first time round so the file handle is still open
Consider using the using construct - this will automatically dispose of the object when it goes out of scope:
using(var file = new StreamReader(args))
{
// Do stuff
}
// file has now been disposed/closed etc
You need to close your StreamReaders for one thing.
StreamReader file = new StreamReader(#_bundlePath + _bundleInfo.BundleName);
You need to close the StreamReader object, and you could do this in a finally block:
finally {
file.Close();
}
A better way is to use a using block:
using (StreamReader file = new StreamReader(#_bundlePath + _bundleInfo.BundleName)) {
...
}
It looks to me like you are calling GC.Collect to try to force the closing of these StreamReaders, but that doesn't guarantee that they will be closed immediately as per the MSDN doc:
http://msdn.microsoft.com/en-us/library/xe0c2357.aspx
From that doc:
"All objects, regardless of how long they have been in memory, are considered for collection;"

looking for way to read and search file fast in c#

I have 100Mb text file and I need to check every line for special word.
I am looking for fast way to do it.
so I divide the file to 10 :
public void ParseTheFile(BackgroundWorker bg)
{
Lines = File.ReadAllLines(FilePath);
this.size = Lines.Length;
chankSise=size/10;
reports reportInst = new reports(bg,size);
ParserThread [] ParserthreadArray = new ParserThread[10];
for (int i = 0; i <ParserthreadArray.Length; i++)
{
ParserthreadArray[i] = new ParserThread((reportInst));
ParserthreadArray[i].Init(SubArray(Lines,i * chankSise, chankSise), OutputPath);
}
Thread oThread0 = new Thread(ParserthreadArray[0].run);
oThread0.IsBackground = true;
Thread oThread1 = new Thread(ParserthreadArray[1].run);
oThread1.IsBackground = true;
Thread oThread2 = new Thread(ParserthreadArray[2].run);
oThread2.IsBackground = true;
Thread oThread3 = new Thread(ParserthreadArray[3].run);
oThread3.IsBackground = true;
Thread oThread4 = new Thread(ParserthreadArray[4].run);
oThread4.IsBackground = true;
Thread oThread5 = new Thread(ParserthreadArray[5].run);
oThread5.IsBackground = true;
Thread oThread6 = new Thread(ParserthreadArray[6].run);
oThread6.IsBackground = true;
Thread oThread7 = new Thread(ParserthreadArray[7].run);
oThread7.IsBackground = true;
Thread oThread8 = new Thread(ParserthreadArray[8].run);
oThread8.IsBackground = true;
Thread oThread9 = new Thread(ParserthreadArray[9].run);
oThread9.IsBackground = true;
oThread0.Start();
oThread1.Start();
oThread2.Start();
oThread3.Start();
oThread4.Start();
oThread5.Start();
oThread6.Start();
oThread7.Start();
oThread8.Start();
oThread9.Start();
oThread0.Join();
oThread1.Join();
oThread2.Join();
oThread3.Join();
oThread4.Join();
oThread5.Join();
oThread6.Join();
oThread7.Join();
oThread8.Join();
oThread9.Join();
this is the Init method:
public void Init(string [] olines,string outputPath)
{
Lines = olines;
OutputPath = outputPath+"/"+"ThreadTemp"+threadID;
}
this is the SubArray method:
public string [] SubArray(string [] data, int index, int length)
{
string [] result = new string[length];
Array.Copy(data, index, result, 0, length);
return result;
}
and each thread do this:
public void run()
{
if (!System.IO.Directory.Exists(OutputPath))
{
System.IO.Directory.CreateDirectory(OutputPath);
DirectoryInfo dir = new DirectoryInfo(OutputPath);
dir.Attributes |= FileAttributes.Hidden;
}
this.size = Lines.Length;
foreach (string line in Lines)
{
bgReports.sendreport(allreadychecked);
allreadychecked++;
hadHandlerOrEngine = false;
words = line.Split(' ');
if (words.Length>4)
{
for (int i = 5; i < words.Length; i++)
{
if (words[i] == "Handler" | words[i] == "Engine")
{
hadHandlerOrEngine = true;
string num = words[1 + i];
int realnum = int.Parse(num[0].ToString());
cuurentEngine = (realnum);
if (engineArry[realnum] == false)
{
File.Create(OutputPath + "/" + realnum + ".txt").Close();
engineArry[realnum] = true;
}
TextWriter tw = new StreamWriter(OutputPath + "/" + realnum + ".txt", true);
tw.WriteLine(line);
tw.Close();
break;
}
}
}
if (hadHandlerOrEngine == false)
{
if (engineArry[cuurentEngine] == true)
{
TextWriter tw = new StreamWriter(OutputPath + "/" + cuurentEngine + ".txt", true);
tw.WriteLine(line);
tw.Close();
}
}
}
my question is there any way to make this run faster
You haven't shown your Init method, but at the moment it looks like each of your threads will actually be checking all of the lines. Additionally, it looks like all of those may be trying to write to the same files - and not doing so in an exception-safe way (using using statements) either.
EDIT: Okay, so now we can see Init but we can't see SubArray. Presumably it just copies a chunk of the array.
How slow is this if you avoid using threads to start with? Is it definitely too slow? What is your performance target? It seems unlikely that using 10 threads is going to help though, given that at that point it's entirely memory/CPU-bound. (You should also try to avoid repeating so much code for starting all the threads - why aren't you using a collection for that?)
You are probably IO bound, so I'd guess that multiple threads aren't going to help much. (Odds are your program spends most of its time here: Lines = File.ReadAllLines(FilePath); and not that much time actually parsing. You should measure though.) In fact, your SubArray splitting is possibly slower than if you just passed the whole thing to a single parser thread.
I would be looking at MemoryMappedFile (if this is .NET 4) which should help some with IO by not having to make copies of all the source data.
I would like to recommend something which may be useful. As someone said, there is no point if you assign multiple thread read your file since this is more of I/O activity which in this case get queued up in OS FileManager. But definitely you can place an async I/O request for any available I/O completion thread to look after.
Now when it comes to processing the file, I would recommend you use Memory-mapped files
. Memory-mapped files are ideal for scenarios where an arbitrary chunk file ( view) of a considerably larger file needs to be accessed repeatedly/separately. In your scenario, memory-mapped files can help you split/assemble the file if the chunks arrive/process out of order.
I have no handy examples at the moment. Have a look at the following article Memory Mapped Files.

Categories

Resources