write file need to optimised for heavy traffic - c#

i am very new to C#, and this is my first question, please be gentle on me
I am trying to write a application to capture some tick data from the data provider, below is the main part of the program
void zf_TickEvent(object sender, ZenFire.TickEventArgs e)
{
output myoutput = new output();
myoutput.time = e.TimeStamp;
myoutput.product = e.Product.ToString();
myoutput.type = Enum.GetName(typeof(ZenFire.TickType), e.Type);
myoutput.price = e.Price;
myoutput.volume = e.Volume;
using (StreamWriter writer = File.AppendText("c:\\log222.txt"))
{
writer.Write(myoutput.time.ToString(timeFmt) + ",");
writer.Write(myoutput.product + "," );
writer.Write(myoutput.type + "," );
writer.Write(myoutput.price + ",");
writer.Write(myoutput.volume + ",");
}
i have successfully write the data into the text file, however i know that this method will be call like 10000 times a second during peak time, and open a file and append it many times a second is very inefficient, i was pointed to use a buffer or some sort, but i have no idea how to do it, i try reading the document but i still dont understand, thats why i turn in here for help.
Please give me some (working) snippet code so i can pointed to the write direction. thanks
EDIT: i have simplified the code as much as possible
using (StreamWriter streamWriter = File.AppendText("c:\\output.txt"))
{
streamWriter.WriteLine(string.Format("{0},{1},{2},{3},{4}",
e.TimeStamp.ToString(timeFmt),
e.Product.ToString(),
Enum.GetName(typeof(ZenFire.TickType), e.Type),
e.Price,
e.Volume));
}
ED has told me to make my stream to a field, how is the syntax looks like? can anyone post some code to help me? thanks a lot

You need to create a field for the stream instead of a local variable. Initialize it in constructor once and don't forget to close it somewhere. It's better to implement IDisposable interface and close the stream in Dispose() method.
IDisposable
class MyClass : IDisposable {
private StreamWriter _writer;
MyClass() {
_writer = File.App.....;
}
void zf_TickEvent(object sender, ZenFire.TickEventArgs e)
{
output myoutput = new output();
myoutput.time = e.TimeStamp;
myoutput.product = e.Product.ToString();
myoutput.type = Enum.GetName(typeof(ZenFire.TickType), e.Type);
myoutput.price = e.Price;
myoutput.volume = e.Volume;
_writer.Write(myoutput.time.ToString(timeFmt) + ",");
_writer.Write(myoutput.product + "," );
_writer.Write(myoutput.type + "," );
_writer.Write(myoutput.price + ",");
_writer.Write(myoutput.volume + ",");
}
public void Dispose() { /*see the documentation*/ }
}

There are many things you can do
Step 1. Make sure you don't make many io calls and string concatenations.
Output myOutput = new Outoput(e); // Maybe consruct from event args?
// Single write call, single string.format
writer.Write(string.Format("{0},{1},{2},{3},{4},{5}",
myOutput.Time.ToString(),
myOutput.Product,
...);
This I recommend regardless of what your current performance is. I also made some cosmetic changes (variable/property/class name casing. You should look up the difference between variables and properties and their recommended case etc.)
Step 2. Analyse your performance to see if it does what you want. If it does, no need to do anything further. If performance is still too bad, you can
Keep the file open and close it when your handler shuts down.
Write to a buffer and flush it at regular intervals.
Use a logger framework like log4net that internally handles the above for you, and takes care of hairy issues like access to the log file from multiple threads.

I would use String.Format:
using (StreamWriter writer = new StreamWriter(#"c:\log222.txt", true))
{
writer.AutoFlush = true;
writer.Write(String.Format("{0},{1},{2},{3},{4},", myoutput.time.ToString(timeFmt),
myoutput.product, myoutput.type, myoutput.price, myoutput.volume);
}
If you use # before string you don't have to use double \.
This is much faster - you write only once to the file instead of 5 times. Additionally you don't use + operator with strings which is not the fastest operation ;)
Also - if this is multithreading application - you should consider using some lock. It would prevent application from trying to write to the file from eg. 2 threads at one time.

Related

Is there a memory efficient way to use 'using' within a recursive function when e.g. writing lines to a file?

Is there a memory efficient way to use 'using' within a recursive function when e.g. writing lines to a file?
I read C# 'using' block inside a loop and it mentioned that you don't want to put a using statement inside a for loop unless you have to. (makes sense, one doesn't want multiple instances of 'using' if one doesn't need them). So in the case of a for loop if you can put it outside, you do.
But here, I have a recursive function. So the 'using' statement is going to run multiple times even if I put it outside of a for.
So what is a good or proper way of placing the 'using' statement?
I don't know if I should avoid 'using', and declare the StreamWriter object, StreamWriter writetext before the method call and dispose of it after with writetext.Dispose(). Or maybe there is a more conventional way with 'using'. Maybe wrapping the 'main' call DirSearch_basic_writetofile("c:\\aaa"); with a 'try' and putting the Dispose line in a finally. And avoiding 'using' then. That's just a thought.
// requires directory c:\texts
File.Delete(#"c:\texts\filelist.txt");
// list files and subdirectories of c:\aaa and write them to file "c:\texts\filelist.txt"
DirSearch_basic_writetofile("c:\\aaa");
// recursive function that lists files and directories and subdirectories,in given directory
static void DirSearch_basic_writetofile(string sDir)
{
Console.WriteLine("DirSearch(" + sDir + ")");
Console.WriteLine(sDir+#"\");
try
{
using (StreamWriter writetext = new StreamWriter("c:\\texts\\filelist.txt",true))
{
writetext.WriteLine("DirSearch(" + sDir + ")");
writetext.WriteLine(sDir);
foreach (string f in Directory.GetFiles(sDir))
{
Console.WriteLine(f);
writetext.WriteLine(f);
}
}
foreach (string d in Directory.GetDirectories(sDir))
{
DirSearch_basic_writetofile(d);
}
}
catch (System.Exception excpt)
{
Console.WriteLine(excpt.Message);
}
}
The linked thing is a case where you were using the same resource for all itterations of the loop. In that case, opening and closing it every itteration serves no purpose. As long as it is closed by the end of all loops, it is save enough.
The opposite case is if you use a different resource every itteration. Say, when going over a list of filenames or full paths, to open each in turn. In that case you got no chocie but to have a new File related instance each itteration.
A recursion is not really different from a loop. You can always replace a loop with a recursion, but the opposite is not always the case. Same rules apply:
If it is the same resource, you just have to move the creation of the resource outside the recursive function. Rather then taking a path (or using a hardcoded one), let it take a Stream. That keeps the function nicely generic
If you got a different resource, you have no choice but create a new Instance with a new using every recursion. However I can not think of any "recursive using" case.
If you got to itterate over all files in a directory inlcuding all subdirectories, you would have the recursive function recurse over the directories (no unmanaged resource needed). And then a loop inside the recursive function to itterate over the files inside the current directory (wich requires unmanaged resources).
Edit:
static void DirSearch_basic_writetofile(string currentDirectory, StreamWriter Output){
//do your thing, using output as the stream you write to
//Do recusirve calls as normal
DirSearch_basic_writetofile(subDir, Output);
}
calling it:
using (StreamWriter OutputWriter = new StreamWriter("c:\\texts\\filelist.txt",true){
DirSearch_basic_writetofile(startDirectory, OutputWriter);
}
If we want to solve using yield return
You might want to restructure the code such that you separate out the recursive part; for example with a yield return.
Something like this below ( sorry, no IDE at hand, let's see if this works) is a simplistic approach.
If you need to write out the new header ( DirSearch(" + sDir + ") ) every time you switch directory, that's doable by not returning String only from producer an object containing String directoryName, List fileNames, and return only once for each directory.
static void DirSearch_basic_writetofile(string sDir)
{
Console.WriteLine("DirSearch(" + sDir + ")");
Console.WriteLine(sDir+#"\");
IEnumerable<String> producer = DirSearch_Producer(string sDir);
try
{
using (StreamWriter writetext = new StreamWriter("c:\\texts\\filelist.txt",true))
{
writetext.WriteLine("DirSearch(" + sDir + ")");
writetext.WriteLine(sDir);
foreach (string f in DirSearch_Producer(sDir))
{
Console.WriteLine(f);
}
}
}
catch (System.Exception excpt)
{
Console.WriteLine(excpt.Message);
}
}
public static IEnumerable<String> DirSearch_Producer(string sDir){
foreach (string f in Directory.GetFiles(sDir))
{
yield return f;
}
foreach (string d in Directory.GetDirectories(sDir))
{
foreach (String f in DirSearch_Producer(d)){
yield return f;
}
}
}
Alternative, without using yield return
we can use the Directory.GetFiles with EnumerationOptions to go through subdirectories as well. It makes things much simpler. See: RecurseSubdirectories
I am going to leave Christopher's answer as accepted..
I will just note here the solution I used..
using (writetext = new StreamWriter("c:\\texts\\filelist.txt", true))
DirSearch_basic_writetofile_quicker("c:\\aaa");
I also used a StringBuilder and wrote to console and file after
// https://stackoverflow.com/questions/15443739/what-is-the-simplest-way-to-write-the-contents-of-a-stringbuilder-to-a-text-file
System.IO.File.WriteAllText(#"c:\texts\filelist.txt", sb.ToString());
Console.Write(sb.ToString());
mjwillis had suggested to me that writing to the console one line at a time was a bottleneck, and christopher mentioned about writing to the file at the end. So I just write to both at the end / after the call.
I used a static variable for StreamWriter and a static variable for StringBuilder, so that main() and the recursive function could see it. I didn't want to create a new parameter for the recursive function, 'cos I wanted to keep the recursive call looking 'simple' (one parameter).

How to read and write more then 25000 records/lines into text file at a time?

I am connecting my application with stock market live data provider using web socket. So when market is live and socket is open then it's giving me nearly 45000 lines in a minute. at a time I am deserializing it line by line
and then write that line into text file and also reading text file and removing first line of text file. So handling another process with socket becomes slow. So please can you help me that how should I perform that process very fast like nearly 25000 lines in a minute.
string filePath = #"D:\Aggregate_Minute_AAPL.txt";
var records = (from line in File.ReadLines(filePath).AsParallel()
select line);
List<string> str = records.ToList();
str.ForEach(x =>
{
string result = x;
result = result.TrimStart('[').TrimEnd(']');
var jsonString = Newtonsoft.Json.JsonConvert.DeserializeObject<List<LiveAMData>>(x);
foreach (var item in jsonString)
{
string value = "";
string dirPath = #"D:\COMB1\MinuteAggregates";
string[] fileNames = null;
fileNames = System.IO.Directory.GetFiles(dirPath, item.sym+"_*.txt", System.IO.SearchOption.AllDirectories);
if(fileNames.Length > 0)
{
string _fileName = fileNames[0];
var lineList = System.IO.File.ReadAllLines(_fileName).ToList();
lineList.RemoveAt(0);
var _item = lineList[lineList.Count - 1];
if (!_item.Contains(item.sym))
{
lineList.RemoveAt(lineList.Count - 1);
}
System.IO.File.WriteAllLines((_fileName), lineList.ToArray());
value = $"{item.sym},{item.s},{item.o},{item.h},{item.c},{item.l},{item.v}{Environment.NewLine}";
using (System.IO.StreamWriter sw = System.IO.File.AppendText(_fileName))
{
sw.Write(value);
}
}
}
});
How to make process fast, if application perform this then it takes nearly 3000 to 4000 symbols. and if there is no any process then it executes 25000 lines per minute. So how to increase line execution time/process with all this code ?
First you need to cleanup you code to gain more visibility, i did a quick refactor and this is what i got
const string FilePath = #"D:\Aggregate_Minute_AAPL.txt";
class SomeClass
{
public string Sym { get; set; }
public string Other { get; set; }
}
private void Something() {
File
.ReadLines(FilePath)
.AsParallel()
.Select(x => x.TrimStart('[').TrimEnd(']'))
.Select(JsonConvert.DeserializeObject<List<SomeClass>>)
.ForAll(WriteRecord);
}
private const string DirPath = #"D:\COMB1\MinuteAggregates";
private const string Separator = #",";
private void WriteRecord(List<SomeClass> data)
{
foreach (var item in data)
{
var fileNames = Directory
.GetFiles(DirPath, item.Sym+"_*.txt", SearchOption.AllDirectories);
foreach (var fileName in fileNames)
{
var fileLines = File.ReadAllLines(fileName)
.Skip(1).ToList();
var lastLine = fileLines.Last();
if (!lastLine.Contains(item.Sym))
{
fileLines.RemoveAt(fileLines.Count - 1);
}
fileLines.Add(
new StringBuilder()
.Append(item.Sym)
.Append(Separator)
.Append(item.Other)
.Append(Environment.NewLine)
.ToString()
);
File.WriteAllLines(fileName, fileLines);
}
}
}
From here should be more easy to play with List.AsParallel to check how and with what parameters the code is faster.
Also:
You are opening the write file twice
The removes are also somewhat expensive, in the index 0 is more (however, if there are few elements this could not make much difference
if(fileNames.Length > 0) is useless, use a for, if the list is empty, then he for will simply skip
You can try StringBuilder instead string interpolation
I hope this hints can help you to improve your time! and that i have not forgetting something.
Edit
We have nearly 10,000 files in our directory. So when process is
running then it's passing an error that The Process can not access the
file because it is being used by another process
Well, is there a possibility that in your process lines there is duplicated file names?
If that is the case, you could try a simple approach, a retry after some milliseconds, something like
private const int SleepMillis = 5;
private const int MaxRetries = 3;
public void WriteFile(string fileName, string[] fileLines, int retries = 0)
{
try
{
File.WriteAllLines(fileName, fileLines);
}
catch(Exception e) //Catch the special type if you can
{
if (retries >= MaxRetries)
{
Console.WriteLine("Too many tries with no success");
throw; // rethrow exception
}
Thread.Sleep(SleepMillis);
WriteFile(fileName, fileLines, ++retries); // try again
}
}
I tried to keep it simple, but there are some annotations:
- If you can make your methods async, it could be an improvement by changing the sleep for a Task.Delay, but you need to know and understand well how async works
- If the collision happens a lot, then you should try another approach, something like a concurrent map with semaphores
Second edit
In real scenario I am connecting to websocket and receiving 70,000 to
1 lac records on every minute and after that I am bifurcating those
records with live streaming data and storing in it's own file. And
that becomes slower when I am applying our concept with 11,000 files
It is a hard problem, from what i understand, you're talking about 1166 records per second, at this size the little details can become big bottlenecks.
At that phase i think it is better to think about other solutions, it could be so much I/O for the disk, could be many threads, or too few, network...
You should start by profiling the app to check where the app is spending more time to focus in that area, how much resources is using? how much resources do you have? how is the memory, processor, garbage collector, network? do you have an SSD?
You need a clear view of what is slowing you down so you can attack that directly, it will depend on a lot of things, it will be hard to help with that part :(.
There are tons of tools for profile c# apps, and many ways to attack this problem (spread the charge in several servers, use something like redis to save data really quick, some event store so you can use events....

Why am I unable to write to this file?

I wrote an Async method in C# to write to a file, however I keep on getting the following exception:
The process cannot access the file
'C:\XXX\XXX\XXX\XXX\EventBuffer.txt' because it is being used by
another process.
I've had a look at similar questions already posted on SO such as this one, and this one but it seems like the cause of my issue is different.
I used a process monitor to see which processes are trying to access the directory in which the file is in but the only process trying to access it is the one I'm debugging (Will post a snippet soon of the debug process window).
It isn't that file access was being attempted before it was closed upon last access, because I can get the exception when I attempt to access the file for the first time. I have tried to implement a delay after the StreamWriter is instantiated incase the write method was being attempted, I wasn't using the using block before and was disposing of the object using it's dispose methods but in one of the similar questions a this solved the issue.
public static async void UpdateEventBufferFile(EventDetails EventDtls)
{
string line;
try
{
using (StreamWriter EventBufferFile = new StreamWriter(FilePath, true)) // creates the file
{
//All barcode data space sperated for split detection
line = EventDtls.SiteID + " " + EventDtls.McID + " " + EventDtls.EventID + " "
+ EventDtls.EventDT + " " + EventDtls.AdditionalInfo;
await Task.Run(() => LogFileManager.SystematicLog(" Events " + line + " added to buffer file", " BufferFileWriter.cs"));
await EventBufferFile.WriteLineAsync(line); //no need for new line char WriteLine does that
await EventBufferFile.FlushAsync();
//The using block is suffice to dispose of the object the below is no longer required
//EventBufferFile.Dispose();
//EventBufferFile.Close();
//EventBufferFile = null;
}
}
catch (Exception ex)
{
}
}
I have near identical methods utilised within other classes that don't cause the same issue, which annoys me quite a bit.
The method is not being invoked from within a Loop. Invocation is done in a seprate static class in the method below:
public static void AddCentralEvents(int SiteID, int McID, int EventID, DateTime EventDT, string AdditionalInfo)
{
EventDetails EventDetailsObj = new EventDetails();
EventDetailsObj.SiteID = SiteID;
EventDetailsObj.McID = McID;
EventDetailsObj.EventID = EventID;
EventDetailsObj.EventDT = EventDT;
EventDetailsObj.AdditionalInfo = AdditionalInfo;
Task.Run(() => BufferFileWriter.UpdateEventBufferFile(EventDetailsObj));
}
The error is self explanatory, you are using an ASYNC method (in a loop perhaps) and while your first task hasn't completed it's run (i.e. written to the file) that's why you are ending up with that error.
Have you tried writing with a synchronized method? If you have a requirement to periodically write to a file (i.e. logging) use a logging framework.
I recommend using log4net It is 'one of the' the best out there.

Why does some file get missed out if i use Parallel.ForEach()?

Following is the code which processes about 10000 files.
var files = Directory.GetFiles(directorypath, "*.*", SearchOption.AllDirectories).Where(
name => !name.EndsWith(".gif") && !name.EndsWith(".jpg") && !name.EndsWith(".png")).ToList();
Parallel.ForEach(files,Countnumberofwordsineachfile);
And the Countnumberofwordsineachfile function prints the number of words in each file into the text.
Whenever i implement Parallel.ForEach(), i miss about 4-5 files everytime while processing.
Can anyone suggest as to why this happens?
public void Countnumberofwordsineachfile(string filepath)
{
string[] arrwordsinfile = Regex.Split(File.ReadAllText(filepath).Trim(), #"\s+");
Charactercount = Convert.ToInt32(arrwordsinfile.Length);
filecontent.AppendLine(filepath + "=" + Charactercount);
}
fileContent is probably not threadsafe. So if two (or more) tasks attempt to append to it at the same time one will win, the other will not. You need to remember to either lock the sections that are shared, or don't used shared data.
This is probably the easiest solution for your code. Locking, synchronises access (other tasks have to queue up to access the locked section) so it will slow down the algorithm, but since this is very short compared to the part that counts the words is likely to be then it isn't really going to be much of an issue.
private object myLock = new object();
public void Countnumberofwordsineachfile(string filepath)
{
string[] arrwordsinfile = Regex.Split(File.ReadAllText(filepath).Trim(), #"\s+");
Charactercount = Convert.ToInt32(arrwordsinfile.Length);
lock(myLock)
{
filecontent.AppendLine(filepath + "=" + Charactercount);
}
}
The cause has already been found, here is an alternative implementation:
//Parallel.ForEach(files,Countnumberofwordsineachfile);
var fileContent = files
.AsParallel()
.Select(f=> f + "=" + Countnumberofwordsineachfile(f));
and that requires a more useful design for the count method:
// make this an 'int' function, more reusable as well
public int Countnumberofwordsineachfile(string filepath)
{ ...; return characterCount; }
But do note that going parallel won't help you much here, your main function (ReadAllText) is I/O bound so you will most likely see a degradation from using AsParallel().
The better option is to use Directory.EnumerateFiles and then collect the results without parallelism:
var files = Directory.EnumerateFiles(....);
var fileContent = files
//.AsParallel()
.Select(f=> f + "=" + Countnumberofwordsineachfile(f));

Search String takes a long time the first time only?

No shortage of search for string performance questions out there yet I still can not make heads or tails out of what the best approach is.
Long story short, I have committed to moving from 4NT to PowerShell. In leaving the 4NT I am going to miss the console super quick string searching utility that came with it called FFIND. I have decided to use my rudimentary C# programming skills to try an create my own utility to use in PowerShell that is just as quick.
So far search results on a string search in 100's of directories across a few 1000 files, some of which are quite large, are FFIND 2.4 seconds and my utility 4.4 seconds..... after I have ran mine at least once????
The first time I run them FFIND does it near the same time but mine takes over a minute? What is this? Loading of libraries? File indexing? Am I doing something wrong in my code? I do not mind waiting a little longer but the difference is extreme enough that if there is a better language or approach I would rather start down that path now before I get too invested.
Do I need to pick another language to write a string search that will be lighting fast
I have the need to use this utility to search through 1000 of files for strings in web code, C# code, and another propitiatory language that uses text files. I also need to be able to use this utility to find strings in very large log files, MB size.
class Program
{
public static int linecounter;
public static int filecounter;
static void Main(string[] args)
{
//
//INIT
//
filecounter = 0;
linecounter = 0;
string word;
// Read properties from application settings.
string filelocation = Properties.Settings.Default.FavOne;
// Set Args from console.
word = args[0];
//
//Recursive search for sub folders and files
//
string startDIR;
string filename;
startDIR = Environment.CurrentDirectory;
//startDIR = "c:\\SearchStringTestDIR\\";
filename = args[1];
DirSearch(startDIR, word, filename);
Console.WriteLine(filecounter + " " + "Files found");
Console.WriteLine(linecounter + " " + "Lines found");
Console.ReadKey();
}
static void DirSearch(string dir, string word, string filename)
{
string fileline;
string ColorOne = Properties.Settings.Default.ColorOne;
string ColorTwo = Properties.Settings.Default.ColorTwo;
ConsoleColor valuecolorone = (ConsoleColor)Enum.Parse(typeof(ConsoleColor), ColorOne);
ConsoleColor valuecolortwo = (ConsoleColor)Enum.Parse(typeof(ConsoleColor), ColorTwo);
try
{
foreach (string f in Directory.GetFiles(dir, filename))
{
StreamReader file = new StreamReader(f);
bool t = true;
int counter = 1;
while ((fileline = file.ReadLine()) != null)
{
if (fileline.Contains(word))
{
if (t)
{
t = false;
filecounter++;
Console.ForegroundColor = valuecolorone;
Console.WriteLine(" ");
Console.WriteLine(f);
Console.ForegroundColor = valuecolortwo;
}
linecounter++;
Console.WriteLine(counter.ToString() + ". " + fileline);
}
counter++;
}
file.Close();
file = null;
}
foreach (string d in Directory.GetDirectories(dir))
{
//Console.WriteLine(d);
DirSearch(d,word,filename);
}
}
catch (System.Exception ex)
{
Console.WriteLine(ex.Message);
}
}
}
}
If you want to speed up your code run a performance analysis and see what is taking the most time. I can almost guaruntee the longest step here will be
fileline.Contains(word)
This function is called on every line of the file, on every file. Naively searching for a word in a string can taken len(string) * len(word) comparisons.
You could code your own Contains method, that uses a faster string comparison algorithm. Google for "fast string exact matching". You could try using a regex and seeing if that gives you a performance enhancement. But I think the simplest optimization you can try is :
Don't read every line. Make a large string of all the content of the file.
StreamReader streamReader = new StreamReader(filePath, Encoding.UTF8);
string text = streamReader.ReadToEnd();
Run contains on this.
If you need all the matches in a file, then you need to use something like Regex.Matches(string,string).
After you have used regex to get all the matches for a single file, you can iterate over this match collection (if there are any matches). For each match, you can recover the line of the original file by writing a function that reads forward and backward from the match object index attribute, to where you find the '\n' character. Then output that string between those two newlines, to get your line.
This will be much faster, I guarantee it.
If you want to go even further, some things I've noticed are :
Remove the try catch statement from outside the loop. Only use it exactly where you need it. I would not use it at all.
Also make sure your system is running, ngen. Most setups usually have this, but sometimes ngen is not running. You can see the process in process explorer. Ngen generates a native image of the C# managed bytecode so the code does not have to be interpreted each time, but can be run natively. This speeds up C# a lot.
EDIT
Other points:
Why is there a difference between first and subsequent run times? Seems like caching. The OS could have cached the requests for the directories, for the files, for running and loading programs. Usually one sees speedups after a first run. Ngen could also be playing a part here, too, in generating the native image after compilation on the first run, then storing that in the native image cache.
In general, I find C# performance too variable for my liking. If the optimizations suggested are not satisfactory and you want more consistent performance results, try another language -- one that is not 'managed'. C is probably the best for your needs.

Categories

Resources