Reading Text file data through Multi threading in C#.net

Reading Text file data through Multi threading in C#.net - c#

I am reading the text file consisting of 6 columns. Among 6 columns each 3 columns show one object information I want to access these columns in parallel through multithreading. Like 3 columns for one object, altogether 2 threads have created except main thread.
The text file looks like this:
I tried it but I face difficulty in passing data from the main thread to other threads error occurs at string variable "part". (variable part doesn't exist in the current context)
I want to do multithreading for tag1 and tag2.
I am sharing the block of my code, please suggest me where I am mistaken
As I am new to multithread programming.
namespace MultiTag_Simulation_ConsoleApp
{
class Program
{
static void Main(string[] args)
{
string line;
string[] part;
StreamReader File = new StreamReader("2Tags_Points.txt");
while((line = File.ReadLine()) !=null)
{
part = line.Split('\t');
Thread TAG1 = new Thread(new ThreadStart(Tag1));
TAG1.Start();
}
}
void Tag1()
{
double w, x;
w = Convert.ToDouble(part[1]);
x = Convert.ToDouble(part[2]);
Console.WriteLine("Tag1 x:" + w + "\t" + "Tag1 y:" + x);
Console.ReadKey();
}
}
}

Thank you, everyone, for your time. I had mistaken in thread synchronization. Now I solved the issue, by initializing the "part" variable as static variable above the main thread.
static string [] part

Your solution, although it might compile, still has a lot of hidden problems. You need to synchronize access to shared variables for example and right now, if you did that, it would defeat the purpose of having multiple threads. I would suggest using a simpler framework, that will do the multi-threading for you, because multi-threading is hard to get right, but using multiple processors for your workload is way easier when you leave the hard stuff to the framework.
For example, this will calculate your stuff in parallel. Although just in parallel per line, not per tag, but as long as all your processors are used optimally, it really does not matter.
namespace MultiTag_Simulation_ConsoleApp
{
using System;
using System.IO;
using System.Linq;
using System.Threading.Tasks;
internal static class Program
{
internal static void Main()
{
Parallel.ForEach(
File.ReadLines("2Tags_Points.txt").Select(line => line.Split('\t')),
parts =>
{
var w = Convert.ToDouble(parts[1]);
var x = Convert.ToDouble(parts[2]);
Console.WriteLine("Tag1 x:" + w + "\t" + "Tag1 y:" + x);
var y = Convert.ToDouble(parts[4]);
var z = Convert.ToDouble(parts[5]);
Console.WriteLine("Tag2 x:" + y + "\t" + "Tag2 y:" + z);
});
}
}
}

Related

How to read and write more then 25000 records/lines into text file at a time?

I am connecting my application with stock market live data provider using web socket. So when market is live and socket is open then it's giving me nearly 45000 lines in a minute. at a time I am deserializing it line by line
and then write that line into text file and also reading text file and removing first line of text file. So handling another process with socket becomes slow. So please can you help me that how should I perform that process very fast like nearly 25000 lines in a minute.
string filePath = #"D:\Aggregate_Minute_AAPL.txt";
var records = (from line in File.ReadLines(filePath).AsParallel()
select line);
List<string> str = records.ToList();
str.ForEach(x =>
{
string result = x;
result = result.TrimStart('[').TrimEnd(']');
var jsonString = Newtonsoft.Json.JsonConvert.DeserializeObject<List<LiveAMData>>(x);
foreach (var item in jsonString)
{
string value = "";
string dirPath = #"D:\COMB1\MinuteAggregates";
string[] fileNames = null;
fileNames = System.IO.Directory.GetFiles(dirPath, item.sym+"_*.txt", System.IO.SearchOption.AllDirectories);
if(fileNames.Length > 0)
{
string _fileName = fileNames[0];
var lineList = System.IO.File.ReadAllLines(_fileName).ToList();
lineList.RemoveAt(0);
var _item = lineList[lineList.Count - 1];
if (!_item.Contains(item.sym))
{
lineList.RemoveAt(lineList.Count - 1);
}
System.IO.File.WriteAllLines((_fileName), lineList.ToArray());
value = $"{item.sym},{item.s},{item.o},{item.h},{item.c},{item.l},{item.v}{Environment.NewLine}";
using (System.IO.StreamWriter sw = System.IO.File.AppendText(_fileName))
{
sw.Write(value);
}
}
}
});
How to make process fast, if application perform this then it takes nearly 3000 to 4000 symbols. and if there is no any process then it executes 25000 lines per minute. So how to increase line execution time/process with all this code ?

First you need to cleanup you code to gain more visibility, i did a quick refactor and this is what i got
const string FilePath = #"D:\Aggregate_Minute_AAPL.txt";
class SomeClass
{
public string Sym { get; set; }
public string Other { get; set; }
}
private void Something() {
File
.ReadLines(FilePath)
.AsParallel()
.Select(x => x.TrimStart('[').TrimEnd(']'))
.Select(JsonConvert.DeserializeObject<List<SomeClass>>)
.ForAll(WriteRecord);
}
private const string DirPath = #"D:\COMB1\MinuteAggregates";
private const string Separator = #",";
private void WriteRecord(List<SomeClass> data)
{
foreach (var item in data)
{
var fileNames = Directory
.GetFiles(DirPath, item.Sym+"_*.txt", SearchOption.AllDirectories);
foreach (var fileName in fileNames)
{
var fileLines = File.ReadAllLines(fileName)
.Skip(1).ToList();
var lastLine = fileLines.Last();
if (!lastLine.Contains(item.Sym))
{
fileLines.RemoveAt(fileLines.Count - 1);
}
fileLines.Add(
new StringBuilder()
.Append(item.Sym)
.Append(Separator)
.Append(item.Other)
.Append(Environment.NewLine)
.ToString()
);
File.WriteAllLines(fileName, fileLines);
}
}
}
From here should be more easy to play with List.AsParallel to check how and with what parameters the code is faster.
Also:
You are opening the write file twice
The removes are also somewhat expensive, in the index 0 is more (however, if there are few elements this could not make much difference
if(fileNames.Length > 0) is useless, use a for, if the list is empty, then he for will simply skip
You can try StringBuilder instead string interpolation
I hope this hints can help you to improve your time! and that i have not forgetting something.
Edit
We have nearly 10,000 files in our directory. So when process is
running then it's passing an error that The Process can not access the
file because it is being used by another process
Well, is there a possibility that in your process lines there is duplicated file names?
If that is the case, you could try a simple approach, a retry after some milliseconds, something like
private const int SleepMillis = 5;
private const int MaxRetries = 3;
public void WriteFile(string fileName, string[] fileLines, int retries = 0)
{
try
{
File.WriteAllLines(fileName, fileLines);
}
catch(Exception e) //Catch the special type if you can
{
if (retries >= MaxRetries)
{
Console.WriteLine("Too many tries with no success");
throw; // rethrow exception
}
Thread.Sleep(SleepMillis);
WriteFile(fileName, fileLines, ++retries); // try again
}
}
I tried to keep it simple, but there are some annotations:
- If you can make your methods async, it could be an improvement by changing the sleep for a Task.Delay, but you need to know and understand well how async works
- If the collision happens a lot, then you should try another approach, something like a concurrent map with semaphores
Second edit
In real scenario I am connecting to websocket and receiving 70,000 to
1 lac records on every minute and after that I am bifurcating those
records with live streaming data and storing in it's own file. And
that becomes slower when I am applying our concept with 11,000 files
It is a hard problem, from what i understand, you're talking about 1166 records per second, at this size the little details can become big bottlenecks.
At that phase i think it is better to think about other solutions, it could be so much I/O for the disk, could be many threads, or too few, network...
You should start by profiling the app to check where the app is spending more time to focus in that area, how much resources is using? how much resources do you have? how is the memory, processor, garbage collector, network? do you have an SSD?
You need a clear view of what is slowing you down so you can attack that directly, it will depend on a lot of things, it will be hard to help with that part :(.
There are tons of tools for profile c# apps, and many ways to attack this problem (spread the charge in several servers, use something like redis to save data really quick, some event store so you can use events....

C# console application design like htop

I want to build console application with similar interface like htop's one (fixed console design). Here is a link to htop console design: http://upload.wikimedia.org/wikipedia/commons/b/b1/Htop.png. I wanted to ask how can I build application like this as I only know C#'s Console.Write() method. I am writing simple program that is starting up applications via Process.Start() and then I am monitoring for example their RAM usage via Process.WorkingSet64 and outputing it via simple Console.WriteLine() each line to console. But how could I design C# console application like htop so it has fixed design that will be for example refreshing every 1 second. By fixed designed I mean that I it will be fixed position on the console where I will print out process names, ram usage, application name, etc. Here is my code of the program:
class Program
{
static void Main(string[] args)
{
string[] myApps = { "notepad.exe", "calc.exe", "explorer.exe" };
Thread w;
ParameterizedThreadStart ts = new ParameterizedThreadStart(StartMyApp);
foreach (var myApp in myApps)
{
w = new Thread(ts);
w.Start(myApp);
Thread.Sleep(1000);
}
}
public static void StartMyApp(object myAppPath)
{
ProcessStartInfo myInfoProcess = new ProcessStartInfo();
myInfoProcess.FileName = myAppPath.ToString();
myInfoProcess.WindowStyle = ProcessWindowStyle.Minimized;
Process myProcess = Process.Start(myInfoProcess);
do
{
if (!myProcess.HasExited)
{
myProcess.Refresh(); // Refresh the current process property values.
Console.WriteLine(myProcess.ProcessName+" RAM: " + (myProcess.WorkingSet64 / 1024 / 1024).ToString() + "\n");
Thread.Sleep(1000);
}
}
while (!myProcess.WaitForExit(1000));
}
}
EDIT: Thanks for pointing to Console.SetCursorPosition #Jim Mischel. I want to use that in my application but now I have another problem. How could I pass to my StartMyApp method, the index number from myApps array so I could do something like:
Console.WriteLine((Array.IndexOf(myApps, myAppPath) + " " + myProcess.ProcessName+" RAM: "+ (myProcess.WorkingSet64 / 1024 / 1024).ToString() + "\n");
That is inside my StartMyApp method. Any method I use I end up getting The name 'myApps' does not exist in the current context. This is very important for me so I could design my application later using Console.SetCursorPosition but I need that index number. So my output would be for example:
0 notepad RAM: 4
1 calc RAM: 4
2 explorer RAM: 12

You want to call Console.SetCursorPosition to set the position where the next write will occur. The linked MSDN topic has a basic example that will get you started.
You'll also be interested in the BackgroundColor, ForegroundColor, and possibly other properties. See the Console class documentation for details.

Why does some file get missed out if i use Parallel.ForEach()?

Following is the code which processes about 10000 files.
var files = Directory.GetFiles(directorypath, "*.*", SearchOption.AllDirectories).Where(
name => !name.EndsWith(".gif") && !name.EndsWith(".jpg") && !name.EndsWith(".png")).ToList();
Parallel.ForEach(files,Countnumberofwordsineachfile);
And the Countnumberofwordsineachfile function prints the number of words in each file into the text.
Whenever i implement Parallel.ForEach(), i miss about 4-5 files everytime while processing.
Can anyone suggest as to why this happens?
public void Countnumberofwordsineachfile(string filepath)
{
string[] arrwordsinfile = Regex.Split(File.ReadAllText(filepath).Trim(), #"\s+");
Charactercount = Convert.ToInt32(arrwordsinfile.Length);
filecontent.AppendLine(filepath + "=" + Charactercount);
}

fileContent is probably not threadsafe. So if two (or more) tasks attempt to append to it at the same time one will win, the other will not. You need to remember to either lock the sections that are shared, or don't used shared data.
This is probably the easiest solution for your code. Locking, synchronises access (other tasks have to queue up to access the locked section) so it will slow down the algorithm, but since this is very short compared to the part that counts the words is likely to be then it isn't really going to be much of an issue.
private object myLock = new object();
public void Countnumberofwordsineachfile(string filepath)
{
string[] arrwordsinfile = Regex.Split(File.ReadAllText(filepath).Trim(), #"\s+");
Charactercount = Convert.ToInt32(arrwordsinfile.Length);
lock(myLock)
{
filecontent.AppendLine(filepath + "=" + Charactercount);
}
}

The cause has already been found, here is an alternative implementation:
//Parallel.ForEach(files,Countnumberofwordsineachfile);
var fileContent = files
.AsParallel()
.Select(f=> f + "=" + Countnumberofwordsineachfile(f));
and that requires a more useful design for the count method:
// make this an 'int' function, more reusable as well
public int Countnumberofwordsineachfile(string filepath)
{ ...; return characterCount; }
But do note that going parallel won't help you much here, your main function (ReadAllText) is I/O bound so you will most likely see a degradation from using AsParallel().
The better option is to use Directory.EnumerateFiles and then collect the results without parallelism:
var files = Directory.EnumerateFiles(....);
var fileContent = files
//.AsParallel()
.Select(f=> f + "=" + Countnumberofwordsineachfile(f));

write file need to optimised for heavy traffic

i am very new to C#, and this is my first question, please be gentle on me
I am trying to write a application to capture some tick data from the data provider, below is the main part of the program
void zf_TickEvent(object sender, ZenFire.TickEventArgs e)
{
output myoutput = new output();
myoutput.time = e.TimeStamp;
myoutput.product = e.Product.ToString();
myoutput.type = Enum.GetName(typeof(ZenFire.TickType), e.Type);
myoutput.price = e.Price;
myoutput.volume = e.Volume;
using (StreamWriter writer = File.AppendText("c:\\log222.txt"))
{
writer.Write(myoutput.time.ToString(timeFmt) + ",");
writer.Write(myoutput.product + "," );
writer.Write(myoutput.type + "," );
writer.Write(myoutput.price + ",");
writer.Write(myoutput.volume + ",");
}
i have successfully write the data into the text file, however i know that this method will be call like 10000 times a second during peak time, and open a file and append it many times a second is very inefficient, i was pointed to use a buffer or some sort, but i have no idea how to do it, i try reading the document but i still dont understand, thats why i turn in here for help.
Please give me some (working) snippet code so i can pointed to the write direction. thanks
EDIT: i have simplified the code as much as possible
using (StreamWriter streamWriter = File.AppendText("c:\\output.txt"))
{
streamWriter.WriteLine(string.Format("{0},{1},{2},{3},{4}",
e.TimeStamp.ToString(timeFmt),
e.Product.ToString(),
Enum.GetName(typeof(ZenFire.TickType), e.Type),
e.Price,
e.Volume));
}
ED has told me to make my stream to a field, how is the syntax looks like? can anyone post some code to help me? thanks a lot

You need to create a field for the stream instead of a local variable. Initialize it in constructor once and don't forget to close it somewhere. It's better to implement IDisposable interface and close the stream in Dispose() method.
IDisposable
class MyClass : IDisposable {
private StreamWriter _writer;
MyClass() {
_writer = File.App.....;
}
void zf_TickEvent(object sender, ZenFire.TickEventArgs e)
{
output myoutput = new output();
myoutput.time = e.TimeStamp;
myoutput.product = e.Product.ToString();
myoutput.type = Enum.GetName(typeof(ZenFire.TickType), e.Type);
myoutput.price = e.Price;
myoutput.volume = e.Volume;
_writer.Write(myoutput.time.ToString(timeFmt) + ",");
_writer.Write(myoutput.product + "," );
_writer.Write(myoutput.type + "," );
_writer.Write(myoutput.price + ",");
_writer.Write(myoutput.volume + ",");
}
public void Dispose() { /*see the documentation*/ }
}

There are many things you can do
Step 1. Make sure you don't make many io calls and string concatenations.
Output myOutput = new Outoput(e); // Maybe consruct from event args?
// Single write call, single string.format
writer.Write(string.Format("{0},{1},{2},{3},{4},{5}",
myOutput.Time.ToString(),
myOutput.Product,
...);
This I recommend regardless of what your current performance is. I also made some cosmetic changes (variable/property/class name casing. You should look up the difference between variables and properties and their recommended case etc.)
Step 2. Analyse your performance to see if it does what you want. If it does, no need to do anything further. If performance is still too bad, you can
Keep the file open and close it when your handler shuts down.
Write to a buffer and flush it at regular intervals.
Use a logger framework like log4net that internally handles the above for you, and takes care of hairy issues like access to the log file from multiple threads.

I would use String.Format:
using (StreamWriter writer = new StreamWriter(#"c:\log222.txt", true))
{
writer.AutoFlush = true;
writer.Write(String.Format("{0},{1},{2},{3},{4},", myoutput.time.ToString(timeFmt),
myoutput.product, myoutput.type, myoutput.price, myoutput.volume);
}
If you use # before string you don't have to use double \.
This is much faster - you write only once to the file instead of 5 times. Additionally you don't use + operator with strings which is not the fastest operation ;)
Also - if this is multithreading application - you should consider using some lock. It would prevent application from trying to write to the file from eg. 2 threads at one time.

Memory usage of strings (or any other objects) in .Net

I wrote this little test program:
using System;
namespace GCMemTest
{
class Program
{
static void Main(string[] args)
{
System.GC.Collect();
System.Diagnostics.Process pmCurrentProcess = System.Diagnostics.Process.GetCurrentProcess();
long startBytes = pmCurrentProcess.PrivateMemorySize64;
double kbStart = (double)(startBytes) / 1024.0;
System.Console.WriteLine("Currently using " + kbStart + "KB.");
{
int size = 2000000;
string[] strings = new string[size];
for(int i = 0; i < size; i++)
{
strings[i] = "blabla" + i;
}
}
System.GC.Collect();
pmCurrentProcess = System.Diagnostics.Process.GetCurrentProcess();
long endBytes = pmCurrentProcess.PrivateMemorySize64;
double kbEnd = (double)(endBytes) / 1024.0;
System.Console.WriteLine("Currently using " + kbEnd + "KB.");
System.Console.WriteLine("Leaked " + (kbEnd - kbStart) + "KB.");
System.Console.ReadKey();
}
}
}
The output in Release build is:
Currently using 18800KB.
Currently using 118664KB.
Leaked 99864KB.
I assume that the GC.collect call will remove the allocated strings since they go out of scope, but it appears it does not. I do not understand nor can I find an explanation for it. Maybe anyone here?
Thanks,
Alex

You're looking at the private memory size - the managed heap will have expanded to accommodate the strings, but it won't release the memory back to the operating system when the strings are garbage collected. Instead, the managed heap will be bigger, but have lots of free space - so if you create more objects, it won't require the heap to expand.
If you want to look at the memory used within the managed heap, look at GC.GetTotalMemory. Note that due to the complexities of garbage collection, there's a certain amount of woolliness within all of this.

Indeed I used the private mem size because that's the one that's closest to the one in Process Explorer
if I rewrite the program with the GC.GetTotalMemory like this:
using System;
namespace GCMemTest
{
class Program
{
static void Main(string[] args)
{
System.GC.Collect();
long startBytes = System.GC.GetTotalMemory(true);
{
string[] strings = new string[2000000];
for (int i = 0; i < 2000000; i++)
{
strings[i] = "blabla" + i;
}
strings = null;
}
System.GC.Collect();
long endBytes = System.GC.GetTotalMemory(true);
double kbStart = (double)(startBytes) / 1024.0;
double kbEnd = (double)(endBytes) / 1024.0;
System.Console.WriteLine("Leaked " + (kbEnd - kbStart) + "KB.");
System.Console.ReadKey();
}
}
}
Then the output is:
Leaked 0KB.
Only when I have 'strings = null;' this is the case, remove it and I leak 100MB. This means that the local scope in the main routine does not cause the array to be freed. If I move that part into a static method Test, and call that one instead, I leak a few bytes. I guess what I should learn from this is that local scopes are ignored by the GC.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Reading Text file data through Multi threading in C#.net - c#

Thank you, everyone, for your time. I had mistaken in thread synchronization. Now I solved the issue, by initializing the "part" variable as static variable above the main thread. static string [] part

Related

How to read and write more then 25000 records/lines into text file at a time?

C# console application design like htop

Why does some file get missed out if i use Parallel.ForEach()?

write file need to optimised for heavy traffic

Memory usage of strings (or any other objects) in .Net

Categories

Resources