Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I have around 5000 files located in FTP, so i am downloading those by using FTP and then unzipping the files, finally processing and pushing in to oracle database.Except processing and pushing in to database everything going fine, i dont know why processing is not happeneing .I can see debugger hitting that method but it is not going in to inside method.How to fix this issue?
var list = ftp.GetFileList(remotepath);
//-------------------
DateTime dt = DateTime.Now;
string st = String.Format("{0:yyyyMMdd}", dt);//20161120
Task[] myTasks = new Task[list.Count];
int i = 0;
foreach (string item in list)
{
{
if (item.StartsWith("GExport_") && (!item.ToUpper().Contains("DUM")) && (item.Contains(st)) && (!item.ToUpper().Contains("BLK")))
{
4gpath = item;
//Downloadfile()
ftp.Get(dtr["REMOTE_FILE_PATH"].ToString() + 4gpath , #localDestnDir + "\\" + dtr["SOURCE_PATH"].ToString());
download_location_hw = dtr["LOCAL_FILE_PATH"].ToString();
// Spin off a background task to process the file we just downloaded
myTasks[i++] = Task.Factory.StartNew(() =>
{
//Extractfile()
ExtractZipfiles(download_location_hw + "//" + huwawei4gpath, dtr["REMOTE_FILE_PATH"].ToString(),
dtr["FTP_SERVER"].ToString(), dtr["FTP_USER_ID"].ToString(),
dtr["TECH_CODE"].ToString(), dtr["VENDOR_CODE"].ToString());
//Extract the zip file referred to by download_location_hw
// Process the extracted zip file
ProcessFile()
});
}
}
}
Task.WaitAll(myTasks);
Here ProcessFile() method is not executing at all
EDIT
there was typo in filepath cause issue,thanks,but my question is is there any synchronization issue,since first unzip the file and same time process file where file was not available,will it wait for unzipping before processing –
added check while(!File.Exists("")) { Thread.Sleep(1000);
does that make any isssues??
If you try this code here, you will notice it works. It is very similar to your code. Since this works, your issue is elsewhere and not related to Task(s).
class Program {
static void Main(string[] args) {
var list = new List<string> { "1", "2" };
Task[] myTasks = new Task[ list.Count ];
int i = 0;
foreach( string item in list ) {
// Spin off a background task to process the file we just downloaded
myTasks[ i++ ] = Task.Factory.StartNew( () =>
{
//Extract the zip file referred to by download_location_hw
// Process the extracted zip file
ProcessFile();
} );
}
Task.WaitAll( myTasks );
Console.WriteLine( "in main after processing..." );
Console.Read();
}
private static void ProcessFile() {
Console.Write( "Processed..." );
}
}
Related
I am connecting my application with stock market live data provider using web socket. So when market is live and socket is open then it's giving me nearly 45000 lines in a minute. at a time I am deserializing it line by line
and then write that line into text file and also reading text file and removing first line of text file. So handling another process with socket becomes slow. So please can you help me that how should I perform that process very fast like nearly 25000 lines in a minute.
string filePath = #"D:\Aggregate_Minute_AAPL.txt";
var records = (from line in File.ReadLines(filePath).AsParallel()
select line);
List<string> str = records.ToList();
str.ForEach(x =>
{
string result = x;
result = result.TrimStart('[').TrimEnd(']');
var jsonString = Newtonsoft.Json.JsonConvert.DeserializeObject<List<LiveAMData>>(x);
foreach (var item in jsonString)
{
string value = "";
string dirPath = #"D:\COMB1\MinuteAggregates";
string[] fileNames = null;
fileNames = System.IO.Directory.GetFiles(dirPath, item.sym+"_*.txt", System.IO.SearchOption.AllDirectories);
if(fileNames.Length > 0)
{
string _fileName = fileNames[0];
var lineList = System.IO.File.ReadAllLines(_fileName).ToList();
lineList.RemoveAt(0);
var _item = lineList[lineList.Count - 1];
if (!_item.Contains(item.sym))
{
lineList.RemoveAt(lineList.Count - 1);
}
System.IO.File.WriteAllLines((_fileName), lineList.ToArray());
value = $"{item.sym},{item.s},{item.o},{item.h},{item.c},{item.l},{item.v}{Environment.NewLine}";
using (System.IO.StreamWriter sw = System.IO.File.AppendText(_fileName))
{
sw.Write(value);
}
}
}
});
How to make process fast, if application perform this then it takes nearly 3000 to 4000 symbols. and if there is no any process then it executes 25000 lines per minute. So how to increase line execution time/process with all this code ?
First you need to cleanup you code to gain more visibility, i did a quick refactor and this is what i got
const string FilePath = #"D:\Aggregate_Minute_AAPL.txt";
class SomeClass
{
public string Sym { get; set; }
public string Other { get; set; }
}
private void Something() {
File
.ReadLines(FilePath)
.AsParallel()
.Select(x => x.TrimStart('[').TrimEnd(']'))
.Select(JsonConvert.DeserializeObject<List<SomeClass>>)
.ForAll(WriteRecord);
}
private const string DirPath = #"D:\COMB1\MinuteAggregates";
private const string Separator = #",";
private void WriteRecord(List<SomeClass> data)
{
foreach (var item in data)
{
var fileNames = Directory
.GetFiles(DirPath, item.Sym+"_*.txt", SearchOption.AllDirectories);
foreach (var fileName in fileNames)
{
var fileLines = File.ReadAllLines(fileName)
.Skip(1).ToList();
var lastLine = fileLines.Last();
if (!lastLine.Contains(item.Sym))
{
fileLines.RemoveAt(fileLines.Count - 1);
}
fileLines.Add(
new StringBuilder()
.Append(item.Sym)
.Append(Separator)
.Append(item.Other)
.Append(Environment.NewLine)
.ToString()
);
File.WriteAllLines(fileName, fileLines);
}
}
}
From here should be more easy to play with List.AsParallel to check how and with what parameters the code is faster.
Also:
You are opening the write file twice
The removes are also somewhat expensive, in the index 0 is more (however, if there are few elements this could not make much difference
if(fileNames.Length > 0) is useless, use a for, if the list is empty, then he for will simply skip
You can try StringBuilder instead string interpolation
I hope this hints can help you to improve your time! and that i have not forgetting something.
Edit
We have nearly 10,000 files in our directory. So when process is
running then it's passing an error that The Process can not access the
file because it is being used by another process
Well, is there a possibility that in your process lines there is duplicated file names?
If that is the case, you could try a simple approach, a retry after some milliseconds, something like
private const int SleepMillis = 5;
private const int MaxRetries = 3;
public void WriteFile(string fileName, string[] fileLines, int retries = 0)
{
try
{
File.WriteAllLines(fileName, fileLines);
}
catch(Exception e) //Catch the special type if you can
{
if (retries >= MaxRetries)
{
Console.WriteLine("Too many tries with no success");
throw; // rethrow exception
}
Thread.Sleep(SleepMillis);
WriteFile(fileName, fileLines, ++retries); // try again
}
}
I tried to keep it simple, but there are some annotations:
- If you can make your methods async, it could be an improvement by changing the sleep for a Task.Delay, but you need to know and understand well how async works
- If the collision happens a lot, then you should try another approach, something like a concurrent map with semaphores
Second edit
In real scenario I am connecting to websocket and receiving 70,000 to
1 lac records on every minute and after that I am bifurcating those
records with live streaming data and storing in it's own file. And
that becomes slower when I am applying our concept with 11,000 files
It is a hard problem, from what i understand, you're talking about 1166 records per second, at this size the little details can become big bottlenecks.
At that phase i think it is better to think about other solutions, it could be so much I/O for the disk, could be many threads, or too few, network...
You should start by profiling the app to check where the app is spending more time to focus in that area, how much resources is using? how much resources do you have? how is the memory, processor, garbage collector, network? do you have an SSD?
You need a clear view of what is slowing you down so you can attack that directly, it will depend on a lot of things, it will be hard to help with that part :(.
There are tons of tools for profile c# apps, and many ways to attack this problem (spread the charge in several servers, use something like redis to save data really quick, some event store so you can use events....
Currently I have a .txt file of about 170,000 jpg file names and I read them all into a List (fileNames).
I want to search ONE folder (this folder has sub-folders) to check if each file in fileNames exists in this folder and if it does, copy it to a new folder.
I was making a rough estimate but each search and copy for each file name in fileNames takes about .5 seconds. So 170,000 seconds is roughly 48 hours so divide by 2 that will take about 24 hours for my app to have searched for every single file name using 1 thread! Obviously this is too long so I want to narrow this down and speed the process up. What is the best way to go about doing this using multi-threading?
Currently I was thinking of making 20 separate threads and splitting my list (fileNames) into 20 different lists and search for the files simultaneously. For example I would have 20 different threads executing the below at the same time:
foreach (string str in fileNames)
{
foreach (var file in Directory.GetFiles(folderToCheckForFileName, str, SearchOption.AllDirectories))
{
string combinedPath = Path.Combine(newTargetDirectory, Path.GetFileName(file));
if (!File.Exists(combinedPath))
{
File.Copy(file, combinedPath);
}
}
}
UPDATED TO SHOW MY SOLUTION BELOW:
string[] folderToCheckForFileNames = Directory.GetFiles("C:\\Users\\Alex\\Desktop\\ok", "*.jpg", SearchOption.AllDirectories);
foreach(string str in fileNames)
{
Parallel.ForEach(folderToCheckForFileNames, currentFile =>
{
string filename = Path.GetFileName(currentFile);
if (str == filename)
{
string combinedPath = Path.Combine(targetDir, filename);
if (!File.Exists(combinedPath))
{
File.Copy(currentFile, combinedPath);
Console.WriteLine("FOUND A MATCH AND COPIED" + currentFile);
}
}
}
);
}
Thank you everyone for your contributions! Greatly Appreciated!
Instead of using ordinary foreach statement in doing your search, you should use parallel linq. Parallel linq combines the simplicity and readability of LINQ syntax with the power of parallel programming. Just like code that targets the Task Parallel Library. This will shield you from low level thread manipulation and probable exceptions (hard to find/debug exceptions) while splitting your work among many threads. So you might do something like this:
fileNames.AsParallel().ForAll(str =>
{
var files = Directory.GetFiles(folderToCheckForFileName, str, SearchOption.AllDirectories);
files.AsParallel().ForAll(file =>
{
if (!string.IsNullOrEmpty(file))
{
string combinedPath = Path.Combine(newTargetDirectory, Path.GetFileName(file));
if (!File.Exists(combinedPath))
{
File.Copy(file, combinedPath);
}
}
});
});
20 different threads won't help if your computer has fewer than 20 cores. In fact, it can make the process slower because you will 1) have to spend time context switching between each thread (which is your CPU's way of emulating more than 1 thread / core) and 2) a Thread in .NET reserves 1 MB for its stack, which is pretty hefty.
Instead, try dividing your I/O into async workloads, using Task.Run for the CPU-bound / intensive parts. Also, keep your number of Tasks to maybe 4 to 8 at the max.
Sample code:
var tasks = new Task[8];
var names = fileNames.ToArray();
for (int i = 0; i < tasks.Length; i++)
{
int index = i;
tasks[i] = Task.Run(() =>
{
for (int current = index; current < names.Length; current += 8)
{
// execute the workload
string str = names[current];
foreach (var file in Directory.GetFiles(folderToCheckForFileName, str, SearchOption.AllDirectories))
{
string combinedPath = Path.Combine(newTargetDirectory, Path.GetFileName(file));
if (!File.Exists(combinedPath))
{
File.Copy(file, combinedPath);
}
}
}
});
}
Task.WaitAll(tasks);
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I am doing recording in my application using C#.
I record voice to the same file and play it but the SoundPlayer plays the contents recorded first time.
For example I have file test.wav where I record "hello" and then I record "hi" to the same file
by overwriting the file. When I play the file test.wav player plays "hello".
I have only one instance of player, e.g.
public static System.Media.SoundPlayer Player;
static void Main()
{
try
{
Player = new System.Media.SoundPlayer();
}
catch (Exception ex)
{
}
}
Code for playing the file:
public static void Play(string fileName)
{
if (File.Exists(fileName))
{
Program.Player.SoundLocation = fileName;
Program.Player.Load();
if (Program.Player.IsLoadCompleted)
{
Program.Player.Play();
}
}
}
I don't know what is wrong here.
Inside the Setter for the SoundLocation property is an interesting check:
set
{
if (value == null)
{
value = string.Empty;
}
if (!this.soundLocation.Equals(value))
{
this.SetupSoundLocation(value);
this.OnSoundLocationChanged(EventArgs.Empty);
}
}
You can see that it looks to see if the new location differs from the old one. If it does, then it does some setup work. If it doesn't, it essentially does nothing.
I'm betting you can get around this by doing something like this:
public static void Play(string fileName)
{
if (File.Exists(fileName))
{
Program.Player.SoundLocation = "";
Program.Player.SoundLocation = fileName;
Program.Player.Load();
if (Program.Player.IsLoadCompleted)
{
Program.Player.Play();
}
}
}
The first call to the SoundLocation setter would clear out the loaded stream. The second one would then set it up properly with the location again and allow for Load to load the stream as expected.
So the title might be bit misleading, but what I wanted to accomplish is reading an array of files and then combine them into one, which is where I am now.
The problem is that I have a catch that looks for the exception "FileNotFoundException", when this is called I want to continue my try statement (Using "continue") but let the user know that the file is missing.
My setup is a class that is called from a form (It's in the form where the error should show up)
I thought about creating an event that can be registered from my form, but is that the right way?
public void MergeClientFiles(string directory)
{
// Find all clients
Array clients = Enum.GetValues(typeof(Clients));
// Create a new array of files
string[] files = new string[clients.Length];
// Combine the clients with the .txt extension
for (int i = 0; i < clients.Length; i++)
files[i] = clients.GetValue(i) + ".txt";
// Merge the files into directory
using (var output = File.Create(directory))
{
foreach (var file in files)
{
try
{
using (var input = File.OpenRead(file))
{
input.CopyTo(output);
}
}
catch (FileNotFoundException)
{
// Its here I want to send the error to the form
continue;
}
}
}
}
You want the method to do its job and report user about problems, right?
Then Oded has suggested right thing. With small modification, the code could look like this:
public List<string> MergeClientFiles( string path )
{
// Find all clients
Array clients = Enum.GetValues( typeof( Clients ) );
// Create a new array of files
string[] files = new string[clients.Length];
// Combine the clients with the .txt extension
for( int i = 0; i < clients.Length; i++ )
files[i] = clients.GetValue( i ) + ".txt";
List<string> errors = new List<string>();
// Merge the files into AllClientData
using( var output = File.Create( path ) ) {
foreach( var file in files ) {
try {
using( var input = File.OpenRead( file ) ) {
input.CopyTo( output );
}
}
catch( FileNotFoundException ) {
errors.Add( file );
}
}
}
return errors;
}
Then, in caller you just check if MergeClientFiles returns non-empty collection.
You can collect the exceptions into a List<FileNotFoundException> and at the end of iteration, if the list is not empty, throw a custom exception assigning this list to a corresponding member.
This will allow any code calling the above to catch your custom exception, iterate over the FileNotFoundExceptions and notify the user.
you could define a delegate that you pass as an argument of your method.
public delegate void FileNotFoundCallback(string file);
public void MergeClientFiles(string directory, FileNotFoundCallback callback)
{
// Find all clients
Array clients = Enum.GetValues(typeof(Clients));
// Create a new array of files
string[] files = new string[clients.Length];
// Combine the clients with the .txt extension
for (int i = 0; i < clients.Length; i++)
files[i] = clients.GetValue(i) + ".txt";
// Merge the files into directory
using (var output = File.Create(directory))
{
foreach (var file in files)
{
try
{
using (var input = File.OpenRead(file))
{
input.CopyTo(output);
}
}
catch (FileNotFoundException)
{
// Its here I want to send the error to the form
callback( file );
continue;
}
}
}
}
Rather than catching FileNotFoundExceptions you should actively check if the file exists and then don't try to open it if it doesn't.
You can change the method to return a list of merged files, a list of missing files, or a list of all files along with an indicator if they were merged or missing. Returning a single list gives the caller the option to process the missing files all at once and know how many were missing, instead of one-by-one as would be the case with an event or callback.
For some inspiration, have a look at the documentation for the new parallel constructs in c#, such as Parallel.For and the Reactive Framework (rx).
In the first, exceptions are collected in an AggregateException, in Rx exceptions are communicated via a callback interface.
I think I prefer the approach used in Parallel.For, but choose what fits your scenario best.
I'm not really into multithreading so probably the question is stupid but it seems I cannot find a way to solve this problem (especially because I'm using C# and I've been using it for a month).
I have a dynamic number of directories (I got it from a query in the DB). Inside those queries there are a certain amount of files.
For each directory I need to use a method to transfer these files using FTP in a cuncurrent way because I have basically no limit in FTP max connections (not my word, it's written in the specifics).
But I still need to control the max amount of files transfered per directory. So I need to count the files I'm transfering (increment/decrement).
How could I do it? Should I use something like an array and use the Monitor class?
Edit: Framework 3.5
You can use the Semaphore class to throttle the number of concurrent files per directory. You would probably want to have one semaphore per directory so that the number of FTP uploads per directory can be controlled independently.
public class Example
{
public void ProcessAllFilesAsync()
{
var semaphores = new Dictionary<string, Semaphore>();
foreach (string filePath in GetFiles())
{
string filePathCapture = filePath; // Needed to perform the closure correctly.
string directoryPath = Path.GetDirectoryName(filePath);
if (!semaphores.ContainsKey(directoryPath))
{
int allowed = NUM_OF_CONCURRENT_OPERATIONS;
semaphores.Add(directoryPath, new Semaphore(allowed, allowed));
}
var semaphore = semaphores[directoryPath];
ThreadPool.QueueUserWorkItem(
(state) =>
{
semaphore.WaitOne();
try
{
DoFtpOperation(filePathCapture);
}
finally
{
semaphore.Release();
}
}, null);
}
}
}
var allDirectories = db.GetAllDirectories();
foreach(var directoryPath in allDirectories)
{
DirectoryInfo directories = new DirectoryInfo(directoryPath);
//Loop through every file in that Directory
foreach(var fileInDir in directories.GetFiles()) {
//Check if we have reached our max limit
if (numberFTPConnections == MAXFTPCONNECTIONS){
Thread.Sleep(1000);
}
//code to copy to FTP
//This can be Aync, when then transfer is completed
//decrement the numberFTPConnections so then next file can be transfered.
}
}
You can try something along the lines above. Note that It's just the basic logic and there are proberly better ways to do this.