I made a program in C# where it processes about 30 zipped folders which have about 35000 files in total. My purpose is to read every single file for processing its information. As of now, my code extracts all the folders and then read the files. The problem with this process is it takes about 15-20 minutes for it to happen, which is a lot.
I am using the following code to extract files:
void ExtractFile(string zipfile, string path)
{
ZipFile zip = ZipFile.Read(zipfile);
zip.ExtractAll(path);
}
The extraction part is the one which takes the most time to process. I need to reduce this time. Is there a way I can read the contents of the files inside the zipped folder without extracting them? or if anyone knows any other way that can help me reduce the time of this code ?
Thanks in advance
You could try reading each entry into a memory stream instead of to the file system:
ZipFile zip = ZipFile.Read(zipfile);
foreach(ZipEntry entry in zip.Entries)
{
using(MemoryStream ms = new MemoryStream())
{
entry.Extract(ms);
ms.Seek(0,SeekOrigin.Begin);
// read from the stream
}
}
Maybe instead of extracting it to the hard disk, you should try read it without extraction, using OpenRead, then you would have to use the ZipArchiveEntry.Open method.
Also have a look at the CodeFluent Runtime tool, which claims to be improved for performances issues.
Try to break your responses into single await async methods, which started one by one if one of the responses is longer than 50 ms. http://msdn.microsoft.com/en-us/library/hh191443.aspx
If we have for example 10 executions which call one by one, in async/await we call our executions parallel, and operation will depend only from server powers.
Related
I need to generate multiple XML files at SFTP location from C# code. for SFTP connectivity, I am using Renci.Ssh.net. I found there are different methods to generate files including WriteAllText() and UploadFile(). I am producing XML string runtime, currently I've used WriteAllText() method (just to avoid creating the XML file on local and thus to avoid IO operation).
using (SftpClient client = new SftpClient(host,port, sftpUser, sftpPassword))
{
client.Connect();
if (client.IsConnected)
{
client.BufferSize = 1024;
var filePath = sftpDir + fileName;
client.WriteAllText(filePath, contents);
client.Disconnect();
}
client.Dispose();
}
Will using UploadFile(), either from FileStream or MemoryStream give me better performance in long run?
The result document size will be in KB, around 60KB.
Thanks!
SftpClient.UploadFile is optimized for uploads of large amount of data.
But for 60KB, I'm pretty sure that it makes no difference whatsoever. So you can continue using the more convenient SftpClient.WriteAllText.
Though, I believe that most XML generators (like .NET XmlWriter are able to write XML to Stream (it's usually the preferred output API, rather than a string). So the use of SftpClient.UploadFile can be more convenient in the end.
See also What is the difference between SftpClient.UploadFile and SftpClient.WriteAllBytes?
I write a code to read many text files and grouped them in one file called (all.txt), after that I read all.txt file to count the word frequency, and the result appears in richtextbox. the code work well but the problem when I run the program, part of result appears then the program is hang without responding. I think that may be from memory, my computer RAM is 4 GB any help would be appreciated.
note:my code work well in small text file.here's part of my code :
StreamWriter w=new StreamWriter(#"C:\documents\all.txt");
w.Write(all);
w.Close();
If your problem is reading a large file, try using MemoryMappedFile as indicated in here
I have a .txt file that i process into a List in my program.
I would like to somehow save that List and include it in the program itself so that it loads every time the program starts, so I don't have to process it every time from a .txt file.
Its more complicated than just "int x = 3;" cause it has like 10k lines and I don't wanna copy paste all that in the beginning.
I've looked all over but haven't found anything similar, any ideas guys?
Also if thee's a solution, can it work with any type (arrays, Dictionaries)?
As requested, the code is:
var text = System.IO.File.ReadAllText(#"C:\Users\jazz7\Desktop\links_zg.txt");
EDIT
Joe suggested the solution:
Included the file within the project, set its "build action" to embedded resource in Properties and used this code:
private string linkovi = "";
...
var assembly = Assembly.GetExecutingAssembly();
var resourceName = "WindowsFormsApplication4.links_zg.txt";
using (Stream stream = assembly.GetManifestResourceStream(resourceName))
using (StreamReader reader = new StreamReader(stream))
{
linkovi = reader.ReadToEnd();
}
string linkovi now contains the txt file and is now within the application. Thanks all!
You could store the file as a resource in your executable file.
This KB article describes how to do it.
Fundamentally, youve got to choose between storing your data in memory or storing it on the hard drive. The former will cut your loading time, but might use an unacceptable amount of memory, whilst the latter is slower, as youve identified. Either way, your data has to be stored somewhere.
Do you need to load all of the data at once? If the loading time is the issue, you could process the file line by line. While this would be slower overall, you would still have access to some usable data sooner.
I want to remove blank lines from my file, foe that I am using code below.
private void ReadFile(string Address)
{
var tempFileName = Path.GetTempFileName();
try
{
//using (var streamReader = new StreamReader(Server.MapPath("~/Images/") + FileName))
using (var streamReader = new StreamReader(Address))
using (var streamWriter = new StreamWriter(tempFileName))
{
string line;
while ((line = streamReader.ReadLine()) != null)
{
if (!string.IsNullOrWhiteSpace(line))
streamWriter.WriteLine(line);
}
}
File.Copy(tempFileName, Address, true);
}
finally
{
File.Delete(tempFileName);
}
Response.Write("Completed");
}
But the problem is my file is too large (8 lac lines ) so its taking lot of time. So is there any other way to do it faster?
Instead of doing a ReadLine(), I would do a StreamReader.ReadToEnd() to load the entire file into memory, then do a line.Replace("\n\n","\n") and then do a streamWrite.Write(line) to the file. That way there is not a lot of thrashing, either memory or disk, going on.
The best solution may well depend on the disk type - SSDs and spinning rust behave differently. Your current approach has the advantage over Steve's answer of being able to do processing (such as encoding text data back as binary) while data is still coming off the disk. (With buffering and background IO, there's a lot of potential asynchrony here.) It's definitely worth trying both approaches. (Obviously your approach uses less memory, too.)
However, there's one aspect of your code which is definitely suboptimal: creating a copy of the results. You don't need to do that. You can use file moves instead which are a lot more efficient, assuming they're all in the same drive. To make sure you don't lose data, you can do two moves and a delete:
Move the old file to a backup filename
Move the new file to the old filename
Delete the backup filename
It looks like this is what File.Replace does for you, which makes it considerably simpler, and also preserves the original metadata.
If something goes wrong after the first move, you're left without the "proper" file from either old or new, but you can detect that and use the backup filename to read next time.
Of course, if this is meant to happen as part of a web request, you may want to do all the processing in a background task - processing 800,000 lines of text is likely to take longer than you really want a web request to take...
I'm upload big files dividing its on chunks(small parts) on my ASMX webservice(asmx doesn't support streaming, I not found another way):
bool UploadChunk(byte[] bytes, string path, string md5)
{
...
using (FileStream fs = new FileStream(tempPath, FileMode.Append) )
{
fs.Write( bytes, 0, bytes.Length );
}
...
return status;
}
but on some files after ~20-50 invokes I catch this error: The process cannot access the file because it is being used by another process.
I suspect that this related with Windows can't realize the file. Any idea to get rid of this boring error?
EDIT
the requests executes sequentially and synchronously
EDIT2
client code looks like:
_service.StartUpload(path);
...
do
{
..
bool status = _service.UploadChunk(buf, path, md5);
if(!status)return Status.Failed;
..
}
while(bytesRead > 0);
_service.CheckFile(path, md5);
Each request is handled independently. The process still accessing the file may be the previous request.
In general, you should use file transfer protocols to transfer files. ASMX is not good for that.
And, I presume you have a good reason to not use WCF?
Use WhoLockMe at the moment the error occurs to check who is using the file. You could put the application into debug mode and hold the break point to do this. In all probability it will be your process.
Also try adding a delay after each transfer (and before the next) to see if it helps. Maybe your transfers are too fast and the stream is still in use or being flushed when the next transfer comes in.
Option 1: Get the requirements changed so you don't have to do this using ASMX. WCF supports a streaming model that I'm about to experiment with, but it should be much more effective for what you want.
Option 2: Look into WSE 3.0. I haven't looked at it much, but I think it extends ASMX web services to support things like DIME and MTOM which are designed for transferring files so that may help.
Option 3: Set the system up so that each call writes a piece of the file into a different filename, then write code to rejoin everything at the end.
use this for creating a file
if you want to append something then add FileMode.Append
var filestreama = new FileStream(name, FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.ReadWrite);