C# - Truncate beginning of large transaction log file

C# - Truncate beginning of large transaction log file - c#

I'm using a text file as a way to store transactions. This transaction log is essentially the persistence mechanism. When the software bootstraps it will replay the transactions and get back to the last known state. There are other things like snapshots, loading transactions after the snapshot was taken (to stop from replaying from the beginning of time), archiving, and purging.
These transaction logs can get really large. This is especially true when a company wants to keep a month's worth of transactions. The archive is purged of old transactions at startup and then every midnight (snapshots taken, transactions archived, and then old ones purged).
The algorithm used to purge is to open two file streams; one for the current file and another for a temp file that gets created. I stream one transaction at a time to the temporary file only writing the ones I want. Then I delete the current file and rename the temp file to be the current file. This approach saves on RAM but performance becomes a problem for files approaching 500mb.
The transactions are stored oldest on the top to newest on the bottom. What I would like to do is remove one line at a time until I find a transaction that needs to stay and then stop processing. Is there a way to do that? Below is the current approach:
await _semaphore.WaitAsync().ConfigureAwait(false);
try
{
using var reader = _fileSystem.OpenStream(originalFile);
using var writer = _fileSystem.CreateStream(tempFile);
string? line = null;
while ((line = await reader.ReadLineAsync().ConfigureAwait(false)) is not null)
{
var logItem = _serializer.Deserialize<TransactionLogItem>(line);
var dateLogged = logItem.HappenedOn.ToLocalDateTime().Date;
if (dateLogged >= oldestAllowedDate) await writer.WriteLineAsync(line).ConfigureAwait(false);
}
}
finally
{
_fileSystem.DeleteFile(originalFile);
_fileSystem.Rename(tempFile, originalFile);
_semaphore.Release();
}

Is there a way to do that?
No; no file system supports truncating the beginning of a file. You can kinda get something working maybe with sparse files, but they only work on some filesystems and are pretty coarse in what sections can be made sparse.
Your best bet is to either do what you're doing now, use a real database, or have multiple transaction logs so you can just delete old ones.

Related

I want to back up the data in an xml file, I couldn't find how to save the newly received data without adding the same data to the existing file

I want to back up the data for today's date in an XML file every 10 minutes I managed to create the XML file, but I couldn't find how to save the newly received data without adding the same data to the existing file
Can I convert the file I created to dataSet with dataSet.ReadXml, add the new data I got from the query and convert it back to an XML file and save it? What method should I use?
String QueryString = "SELECT * FROM dbo.db_records WHERE DAY(datetime) = DAY(CURRENT_TIMESTAMP)";
public void run()
{
while (true)
{
try
{
Thread.Sleep(600000);
if (odbcConnection.State != ConnectionState.Open)
{
odbcConnection.Close();
odbcConnection.Open();
}
DataSet dataSet = new DataSet("XMLDB");
odbcDataAdapter.Fill(dataSet, "#ID");
if (File.Exists(Path))
{
}
else
{
using (FileStream fs = File.Create(Path))
{
dataSet.WriteXml(fs);
}
}
}
catch (Exception) { }
}
}

Xml is not a great format if you want to append data, since it uses tags that need to be closed. So you have a few options:
Save separate files
Since you seem to fetch data for the current day, just attach date-info to your file-name. When reading the data you may need to read all files in the folder fitting the pattern, and merge it.
Use a format that is trivial to append
If your data model is simple tabular data you may use a .csv file instead. You can add data to this using one of the File.Append methods.
Overwrite all data
Get the complete data you want to save each time, and overwrite any existing data. This is simple, but may be slow if you have lots of data. But if the database is small and grow slowly this might be perfectly fine.
Parse the existing data
You could read the existing file with Readxml as you suggest, and use DataSet.Merge to merge it with your new set before overwriting the existing file. This may also be slow, since it needs to process all the data. But it may put less load on the database than fetching all data from the database each time.
In any case, you might want to periodically save full backups, or have some other way to handle corrupt files. You should also have some way to test the backups. I would also consider using the backup options built into most database engines if that is an alternative.

What could cause an XML file to be filled with null characters?

This is a tricky question. I suspect it will require some advanced knowledge of file systems to answer.
I have a WPF application, "App1," targeting .NET framework 4.0. It has a Settings.settings file that generates a standard App1.exe.config file where default settings are stored. When the user modifies settings, the modifications go in AppData\Roaming\MyCompany\App1\X.X.0.0\user.config. This is all standard .NET behavior. However, on occasion, we've discovered that the user.config file on a customer's machine isn't what it's supposed to be, which causes the application to crash.
The problem looks like this: user.config is about the size it should be if it were filled with XML, but instead of XML it's just a bunch of NUL characters. It's character 0 repeated over and over again. We have no information about what had occurred leading up to this file modification.
We can fix that problem on a customer's device if we just delete user.config because the Common Language Runtime will just generate a new one. They'll lose the changes they've made to the settings, but the changes can be made again.
However, I've encountered this problem in another WPF application, "App2," with another XML file, info.xml. This time it's different because the file is generated by my own code rather than by the CLR. The common themes are that both are C# WPF applications, both are XML files, and in both cases we are completely unable to reproduce the problem in our testing. Could this have something to do with the way C# applications interact with XML files or files in general?
Not only can we not reproduce the problem in our current applications, but I can't even reproduce the problem by writing custom code that generates errors on purpose. I can't find a single XML serialization error or file access error that results in a file that's filled with nulls. So what could be going on?
App1 accesses user.config by calling Upgrade() and Save() and by getting and setting the properties. For example:
if (Settings.Default.UpgradeRequired)
{
Settings.Default.Upgrade();
Settings.Default.UpgradeRequired = false;
Settings.Default.Save();
}
App2 accesses info.xml by serializing and deserializing the XML:
public Info Deserialize(string xmlFile)
{
if (File.Exists(xmlFile) == false)
{
return null;
}
XmlSerializer xmlReadSerializer = new XmlSerializer(typeof(Info));
Info overview = null;
using (StreamReader file = new StreamReader(xmlFile))
{
overview = (Info)xmlReadSerializer.Deserialize(file);
file.Close();
}
return overview;
}
public void Serialize(Info infoObject, string fileName)
{
XmlSerializer writer = new XmlSerializer(typeof(Info));
using (StreamWriter fileWrite = new StreamWriter(fileName))
{
writer.Serialize(fileWrite, infoObject);
fileWrite.Close();
}
}
We've encountered the problem on both Windows 7 and Windows 10. When researching the problem, I came across this post where the same XML problem was encountered in Windows 8.1: Saved files sometime only contains NUL-characters
Is there something I could change in my code to prevent this, or is the problem too deep within the behavior of .NET?
It seems to me that there are three possibilities:
The CLR is writing null characters to the XML files.
The file's memory address pointer gets switched to another location without moving the file contents.
The file system attempts to move the file to another memory address and the file contents get moved but the pointer doesn't get updated.
I feel like 2 and 3 are more likely than 1. This is why I said it may require advanced knowledge of file systems.
I would greatly appreciate any information that might help me reproduce, fix, or work around the problem. Thank you!

It's well known that this can happen if there is power loss. This occurs after a cached write that extends a file (it can be a new or existing file), and power loss occurs shortly thereafter. In this scenario the file has 3 expected possible states when the machine comes back up:
1) The file doesn't exist at all or has its original length, as if the write never happened.
2) The file has the expected length as if the write happened, but the data is zeros.
3) The file has the expected length and the correct data that was written.
State 2 is what you are describing. It occurs because when you do the cached write, NTFS initially just extends the file size accordingly but leaves VDL (valid data length) untouched. Data beyond VDL always reads back as zeros. The data you were intending to write is sitting in memory in the file cache. It will eventually get written to disk, usually within a few seconds, and following that VDL will get advanced on disk to reflect the data written. If power loss occurs before the data is written or before VDL gets increased, you will end up in state 2.
This is fairly easy to repro, for example by copying a file (the copy engine uses cached writes), and then immediately pulling the power plug on your computer.

I had a similar problem and I was able to trace my problem to corrupted HDD.
Description of my problem (all related informations):
Disk attached to mainboard (SATA):
SSD (system),
3 * HDD.
One of the HDD's had a bad blocks and there were even problems reading the disk structure (directories and file listing).
Operation system: Windows 7 x64
file system (on all disks): NTFS
When the system tried to read or write to the corrupted disk (user request or automatic scan or any other reason) and the attempt failed, all write operations (to other disk's) were incorrect. The files created on system disk (mostly configuration files by another applications) were written and were valid (probably because the files were cashed in RAM) on direct check of file content.
Unfortunately, after a restart, all the files (written after the failed write/read access on corrupted drive) had the correct size, but the content of the files was 'zero byte' (exactly like in your case).
Try rule out hardware related problems. You can try to check 'copy' the file (after a change) to a different machine (upload to web/ftp). Or try to save specific content to a fixed file. When the check file on different will be correct, or when the fixed content file will be 'empty', the reason is probably on local machine. Try to change HW components, or reinstall the system.

There is no documented reason for this behavior, as this is happening to users but nobody can tell the origin of this odd conditions.
It might be CLR problem, although this is a very unlikely, the CLR doesn't just write null characters and XML document cannot contain null characters if there's no xsi:nil defined for the nodes.
Anyway, the only documented way to fix this is to delete the corrupted file using this line of code:
try
{
ConfigurationManager.OpenExeConfiguration(ConfigurationUserLevel.PerUserRoamingAndLocal);
}
catch (ConfigurationErrorsException ex)
{
string filename = ex.Filename;
_logger.Error(ex, "Cannot open config file");
if (File.Exists(filename) == true)
{
_logger.Error("Config file {0} content:\n{1}", filename, File.ReadAllText(filename));
File.Delete(filename);
_logger.Error("Config file deleted");
Properties.Settings.Default.Upgrade();
// Properties.Settings.Default.Reload();
// you could optionally restart the app instead
}
else
{
_logger.Error("Config file {0} does not exist", filename);
}
}
It will restore the user.config using the Properties.Settings.Default.Upgrade();
again without null values.

I ran into a similar issue but it was on a server. The server restarted while a program was writing to a file which caused the file to contain all null characters and become unusable to the program writing/reading from it.
So the file looked like this:
The logs showed that the server restarted:
The corrupted file showed that it was last updated at the time of the restart:

I have the same problem, there is an extra "NUL" character at the end of serialized xml file:
I am using XMLWriter like this:
using (var stringWriter = new Utf8StringWriter())
{
using (var xmlWriter = XmlWriter.Create(stringWriter, new XmlWriterSettings { Indent = true, IndentChars = "\t", NewLineChars = "\r\n", NewLineHandling = NewLineHandling.Replace }))
{
xmlSerializer.Serialize(xmlWriter, data, nameSpaces);
xml = stringWriter.ToString();
var xmlDocument = new XmlDocument();
xmlDocument.LoadXml(xml);
if (removeEmptyNodes)
{
RemoveEmptyNodes(xmlDocument);
}
xml = xmlDocument.InnerXml;
}
}

How do I remove blank lines from text File in c#.net

I want to remove blank lines from my file, foe that I am using code below.
private void ReadFile(string Address)
{
var tempFileName = Path.GetTempFileName();
try
{
//using (var streamReader = new StreamReader(Server.MapPath("~/Images/") + FileName))
using (var streamReader = new StreamReader(Address))
using (var streamWriter = new StreamWriter(tempFileName))
{
string line;
while ((line = streamReader.ReadLine()) != null)
{
if (!string.IsNullOrWhiteSpace(line))
streamWriter.WriteLine(line);
}
}
File.Copy(tempFileName, Address, true);
}
finally
{
File.Delete(tempFileName);
}
Response.Write("Completed");
}
But the problem is my file is too large (8 lac lines ) so its taking lot of time. So is there any other way to do it faster?

Instead of doing a ReadLine(), I would do a StreamReader.ReadToEnd() to load the entire file into memory, then do a line.Replace("\n\n","\n") and then do a streamWrite.Write(line) to the file. That way there is not a lot of thrashing, either memory or disk, going on.

The best solution may well depend on the disk type - SSDs and spinning rust behave differently. Your current approach has the advantage over Steve's answer of being able to do processing (such as encoding text data back as binary) while data is still coming off the disk. (With buffering and background IO, there's a lot of potential asynchrony here.) It's definitely worth trying both approaches. (Obviously your approach uses less memory, too.)
However, there's one aspect of your code which is definitely suboptimal: creating a copy of the results. You don't need to do that. You can use file moves instead which are a lot more efficient, assuming they're all in the same drive. To make sure you don't lose data, you can do two moves and a delete:
Move the old file to a backup filename
Move the new file to the old filename
Delete the backup filename
It looks like this is what File.Replace does for you, which makes it considerably simpler, and also preserves the original metadata.
If something goes wrong after the first move, you're left without the "proper" file from either old or new, but you can detect that and use the backup filename to read next time.
Of course, if this is meant to happen as part of a web request, you may want to do all the processing in a background task - processing 800,000 lines of text is likely to take longer than you really want a web request to take...

Get all lines after the last print of line with 'keyword' in C#

I am working on a c# project.
I am trying to send a logfile via email whenever application gets crashed.
however logfile is a little bit larger in size.
So I thought that i should include only a specific portion of logfile.
For that I am trying to read all the lines after the last instance of line with specified keyword.(in my case "Application Started")
since Application get restarted many times(due to crashing), 'Application Started' gets printed many times in file. So I would only want last print of line containing 'Application Started' & lines after that until end of file.
I require help to figure out how can i do this.
I have just started with Basic code as of now.
System.IO.StreamReader file = new System.IO.StreamReader("c:\\mylogfile.txt");
while((line = file.ReadLine()) != null)
{
if ( line.Contains("keyword") )
{
}
}

Read the file, line-by-line, until you find your keyword. Once you find your keyword, start pushing every line after that into a List<string>. If you find another line with your keyword, just Clear your list and start refilling it from that point.
Something like:
List<string> buffer = new List<string>();
using (var sin = new StreamReader("pathtomylogfile"))
{
string line;
bool read;
while ((line = sin.ReadLine())!=null)
{
if (line.Contains("keyword"))
{
buffer.Clear();
read = true;
}
if (read)
{
buffer.Add(line);
}
}
// now buffer has the last entry
// you could use string.Join to put it back together in a single string
var lastEntry = string.Join("\n",buffer);
}
If the number of lines in each entry is very large, it might be more efficient to scan the file first to find the last entry and then loop again to extract it. If the whole log file isn't that large, it might be more efficient to just ReadToEnd and then use LastIndexOf to find the start of the last entry.

Read everything from the file and then select the portion you want.
string lines = System.IO.File.ReadAllText("c:\\mylogfile.txt");
int start_index = lines.LastIndexOf("Application Started");
string needed_portion = lines.Substring(start_index);
SendEmail(needed_portion);

I advise you to use a proper logger, like log4net or NLogger.
You can configure it to save to multiple files - one containing complete logs, other containing errors/exceptions only. Also you can set maximum size of log files, etc. Or can configure them to send you a mail if exception occours.
Of course this does not solves your current problem, for it there is some solution above.
But I would try simpler methods, like trying out Notepad++ - it can handle bigger files (last time i've formatted a 30MB XML document with it, it took about 20 mins, but he did it! With simple text files there should be much better perf.). Or if you open the file for reading only (not for editing) you may get much better performance (in Windows).

How can I reduce the time taken to extract files?

I made a program in C# where it processes about 30 zipped folders which have about 35000 files in total. My purpose is to read every single file for processing its information. As of now, my code extracts all the folders and then read the files. The problem with this process is it takes about 15-20 minutes for it to happen, which is a lot.
I am using the following code to extract files:
void ExtractFile(string zipfile, string path)
{
ZipFile zip = ZipFile.Read(zipfile);
zip.ExtractAll(path);
}
The extraction part is the one which takes the most time to process. I need to reduce this time. Is there a way I can read the contents of the files inside the zipped folder without extracting them? or if anyone knows any other way that can help me reduce the time of this code ?
Thanks in advance

You could try reading each entry into a memory stream instead of to the file system:
ZipFile zip = ZipFile.Read(zipfile);
foreach(ZipEntry entry in zip.Entries)
{
using(MemoryStream ms = new MemoryStream())
{
entry.Extract(ms);
ms.Seek(0,SeekOrigin.Begin);
// read from the stream
}
}

Maybe instead of extracting it to the hard disk, you should try read it without extraction, using OpenRead, then you would have to use the ZipArchiveEntry.Open method.
Also have a look at the CodeFluent Runtime tool, which claims to be improved for performances issues.

Try to break your responses into single await async methods, which started one by one if one of the responses is longer than 50 ms. http://msdn.microsoft.com/en-us/library/hh191443.aspx
If we have for example 10 executions which call one by one, in async/await we call our executions parallel, and operation will depend only from server powers.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# - Truncate beginning of large transaction log file - c#

Related

I want to back up the data in an xml file, I couldn't find how to save the newly received data without adding the same data to the existing file

What could cause an XML file to be filled with null characters?

How do I remove blank lines from text File in c#.net

Get all lines after the last print of line with 'keyword' in C#

How can I reduce the time taken to extract files?

Categories

Resources