I am using StreamWriter to write records into a file. Now I want to overwrite specific record.
string file="c:\\......";
StreamWriter sw = new StreamWriter(new FileStream(file, FileMode.Open, FileAccess.Write));
sw.write(...);
sw.close();
I read somewhere here that I can use Stream.Write method to do that, I have no previous experience or knowledge of how to deal with bytes.
public override void Write(
byte[] array,
int offset,
int count
)
So how to use this method.
I need someone to explain what exactly byte[] array and int count are in this method, and any simple sample code shows how to use this method to overwrite existing record in a file.
ex. change any record like record Mark1287,11100,25| to Bill9654,22100,30| .
If you want to override a particular record, you must use FileStream.Seek-method to set the put your stream in position.
Example for Seek
using System;
using System.IO;
class FStream
{
static void Main()
{
const string fileName = "Test####.dat";
// Create random data to write to the file.
byte[] dataArray = new byte[100000];
new Random().NextBytes(dataArray);
using(FileStream
fileStream = new FileStream(fileName, FileMode.Create))
{
// Write the data to the file, byte by byte.
for(int i = 0; i < dataArray.Length; i++)
{
fileStream.WriteByte(dataArray[i]);
}
// Set the stream position to the beginning of the file.
fileStream.Seek(0, SeekOrigin.Begin);
// Read and verify the data.
for(int i = 0; i < fileStream.Length; i++)
{
if(dataArray[i] != fileStream.ReadByte())
{
Console.WriteLine("Error writing data.");
return;
}
}
Console.WriteLine("The data was written to {0} " +
"and verified.", fileStream.Name);
}
}
}
After having sought the position, use Write, whereas
public override void Write(
byte[] array,
int offset,
int count
)
Parameters
array
Type: System.Byte[]
The buffer containing data to write to the stream.
offset
Type: System.Int32
The zero-based byte offset in array from which to begin copying bytes to the stream.
count
Type: System.Int32
The maximum number of bytes to write.
And most important: always consider the documentation when unsure!
So... in short:
Your file is text base (but is allowed to become binary based).
Your record have various sizes.
This way there is, without analyzing your file, no way to know where a given record starts and ends. If you want to overwrite a record, the new record can be larger than the old record, so all records further in that file will have to be moved.
This requires a complex management system. Options could be:
When your application starts it analyzes your file and stores in memory the start and length of each record.
There is a seperate (binary)file which holds per record the start and length of each record. This will cost an additional 8 bytes in total (an Int32 for both start+length. Perhapse you want to conside Int64.)
If you want to rewrite a record, u can use this "record/start/length"-system to know where to start to write your record. But before you do that, you have to assure space, thus moving all records after the record being rewritten. Of course you have to update your managementsystem witht the new positions and length.
Another option is to do as a database: every record exists of fixed width columns. Even text columns have a maximum length. Because of this you can calculate very easy where each record start in the file. For example: if each record has a size of 200 bytes, then record #0 will start at position 0, the next record at position 200, the one after that at 400, etc. You do not have to move record when a record is rewritten.
Another suggestion is: create a mangementsystem like how memory is managed. Once a record is written it stays there. The managementsystem keeps a list of allocated portions and free portions of the file. If a new record is written, a free and fitting portion is search by the managementsystem and the record is written at that position (optionally leaving a smaller free portion). When a record is deleted, that space is freeds up. When you rewrite a record you actually delete the old record and write a new record (possibly at a totalle different location).
My last suggestion: Use a database :)
Related
I have a file which is very long, and has no line breaks, CR or LF or other delimiters.
Records are fixed length, and the first control record length is 24 and all other record lengths are of fixed length 81 bytes.
I know how to read a fixed length file per line basis and I am using Multi Record Engine and have defined classes for each 81 byte line record but can’t figure out how I can read 80 characters at a time and then parse that string for the actual fields.
You can use the FileStream to read the number of bytes you need - like in your case either 24 or 81. Keep in mind that progressing through the stream the position changes and therefor you should not use the offset (should always be 0) - also be aware that if there is no information "left" on the stream it will cause an exception.
So you would end up with something like this:
var recordlength = 81;
var buffer = new byte[recordlength];
stream.Read(buffer, 0, recordlength); // offset = 0, start at current position
var record = System.Text.Encoding.UTF8.GetString(buffer); // single record
Since the recordlength is different for the control record you could use that part into a single method, let's name it Read and use that read method to traverse through the stream untill you reach the end, like this:
public List<string> Records()
{
var result = new List<string>();
using(var stream = new FileStream(#"c:\temp\lipsum.txt", FileMode.Open))
{
// first record
result.Add(Read(stream, 24));
var record = "";
do
{
record = Read(stream);
if (!string.IsNullOrEmpty(record)) result.Add(record);
}
while (record.Length > 0);
}
return result;
}
private string Read(FileStream stream, int length = 81)
{
if (stream.Length < stream.Position + length) return "";
var buffer = new byte[length];
stream.Read(buffer, 0, length);
return System.Text.Encoding.UTF8.GetString(buffer);
}
This will give you a list of records (including the starting control record).
This is far from perfect, but an example - also keep in mind that even if the file is empty there is always 1 result in the returned list.
I plan on reading the marks from a text file and then calculating what the average mark is based upon data written in previous code. I haven't been able to read the marks though or calculate how many marks there are as BinaryReader doesn't let you use .Length.
I have tried using an array to hold each mark but it doesn't like each mark being an integer
public static int CalculateAverage()
{
int count = 0;
int total = 0;
float average;
BinaryReader markFile;
markFile = new BinaryReader(new FileStream("studentMarks.txt", FileMode.Open));
//A loop to read each line of the file and add it to the total
{
//total = total + eachMark;
//count++;
}
//average = total / count;
//markFile.Close();
//Console.WriteLine("Average mark:", average);
return 0;
}
This is my studentMark.txt file in VS
First of all, don't use BinerayRead you can use StreamReader for example.
Also with using statement is not necessary implement the close().
There is an answer using a while loop, so using Linq you can do in one line:
var avg = File.ReadAllLines("file.txt").ToArray().Average(a => Int32.Parse(a));
Console.WriteLine("avg = "+avg); //5
Also using File.ReadAllLines() according too docs the file is loaded into memory and then close, so there is no leak memory problem or whatever.
Opens a text file, reads all lines of the file into a string array, and then closes the file.
Edit to add the way to read using BinaryReader.
First thing to know is you are reading a txt file. Unless you have created the file using BinaryWriter, the binary reader will not work. And, if you are creating a binary file, there is not a good practice name as .txt.
So, assuming your file is binary, you need to loop and read every integer, so this code shoul work.
var fileName = "file.txt";
if (File.Exists(fileName))
{
using (BinaryReader reader = new BinaryReader(File.Open(fileName, FileMode.Open)))
{
while (reader.BaseStream.Position < reader.BaseStream.Length)
{
total +=reader.ReadInt32();
count++;
}
}
average = total/count;
Console.WriteLine("Average = "+average); // 5
}
I've used using to ensure file is close at the end.
If your file only contains numbers, you only have to use ReadInt32() and it will work.
Also, if your file is not binary, obviously, binary writer will not work. By the way, my binary file.txt created using BinaryWriter looks like this:
So I'm assuming you dont have a binary file...
I see many examples on how to add numbers using Parallel, however I have not found anything that could demonstrate reading in multiple chunks (say 512 bytes per chunk) in parallel from a stream, and have the results joined together.
I would like to know if it is possible to read multiple parts of a stream and have them concatenated together in proper order.
For example
Assume the following text file
Bird
Cats
Dogs
And reading in a chunk size of 5 bytes, from a normal stream would be something like :
byte[] buffer = new byte[5];
int bytesRead = 0;
StringBuilder sb = new StringBuilder();
using (Stream stream = new FileStream( "animals.txt", FileMode.Open, FileAccess.Read )) {
while ( (bytesRead = stream.Read( buffer, 0, buffer.Length )) > 0 ) {
sb.Append( Encoding.UTF8.GetString( buffer ) );
}
}
Would read in each line (all lines are 5 bytes) and join them together in order so the resulting string would be identical to the file.
However, consider using something like this solution seems like it would potentially join them out of order. I also don't know how it would apply in the above context to replace the where loop.
How can I read in those chunks simultaneously and have them append to StringBuilder the bytes from each iteration - not the order the iteration occurs, but the order which is proper so I don't end up with something like
Cats
Bird
Dog
Sorry I don't have any parallel code to show as this is the reason for the post. It seems easy if you want to sum up numbers, but to have it work in the manner that it is as follows :
Reading from a stream in byte chunks (say 512 bytes per chunk)
Appending to a master result in the order which they are in the stream, not necessarily the order processed.
... seems to be a daunting challenge
By their nature, streams are not compatible with parallel processing. The abstraction of a stream is sequential access.
You can read the stream content sequentially into an array, then launch parallel processing on it, which has the desired effect (processing is parallelized). You can even spawn the parallel tasks as chunks of the stream arrive.
var tasks = new List<Task>();
do {
var buffer = new byte[blockSize];
var location = stream.Position;
stream.Read(buffer);
tasks.Add(ProcessAsync(buffer, location));
} while (!end of stream);
await Task.WhenAll(tasks.ToArray());
Or, if you have random access, you can spawn parallel tasks each with instructions to read from a particular portion of the input, process it, and store to the corresponding part of the result. But note that although random access to files is possible, the access still has to go through a single disk controller... and that hard disks are not random access even though they expose a random-access interface, non-sequential read patterns will result in a lot of time wasted seeking, lowering efficiency far below what you get from streaming. (SSDs don't seek so there's not much penalty for random request patterns, but you don't benefit either)
Thanks to #Kraang for collaborating on the following example, matching the case of parallel processing binary data.
If reading from bytes alone, you could use parallel processing to handle the chunks as follows :
// the byte array goes here
byte[] data = new byte[N];
// the block size
int blockSize = 5;
// find how many chunks there are
int blockCount = 1 + (data.Length - 1) / blockSize;
byte[][] processedChunks = new byte[blockCount][];
Parallel.For( 0, blockCount, ( i ) => {
var offset = i * blockSize;
// set the buffer size to block size or remaining bytes whichever is smaller
var buffer = new byte[Math.Min( blockSize, data.Length - offset )];
// copy the bytes from data to the buffer
Buffer.BlockCopy( data, i * blockSize, buffer, 0, buffer.Length );
// store buffer results into array in position `i` preserving order
processedChunks[i] = Process(buffer);
} );
// recombine chunks using e.g. LINQ SelectMany
I am working on a hobby project (simple / efficient datastore). My current concern is regarding the performance of reading data from disk (binary) and populating my objects.
My goal is to create a simple store optimized for read performance (for mobile) that is much faster than reading from SQL database or CSV.
After profiling the application, when I read data from disk (~1000 records = 240 ms) records most of the time is spent in the method "set(byte[])":
// data layout:
// strings are stored as there UTF-8 representation in a byte array
// within a "row", the first two bytes contain the length in bytes of the string data
// my data store also supports other types (which are much faster) - not shown below.
class myObject : IRow
{
public string Name;
public string Title;
// and so on
public void set(byte[] row_buffer)
{
int offset = 0;
short strLength = 0;
// Name - variable about 40 bytes
strLength = BitConverter.ToInt16(row_buffer, offset);
offset += 2;
Name = Encoding.UTF8.GetString(row_buffer, offset, strLength);
offset += strLength;
// Path - variable about 150 bytes
strLength = BitConverter.ToInt16(row_buffer, offset);
offset += 2;
Path = Encoding.UTF8.GetString(row_buffer, offset, strLength);
offset += strLength;
// and so on
}
}
Further remarks:
The data is read as binary from disk.
for each row in the file, a new object is created and the function set(row_buffer) is called.
Reading the stream into the row_buffer (using br.Read(row_Buffer, 0, rowLengths[i])) consumes ~ 10% of the time
Converting the bytes (GetString) to string consumes about 88% of the time
-> I don't understand why creating strings is so expensive :(
Any idea, how I can improve the performance? I am limited to "safe C#" code only.
Thanks for reading.
EDIT
I need to create the Objects to run my Linq queries. I would like to defer object creation but failed to find a way at this stage. See my other SO question: Implement Linq query on byte[] for my own type
I have a 1 GB text file which I need to read line by line. What is the best and fastest way to do this?
private void ReadTxtFile()
{
string filePath = string.Empty;
filePath = openFileDialog1.FileName;
if (string.IsNullOrEmpty(filePath))
{
using (StreamReader sr = new StreamReader(filePath))
{
String line;
while ((line = sr.ReadLine()) != null)
{
FormatData(line);
}
}
}
}
In FormatData() I check the starting word of line which must be matched with a word and based on that increment an integer variable.
void FormatData(string line)
{
if (line.StartWith(word))
{
globalIntVariable++;
}
}
If you are using .NET 4.0, try MemoryMappedFile which is a designed class for this scenario.
You can use StreamReader.ReadLine otherwise.
Using StreamReader is probably the way to since you don't want the whole file in memory at once. MemoryMappedFile is more for random access than sequential reading (it's ten times as fast for sequential reading and memory mapping is ten times as fast for random access).
You might also try creating your streamreader from a filestream with FileOptions set to SequentialScan (see FileOptions Enumeration), but I doubt it will make much of a difference.
There are however ways to make your example more effective, since you do your formatting in the same loop as reading. You're wasting clockcycles, so if you want even more performance, it would be better with a multithreaded asynchronous solution where one thread reads data and another formats it as it becomes available. Checkout BlockingColletion that might fit your needs:
Blocking Collection and the Producer-Consumer Problem
If you want the fastest possible performance, in my experience the only way is to read in as large a chunk of binary data sequentially and deserialize it into text in parallel, but the code starts to get complicated at that point.
You can use LINQ:
int result = File.ReadLines(filePath).Count(line => line.StartsWith(word));
File.ReadLines returns an IEnumerable<String> that lazily reads each line from the file without loading the whole file into memory.
Enumerable.Count counts the lines that start with the word.
If you are calling this from an UI thread, use a BackgroundWorker.
Probably to read it line by line.
You should rather not try to force it into memory by reading to end and then processing.
StreamReader.ReadLine should work fine. Let the framework choose the buffering, unless you know by profiling you can do better.
TextReader.ReadLine()
I was facing same problem in our production server at Agenty where we see large files (sometimes 10-25 gb (\t) tab delimited txt files). And after lots of testing and research I found the best way to read large files in small chunks with for/foreach loop and setting offset and limit logic with File.ReadLines().
int TotalRows = File.ReadLines(Path).Count(); // Count the number of rows in file with lazy load
int Limit = 100000; // 100000 rows per batch
for (int Offset = 0; Offset < TotalRows; Offset += Limit)
{
var table = Path.FileToTable(heading: true, delimiter: '\t', offset : Offset, limit: Limit);
// Do all your processing here and with limit and offset and save to drive in append mode
// The append mode will write the output in same file for each processed batch.
table.TableToFile(#"C:\output.txt");
}
See the complete code in my Github library : https://github.com/Agenty/FileReader/
Full Disclosure - I work for Agenty, the company who owned this library and website
My file is over 13 GB:
You can use my class:
public static void Read(int length)
{
StringBuilder resultAsString = new StringBuilder();
using (MemoryMappedFile memoryMappedFile = MemoryMappedFile.CreateFromFile(#"D:\_Profession\Projects\Parto\HotelDataManagement\_Document\Expedia_Rapid.jsonl\Expedia_Rapi.json"))
using (MemoryMappedViewStream memoryMappedViewStream = memoryMappedFile.CreateViewStream(0, length))
{
for (int i = 0; i < length; i++)
{
//Reads a byte from a stream and advances the position within the stream by one byte, or returns -1 if at the end of the stream.
int result = memoryMappedViewStream.ReadByte();
if (result == -1)
{
break;
}
char letter = (char)result;
resultAsString.Append(letter);
}
}
}
This code will read text of file from start to the length that you pass to the method Read(int length) and fill the resultAsString variable.
It will return the bellow text:
I'd read the file 10,000 bytes at a time. Then I'd analyse those 10,000 bytes and chop them into lines and feed them to the FormatData function.
Bonus points for splitting the reading and line analysation on multiple threads.
I'd definitely use a StringBuilder to collect all strings and might build a string buffer to keep about 100 strings in memory all the time.