How to read a specific part in text file? - c#

I have a really big text file (500mb) and i need to get its text.
Of course the problem is the exception-out of memory, but i want to solve it with taking strings (or char arrays) and put them in List.
I search in google and i really don't know how to take a specific part.
* It's a one long line, if that helps.

Do that:
using (FileStream fsSource = new FileStream(pathSource,
FileMode.Open, FileAccess.Read))
{
// Read the source file into a byte array.
int numBytesToRead = // Your amount to read at a time
byte[] bytes = new byte[numBytesToRead];
int numBytesRead = 0;
while (numBytesToRead > 0)
{
// Read may return anything from 0 to numBytesToRead.
int n = fsSource.Read(bytes, numBytesRead, numBytesToRead);
// Break when the end of the file is reached.
if (n == 0)
break;
// Do here what you want to do with the bytes read (convert to string using Encoding.YourEncoding.GetString())
}
}

You can use StreamReader class to read parts of a file.

Related

Read mainframe file and parse data using .net

I have a file which is very long, and has no line breaks, CR or LF or other delimiters.
Records are fixed length, and the first control record length is 24 and all other record lengths are of fixed length 81 bytes.
I know how to read a fixed length file per line basis and I am using Multi Record Engine and have defined classes for each 81 byte line record but can’t figure out how I can read 80 characters at a time and then parse that string for the actual fields.
You can use the FileStream to read the number of bytes you need - like in your case either 24 or 81. Keep in mind that progressing through the stream the position changes and therefor you should not use the offset (should always be 0) - also be aware that if there is no information "left" on the stream it will cause an exception.
So you would end up with something like this:
var recordlength = 81;
var buffer = new byte[recordlength];
stream.Read(buffer, 0, recordlength); // offset = 0, start at current position
var record = System.Text.Encoding.UTF8.GetString(buffer); // single record
Since the recordlength is different for the control record you could use that part into a single method, let's name it Read and use that read method to traverse through the stream untill you reach the end, like this:
public List<string> Records()
{
var result = new List<string>();
using(var stream = new FileStream(#"c:\temp\lipsum.txt", FileMode.Open))
{
// first record
result.Add(Read(stream, 24));
var record = "";
do
{
record = Read(stream);
if (!string.IsNullOrEmpty(record)) result.Add(record);
}
while (record.Length > 0);
}
return result;
}
private string Read(FileStream stream, int length = 81)
{
if (stream.Length < stream.Position + length) return "";
var buffer = new byte[length];
stream.Read(buffer, 0, length);
return System.Text.Encoding.UTF8.GetString(buffer);
}
This will give you a list of records (including the starting control record).
This is far from perfect, but an example - also keep in mind that even if the file is empty there is always 1 result in the returned list.

How to read the from a text file then calculate an average

I plan on reading the marks from a text file and then calculating what the average mark is based upon data written in previous code. I haven't been able to read the marks though or calculate how many marks there are as BinaryReader doesn't let you use .Length.
I have tried using an array to hold each mark but it doesn't like each mark being an integer
public static int CalculateAverage()
{
int count = 0;
int total = 0;
float average;
BinaryReader markFile;
markFile = new BinaryReader(new FileStream("studentMarks.txt", FileMode.Open));
//A loop to read each line of the file and add it to the total
{
//total = total + eachMark;
//count++;
}
//average = total / count;
//markFile.Close();
//Console.WriteLine("Average mark:", average);
return 0;
}
This is my studentMark.txt file in VS
First of all, don't use BinerayRead you can use StreamReader for example.
Also with using statement is not necessary implement the close().
There is an answer using a while loop, so using Linq you can do in one line:
var avg = File.ReadAllLines("file.txt").ToArray().Average(a => Int32.Parse(a));
Console.WriteLine("avg = "+avg); //5
Also using File.ReadAllLines() according too docs the file is loaded into memory and then close, so there is no leak memory problem or whatever.
Opens a text file, reads all lines of the file into a string array, and then closes the file.
Edit to add the way to read using BinaryReader.
First thing to know is you are reading a txt file. Unless you have created the file using BinaryWriter, the binary reader will not work. And, if you are creating a binary file, there is not a good practice name as .txt.
So, assuming your file is binary, you need to loop and read every integer, so this code shoul work.
var fileName = "file.txt";
if (File.Exists(fileName))
{
using (BinaryReader reader = new BinaryReader(File.Open(fileName, FileMode.Open)))
{
while (reader.BaseStream.Position < reader.BaseStream.Length)
{
total +=reader.ReadInt32();
count++;
}
}
average = total/count;
Console.WriteLine("Average = "+average); // 5
}
I've used using to ensure file is close at the end.
If your file only contains numbers, you only have to use ReadInt32() and it will work.
Also, if your file is not binary, obviously, binary writer will not work. By the way, my binary file.txt created using BinaryWriter looks like this:
So I'm assuming you dont have a binary file...

Index of the line in StreamWriter

I'm using StreamWriter to write into file, but I need the index of line I'm writing to.
int i;
using (StreamWriter s = new StreamWriter("myfilename",true) {
i= s.Index(); //or something that works.
s.WriteLine("text");
}
My only idea is to read the whole file and count the lines. Any better solution?
The definition of a line
The definition of a line index and more specifically a line in a file is denoted by the \n character. Typically (and on Windows moreso) this can be preceded by the carriage return \r character too, but not required and not typically present on Linux or Mac.
Correct Solution
So what you are asking is for the line index at the current position basically means you are asking for the number of \n present before the current position in the file you are writing to, which seems to be the end (appending to the file), so you can think of it as how many lines are in the file.
You can read the stream and count these, with consideration for your machines RAM and to not just read in the entire file into memory. So this would be safe to use on very large files.
// File to read/write
var filePath = #"C:\Users\luke\Desktop\test.txt";
// Write a file with 3 lines
File.WriteAllLines(filePath,
new[] {
"line 1",
"line 2",
"line 3",
});
// Get newline character
byte newLine = (byte)'\n';
// Create read buffer
var buffer = new char[1024];
// Keep track of amount of data read
var read = 0;
// Keep track of the number of lines
var numberOfLines = 0;
// Read the file
using (var streamReader = new StreamReader(filePath))
{
do
{
// Read the next chunk
read = streamReader.ReadBlock(buffer, 0, buffer.Length);
// If no data read...
if (read == 0)
// We are done
break;
// We read some data, so go through each character...
for (var i = 0; i < read; i++)
// If the character is \n
if (buffer[i] == newLine)
// We found a line
numberOfLines++;
}
while (read > 0);
}
The lazy solution
If your files are not that large (large being dependant on your intended machine/device RAM and program as a whole) and you want to just read the entire file into memory (so into your programs RAM) you can do a one liner:
var numberOfLines = File.ReadAllLines(filePath).Length;

Issue reading file to byte array

I'm maintaining a program that has the following code to read a file to a byte array:
using (FileStream fileStream = new FileStream(filePath, FileMode.Open))
{
fileStream.Position = 0;
int fileSize = (int)fileStream.Length;
int readSize;
int remain = fileSize;
var pos = 0;
byteData = new byte[fileSize];
while (remain > 0)
{
readSize = fileStream.Read(byteData, pos, Math.Min(1024, remain));
pos += readSize;
remain -= readSize;
}
}
And then afterwards outputs this byte array as a Base64 string:
var value = "File contents:" + Environment.NewLine + Convert.ToBase64String(byteData)
The issue we are occasionally seeing is that the output is just a string of A's, like "AAAAAAAAAAAAAAAAAAAAAA", but longer. I've figured out that if you output a byte array that has been initialized to a given length but not assigned a value (i.e. each byte is still the initial value of 0) it will output in Base64 as a series of A's, so my hypothesis is that the byte array is being created to the size of the file, but then the value of each byte isn't being assigned. Looking at the code I can't see any obvious issues with it though, so if anyone knows better I'd be very grateful.
For posterity, I did end up changing it to File.ReadAllBytes, however I also found out that the issue was with the file itself, and an empty byte array was actually correct. I.e. each byte was still the initial value of 0, so a corresponding base64 string of "A"s was also correct.

Reading and Write specific lines to text file C#

I have a master file called FileName with IDs of people. It is in sorted order.
I want to divide IDs into 27 chunks and copy each chunk into a different text file.
using (FileStream fs = File.Open(FileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
string line;
int numOfLines = File.ReadAllLines(FileName).Length; -- I have 73467
int eachSubSet = (numOfLines / 27);
var lines = File.ReadAllLines(dataFileName).Take(eachSubSet);
File.WriteAllLines(FileName1,lines);
}
I have 27 different text files. so I want 73467 of IDs divided equally and copied over to 27 different files. So, 1st file will have ID#1 to ID#2721
2nd Dile will have ID#2722 to ID#(2722+2721) and so on. I do not know how to automate this and run this quickly.
Thanks
HR
The simplest way would be to run File.ReadLine and WriteLine inside a loop and decide what file will receive which line.
I wouldn't recommend you to parallelize this routine since it's an IO operation, but just the copy of lines would be pretty fast.
Note that in your sample code you called File.ReadAllLines twice, so you actually parse your entire input file twice.
So avoiding that should speed up the process, and also you didn't actually split the files, you only wrote the first file out of the 27.
Untested, but something along these lines should work:
const int numOfFiles = 27;
string[] lines = File.ReadAllLines(FileName);
int numOfLines = lines.Length;
int eachSubSet = numOfLines/numOfFiles;
int firstSubset = numOfLines%numOfFiles + eachSubSet;
IEnumerable<string> linesLeftToWrite = lines;
for (int index = 0; index < numOfFiles; index++)
{
int numToTake = index == 0 ? firstSubset : eachSubSet;
File.WriteAllLines(string.Format("{0}_{1}.txt", FileName, index), linesLeftToWrite.Take(numToTake));
linesLeftToWrite = linesLeftToWrite.Skip(numToTake);
}

Categories

Resources