I have a master file called FileName with IDs of people. It is in sorted order.
I want to divide IDs into 27 chunks and copy each chunk into a different text file.
using (FileStream fs = File.Open(FileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
string line;
int numOfLines = File.ReadAllLines(FileName).Length; -- I have 73467
int eachSubSet = (numOfLines / 27);
var lines = File.ReadAllLines(dataFileName).Take(eachSubSet);
File.WriteAllLines(FileName1,lines);
}
I have 27 different text files. so I want 73467 of IDs divided equally and copied over to 27 different files. So, 1st file will have ID#1 to ID#2721
2nd Dile will have ID#2722 to ID#(2722+2721) and so on. I do not know how to automate this and run this quickly.
Thanks
HR
The simplest way would be to run File.ReadLine and WriteLine inside a loop and decide what file will receive which line.
I wouldn't recommend you to parallelize this routine since it's an IO operation, but just the copy of lines would be pretty fast.
Note that in your sample code you called File.ReadAllLines twice, so you actually parse your entire input file twice.
So avoiding that should speed up the process, and also you didn't actually split the files, you only wrote the first file out of the 27.
Untested, but something along these lines should work:
const int numOfFiles = 27;
string[] lines = File.ReadAllLines(FileName);
int numOfLines = lines.Length;
int eachSubSet = numOfLines/numOfFiles;
int firstSubset = numOfLines%numOfFiles + eachSubSet;
IEnumerable<string> linesLeftToWrite = lines;
for (int index = 0; index < numOfFiles; index++)
{
int numToTake = index == 0 ? firstSubset : eachSubSet;
File.WriteAllLines(string.Format("{0}_{1}.txt", FileName, index), linesLeftToWrite.Take(numToTake));
linesLeftToWrite = linesLeftToWrite.Skip(numToTake);
}
Related
I plan on reading the marks from a text file and then calculating what the average mark is based upon data written in previous code. I haven't been able to read the marks though or calculate how many marks there are as BinaryReader doesn't let you use .Length.
I have tried using an array to hold each mark but it doesn't like each mark being an integer
public static int CalculateAverage()
{
int count = 0;
int total = 0;
float average;
BinaryReader markFile;
markFile = new BinaryReader(new FileStream("studentMarks.txt", FileMode.Open));
//A loop to read each line of the file and add it to the total
{
//total = total + eachMark;
//count++;
}
//average = total / count;
//markFile.Close();
//Console.WriteLine("Average mark:", average);
return 0;
}
This is my studentMark.txt file in VS
First of all, don't use BinerayRead you can use StreamReader for example.
Also with using statement is not necessary implement the close().
There is an answer using a while loop, so using Linq you can do in one line:
var avg = File.ReadAllLines("file.txt").ToArray().Average(a => Int32.Parse(a));
Console.WriteLine("avg = "+avg); //5
Also using File.ReadAllLines() according too docs the file is loaded into memory and then close, so there is no leak memory problem or whatever.
Opens a text file, reads all lines of the file into a string array, and then closes the file.
Edit to add the way to read using BinaryReader.
First thing to know is you are reading a txt file. Unless you have created the file using BinaryWriter, the binary reader will not work. And, if you are creating a binary file, there is not a good practice name as .txt.
So, assuming your file is binary, you need to loop and read every integer, so this code shoul work.
var fileName = "file.txt";
if (File.Exists(fileName))
{
using (BinaryReader reader = new BinaryReader(File.Open(fileName, FileMode.Open)))
{
while (reader.BaseStream.Position < reader.BaseStream.Length)
{
total +=reader.ReadInt32();
count++;
}
}
average = total/count;
Console.WriteLine("Average = "+average); // 5
}
I've used using to ensure file is close at the end.
If your file only contains numbers, you only have to use ReadInt32() and it will work.
Also, if your file is not binary, obviously, binary writer will not work. By the way, my binary file.txt created using BinaryWriter looks like this:
So I'm assuming you dont have a binary file...
I'm using StreamWriter to write into file, but I need the index of line I'm writing to.
int i;
using (StreamWriter s = new StreamWriter("myfilename",true) {
i= s.Index(); //or something that works.
s.WriteLine("text");
}
My only idea is to read the whole file and count the lines. Any better solution?
The definition of a line
The definition of a line index and more specifically a line in a file is denoted by the \n character. Typically (and on Windows moreso) this can be preceded by the carriage return \r character too, but not required and not typically present on Linux or Mac.
Correct Solution
So what you are asking is for the line index at the current position basically means you are asking for the number of \n present before the current position in the file you are writing to, which seems to be the end (appending to the file), so you can think of it as how many lines are in the file.
You can read the stream and count these, with consideration for your machines RAM and to not just read in the entire file into memory. So this would be safe to use on very large files.
// File to read/write
var filePath = #"C:\Users\luke\Desktop\test.txt";
// Write a file with 3 lines
File.WriteAllLines(filePath,
new[] {
"line 1",
"line 2",
"line 3",
});
// Get newline character
byte newLine = (byte)'\n';
// Create read buffer
var buffer = new char[1024];
// Keep track of amount of data read
var read = 0;
// Keep track of the number of lines
var numberOfLines = 0;
// Read the file
using (var streamReader = new StreamReader(filePath))
{
do
{
// Read the next chunk
read = streamReader.ReadBlock(buffer, 0, buffer.Length);
// If no data read...
if (read == 0)
// We are done
break;
// We read some data, so go through each character...
for (var i = 0; i < read; i++)
// If the character is \n
if (buffer[i] == newLine)
// We found a line
numberOfLines++;
}
while (read > 0);
}
The lazy solution
If your files are not that large (large being dependant on your intended machine/device RAM and program as a whole) and you want to just read the entire file into memory (so into your programs RAM) you can do a one liner:
var numberOfLines = File.ReadAllLines(filePath).Length;
I would like to consecutively read from a text file that is generated by my program. The problem is that after parsing the file for the first time, my program reads the last line of the file before it can begin re-parsing, which causes it to accumulates unwanted data.
3 photos: first is creating tournament and showing points, second is showing text file and the third is showing that TeamA got more 3 points
StreamReader = new StreamReader("Torneios.txt");
torneios = 0;
while (!rd.EndOfStream)
{
string line = rd.ReadLine();
if (line == "Tournament")
{
torneios++;
}
else
{
string[] arr = line.Split('-');
equipaAA = arr[0];
equipaBB = arr[1];
res = Convert.ToChar(arr[2]);
}
}
rd.Close();
That is what I'm using at the moment.
To avoid mistakes like these, I highly recommend using File.ReadAllText or File.ReadAllLines unless you are using large files (in which case they are not good choices), here is an example of an implementation of such:
string result = File.ReadAllText("textfilename.txt");
Regarding your particular code, an example using File.ReadAllLines which achieves this is:
string[] lines = File.ReadAllLines("textfilename.txt");
for(int i = 0; i < lines.Length; i++)
{
string line = lines[i];
//Do whatever you want here
}
Just to make it clear, this is not a good idea if the files you intend to read from are large (such as binary files).
I have a requirement to test some load issues with regards to file size. I have a windows application written in C# which will automatically generate the files. I know the size of each file, ex. 100KB, and how many files to generate. What I need help with is how to generate a string less than or equal to the required file size.
pseudo code:
long fileSizeInKB = (1024 * 100); //100KB
int numberOfFiles = 5;
for(var i = 0; i < numberOfFiles - 1; i++) {
var dataSize = fileSizeInKB;
var buffer = new byte[dataSize];
using (var fs = new FileStream(File, FileMode.Create, FileAccess.Write)) {
}
}
You can always use the a constructor for string which takes a char and a number of times you want that character repeated:
string myString = new string('*', 5000);
This gives you a string of 5000 stars - tweak to your needs.
Easiest way would be following code:
var content = new string('A', fileSizeInKB);
Now you've got a string with as many A as required.
To fill it with Lorem Ipsum or some other repeating string build something like the following pseudocode:
string contentString = "Lorem Ipsum...";
for (int i = 0; i < fileSizeInKB / contentString.Length; i++)
//write contentString to file
if (fileSizeInKB % contentString.Length > 0)
// write remaining substring of contentString to file
Edit: If you're saving in Unicode you may need to half the filesize count because unicode uses two bytes per character if I remember correctly.
There are so many variations on how you can do this. One would be, fill the file with a bunch of chars. You need 100KB? No problem.. 100 * 1024 * 8 = 819200 bits. A single char is 16 bits. 819200 / 16 = 51200. You need to stick 51,200 chars into a file. But consider that a file may have additional header/meta data, so you may need to account for that and decrease the number of chars to write to file.
As a partial answer to your question I recently created a portable WPF app that easily creates 'junk' files of almost any size: https://github.com/webmooch/FileCreator
I have a 1 GB text file which I need to read line by line. What is the best and fastest way to do this?
private void ReadTxtFile()
{
string filePath = string.Empty;
filePath = openFileDialog1.FileName;
if (string.IsNullOrEmpty(filePath))
{
using (StreamReader sr = new StreamReader(filePath))
{
String line;
while ((line = sr.ReadLine()) != null)
{
FormatData(line);
}
}
}
}
In FormatData() I check the starting word of line which must be matched with a word and based on that increment an integer variable.
void FormatData(string line)
{
if (line.StartWith(word))
{
globalIntVariable++;
}
}
If you are using .NET 4.0, try MemoryMappedFile which is a designed class for this scenario.
You can use StreamReader.ReadLine otherwise.
Using StreamReader is probably the way to since you don't want the whole file in memory at once. MemoryMappedFile is more for random access than sequential reading (it's ten times as fast for sequential reading and memory mapping is ten times as fast for random access).
You might also try creating your streamreader from a filestream with FileOptions set to SequentialScan (see FileOptions Enumeration), but I doubt it will make much of a difference.
There are however ways to make your example more effective, since you do your formatting in the same loop as reading. You're wasting clockcycles, so if you want even more performance, it would be better with a multithreaded asynchronous solution where one thread reads data and another formats it as it becomes available. Checkout BlockingColletion that might fit your needs:
Blocking Collection and the Producer-Consumer Problem
If you want the fastest possible performance, in my experience the only way is to read in as large a chunk of binary data sequentially and deserialize it into text in parallel, but the code starts to get complicated at that point.
You can use LINQ:
int result = File.ReadLines(filePath).Count(line => line.StartsWith(word));
File.ReadLines returns an IEnumerable<String> that lazily reads each line from the file without loading the whole file into memory.
Enumerable.Count counts the lines that start with the word.
If you are calling this from an UI thread, use a BackgroundWorker.
Probably to read it line by line.
You should rather not try to force it into memory by reading to end and then processing.
StreamReader.ReadLine should work fine. Let the framework choose the buffering, unless you know by profiling you can do better.
TextReader.ReadLine()
I was facing same problem in our production server at Agenty where we see large files (sometimes 10-25 gb (\t) tab delimited txt files). And after lots of testing and research I found the best way to read large files in small chunks with for/foreach loop and setting offset and limit logic with File.ReadLines().
int TotalRows = File.ReadLines(Path).Count(); // Count the number of rows in file with lazy load
int Limit = 100000; // 100000 rows per batch
for (int Offset = 0; Offset < TotalRows; Offset += Limit)
{
var table = Path.FileToTable(heading: true, delimiter: '\t', offset : Offset, limit: Limit);
// Do all your processing here and with limit and offset and save to drive in append mode
// The append mode will write the output in same file for each processed batch.
table.TableToFile(#"C:\output.txt");
}
See the complete code in my Github library : https://github.com/Agenty/FileReader/
Full Disclosure - I work for Agenty, the company who owned this library and website
My file is over 13 GB:
You can use my class:
public static void Read(int length)
{
StringBuilder resultAsString = new StringBuilder();
using (MemoryMappedFile memoryMappedFile = MemoryMappedFile.CreateFromFile(#"D:\_Profession\Projects\Parto\HotelDataManagement\_Document\Expedia_Rapid.jsonl\Expedia_Rapi.json"))
using (MemoryMappedViewStream memoryMappedViewStream = memoryMappedFile.CreateViewStream(0, length))
{
for (int i = 0; i < length; i++)
{
//Reads a byte from a stream and advances the position within the stream by one byte, or returns -1 if at the end of the stream.
int result = memoryMappedViewStream.ReadByte();
if (result == -1)
{
break;
}
char letter = (char)result;
resultAsString.Append(letter);
}
}
}
This code will read text of file from start to the length that you pass to the method Read(int length) and fill the resultAsString variable.
It will return the bellow text:
I'd read the file 10,000 bytes at a time. Then I'd analyse those 10,000 bytes and chop them into lines and feed them to the FormatData function.
Bonus points for splitting the reading and line analysation on multiple threads.
I'd definitely use a StringBuilder to collect all strings and might build a string buffer to keep about 100 strings in memory all the time.