splitting a large text file into smaller text files - c#

I'm trying to split a text file based on then number of lines contains around 6M lines and each file should always end (last line) with a certain identifier.
What I tried:
using (System.IO.StreamReader sr = new System.IO.StreamReader(inputfile))
{
int fileNumber = 0;
string line = "";
while (!sr.EndOfStream)
{
int count = 0;
//identifier = sr.ReadLine().Substring(0,2);
using (System.IO.StreamWriter sw = new System.IO.StreamWriter(inputfile + ++fileNumber + ".TXT"))
{
sw.AutoFlush = true;
while (!sr.EndOfStream && ++count < 1233123)
{
line = sr.ReadLine();
sw.WriteLine(line);
}
//having problems starting here not sure how to implement the other condition == "JK"
line = sr.ReadLine();
if (count > 1233123 && line.Substring(0,2) == "JK")
{
sw.WriteLine(line);
}
else
{
while (!sr.EndOfStream && line.Substring(0,2) != "JK")
{
line = sr.ReadLine();
sw.WriteLine(line);
}
}
}
}
}
sample input text is like:
AAadsadasdasdasdfsdfsdfs
Bbasfafasfasdfdsfsdfsdff
CCsafsdfasdadfasdfasfsaf
DDasdsfsdfsafdsadfsafasf
JKdfgdsgdsfgsdfgsfgdfgdf
AAfsdfsadfsdfsaadfadasda
BBadfasdfasdfdsfasfasdas
CCadasdsfasdfasfasfasfds
DDsdfsdafasdfsdfdsfsdfsd
EEsadfsfsasafasdfsdfsdfs
FFasfasfadsdfdsadssfsdfs
JKadsadasdasdadsadasdasa
AAadasdasdasdasdasdasdas
BBasdadadadasdasdasdasdd
CCadasdasdasdasdasdasdad
JKsafsdfsdfasfasdfdasfsa
Basically what I'm trying to achieve is have multiple text files that has at least 1233123 lines or more (i.e if line 1233123 does not have "JK" then continue writing to current file till it is found).

While reading and writing files check if your condition, line number greater than 1233123 and line starting with JK, is true. In this case you can stop writing to the file fragment and continue with the next iteration of your most outer loop, which starts writing to the next file.
using (System.IO.StreamWriter sw = new System.IO.StreamWriter(inputfile + ++fileNumber + ".TXT"))
{
sw.AutoFlush = true;
while (!sr.EndOfStream)
{
line = sr.ReadLine();
sw.WriteLine(line);
if(++count > 1233123 && line.Substring(0,2) == "JK")
{
break;
}
}
}

Related

How find and remove specific line with next or previous lines in large text document

I'm trying to figure out, how to remove specific string from large text document with 500 000 lines. Find line by content, but at the same time get current line index value in text document order, which must not be disturbed, to remove next or previous line of found line, in other words find closest by index, to remove both for large document. Because any method I've tried with using File.WriteAllLines program hangs with such size. I have active requesting to this file and seems like need to find some other way. For example file content is:
1. line 1
2. line 2
3. line 3
4. line 4
5. line 5
and line to find and remove is:
string input = "line 3"
to get this result with removing of found line index and next line index + 1 of next line, if found line index number is odd:
line 1
line 2
line 5
and at the same time be able to remove found line index and index - 1 previous line, if found line index is even number for searching string:
string input = "line 4"
and result should be:
line 1
line 2
line 5
And to know if line is does not exist in the text document.
Write to the same single file.
If you want to process very large file, the you should use FileStream to avoid loading all of the contents into memory.
To meet your last requirement, you can read the lines two by two. It actually makes your code simpler.
var inputFileName = #"D:\test-input.txt";
var outputFileName = Path.GetTempFileName();
var search = "line 4";
using (var strInp = File.Open(inputFileName, FileMode.Open))
using (var strOtp = File.Open(outputFileName, FileMode.Create))
using (var reader = new StreamReader(strInp))
using (var writer = new StreamWriter(strOtp))
{
while (reader.Peek() >= 0)
{
var lineOdd = reader.ReadLine();
var lineEven = (string)null;
if (reader.Peek() >= 0)
lineEven = reader.ReadLine();
if(lineOdd != search && lineEven != search)
{
writer.WriteLine(lineOdd);
if(lineEven != null)
writer.WriteLine(lineEven);
}
}
}
// at this point, operation is sucessfull
// rename temp file with original one
File.Delete(inputFileName);
File.Move(outputFileName, inputFileName);
Use a System.IO.StreamReader.
private static void RemoveLines(string lineToRemove, bool skipPrevious, bool skipNext)
{
string previousLine = string.Empty;
string currentLine;
bool isNext = false;
using (StreamWriter sw = File.CreateText(#"output.txt"))
{
using (StreamReader sr = File.OpenText(#"input.txt"))
{
while ((currentLine = sr.ReadLine()) != null)
{
if (isNext)
{
currentLine = string.Empty;
isNext = false;
}
if (currentLine == lineToRemove)
{
if (skipPrevious)
{
previousLine = string.Empty;
}
if (skipNext)
{
currentLine = string.Empty;
isNext = true;
}
}
if (previousLine != string.Empty && previousLine != lineToRemove)
{
sw.WriteLine(previousLine);
}
previousLine = currentLine;
}
}
if (previousLine != string.Empty && previousLine != lineToRemove)
{
sw.WriteLine(previousLine);
}
}
}
Haven't tested it, but this would give required directions.
Let the input file is inputFile.txt then you can use File.ReadAllLines() method to get all lines in that particular file. Then use IndexOf() method to find the index of specific line in that list, if it is not found means it will return -1 then use RemoveAt() to remove the line at that particular index. Consider the code:
List<string> linesInFile = File.ReadAllLines(filePath).ToList(); // gives you list of lines
string input = "line 3";
int lineIndex = linesInFile.IndexOf(input);
if (lineIndex != -1)
{
linesInFile.RemoveAt(lineIndex);
}
// If you may have more number of match for particular line means you can try this as well :
linesInFile.RemoveAll(x=> x== input);
If you want to write it back to the file means use this line:
File.WriteAllLines(filePath,linesInFile);

Read file, check correctness of column, write file C#

I need to check certain columns of data to make sure there are no trailing blank spaces. At first thought I thought it would be very easy, but after attempting to achieve the goal I have got stuck.
I know that there should be 6-digits in the column I need to check. If there is less I will reject, if there are more I will trim the blank spaces. After doing that for the entire file, I want to write it back to the file with the same delimiters.
This is my attempt:
Everything seems to be working correctly except for writing the file.
if (File.Exists(filename))
{
using (StreamReader sr = new StreamReader(filename))
{
string lines = sr.ReadLine();
string[] delimit = lines.Split('|');
while (delimit[count] != "COLUMN_DATA_TO_CHANGE")
{
count++;
}
string[] allLines = File.ReadAllLines(#filename);
foreach(string nextLine in allLines.Skip(1)){
string[] tempLine = nextLine.Split('|');
if (tempLine[count].Length == 6)
{
checkColumn(tempLine);
writeFile(tempLine);
}
else if (tempLine[count].Length > 6)
{
tempLine[count] = tempLine[count].Trim();
checkColumn(tempLine);
}
else
{
throw new Exception("Not enough numbers");
}
}
}
}
}
public static void checkColumn(string[] str)
{
for (int i = 0; i < str[count].Length; i++)
{
char[] c = str[count].ToCharArray();
if (!Char.IsDigit(c[i]))
{
throw new Exception("A non-digit is contained in data");
}
}
}
public static void writeFile(string[] str)
{
string temp;
using (StreamWriter sw = new StreamWriter(filename+ "_tmp", false))
{
StringBuilder builder = new StringBuilder();
bool firstColumn = true;
foreach (string value in str)
{
if (!firstColumn)
{
builder.Append('|');
}
if (value.IndexOfAny(new char[] { '"', ',' }) != -1)
{
builder.AppendFormat("\"{0}\"", value.Replace("\"", "\"\""));
}
else
{
builder.Append(value);
}
firstColumn = false;
}
temp = builder.ToString();
sw.WriteLine(temp);
}
}
If there is a better way to go about this, I would love to hear it. Thank you for looking at the question.
edit:
file structure-
country| firstname| lastname| uniqueID (column I am checking)| address| etc
USA|John|Doe|123456 |5 main street|
notice the blank space after the 6
var oldLines = File.ReadAllLines(filePath):
var newLines = oldLines.Select(FixLine).ToArray();
File.WriteAllLines(filePath, newLines);
string FixLine(string oldLine)
{
string fixedLine = ....
return fixedLine;
}
The main problem with writing the file is that you're opening the output file for each output line, and you're opening it with append=false, which causes the file to be overwritten every time. A better approach would be to open the output file one time (probably right after validating the input file header).
Another problem is that you're opening the input file a second time with .ReadAllLines(). It would be better to read the existing file one line at a time in a loop.
Consider this modification:
using (StreamWriter sw = new StreamWriter(filename+ "_tmp", false))
{
string nextLine;
while ((nextLine = sr.ReadLine()) != null)
{
string[] tempLine = nextLine.Split('|');
...
writeFile(sw, tempLine);

c# function inside while won't cycle

hi everybody i have this code
StreamReader reader = new StreamReader("C:\\Users\\lorenzov\\Desktop\\gi_pulito_neg.txt");
string line = reader.ReadLine();
string app = "";
int i = 0;
while (line != null)
{
i++;
line = reader.ReadLine();
if (line != null)
{
int lunghezza = line.Length;
}
Console.WriteLine(i);
System.Threading.Thread.Sleep(800);
string ris= traduttore.traduci(targetLanguage, line);
// Console.WriteLine(line);
// Console.WriteLine(ris);
// Console.Read();
// app = app + ris;
// System.Threading.Thread.Sleep(50);
File.AppendAllText(#"C:\Users\lorenzov\Desktop\gi_tradotto_neg.txt", ris + Environment.NewLine);
}
the fact is that i have a txt file which i want to translate with this function traduci(targetLanguage,line), the function is ok, i want to translate each line into another file, while is looping the function is blocking at the first loop, if i insert consonle.read() when i press enter the function works...ho can i do? thank you all!
Your code is pretty messy. I would suggest the following method to loop over the StreamReader lines:
using (StreamReader reader = new StreamReader("C:\\Users\\lorenzov\\Desktop\\gi_pulito_neg.txt"))
{
string line;
while (!reader.EndOfStream)
{
line = reader.ReadLine();
// ... process the line
}
}
If ReadLine returns a null, your code will break. better structure:
StreamReader reader = new StreamReader("C:\\Users\\lorenzov\\Desktop\\gi_pulito_neg.txt");
string line;
string app = "";
int i = 0;
while ((line = reader.ReadLine()) != null)
{
i++;
int lunghezza = line.Length;
Console.WriteLine(i);
System.Threading.Thread.Sleep(800);
string ris= traduttore.traduci(targetLanguage, line);
// Console.WriteLine(line);
// Console.WriteLine(ris);
// Console.Read();
// app = app + ris;
// System.Threading.Thread.Sleep(50);
File.AppendAllText(#"C:\Users\lorenzov\Desktop\gi_tradotto_neg.txt", ris + Environment.NewLine);
}
The code as it stands will skip over the first line, as you use ReadLine() twice prior to fist use.
You can restructure the code as
using (StreamReader reader = new StreamReader(#"C:\Users\lorenzov\Desktop\gi_pulito_neg.txt"))
using (StreamWriter writer = new StreamWriter(#"C:\Users\lorenzov\Desktop\gi_tradotto_neg.txt"))
{
string line = reader.ReadLine();
while(line != null)
{
System.Threading.Thread.Sleep(800);
string ris = traduttore.traduci(targetLanguage, line);
writer.WriteLine(ris);
line = reader.ReadLine();
}
}

Writing to temp array from text file

I asked this before but most people don't understand my question.
I have two text files. Gamenam.txt which is the text file I'm reading from, and gamenam_2.txt.
In the gamenam.txt I have strings like this:
01456
02456
05215
05111
01421
03117
05771
01542
04331
05231
I have written a code to count number of times substring "05" appears in text file before substring "01".
My output which is written to gamenam_1.txt is:
01456
02456
05215
05111
2
01421
03117
05771
1
01542
04331
05231
1
This was the code I wrote to achieve
string line;
int counter = 0;
Boolean isFirstLine = true;
try
{
StreamReader sr = new StreamReader("C:\\Files\\gamenam.txt");
StreamWriter sw = new StreamWriter("C:\\Files\\gamenam_1.txt");
while ((line = sr.ReadLine()) != null)
{
if (line.Substring(0, 2) == "01")
{
if (!isFirstLine)
{
sw.WriteLine(counter.ToString());
counter = 0;
}
}
if (line.Substring(0, 2) == "05")
{
counter++;
}
sw.WriteLine(line);
if (sr.Peek() < 0)
{
sw.Write(counter.ToString());
}
isFirstLine = false;
}
sr.Close();
sw.Close();
}
catch (Exception e)
{
Console.WriteLine("Exception: " + e.Message);
}
finally
{
Console.WriteLine("Exception finally block.");
}
That code is working perfectly.
Now I have to write a code to print the count of substring "05" before writing lines.
My output should look something like this:
2
01456
02456
05215
05111
1
01421
03117
05771
1
01542
04331
05231
Apparently I should write the lines first to temporary string array, the count and then write count and then after write lines from my temporary array.
I'm new to development so I'm stuck trying to figure out how I'd achieve this.
Any help will highly be appreciated.
Try this
string line;
int counter = 0;
Boolean isFirstLine = true;
try
{
StreamReader sr = new StreamReader("C:\\Files\\gamenam.txt");
StreamWriter sw = new StreamWriter("C:\\Files\\gamenam_1.txt");
var lines = new List<string>(); //Here goes the temp lines
while ((line = sr.ReadLine()) != null)
{
if (line.Substring(0, 2) == "01")
{
if (!isFirstLine)
{
sw.WriteLine(counter.ToString()); //write the number before the lines
foreach(var l in lines)
sw.WriteLine(l); //actually write the lines
counter = 0;
lines.Clear(); //clear the list for next round
}
}
if (line.Substring(0, 2) == "05")
{
counter++;
}
lines.add(line); //instead of writing, just adds the line to the temp list
if (sr.Peek() < 0)
{
sw.WriteLine(counter.ToString()); //writes everything left
foreach(var l in lines)
sw.WriteLine(l);
}
isFirstLine = false;
}
sr.Close();
sw.Close();
}
catch (Exception e)
{
Console.WriteLine("Exception: " + e.Message);
}
finally
{
Console.WriteLine("Exception finally block.");
}

Deleting some content from the text file in c#

I have text file called Load.txt which contains approximately 200 lines. I have a checkbox, If that is checked then I want to create a new file which had only first 100 lines from the Load.txt. And I am using c# for this program. Actually my real requirement is that I have to delete from line 110 to 201.And my code is below and because of some reason its deleting from line 1 to 92. I dnt know whats happening.
String line = null;
String tempFile = Path.GetTempFileName();
String filePath = saveFileDialog1.FileName;
int line_number = 110;
int lines_to_delete = 201;
using (StreamReader reader = new StreamReader(sqlConnectionString))
{
using (StreamWriter writer = new StreamWriter(saveFileDialog1.FileName))
{
while ((line = reader.ReadLine()) != null)
{
line_number++;
if (line_number <= lines_to_delete)
continue;
writer.WriteLine(line);
}
}
}
So I figured out this issue. But my next issue is that: I am updating some of variables in the text file. Before my that code was alright . But now That code is conflicting with my delete lines code. If I am able to delete lines then I m not able to update those variables.
My Code is:
File.WriteAllLines(saveFileDialog1.FileName, System.IO.File.ReadLine(sqlConnectionString).Take(110));
File.WriteAllText(saveFileDialog1.FileName, fileContents);
File.WriteAllLines("new.txt", File.ReadLines("Load.txt").Take(100));
After update...
var desired = File.ReadLines("Load.txt")
.Take(110) // "And I want to keep 1-110" -- OP
.Select(line => UpdateLine(line)); // "And I also want to update variables between 1-110" -- OP
File.WriteAllLines("new.txt", desired);
...
static string UpdateLine(string given)
{
var updated = given;
// other ops
return updated;
}
MSDN File.WriteAllLines
MSDN File.ReadLines
THis should accomplish what you need. It reads the text then dumps 100 lines of it.
System.IO.File.WriteAllLines("newLoad.txt", System.IO.File.ReadLines("Load.txt").Take(100));
"I want to create a new file which had only first 100 lines"
Keeping with your original model, here's one way to keep just the first 100 lines:
int LinesToKeep = 100;
using (StreamReader reader = new StreamReader(sqlConnectionString))
{
using (StreamWriter writer = new StreamWriter(saveFileDialog1.FileName))
{
for (int i = 1; (i <= LinesToKeep) && ((line = reader.ReadLine()) != null); i++)
{
writer.WriteLine(line);
}
}
}
"my real requirement is that I have to delete from line 110 to 201"
So copy the file, but skip lines 110 to 201?
int currentLine = 0;
using (StreamReader reader = new StreamReader(sqlConnectionString))
{
using (StreamWriter writer = new StreamWriter(saveFileDialog1.FileName))
{
while ((line = reader.ReadLine()) != null)
{
currentLine++;
if (currentLine < 110 || currentLine > 201)
{
writer.WriteLine(line);
}
}
}
}

Categories

Resources