I'm writing a program which splits a CSV file in four almost-equal parts.
I'm using a 2000-lines CSV input file as example, and when reviewing the output files, there are lines missing in the first file, and also there are uncomplete lines which makes no sense, since I'm writing line by line. Here the code:
using System.IO;
using System;
class MainClass {
public static void Main(string[] args){
string line;
int linesNumber = 0, linesEach = 0, cont = 0;
StreamReader r = new StreamReader("in.csv");
StreamWriter w1 = new StreamWriter("out-1.csv");
StreamWriter w2 = new StreamWriter("out-2.csv");
StreamWriter w3 = new StreamWriter("out-3.csv");
StreamWriter w4 = new StreamWriter("out-4.csv");
while((line = r.ReadLine()) != null)
++linesNumber;
linesEach = linesNumber / 4;
r.DiscardBufferedData();
r.BaseStream.Seek(0, SeekOrigin.Begin);
r.BaseStream.Position = 0;
while((line = r.ReadLine()) != null){
++cont;
if(cont == 1){
//fisrt line must be skipped
continue;
}
if(cont < linesEach){
Console.WriteLine(line);
w1.WriteLine(line);
}
else if(cont < (linesEach*2)){
w2.WriteLine(line);
}
else if(cont < (linesEach*3)){
w3.WriteLine(line);
}
else{
w4.WriteLine(line);
}
}
}
}
Why is the writing part doing wrong? How can I fix it?
Thank you all for your help.
You could simplify your approach by using a Partitioner and some LINQ. It also has the benefit of only having two file handles open at once, instead of 1 for each output file plus the original input file.
using System.Collections.Concurrent;
using System.IO;
using System.Linq;
namespace FileSplitter
{
internal static class Program
{
internal static void Main(string[] args)
{
var input = File.ReadLines("in.csv").Skip(1);
var partitioner = Partitioner.Create(input);
var partitions = partitioner.GetPartitions(4);
for (int i = 0; i < partitions.Count; i++)
{
var enumerator = partitions[i];
using (var stream = File.OpenWrite($"out-{i + 1}.csv"))
{
using (var writer = new StreamWriter(stream))
{
while (enumerator.MoveNext())
{
writer.WriteLine(enumerator.Current);
}
}
}
}
}
}
}
This is not direct answer to your question, just an alternative.
Linq can be used to create shorter codes
int inx = 0;
var fInfo = new FileInfo(filename);
var lines = File.ReadAllLines(fInfo.FullName);
foreach (var groups in lines.GroupBy(x => inx++ / (lines.Length / 4)))
{
var newFileName = $"{fInfo.DirectoryName}\\{fInfo.Name}_{groups.Key}{fInfo.Extension}";
File.WriteAllLines(newFileName, groups);
}
Thank you all for your answers.
The problem is, as Jegan and spender suggested, that the StreamWriter needs to be wrapped in the using clause. That said, problem solved.
Related
My issue is:
I have a file about 100 Mb that I've tried to read line by line and the do some processing. The performance was not very good. That's why I want now to change and read it in memory at once by using ReadAllLines() and then spliting it in some reports that are marked by a line containing T followed by 10 digits. Can someone help me with generating the right regular expression that I can use to split, I am thinking of something like:
#"(\n|\r|\r\n)[T](?<!\d)\d{10}(?!\d)",
is this correct? Thanks in advance!
A split pattern for your case could look like this:
(?=\DT\d{10}\D)
Code Sample:
using System;
using System.Text.RegularExpressions;
class Test
{
static void Main(string[] args)
{
String sourcestring = #"sdfso dadfjlsdfjksdjfkjsd
sdfso dadfjlsdfjksdjfkjsd
T1234567898dssdkfjskfjksdj
T1234567890dssdkfjskfjksdj
sdfso dadfjlsdfjksdjfkjsd
T1234567891dssdkfjskfjksdj";
String matchpattern = #"(?=\DT\d{10}\D)";
Regex re = new Regex(matchpattern);
String[] splitarray = re.Split(sourcestring);
for(int sIdx = 0; sIdx < splitarray.Length; sIdx++ ) {
Console.WriteLine("[{0}] = {1}", sIdx, splitarray[sIdx].Trim());
}
}
}
Depending on your context you are probably still better off reading a large file line by line and collection the individual reports/blocks in a List or the like as suggested by Wiktor. You could also further processing of the reports/blocks in parallel. I suggest making use of the StreamReader and StringBuilder classes.
Sample implementation
using System.Collections.Generic;
using System.IO;
using System.Text;
using System.Text.RegularExpressions;
class Program
{
static void Main(string[] args)
{
string pattern = #"^T\d{10}\D";
var re = new Regex(pattern);
var result = new List<string>();
var block = new StringBuilder();
var fileStream = new FileStream(#"c:\file.txt", FileMode.Open, FileAccess.Read);
using (var streamReader = new StreamReader(fileStream, Encoding.UTF8))
{
string line;
while ((line = streamReader.ReadLine()) != null)
{
if (re.IsMatch(line))
{
//store current block or hand it off to different process, etc.
result.Add(block.ToString());
block.Clear();
}
block.AppendLine(line);
}
// final block
result.Add(block.ToString());
}
}
}
I'm relatively new to c# and I am trying to write a program that finds the mean of every xth value in a file using Streamreader. (For example if I wanted to find the mean of every fifth value in that file)
I written some code that reads the file and splits it into a new line for each comma, and this works fine, when I try and read each specific value.
However I'm struggling to think of a way to find every specific value, such as every 4th one and then find the mean of these and output it in the same program.
static void Main(string[] args)
{
using (var reader = new StreamReader(#"file"))
{
List<string> list = new List<string>();
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
var values = line.Split(',');
list.Add(values[0]);
}
}
}
Any suggestions or help would be greatly appreciated
Try like this;
static void Main()
{
using (var reader = new StreamReader(#"file"))
{
int lineNumber = 4;
bool streamEnded = false;
List<string> list = new List<string>();
while (!streamEnded)
{
var line = ReadSpecificLine(reader, lineNumber,out streamEnded);
if (string.IsNullOrEmpty(line))
{
continue;
}
var values = line.Split(',');
list.Add(values[0]);
}
}
}
public static string ReadSpecificLine(StreamReader sr, int lineNumber,out bool streamEnded)
{
streamEnded = false;
for (int i = 1; i < lineNumber; i++)
{
if (sr.EndOfStream)
{
streamEnded = true;
return "";
}
sr.ReadLine();
}
if (sr.EndOfStream)
{
streamEnded = true;
return "";
}
return sr.ReadLine();
}
Hello I am having a strange error with using pipes to communicate between two process. In short everything is working fine with the program except that the client side never closes the stream, meaning the server's streamReader.readLine never returns null, causing the sever process to never terminate. I'm convinced this is a simple issue but I and struggling to find a answer. Here is some relevant code:
Server Side:
using (StreamReader sr = new StreamReader(clientServer))
{
// Display the read text to the console
string temp;
int count = 0;
while ((temp = sr.ReadLine()) != null)
{
if (count == 0)
{
Console.WriteLine("==========Parent Process found text:like==========");
}
Console.WriteLine(temp);
count++;
}
Console.WriteLine("out of while loop");
}
Client Project:
using System;
using System.Collections.Generic;
using System.IO;
using System.IO.Pipes;
class PipeClient
{
static void Main(string[] args)
{
try
{
if (args.Length < 3)
{
Console.WriteLine("Invalid number of commandline arguments");
}
else
{
List<string> inputList = new List<string>();
List<string> foundMatchList = new List<string>();
using (PipeStream pipeClientIn =
new AnonymousPipeClientStream(PipeDirection.In, args[0]))
{
using (StreamReader sr = new StreamReader(pipeClientIn))
{
// Display the read text to the console
string temp;
int count = 0;
while ((temp = sr.ReadLine()) != null)
{
if (count == 0)
{
Console.WriteLine("==========Client Process Read Text:==========");
}
Console.WriteLine(temp);
inputList.Add(temp);
count++;
}
foreach (var curtString in inputList)
{
if (curtString.Contains(args[2]))
{
foundMatchList.Add(curtString);
}
}
}
//Console.WriteLine("released sr");
}
// Console.WriteLine("released pipeClientIn");
using (PipeStream pipeClientOut =
new AnonymousPipeClientStream(PipeDirection.Out, args[1]))
{
using (StreamWriter sw = new StreamWriter(pipeClientOut))
{
sw.AutoFlush = true;
foreach (var match in foundMatchList)
{
sw.WriteLine(match);
}
}
}
//Console.WriteLine("released pipeClientOut");
}
}
catch (Exception e)
{
/* if (args.Length == 0)
Console.WriteLine("no arguments");
foreach(String s in args)
{
Console.Write("{0} ", s);
}*/
Console.WriteLine(e.Message);
}
}
}
I've tested and can confirm that the client process terminates.
I attempted to manually flush and close the Client StreamWriter but this did not work.
My overall question is: Why am I never seeing the the "out of while loop" message? And how can fix my client so that it will end the stream?
Did you call clientServer.DisposeLocalCopyOfClientHandle()?
from msdn
The DisposeLocalCopyOfClientHandle method should be called after the
client handle has been passed to the client. If this method is not
called, the AnonymousPipeServerStream object will not receive notice
when the client disposes of its PipeStream object.
hope this helps
I'm trying to work out a way of removing records from a program I'm writing. I have a text file with all the customer data spread over a set of lines and I read in these lines one at a time and store them in a List
When writing I simply append to the file. However, for deleting I had the idea of adding a character such as * or # to the front of lines no longer needed. However I am unsure how to do this
Below is how I currrently read the data in:
Thanks in advance
StreamReader dataIn = null;
CustomerClass holdcus; //holdcus and holdacc are used as "holding pens" for the next customer/account
Accounts holdacc;
bool moreData = false;
string[] cusdata = new string[13]; //holds customer data
string[] accdata = new string[8]; //holds account data
if (fileIntegCheck(inputDataFile, ref dataIn))
{
moreData = getCustomer(dataIn, cusdata);
while (moreData == true)
{
holdcus = new CustomerClass(cusdata[0], cusdata[1], cusdata[2], cusdata[3], cusdata[4], cusdata[5], cusdata[6], cusdata[7], cusdata[8], cusdata[9], cusdata[10], cusdata[11], cusdata[12]);
customers.Add(holdcus);
int x = Convert.ToInt32(cusdata[12]);
for (int i = 0; i < x; i++) //Takes the ID number for the last customer, as uses it to set the first value of the following accounts
{ //this is done as a key to which accounts map to which customers
moreData = getAccount(dataIn, accdata);
accdata[0] = cusdata[0];
holdacc = new Accounts(accdata[0], accdata[1], accdata[2], accdata[3], accdata[4], accdata[5], accdata[6], accdata[7]);
accounts.Add(holdacc);
}
moreData = getCustomer(dataIn, cusdata);
}
}
if (moreData != null) dataIn.Close();
Since your using string arrays, you can just do cusdata[index] = "#"+cusdata[index] to append it to the beginning of the line. However if your question is how to delete it from the file, why not skip the above step and just not add the line you want deleted when writing the file?
Here is a small read / write sample that should suit your needs. If it doesnt then let me know in the comment.
class Program
{
static readonly string filePath = "c:\\test.txt";
static void Main(string[] args)
{
// Read your file
List<string> lines = ReadLines();
//Create your remove logic here ..
lines = lines.Where(x => x.Contains("Julia Roberts") != true).ToList();
// Rewrite the file
WriteLines(lines);
}
static List<string> ReadLines()
{
List<string> lines = new List<string>();
using (StreamReader sr = new StreamReader(new FileStream(filePath, FileMode.Open)))
{
while (!sr.EndOfStream)
{
string buffer = sr.ReadLine();
lines.Add(buffer);
// Just to show you the results
Console.WriteLine(buffer);
}
}
return lines;
}
static void WriteLines(List<string> lines)
{
using (StreamWriter sw = new StreamWriter(new FileStream(filePath, FileMode.Create)))
{
foreach (var line in lines)
{
sw.WriteLine(line);
}
}
}
}
I used the following "data sample" for this
Matt Damon 100 222
Julia Roberts 125 152
Robert Downey Jr. 150 402
Tom Hanks 55 932
My file named as test.txt contains
This document is divided into about 5 logical sections starting with a feature and structure overview, followed by an overview of built in column and cell types. Next is an overview of working with data, followed by an overview of specific major features. Lastly, a “best practice” section concludes the main part of this document.
Now i want to delete 2nd line of the file.
How to do it using c#?
Thanks in advance.
Naveenkumar
List<string> lines = File.ReadAllLines(#"filename.txt").ToList();
if(lines.Count>lineNum){
lines.RemoveAt(lineNum);
}
File.WriteAllLines(#"filename.txt",lines.ToArray());
You can acheive this by splitting the text by \n and then using LINQ to select the lines you want to keep, and re-joining them.
var lineNum=5;
var lines=File
.ReadAllText(#"src.txt")
.Split('\n');
var outTxt=String
.Join(
"\n",
lines
.Take(lineNum)
.Concat(lines.Skip(lineNum+1))
.ToArray()
);
Here's a pretty efficient way to do it.
FileInfo x = new FileInfo(#"path\to\original");
string xpath = x.FullName;
FileInfo y = new FileInfo(#"path\to\temporary\new\file");
using (var reader = x.OpenText())
using (var writer = y.AppendText())
{
// write 1st line
writer.WriteLine(reader.ReadLine());
reader.ReadLine(); // skip 2nd line
// write all remaining lines
while (!reader.EndOfStream)
{
writer.WriteLine(reader.ReadLine());
}
}
x.Delete();
y.MoveTo(xpath);
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
namespace rem2ndline
{
class Program
{
static void Main(string[] args)
{
string inPath = #"c:\rem2ndline.txt";
string outPath = #"c:\rem2ndlineresult.txt";
StringBuilder builder = new StringBuilder();
using (FileStream fso = new FileStream(inPath, FileMode.Open))
{
using (StreamReader rdr = new StreamReader(fso))
{
int lineCount = 0;
bool canRead = true;
while (canRead)
{
var line = rdr.ReadLine();
lineCount++;
if (line == null)
{
canRead = false;
}
else
{
if (lineCount != 2)
{
builder.AppendLine(line);
}
}
}
}
}
using(FileStream fso2 = new FileStream(outPath, FileMode.OpenOrCreate))
{
using (StreamWriter strw = new StreamWriter(fso2))
{
strw.Write(builder.ToString());
}
}
}
}
}
Here's what I'd do. The advantage is that you don't have to have the file in memory all at once, so memory requirements should be similar for files of varying sizes (as long as the lines contained in each of the files are of similar length). The drawback is that you can't pipe back to the same file - you have to mess around with a Delete and a Move afterwards.
The extension methods may be overkill for your simple example, but those are two extension methods I come to rely on again and again, as well as the ReadFile method, so I'd typically only have to write the code in Main().
class Program
{
static void Main()
{
var file = #"C:\myFile.txt";
var tempFile = Path.ChangeExtension(file, "tmp");
using (var writer = new StreamWriter(tempFile))
{
ReadFile(file)
.FilterI((i, line) => i != 1)
.ForEach(l => writer.WriteLine(l));
}
File.Delete(file);
File.Move(tempFile, file);
}
static IEnumerable<String> ReadFile(String file)
{
using (var reader = new StreamReader(file))
{
while (!reader.EndOfStream)
{
yield return reader.ReadLine();
}
}
}
}
static class IEnumerableExtensions
{
public static IEnumerable<T> FilterI<T>(
this IEnumerable<T> seq,
Func<Int32, T, Boolean> filter)
{
var index = 0;
foreach (var item in seq)
{
if (filter(index, item))
{
yield return item;
}
index++;
}
}
public static void ForEach<T>(
this IEnumerable<T> seq,
Action<T> action)
{
foreach (var item in seq)
{
action(item);
}
}
}