My issue is:
I have a file about 100 Mb that I've tried to read line by line and the do some processing. The performance was not very good. That's why I want now to change and read it in memory at once by using ReadAllLines() and then spliting it in some reports that are marked by a line containing T followed by 10 digits. Can someone help me with generating the right regular expression that I can use to split, I am thinking of something like:
#"(\n|\r|\r\n)[T](?<!\d)\d{10}(?!\d)",
is this correct? Thanks in advance!
A split pattern for your case could look like this:
(?=\DT\d{10}\D)
Code Sample:
using System;
using System.Text.RegularExpressions;
class Test
{
static void Main(string[] args)
{
String sourcestring = #"sdfso dadfjlsdfjksdjfkjsd
sdfso dadfjlsdfjksdjfkjsd
T1234567898dssdkfjskfjksdj
T1234567890dssdkfjskfjksdj
sdfso dadfjlsdfjksdjfkjsd
T1234567891dssdkfjskfjksdj";
String matchpattern = #"(?=\DT\d{10}\D)";
Regex re = new Regex(matchpattern);
String[] splitarray = re.Split(sourcestring);
for(int sIdx = 0; sIdx < splitarray.Length; sIdx++ ) {
Console.WriteLine("[{0}] = {1}", sIdx, splitarray[sIdx].Trim());
}
}
}
Depending on your context you are probably still better off reading a large file line by line and collection the individual reports/blocks in a List or the like as suggested by Wiktor. You could also further processing of the reports/blocks in parallel. I suggest making use of the StreamReader and StringBuilder classes.
Sample implementation
using System.Collections.Generic;
using System.IO;
using System.Text;
using System.Text.RegularExpressions;
class Program
{
static void Main(string[] args)
{
string pattern = #"^T\d{10}\D";
var re = new Regex(pattern);
var result = new List<string>();
var block = new StringBuilder();
var fileStream = new FileStream(#"c:\file.txt", FileMode.Open, FileAccess.Read);
using (var streamReader = new StreamReader(fileStream, Encoding.UTF8))
{
string line;
while ((line = streamReader.ReadLine()) != null)
{
if (re.IsMatch(line))
{
//store current block or hand it off to different process, etc.
result.Add(block.ToString());
block.Clear();
}
block.AppendLine(line);
}
// final block
result.Add(block.ToString());
}
}
}
Related
I have a file in which i have to read text between startscriptexpression$ and Finish scriptExpression$, and also read between startupdatedescription$ and startupdatedescription$[
The problem is that i want to re write the code in a cleaner format.
My Code:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
namespace Vesrion
{
class Program
{
static void Main(string[] args)
{
string path = #"C:\Users\Development\Desktop\Read\Test.txt";
using (var reader = new StreamReader(path))
{
var textInBetween = new List<string>();
var ListOFDescription = new List<string>();
string NewString = "";
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
//Reads First line,
switch (line)
{
case "StartScriptExpression$":
continue;
case "FinishScriptExpression$":
if (line.Contains("FinishScriptExpression$"))
{
line = "";
}
string Something = string.Join("", textInBetween);
textInBetween = line.Split(',').ToList();
string[] lines = Something.Split(
new string[] { Environment.NewLine },
StringSplitOptions.None);
foreach (var S in lines)
{
ListOFDescription.Add(S);
Console.WriteLine(S);
}
NewString += ListOFDescription;
break;
case "StartUpdateDescription$":
//Console.WriteLine(Environment.NewLine);
continue;
case "FinishUpdateDescription$":
// Console.WriteLine(Environment.NewLine);
continue;
default:
textInBetween.Add(line);
//Console.WriteLine(line);
break;
}
}
}
}
}
}
Text inside start and finish expression must be in a list of string array.
text inside startupdatedescription and finishupdatedescription must be in a string.
.
One way to do it is using regular expression https://dotnetfiddle.net/pxBAMv
Using git show, I can fetch the contents of a particular file from a particular commit, without changing the state of my local clone:
$ git show <file>
$ git show <commit>:<file>
How can I achieve this programatically using libgit2sharp?
According to the documentation:
$ git show 807736c691865a8f03c6f433d90db16d2ac7a005:a.txt
Is equivalent to the code below:
using System;
using System.IO;
using System.Linq;
using System.Text;
using LibGit2Sharp;
namespace ConsoleApp2
{
class Program
{
static void Main(string[] args)
{
var pathToFile = "a.txt";
var commitSha = "807736c691865a8f03c6f433d90db16d2ac7a005";
var repoPath = #"path/to/repo";
using (var repo =
new Repository(repoPath))
{
var commit = repo.Commits.Single(c => c.Sha == commitSha);
var file = commit[pathToFile];
var blob = file.Target as Blob;
using (var content = new StreamReader(blob.GetContentStream(), Encoding.UTF8))
{
var fileContent = content.ReadToEnd();
Console.WriteLine(fileContent);
}
}
}
}
}
As nulltoken says in the comments, Lookup<T>() can use colon-pathspec syntax.
using (var repo = new Repository(repoPath))
{
// The line below is the important one.
var blob = repo.Lookup<Blob>(commitSha + ":" + path);
using (var content = new StreamReader(blob.GetContentStream(), Encoding.UTF8))
{
var fileContent = content.ReadToEnd();
Console.WriteLine(fileContent);
}
}
The tagged line is the change from Andrzej Gis's answer. It replaces the commit =, file =, and blob = lines. Also, commitSha can be any refspec: v3.17.0, HEAD, origin/master, etc.
I'm writing a program which splits a CSV file in four almost-equal parts.
I'm using a 2000-lines CSV input file as example, and when reviewing the output files, there are lines missing in the first file, and also there are uncomplete lines which makes no sense, since I'm writing line by line. Here the code:
using System.IO;
using System;
class MainClass {
public static void Main(string[] args){
string line;
int linesNumber = 0, linesEach = 0, cont = 0;
StreamReader r = new StreamReader("in.csv");
StreamWriter w1 = new StreamWriter("out-1.csv");
StreamWriter w2 = new StreamWriter("out-2.csv");
StreamWriter w3 = new StreamWriter("out-3.csv");
StreamWriter w4 = new StreamWriter("out-4.csv");
while((line = r.ReadLine()) != null)
++linesNumber;
linesEach = linesNumber / 4;
r.DiscardBufferedData();
r.BaseStream.Seek(0, SeekOrigin.Begin);
r.BaseStream.Position = 0;
while((line = r.ReadLine()) != null){
++cont;
if(cont == 1){
//fisrt line must be skipped
continue;
}
if(cont < linesEach){
Console.WriteLine(line);
w1.WriteLine(line);
}
else if(cont < (linesEach*2)){
w2.WriteLine(line);
}
else if(cont < (linesEach*3)){
w3.WriteLine(line);
}
else{
w4.WriteLine(line);
}
}
}
}
Why is the writing part doing wrong? How can I fix it?
Thank you all for your help.
You could simplify your approach by using a Partitioner and some LINQ. It also has the benefit of only having two file handles open at once, instead of 1 for each output file plus the original input file.
using System.Collections.Concurrent;
using System.IO;
using System.Linq;
namespace FileSplitter
{
internal static class Program
{
internal static void Main(string[] args)
{
var input = File.ReadLines("in.csv").Skip(1);
var partitioner = Partitioner.Create(input);
var partitions = partitioner.GetPartitions(4);
for (int i = 0; i < partitions.Count; i++)
{
var enumerator = partitions[i];
using (var stream = File.OpenWrite($"out-{i + 1}.csv"))
{
using (var writer = new StreamWriter(stream))
{
while (enumerator.MoveNext())
{
writer.WriteLine(enumerator.Current);
}
}
}
}
}
}
}
This is not direct answer to your question, just an alternative.
Linq can be used to create shorter codes
int inx = 0;
var fInfo = new FileInfo(filename);
var lines = File.ReadAllLines(fInfo.FullName);
foreach (var groups in lines.GroupBy(x => inx++ / (lines.Length / 4)))
{
var newFileName = $"{fInfo.DirectoryName}\\{fInfo.Name}_{groups.Key}{fInfo.Extension}";
File.WriteAllLines(newFileName, groups);
}
Thank you all for your answers.
The problem is, as Jegan and spender suggested, that the StreamWriter needs to be wrapped in the using clause. That said, problem solved.
I am trying to read from a text file that has multiple outputs from when writing to it but when I want to read from the textfile that I already outputted stuff to, I want to choose the last entry(bear in mind each entry when writing has 5 lines and I just want the line containing "Ciphered text:")
But with this it is reading the lines containing that but I cannot work how to make it show only the last entry containing the string I specified.
using System;
using System.IO;
namespace ReadLastContain
{
class StreamRead
{
static void Main(string[] args)
{
string TempFile = #"C:\Users\Josh\Desktop\text2.txt";
using (var source = new StreamReader(TempFile))
{
string line;
while ((line = source.ReadLine()) != null)
{
if (line.Contains("Ciphered Text:"))
{
Console.WriteLine(line);
}
}
}
}
}
}
I would suggest to use LINQ for better readability:
string lastCipheredText = File.ReadLines(TempFile)
.LastOrDefault(l => l.Contains("Ciphered Text:"));
it is null if there was no such line. If you can't use LINQ:
string lastCipheredText = null;
while ((line = source.ReadLine()) != null)
{
if (line.Contains("Ciphered Text:"))
{
lastCipheredText = line;
}
}
It will be overwritten always, so you automatically get the last line that contained it.
You can use Linq:
var text = File
.ReadLines(#"C:\Users\Josh\Desktop\text2.txt")
.LastOrDefault(line => line.Contains("Ciphered Text:"));
if (null != text) // if there´s a text to print out
Console.WriteLine(text);
My file named as test.txt contains
This document is divided into about 5 logical sections starting with a feature and structure overview, followed by an overview of built in column and cell types. Next is an overview of working with data, followed by an overview of specific major features. Lastly, a “best practice” section concludes the main part of this document.
Now i want to delete 2nd line of the file.
How to do it using c#?
Thanks in advance.
Naveenkumar
List<string> lines = File.ReadAllLines(#"filename.txt").ToList();
if(lines.Count>lineNum){
lines.RemoveAt(lineNum);
}
File.WriteAllLines(#"filename.txt",lines.ToArray());
You can acheive this by splitting the text by \n and then using LINQ to select the lines you want to keep, and re-joining them.
var lineNum=5;
var lines=File
.ReadAllText(#"src.txt")
.Split('\n');
var outTxt=String
.Join(
"\n",
lines
.Take(lineNum)
.Concat(lines.Skip(lineNum+1))
.ToArray()
);
Here's a pretty efficient way to do it.
FileInfo x = new FileInfo(#"path\to\original");
string xpath = x.FullName;
FileInfo y = new FileInfo(#"path\to\temporary\new\file");
using (var reader = x.OpenText())
using (var writer = y.AppendText())
{
// write 1st line
writer.WriteLine(reader.ReadLine());
reader.ReadLine(); // skip 2nd line
// write all remaining lines
while (!reader.EndOfStream)
{
writer.WriteLine(reader.ReadLine());
}
}
x.Delete();
y.MoveTo(xpath);
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
namespace rem2ndline
{
class Program
{
static void Main(string[] args)
{
string inPath = #"c:\rem2ndline.txt";
string outPath = #"c:\rem2ndlineresult.txt";
StringBuilder builder = new StringBuilder();
using (FileStream fso = new FileStream(inPath, FileMode.Open))
{
using (StreamReader rdr = new StreamReader(fso))
{
int lineCount = 0;
bool canRead = true;
while (canRead)
{
var line = rdr.ReadLine();
lineCount++;
if (line == null)
{
canRead = false;
}
else
{
if (lineCount != 2)
{
builder.AppendLine(line);
}
}
}
}
}
using(FileStream fso2 = new FileStream(outPath, FileMode.OpenOrCreate))
{
using (StreamWriter strw = new StreamWriter(fso2))
{
strw.Write(builder.ToString());
}
}
}
}
}
Here's what I'd do. The advantage is that you don't have to have the file in memory all at once, so memory requirements should be similar for files of varying sizes (as long as the lines contained in each of the files are of similar length). The drawback is that you can't pipe back to the same file - you have to mess around with a Delete and a Move afterwards.
The extension methods may be overkill for your simple example, but those are two extension methods I come to rely on again and again, as well as the ReadFile method, so I'd typically only have to write the code in Main().
class Program
{
static void Main()
{
var file = #"C:\myFile.txt";
var tempFile = Path.ChangeExtension(file, "tmp");
using (var writer = new StreamWriter(tempFile))
{
ReadFile(file)
.FilterI((i, line) => i != 1)
.ForEach(l => writer.WriteLine(l));
}
File.Delete(file);
File.Move(tempFile, file);
}
static IEnumerable<String> ReadFile(String file)
{
using (var reader = new StreamReader(file))
{
while (!reader.EndOfStream)
{
yield return reader.ReadLine();
}
}
}
}
static class IEnumerableExtensions
{
public static IEnumerable<T> FilterI<T>(
this IEnumerable<T> seq,
Func<Int32, T, Boolean> filter)
{
var index = 0;
foreach (var item in seq)
{
if (filter(index, item))
{
yield return item;
}
index++;
}
}
public static void ForEach<T>(
this IEnumerable<T> seq,
Action<T> action)
{
foreach (var item in seq)
{
action(item);
}
}
}