I have this little project in C# where I am manipulating with files. Now my task is that I have to delete specific rows from files.
For example my file looks like this:
1-this is the first line
2-this is the second line
3-this is the third line
4-this is the fourth line
Now how can I keep only the first two rows and delete only the last two rows?
Note- this is how I read the file from my local machine:
string[] lines = File.ReadAllLines(#"C:\Users\admin\Desktop\COMMANDS.dat");
I have tried something like this but I think it's not so "efficient"
string text = File.ReadAllText(#"C:\Users\admin\Desktop\COMMANDS.dat");
text = text.Replace(lines[2], "");
text = text.Replace(lines[3], "");
File.WriteAllText(#"C:\Users\admin\Desktop\COMMANDS.dat", text);
So this actually does the job, it replaces the lines by string with an empty character but when I take a look at the file, I don't want to have 4 lines there, even though 2 of them are real strings and the other two are just empty lines... Can I manage to do this in another way?
Try replacing the newline character with an empty string:
string text = File.ReadAllText(#"C:\Users\admin\Desktop\COMMANDS.dat");
text = text.Replace(lines[2], "").Remove(Environment.NewLine, "");
text = text.Replace(lines[3], "").Remove(Environment.NewLine , "");
File.WriteAllText(#"C:\Users\admin\Desktop\COMMANDS.dat", text);
If my answer is useful, please mark it as accepted, and upvote it.
async Task Example()
{
var inputLines = await File.ReadAllLinesAsync("path/to/file.txt");
var outputLines = inputLines.Where((l, i) => i < 2);
await File.WriteAllLinesAsync("target/file.txt", outputLines);
}
What it does
Read data but not as one string but as a collection of lines
Create a new collection containing only the lines you want in your output
Write the filtered lines
Notes:
This example is not optimized for memory usage (because we read all lines and for larger files, e.g. multiple GB, this will fail). See existing answers for memory optimized version) - but: It's totally fine to do it this way if you know you have just a few k lines. (and it's faster)
Try not to "modify" strings. This will always create a copy and needs a lot of memory.
In this "Linq style" (functional) approach, we should treat data as immutable. That means: we have one variable that represents the input file and one variable that represents the result. We use declarative Linq to describe how the output should look like. "output is input where the filter index < 2 matches" instead of "if xy remove line" in an imperative style.
Related
I have a CSV whose author, annoyingly enough, has decided to 'introduce' the file before the contents themselves. So in all, I have a CSV that looks like:
This file was created by XXXXYY and represents the crossover between YY and QQQ.
Additional information can be found through the website GG, blah blah blah...
Jacob, Hybrid
Dan, Pure
Lianne, Hybrid
Jack, Hatchback
So the problem here is that I want to get rid of the first few lines before the 'real content' of the CSV file begins. I'm looking for robustness here, so using Streamreader and removing all content before the 4th line for example, is not ideal (plus the length of the text can vary).
Is there a way in which one can read only what matters and write a new CSV into a directory path?
Regards,
genesis
(edit - I'm looking for C sharp code)
The solution depends on the files you have to parse. You need to look for a reliable pattern that distinguishes data from comment.
In your example, there are some possibilities that might be the same in other files:
there are 4 lines of text. But you say this isn't consistent across files
The text lives may not contain the same number of commas as the data table. But that is unlikely to be reliable for all files.
there is a blank/whitespace only line between the text and the data.
the data appears to be in the form word-comma-word. If this is true it should be easy to identify non data lines (any line which doesn't contain exactly one comma, or has multiple words etc)
You may be able to use a combination of these heuristics to more reliably detect the data.
You could scan by line (looking for the \r\n) and ignore lines that don't have a comma count that matches you csv.
You should be able to read the file into a string pretty easily unless it is really massive.
e.g.
var csv = "some test\r\nsome more text\r\na,b,c\r\nd,e,f\r\n";
var lines = csv.Split('\r\n');
var csvLines = line.Where(l => l.Count(',') == 2);
// now csvLines contains only the lines you are after
List<string> info = new List<string>();
int counter = 0;
// Open the file to read from.
info = System.IO.File.ReadAllLines(path).ToList();
// Find the lines up until (& including) the empty one
foreach (string s in info)
{
counter++;
if(string.IsNullOrEmpty(s))
break; //exit from the loop
}
// Remove the lines including the blank one.
info.RemoveRange(0,counter);
Something like this should work, you should probably put some tests in to make sure counter is not > length and other tests to handle errors.
You could adapt this code so that it just finds the empty line number using linq or something, but I don't like the overhead of linq (Yeah ironic considering I'm using c#).
Regards,
Slipoch
I am reading a couple of csv files into var's as follows:
var myFullCsv = ReadFile(myFullCsvFilePath);
var masterCsv = ReadFile(csvFilePath);
Some of the line entries in each csv appear in both files and I am able to create a new var containing lines that exists in myFullCsv but not in masterCsv as follows:
var extraFilesCsv = myFullCsv.Except(masterCsv);
This is great because its very simple. However, I now wish to identify lines in myFullCsv where a specific string appears in the line. The string will correspond to one column of the csv data. I know that I can do this by reading each line of the var and splitting it up, then comparing the field I'm interested in to the string that I am searching for. However, this seems like a very long and inefficient approach as compared to my code above using the 'Except' command.
Is there some way that I can get the lines from myFullCsv with a very simple command or will I have to do it the long way? Please don't ask me to show the long way as that's what I am trying to avoid having to code although I can do it.
Sample csv data:
07801.jpg,67466,9452d316,\Folder1\FolderA\,
07802.jpg,78115,e50492d8,\Folder1\FolderB\,
07803.jpg,41486,37b6a100,\Folder1\FolderC\,
07804.jpg,93500,acdffc2b,\Folder2\FolderA\,
07805.jpg,67466,9452d316,\Folder2\FolderB\,
Sample desired output (I'm always looking for the entry in the 3rd column to match a string, in this case 9452d316):
07801.jpg,67466,9452d316,\Folder1\FolderA\,
07805.jpg,67466,9452d316,\Folder2\FolderB\,
Well you could use:
var results = myFullCsv.Where(line => line.Split(',')[2] == targetValue)
.ToList();
That's just doing the "splitting and checking" you mention in the question but it's pretty simple code. It could be more efficient if you only consider as far as the third comma, but I wouldn't worry about that until it's proved to be a problem.
Personally I'd probably parse each line to an object with meaningful properties rather than treating is as a string, but that's probably what you mean by "the long way".
Note that this doesn't perform any validation, or try to handle escaped commas, or lines with fewer columns etc. Depending on your data source, you may need to make it a lot more robust.
You could use a regex. It doesn't require every line to have at least 3 elements. It doesn't allocate a string array for each line. Therefore it may be faster, but you'd have to test it to prove it.
var regex = new Regex("^.+?,.+?," + Regex.Escape(targetValue) + ",");
var results = myFullCsv.Where(l => regex.IsMatch(l)).ToList();
I wrote a file routing utility (.NET) some time ago to examine a file's location and name pattern and move it to some other preconfigured place based on the match. Fairly simple, straightforward kinda stuff. I had included the possibility of minor transformations through a series of regular expression search-and-replace actions that could be assigned to the file "route", with the intent of adding header rows, replacing commas with pipes, that sort of thing.
So now I have a new text feed that consists of a file header, a batch header, and a multitude of detail records under the batches. The file header contains a count of all detail records in the file, and I have been asked to "split" the file in the assigned transformations, essentially producing a file for each batch record. This is fairly straightforward, as well, but the kicker is, there is an expectation to update the file header for each file to reflect the detail count.
I do not even know if this is possible with pure regular expressions. Can I count the number of matches of a group in a given text document and replace the count value in the original text, or am I going to have to write a custom transformer for this one file?
If I have to write another transformer, are there suggestions on how to make it generic enough to be reusable? I'm considering adding an XSLT transformer option, but my understanding of XSLT is not so great.
I've been asked for an example. Say I have a file like so:
FILE001DETAILCOUNT002
BATCH01
DETAIL001FOO
BATCH02
DETAIL001BAR
this file will be split and stored in two locations. The files will look like this:
FILE001DETAILCOUNT001
BATCH01
DETAIL001FOO
and
FILE001DETAILCOUNT001
BATCH01
DETAIL001BAR
so the sticker for me is the file header's DETAILCOUNT value.
Regular expressions by themselves can't count the number of matches they've made (or, better put, they don't expose that to the regex user), so you do need additional program code to keep track of this.
A regex can only capture text that exists somewhere in the source material, it can't generate new text. So unless you can find the number you need explicitly at some point in the source, you're out of luck. Sorry.
My program first breaks the text into batches.
I think you'll agree that resequencing the detail number is the trickiest part. You can do it with a MatchEvaluator delegate.
Regex.Replace (
text, // the text replace part of
#"(?<=^DETAIL)\d+", // the regex pattern to find.
m => (detailNum++).ToString ("000"), // replacement (evaluated for each match)
RegexOptions.Multiline);
See how the preceeding code increments detailNum at the begining of each batch.
var contents =
#"FILE001DETAILCOUNT002
BATCH01
DETAIL001FOO
BATCH02
DETAIL001BAR";
// foreach batch....
foreach (Match match in Regex.Matches (contents, #"BATCH\d+\s+(?:(?!BATCH\d+).*\s*)+"))
{
Console.WriteLine ("==============\r\nFile\r\n================");
int batchNum = 1;
int detailNum = 1;
StringBuilder temp = new StringBuilder ();
TextWriter file = new StringWriter (temp);
// Your file here instead of my stringBuilder/StringWriter
string batchText = match.Value;
int count = Regex.Matches (batchText, #"^DETAIL\d+", RegexOptions.Multiline).Count;
file.WriteLine ("FILE001DETAILCOUNT{0:000}", count);
string newText = Regex.Replace (batchText, #"(?<=^BATCH)\d+", batchNum.ToString ("000"), RegexOptions.Multiline);
newText = Regex.Replace (
newText,
#"(?<=^DETAIL)\d+",
m => (detailNum++).ToString ("000"), // replacement (evaluated for each match)
RegexOptions.Multiline);
file.Write (newText);
Console.WriteLine (temp.ToString ());
}
prints
==============
File
================
FILE001DETAILCOUNT001
BATCH001
DETAIL001FOO
==============
File
================
FILE001DETAILCOUNT001
BATCH001
DETAIL001BAR
Is there a graceful way in C# to delete multiple lines of text from the beginning of a multiline textbox? I am using Microsoft Visual C# 2008 Express Edition.
EDIT - Additional Details
The multiline textbox in my application is disabled (i.e. it is only editable by the application itself), and every line is terminated with a "\r\n".
This is an incomplete question. So assuming you are using either TextBox or RichTextBox you can use the Lines property found inTextBoxBase.
//get all the lines out as an arry
string[] lines = this.textBox.Lines;
You can then work with this array and set it back.
this.textBox.Lines= newLinesArray;
This might not be the most elegant way, but it will remove the first line.
EDIT: you don't need select, just using skip will be fine
//number of lines to remove from the beginning
int numOfLines = 30;
var lines = this.textBox1.Lines;
var newLines = lines.Skip(numOfLines);
this.textBox1.Lines = newLines.ToArray();
This solution works for me in WPF:
while (LogTextBox.LineCount > Constants.LogMaximumLines)
{
LogTextBox.Text = LogTextBox.Text.Remove(0, LogTextBox.GetLineLength(0));
}
You can replace LogTextBox with the name of your text box, and Constants.LogMaximumLines with the maximum number of lines you would like your text box to have.
Unfortunately, no, there is no "elegant" way to delete lines from the text of a multiline TextBox, regardless of whether you are using ASP.NET, WinForms, or WPF/Silverlight. In every case, you build a string that does not contain the lines you don't want and set the Text property.
WinForms will help you a little bit by pre-splitting the Text value into lines, using the Lines property, but it's not very helpful because it's a string array, and it's not exactly easy to delete an element of an array.
Generally, this algorithm will work for all possible versions of the TextBox class:
var lines = (from item in myTextBox.Text.Split('\n') select item.Trim());
lines = lines.Skip(numLinesToSkip);
myTextBox.Text = string.Join(Environment.Newline, lines.ToArray());
Note: I'm using Environment.Newline specifically for the case of Silverlight on a Unix platform. For all other cases, you're perfectly fine using "\r\n" in the string.Join call.
Also, I do not consider this an elegant solution, even though it's only 3 lines. What it does is the following:
splits the single string into an array of strings
iterates over that array and builds a second array that does not include the lines skipped
joins the array back into a single string.
I do not consider it elegant because it essentially builds two separate arrays, then builds a string from the second array. A more elegant solution would not do this.
One thing to keep in mind is that the Lines collection of the TextBox does not accurately reflect what the user sees as lines. The Lines collection basically works off of carriage returns, whereas the user could see lines wrapping from one line to the next without a carriage return. This may or may not be the behavior you want.
For example, the user would see the below as three lines, but the Lines collection will show 2 (since there are only 2 carriage returns):
This is line number
one.
This is line 2.
Also, if the form, and the text control are resizable the visible lines in the text will change as the control grows or shrinks.
I wrote a blog post several years ago on how to determine the number of lines in the textbox as the user sees them and get the index of a given line (like to get the line at index: http://ryanfarley.com/blog/archive/2004/04/07/511.aspx, perhaps this post will help.
if (txtLog.Lines.Length > maxNumberLines)
{
txtLog.Lines = txtLog.Lines.Skip(txtLog.Lines.Length - maxNumberLines).ToArray();
}
I have a program that generates a plain text file. The structure (layout) is always the same. Example:
Text File:
LinkLabel
"Hello, this text will appear in a LinkLabel once it has been
added to the form. This text may not always cover more than one line. But will always be surrounded by quotation marks."
240, 780
So, to explain what is going on in that file:
Control
Text
Location
And when a button on the Form is clicked, and the user opens one of these files from the OpenFileDialog dialog, I need to be able to Read each line. Starting from the top, I want to check to see what control it is, then starting on the second line I need to be able to get all text inside the quotation marks (regardless of whether is is one line of text or more), and on the next line (after the closing quotation mark), I need to extract the location (240, 780)... I have thought of a few ways of going about this but when I go to write it down and put it to practice, it doesn't make much sense and end up figuring out ways that it won't work.
Has anybody ever done this before? Would anybody be able to provide any help, suggestions or advice on how I'd go about doing this?
I have looked up CSV files but that seems too complicated for something that seems so simple.
Thanks
jase
You could use a regular expression to get the lines from the text:
MatchCollection lines = Regex.Matches(File.ReadAllText(fileName), #"(.+?)\r\n""([^""]+)""\r\n(\d+), (\d+)\r\n");
foreach (Match match in lines) {
string control = match.Groups[1].Value;
string text = match.Groups[2].Value;
int x = Int32.Parse(match.Groups[3].Value);
int y = Int32.Parse(match.Groups[4].Value);
Console.WriteLine("{0}, \"{1}\", {2}, {3}", control, text, x, y);
}
I'll try and write down the algorithm, the way I solve these problems (in comments):
// while not at end of file
// read control
// read line of text
// while last char in line is not "
// read line of text
// read location
Try and write code that does what each comment says and you should be able to figure it out.
HTH.
You are trying to implement a parser and the best strategy for that is to divide the problem into smaller pieces. And you need a TextReader class that enables you to read lines.
You should separate your ReadControl method into three methods: ReadControlType, ReadText, ReadLocation. Each method is responsible for reading only the item it should read and leave the TextReader in a position where the next method can pick up. Something like this.
public Control ReadControl(TextReader reader)
{
string controlType = ReadControlType(reader);
string text = ReadText(reader);
Point location = ReadLocation(reader);
... return the control ...
}
Of course, ReadText is the most interesting one, since it spans multiple lines. In fact it's a loop that calls TextReader.ReadLine until the line ends with a quotation mark:
private string ReadText(TextReader reader)
{
string text;
string line = reader.ReadLine();
text = line.Substring(1); // Strip first quotation mark.
while (!text.EndsWith("\"")) {
line = reader.ReadLine();
text += line;
}
return text.Substring(0, text.Length - 1); // Strip last quotation mark.
}
This kind of stuff gets irritating, it's conceptually simple, but you can end up with gnarly code. You've got a comparatively simple case:one record per file, it gets much harder if you have lots of records, and you want to deal nicely with badly formed records (consider writing a parser for a language such as C#.
For large scale problems one might use a grammar driven parser such as this: link text
Much of your complexity comes from the lack of regularity in the file. The first field is terminated by nwline, the second by delimited by quotes, the third terminated by comma ...
My first recomendation would be to adjust the format of the file so that it's really easy to parse. You write the file so you're in control. For example, just don't have new lines in the text, and each item is on its own line. Then you can just read four lines, job done.