How to read text file between ""

How to read text file between "" - c#

I need an "idea" on how to read text file data between quotes. For example:
line 1: "read a title"
line 2: "read a descr"
line 1: "read a title"
line 2: "read a descr"
I want to do a foreach type of thing, and I want to read all Line 1's, and Line 2's as a pair, but between the ".
In my program I am going to output (foreach of course):
readTerminatedNull(file1);
readTerminatedNull(file2);
I would read line by line, but some of the text could be:
line 1: "read a super long
title that goes off"
line 2: "read a descr"
So that's why I want to read between the ".
Sorry if that is too complicated, and it's a little hard to explain.
Edit:
Thanks for all the feed back guys, but I'm not sure you are getting what I am trying to do :p not your faults, I wrote this kinda wierd.
I will have a text file full of refrences, and text. like so.
text inside:
Refren: "myrefrence_1"
String: "This is a string of a refrence"
Refren: "myrefrence_2"
String: "hello world"
Refren: "myrefrence_3"
String: "I like cookies."
I want it to to read myrefrence_1 in the quotes of the first line, and then read the string in the next line between the ".
I will then stuff into my program that matches the refrence with the string.
But sometimes the text will be more than one line.
Refren: "this is text that goes and then
return keys on some parts."
and I still want it to read through the ".

(not tested, but you'll get the idea)
// Read all text from file
string sData = File.ReadAllText(#"c:/file.txt");
// Match strings between " "
Match match = Regex.Match(sData , "\"(\w|\d|\s|\\\")*\"",
RegexOptions.IgnoreCase);
// Read results and strip " out of them
foreach (var sResult in match) {
sResult = sResult.Remove(0,1).Remove(sResult.length-2, 1);
// Do whatever with sResult
}

You could learn some new tricks by looking into state machines. Basically: Read each character at a time and figure out what state you are in now. First, code this as a big while loop with a big switch statement inside. Then, go and read up on the state pattern for how to do this in an object oriented way. Then, ditch that and use delegates, because c# makes this stuff so easy to do.
Then, scrap it all, write some crappy Regular Expression with a multiline flag and slurp it the Perl way. Meditate on why this is the same as your original state machine solution.
Then, get really stuck in and learn about parser generators (lexx/yacc or some .NET variant) and write a simple BNF grammar for your problem. Take special note of how the trivial grammars used in the tutorials are all way more complicated than the one you need to write. Why is that so? Check out what Noam Chomsky had to say about that.
Eventually, you'll burn out. We all do. But you'll have so much fun digging into what makes programming the coolest activity on the planet. Burn-out is just the realization that that's a pipe dream ;)
When you're done, go outside. Meet people. Talk. Smile a lot. Be friendly. You're now a zen infused developer with a wicked grin. Yay for you! You rock!

What you're describing sounds like a single-column CSV file. The easiest way to access that is probably to use the Microsoft.VisualBasic.FileIO.TextFieldParser class, something like:
using (var csvParser = new TextFieldParser(new StringReader(content))
{
Delimiters = new[] {","},
HasFieldsEnclosedInQuotes = true
})
{
while (!csvParser.EndOfData)
{
var fields = csvParser.ReadFields();
Console.Print(fields[0]); //do something with the first (in your case only) field found.
}
}
Probably the easiest way to determine whether this approach makes sense, is to think about what happens if the string you're reading actually contains a double quote. Would it end up as "He said ""this is quoted"", but I wasn't listening" (doubling up the quotes), or is this situation impossible?
If the quotes would be doubled up in this way, then a standard CSV reader like this built-in framework one is probably your best bet.

To read all of the lines of the file you can use:
File.ReadAllLines(pathToFile);
to strip the text from "" you can use the substring method of string: http://msdn.microsoft.com/en-us/library/aka44szs.aspx
you can do it like that:
string strippedString = original.Substring(1, original.length -2);

Try this one
var text = File.ReadAllLines(pathToFile);
var lines = text.Split(':')
.Where((s,i) => i % 2 != 0)
.Select(s => s.trim('"'));

First of all you need to read in the file using:
File.ReadAllLines(filePath);
Then you could split all the lines using the string.Split function.
Splitting on the closing bracket would be your best bet.

As i have understood from you question is you want to read and write text file with some specific settings. is it ?
I would like to refer to to INI files which are the text files it self and provide the settings configurations as you wish to achieve. here are some links these could help you.
http://www.codeproject.com/Articles/1966/An-INI-file-handling-class-using-C
http://jachman.wordpress.com/2006/09/11/how-to-access-ini-files-in-c-net/

Related

Removing text above real content of CSV file

I have a CSV whose author, annoyingly enough, has decided to 'introduce' the file before the contents themselves. So in all, I have a CSV that looks like:
This file was created by XXXXYY and represents the crossover between YY and QQQ.
Additional information can be found through the website GG, blah blah blah...
Jacob, Hybrid
Dan, Pure
Lianne, Hybrid
Jack, Hatchback
So the problem here is that I want to get rid of the first few lines before the 'real content' of the CSV file begins. I'm looking for robustness here, so using Streamreader and removing all content before the 4th line for example, is not ideal (plus the length of the text can vary).
Is there a way in which one can read only what matters and write a new CSV into a directory path?
Regards,
genesis
(edit - I'm looking for C sharp code)

The solution depends on the files you have to parse. You need to look for a reliable pattern that distinguishes data from comment.
In your example, there are some possibilities that might be the same in other files:
there are 4 lines of text. But you say this isn't consistent across files
The text lives may not contain the same number of commas as the data table. But that is unlikely to be reliable for all files.
there is a blank/whitespace only line between the text and the data.
the data appears to be in the form word-comma-word. If this is true it should be easy to identify non data lines (any line which doesn't contain exactly one comma, or has multiple words etc)
You may be able to use a combination of these heuristics to more reliably detect the data.

You could scan by line (looking for the \r\n) and ignore lines that don't have a comma count that matches you csv.
You should be able to read the file into a string pretty easily unless it is really massive.
e.g.
var csv = "some test\r\nsome more text\r\na,b,c\r\nd,e,f\r\n";
var lines = csv.Split('\r\n');
var csvLines = line.Where(l => l.Count(',') == 2);
// now csvLines contains only the lines you are after

List<string> info = new List<string>();
int counter = 0;
// Open the file to read from.
info = System.IO.File.ReadAllLines(path).ToList();
// Find the lines up until (& including) the empty one
foreach (string s in info)
{
counter++;
if(string.IsNullOrEmpty(s))
break; //exit from the loop
}
// Remove the lines including the blank one.
info.RemoveRange(0,counter);
Something like this should work, you should probably put some tests in to make sure counter is not > length and other tests to handle errors.
You could adapt this code so that it just finds the empty line number using linq or something, but I don't like the overhead of linq (Yeah ironic considering I'm using c#).
Regards,
Slipoch

How to produce a soft return using C#.net

I know this is kind of easy question but i cant seem to find it anywhere. Is there someone out there who knows how to create a soft return inside a set of text using C#.net?
I need to print soft return to a text file/xml file. this text file will be generated using c#.net. you could verify if the answer is correct if you use NOTEPAD++ then enable the option to “View>Show Symbol > Show End of Line” then you will see a symbol like this:
Thanks in advance :)

Not sure what you mean by a soft return. A quick Google search says it's a non-stored line break typically due to word wrapping in which case you wouldn't actually put this in a string, it would only be relevant when the string was rendered for display.
To put a carriage return and/or line feed in the string you would use:
string s = "line one\r\nline two";
And for further reference, here are the other escape codes that you can use.
Link (MSDN Blogs)
In response to your edit
The LF that you see can be represented with \n in a string. Obviously you have a specific line ending sequence that you need to represent. If you were to use Environment.NewLine that is going to give you different results on different platforms.

var message = $"Tom{Convert.ToChar(10)}Harry";
Results in:
Tom
Harry
With just a line feed between.

Lke already mentioned you can use Enviroment.NewLine but I am not sure if that i what you want or if you are actually trying to append a ASCII 141 to your string as mentioned in the comments.
You can add ASCII chr sequences to your string like this.
var myString = new StringBuilder("Foo");
myString.Append((char)141);

C# Regex Replace ignore specific string

Since this is my first question here on stackoverflow I hope my question is correctly asked.
Basicly I have a normal .txt file which contains any text like:
car accident
people died
cat without owner
<!-- Text added at 6/29/2011 9:20:38 AM -->
Some addintional Text
other Text added
add Text
I have a write/append function which allows the user to append some text and set a little timestamp.
So my problem is: With another function, you can search and replace text in the textfile, but as you can guess if someone wants to replace the word "Text" it will be replaced in the xml-stylish comment(timestamp) as well.
My result until now is
content = Regex.Replace(content,"[^<+.*"+input+".*>+]*", replace);
//content = content of the .txt file, input = search term, replace = string to replace
But this fails miserably, as some regex pro's will see without executing it.
Now I hope that some regex pro could help me out here and provide me a search pattern which replaces the normal text but ignores the timestamp.
I'm not realy aware of the logic from regex until now, nevertheless I understand the single expressions so this would be a hook for me to understand Regex more properly.
Thanks in advice.

If I understand your question correctly, you want to replace every instance of "Text" except for the one(s) inside the comment.
The easist way is to use a negative lookbehind (fantastic description here) as below:
content = Regex.Replace(content, #"(?<!<!--.*?)" + input, replace);
What you're doing is attempting to replace a repetition of any length of a character that is NOT <+.*> or a character contained in input with the value in replace.
If you're going to be working a lot with Regex, I would HIGHLY recommend giving the website above a good read. It's hands down the best intro to Regex that I've found, the time spent now will save you lots of headaches later!
Edit
Updated to add flexibility thanks to #stema

Adding a Line to the Middle of a File with .NET

Hello I am working on something, and I need to be able to be able to add text into a .txt file. Although I have this completed I have a small problem. I need to write the string in the middle of the file more or less. Example:
Hello my name is Brandon,
I hope someone can help, //I want the string under this line.
Thank you.
Hopefully someone can help with a solution.
Edit Alright thanks guys, I'll try to figure it out, probably going to just rewrite the whole file. Ok well the program I am making is related to the hosts file, and not everyone has the same hosts file, so I was wondering if there is a way to read their hosts file, and copy all of it, while adding the string to it?

With regular files there's no way around it - you must read the text that follows the line you wish to append after, overwrite the file, and then append the original trailing text.
Think of files on disk as arrays - if you want to insert some items into the middle of an array, you need to shift all of the following items down to make room. The difference is that .NET offers convenience methods for arrays and Lists that make this easy to do. The file I/O APIs offer no such convenience methods, as far as I'm aware.
When you know in advance you need to insert in the middle of a file, it is often easier to simply write a new file with the altered content, and then perform a rename. If the file is small enough to read into memory, you can do this quite easily with some LINQ:
var allLines = File.ReadAllLines( filename ).ToList();
allLines.Insert( insertPos, "This is a new line..." );
File.WriteAllLines( filename, allLines.ToArray() );

This is the best method to insert a text in middle of the textfile.
string[] full_file = File.ReadAllLines("test.txt");
List<string> l = new List<string>();
l.AddRange(full_file);
l.Insert(20, "Inserted String");
File.WriteAllLines("test.txt", l.ToArray());

one of the trick is file transaction. first you read the file up to the line you want to add text but while reading keep saving the read lines in a separate file for example tmp.txt and then add your desired text to the tmp.txt (at the end of the file) after that continue the reading from the source file till the end. then replace the tmp.txt with the source file. at the end you got file with added text in the middle :)

Check out File.ReadAllLines(). Probably the easiest way.
string[] full_file = File.ReadAllLines("test.txt");
List<string> l = new List<string>();
l.AddRange(full_file);
l.Insert(20, "Inserted String");
File.WriteAllLines("test.txt", l.ToArray());

If you know the line index use readLine until you reach that line and write under it.
If you know exactly he text of that line do the same but compare the text returned from readLine with the text that you are searching for and then write under that line.
Or you can search for the index of a specified string and writ after it using th escape sequence \n.

As others mentioned, there is no way around rewriting the file after the point of the newly inserted text if you must stick with a simple text file. Depending on your requirements, though, it might be possible to speed up the finding of location to start writing. If you knew that you needed to add data after line N, then you could maintain a separate "index" of the offsets of line numbers. That would allow you to seek directly to the necessary location to start reading/writing.

Reading line by line

I have a program that generates a plain text file. The structure (layout) is always the same. Example:
Text File:
LinkLabel
"Hello, this text will appear in a LinkLabel once it has been
added to the form. This text may not always cover more than one line. But will always be surrounded by quotation marks."
240, 780
So, to explain what is going on in that file:
Control
Text
Location
And when a button on the Form is clicked, and the user opens one of these files from the OpenFileDialog dialog, I need to be able to Read each line. Starting from the top, I want to check to see what control it is, then starting on the second line I need to be able to get all text inside the quotation marks (regardless of whether is is one line of text or more), and on the next line (after the closing quotation mark), I need to extract the location (240, 780)... I have thought of a few ways of going about this but when I go to write it down and put it to practice, it doesn't make much sense and end up figuring out ways that it won't work.
Has anybody ever done this before? Would anybody be able to provide any help, suggestions or advice on how I'd go about doing this?
I have looked up CSV files but that seems too complicated for something that seems so simple.
Thanks
jase

You could use a regular expression to get the lines from the text:
MatchCollection lines = Regex.Matches(File.ReadAllText(fileName), #"(.+?)\r\n""([^""]+)""\r\n(\d+), (\d+)\r\n");
foreach (Match match in lines) {
string control = match.Groups[1].Value;
string text = match.Groups[2].Value;
int x = Int32.Parse(match.Groups[3].Value);
int y = Int32.Parse(match.Groups[4].Value);
Console.WriteLine("{0}, \"{1}\", {2}, {3}", control, text, x, y);
}

I'll try and write down the algorithm, the way I solve these problems (in comments):
// while not at end of file
// read control
// read line of text
// while last char in line is not "
// read line of text
// read location
Try and write code that does what each comment says and you should be able to figure it out.
HTH.

You are trying to implement a parser and the best strategy for that is to divide the problem into smaller pieces. And you need a TextReader class that enables you to read lines.
You should separate your ReadControl method into three methods: ReadControlType, ReadText, ReadLocation. Each method is responsible for reading only the item it should read and leave the TextReader in a position where the next method can pick up. Something like this.
public Control ReadControl(TextReader reader)
{
string controlType = ReadControlType(reader);
string text = ReadText(reader);
Point location = ReadLocation(reader);
... return the control ...
}
Of course, ReadText is the most interesting one, since it spans multiple lines. In fact it's a loop that calls TextReader.ReadLine until the line ends with a quotation mark:
private string ReadText(TextReader reader)
{
string text;
string line = reader.ReadLine();
text = line.Substring(1); // Strip first quotation mark.
while (!text.EndsWith("\"")) {
line = reader.ReadLine();
text += line;
}
return text.Substring(0, text.Length - 1); // Strip last quotation mark.
}

This kind of stuff gets irritating, it's conceptually simple, but you can end up with gnarly code. You've got a comparatively simple case:one record per file, it gets much harder if you have lots of records, and you want to deal nicely with badly formed records (consider writing a parser for a language such as C#.
For large scale problems one might use a grammar driven parser such as this: link text
Much of your complexity comes from the lack of regularity in the file. The first field is terminated by nwline, the second by delimited by quotes, the third terminated by comma ...
My first recomendation would be to adjust the format of the file so that it's really easy to parse. You write the file so you're in control. For example, just don't have new lines in the text, and each item is on its own line. Then you can just read four lines, job done.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to read text file between "" - c#

To read all of the lines of the file you can use: File.ReadAllLines(pathToFile); to strip the text from "" you can use the substring method of string: http://msdn.microsoft.com/en-us/library/aka44szs.aspx you can do it like that: string strippedString = original.Substring(1, original.length -2);

Try this one var text = File.ReadAllLines(pathToFile); var lines = text.Split(':') .Where((s,i) => i % 2 != 0) .Select(s => s.trim('"'));

First of all you need to read in the file using: File.ReadAllLines(filePath); Then you could split all the lines using the string.Split function. Splitting on the closing bracket would be your best bet.

Related

Removing text above real content of CSV file

How to produce a soft return using C#.net

C# Regex Replace ignore specific string

Adding a Line to the Middle of a File with .NET

Reading line by line

Categories

Resources