I'm sorry for asking noobish questions, but I am one :).
I can write a .txt file using Write or WriteLine, which reads the whole TextBox. The problem is when I read it. I can not read it using ReadLine. It gives the whole text on one line. It must be a problem with the reading, because in NotePad, I get the file correctly.
What is the reason of this quite strange behavior, and how can I change it?
method containing StreamReader
StreamReader streamreader = new StreamReader(openfiledialog.FileName);
textbox.Text = "";
while (!streamreader.EndOfStream)
{
string read_line = streamreader.ReadLine();
textbox.Text += read_line + "\n";
}
streamreader.Close();
method containing StreamWriter
StreamWriter streamwriter = new StreamWriter(savefiledialog.FileName);
streamwriter.Write(textbox.Text);
streamwriter.Close();
Thanks in advance.
UPDATED: ReadToEnd worked
Without seeing any code the best guess I have is you're using different line separators between the textbox and the text file.
I'd guess you either need to format the data to make sure the data gets the right separator for the source, or change the newline separator for the textbox.
Couple of possibilities here.
The text in the File is not UTF-8, so it needs to be converted to UTF-8 and then assigned to the text box.
The Textbox has a character limit that needs to be increased
Width of Text Box. Wrapping of text could make a difference.
Usually you would use ReadToEnd if you want the whole file worth of text in one run and ReadLine if you want 1 line. Difference here is in the encoding of the file. 1 Line in a text editor could be different from another. Some Text Editor convert the text to other encodings and some do not, before displaying. Would recommed Notepad++, because it will tell you at the bottom what encoding the file is in and let you change the encoding and save the file for testing.
.net is based on UTF-8 encoding for strings, so a difference in ecoding of text could make a big difference.
Best of Luck
Related
I need to write a string from c# into an rtf file, but having weird problems.
To write the text I simply use
string fileName = System.IO.Path.GetTempPath() + Guid.NewGuid().ToString() + ".rtf";
System.IO.File.WriteAllText(fileName, body);
body is a string variable, that is filled from a varchar column from a database.
The problem is with the character é which is wrong displayed by wordpad when opening the file like this
If I open the file in notepad, I see this
(één schade gevonden -> ander dossier)
So for some dark reason wordpad decided to show the character é all messed up like this.
I tried writing the file as UTF8 or other unicode encodings, but then wordpad refused to see this file as rtf and just shows the plain text with all the tags
I also looked at this page where it tells me to write a tag like \uXXX? where XXX should be a number defining a Unicode UTF-16 code unit number.
But I cannot find what number to use, or any good example on how to do this.
Actually I am not even sure if its unicode related, the character é is not even a character that needs unicode in my mind, could be wrong off course.
Anyway, does anyone knows how to solve this problem ?
I just need a way to make wordpad not mess up the character é on display and on print.
The problem was that I did not encoded the RTF file properly.
Using this link provided by Filburt I managed to encode the RTF file correct like this.
var iso = Encoding.GetEncoding("ISO-8859-1");
string fileName = System.IO.Path.GetTempPath() + Guid.NewGuid().ToString() + ".rtf";
System.IO.File.WriteAllText(fileName, body, iso);
I have a text file that, when I open it in Notepad, shows the form feed character (byte 12). I want to show this character in my richtextbox but no matter which encoding I use when I read the text file it won't show. When I enter the character myself it shows. When I do myRTB.Text = "♀" it shows, but when I do
myRTB.Text = File.ReadAllText(myFileName.txt);
it doesn't show. I've also tried using the readers in the Encoding class to no avail.
How can I show the form feed character in my rtb?
Firstly, a line feed has a value of 13. If you have characters with the value 12 in there then they are not line feeds.
As for your issue, ReadAllLines reads the lines of a file into a String array, thus stripping out all the line breaks. You might do as Damith suggests and call ReadAllText, which reads the file contents as a single String, and assign the result to the Text property or else call ReadAllLines and assign the result to the Lines property. Better to call LoadFile on the RichTextBox itself though.
try with ReadAllText
myRTB.Text = File.ReadAllText(myFileName.txt, Encoding.Unicode);
Thanks for the help #jmcilhinney and #Damith. I ended up cheating the system by doing a dirty. I saw that myRTB was replacing the form feed char with \page in the RTF, but when I typed the form feed char myself it put \u9792. Therefore I went with the hack:
myRTB.Rtf = myRTB.Rtf.Replace("\\page", "\\u9792");
If you have something less hackish that I can get working please let me know.
Im reding some csv files. The files are really easy, because there is always just ";" as seperator and there are no ", ', or something like that.
So its possible to read the file, line by line and seperate the strings. Thats working fine. Now people told me: maybe you should check the encoding of the file, it should be always ANSI, if its not maybe your output will be different and corrupted. So non-ansi files should be marked somehow.
I just said, okey! But if I think about it: do I really have to check the file for encoding in this case? I just changed the encoding of the file to something else and Im still able to read the file without any problems. My code is simple:
using (TextReader reader = new StreamReader(myFileStream))
{
while ((line = read.ReadLine()) != null)
{
//read the line, spererate by ; and other stuff...
}
}
So again: do I really need to check the files for ANSI encoding? Could somebody give me an example when could I get in trouble or when do I get a corrupted output after reading a non-ansi file? Thank you!
That particular constructor of StreamReader will assume that the data is UTF-8; that is compatible with ASCII, but can fail if data uses bytes in the 128-255 range for single-byte codepages (you'll get the wrong characters in strings, etc), or could fail completely (i.e. throw an exception) if the data is actually something very different like UTF-7, UTF-32, etc.
In some cases (the minority) you might be able to use the byte-order-mark to detect the encoding, but this is a circular problem: in most cases, if you don't already know the encoding, you can't really detect the encoding (robustly). So a better approach would be: to know the encoding in the first place. Then you can pass in the correct encoding to use via one of the other constructors.
Here's an example of it failing:
// we'll write UTF-32, big-endian, without a byte-order-mark
File.WriteAllText("my.txt", "Hello world", new UTF32Encoding(true, false));
using (var reader = new StreamReader("my.txt"))
{
string s = reader.ReadLine();
}
You can run under UTF-8 encoding , cause UTF-8 has a wonderful property support ASCII characters with 1 byte (as it would expected), but when it needed, shrink to support Unicode ones.
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
I have a multiline textbox, and I use databinding to bind its Text property to a string. This appears to work, but upon loading the string from XML, the returns (new lines) get lost. When I inspect the XML, the returns are there, but once the string is loaded from XML they are lost. Does anybody know why this is happening and how to do this right.
(I am not bound to use either a multiline textbox, or a string property for binding, I just a maintainable, (and preferably elegant) solution. )
Edit: Basically, I use the XmlSerializer class:
loading:
using (StreamReader streamReader = new StreamReader(fileName))
{
XmlSerializer xmlSerializer = new XmlSerializer(typeof(T));
return (T)xmlSerializer.Deserialize(streamReader);
}
saving:
using (StreamWriter streamWriter = new StreamWriter(fileName))
{
Type t = typeof(T);
XmlSerializer xmlSerializer = new XmlSerializer(t);
xmlSerializer.Serialize(streamWriter, data);
}
When looking inside the XML, it saves multiline textbox data like this:
<OverriddenComponent>
<overrideInformation>
<Comments>first rule
second rule
third rule</Comments>
</overrideInformation>
</OverriddenComponent>
But those breaks no longer get displayed after the data is loaded.
What are the actual codes for new lines ? 0x0A or 0x0D or both? I stumbled on a similar problem before. The characters from a string "got lost" because textbox "converted" them on its own (or didn't understand them). Basicly, your xml file may be encoded one way, and your textbox uses other encoding, or it is lost during reading from, or writing to, the file itself (your string may be "messed up" also during reading from/writing to file). So there are 3 places your string may be tampered with, without your knowledge:
During writing to the file (take notice what encoding you use)
During reading from the file
When displaying your string in textbox.
My advice is that you should assign the text that you read from the file to another string (not bound) before you assign it to the bound one and use a debugger to check how it changes. This http://home2.paulschou.net/tools/xlate/ is a useful tool to check what exactly is in your strings.
When I encountered this problem in my application, I ended up using binary/hex values of characters to write/read, and then converting them back when I needed to display. But I had to use a lot of strange ASCII codes. Maybe there's an easier solution for you out there.
EDIT: or it may be just some xml-related thing. Maybe you should use some other character to replace line break when writing it to xml?
I'm parsing a pdf file...I converted data into byte array but it doesnt show full file..
i dnt want to use any lib or softy..
FileStream fs = new FileStream(fname, FileMode.Open);
BinaryReader br = new BinaryReader(fs);
int pos = 0;
int length = (int)br.BaseStream.Length;
byte [] file = br.ReadBytes(length);
String text = System.Text.ASCIIEncoding.ASCII.GetString(file);
displayFile.Text = text;
It would really help if you'd give more detail - including some code, preferably a short but complete program that demonstrates the problem.
My guess is that when you're doing the conversion you end up with some text containing a null character ('\0') - which Windows Forms controls treat as a string terminator.
For example, if you use:
label.Text = "hello\0there";
you'll only see "hello".
Now you may have this problem due to converting from a byte array to text using the wrong encoding - but we can't really help much more with the little information you've provided.
Based on your code example, I would say that the problem is that you are assuming that the PDF file contains plain ascii text, which is not the case. PDF is a complicated format, and there are libraries that allow you to parse them.
Doing a quick google search: iTextSharp can read the pdf format.
You cannot convert a PDF to text by just interpreting it as ASCII. You may be lucky enough that some of the text actually is ASCII, but you can also expect some of the non-text contents to be indistinguishable from ASCII.
Instead use one of the solutions for parsing PDF. Here is one way using PDFBox and IKVM: Naspinski.net: Parsing/Reading a PDF file with C# and Asp.Net to text
Even pure Ascii set contains lots of non-printable, non-display-able and control characters.
Like Jon said, a \0 (NUL) at the beginning of a string terminates everything in .NET. I had painful experience with this behavior years back. Control characters like 'bell' and 'backspace' etc etc will give you funny output. But do not expect to hear a bell ringing :P.