Not able to read special character "£" using Streamreader in c# - c#

I am trying to read a character (£) from a text file, using the following code.
public static List<string> ReadAllLines(string path, bool discardEmptyLines, bool doTrim)
{
var retVal = new List<string>();
if (string.IsNullOrEmpty(path) || !File.Exists(path)) {
Comm.File.Log.LogError("ReadAllLines", string.Format("Could not load file: {0}", path));
return retVal;
}
//StreamReader sr = null;
StreamReader sr = new StreamReader(path, Encoding.Default));
try {
sr = File.OpenText(path);
while (sr.Peek() >= 0) {
var line = sr.ReadLine();
if (discardEmptyLines && (line == null || string.IsNullOrEmpty(line.Trim()))) {
continue;
}
if (line != null) {
retVal.Add(doTrim ? line.Trim() : line);
}
}
}
catch (Exception ex) {
Comm.File.Log.LogGeneralException("ReadAllLines", ex);
}
finally {
if (sr != null) {
sr.Close();
}
}
return retVal;
}
But my code is not correctly reading £, It is reading the character as � please guide me what needs to be done to read the special character.
Thanks in advance.

The file you are reading is not encoded the same as Encoding.Default. It is likely UTF-8. Try using UTF-8 for this particular file. For more generic usage, you should see Determining the Encoding of a text file.

Try to replace Encoding.Default with Encoding.GetEncoding(437)

Look like a encoding problem. Try creating your StreamReader with a UTF-8 (or Unicode) encoding instead of default.
StreamReader sr = new StreamReader(path, Encoding.UTF8));

Encoding information can be provided to stream reader in 2 ways.
1)Save your file with save as option and select the appropriate encoding option from dropdown in windows.
see screenshot
2)If your files are dynamic in nature use Encoding.GetEncoding() with StreamReader

Related

I can't read all Rtf file content

I have a Rtf file and I need read file to parser.
In the file has some special characters, because has images in the file.
When I read all text from file, the content after special characters can't be read.
I tried read file with ReadAllText with Encoding.UTF8 and Encoding.ASCII
public class ReadFile
{
public static string GetFileContent(string path)
{
if (!File.Exists(path))
{
throw new FileNotFoundException();
}
else
{
// I also tried
// return File.ReadAllText(path, Encoding.ASCII);
string text = string.Empty;
var fileStream = new FileStream(path, FileMode.Open, FileAccess.Read);
using (var streamReader = new StreamReader(fileStream, Encoding.UTF8))
{
string line;
while ((line = streamReader.ReadLine()) != null)
{
text += line;
}
}
return text;
}
}
}
Actually my result is all text until start special character.
{\rtf1\ansi\ansicpg1252\deff0\deftab720{\fonttbl{\f0\fnil Times New Roman;}{\f1\fnil Arial;}}{\colortbl;\red000\green000\blue000;\red255\green000\blue000;\red128\green128\blue128;}\paperw11905\paperh16837\margl360\margr360\margt360\margb360
\sectd \sectdefaultcl \marglsxn360\margrsxn360\margtsxn360\margbsxn360{ {*\do\dobxpage\dobypage\dodhgt8192\dptxbx{\dptxbxtext\pard\plain {\pict\wmetafile8\picw19499\pich1746\picwgoal1305695\pichgoal116957
\bin342908
Rtf File is here
I made.
To read file I used File.ReadAllBytes(path) and in resulted variable I replace byte 0 by (nul) and byte 27 by esc.
byte[] fileBytes = File.ReadAllBytes(path);
StringBuilder sb = new StringBuilder();
foreach (var b in fileBytes)
{
// handle printable characters
if ((b >= 32) || (b == 10) || (b == 13) || (b == 9)) // lf, cr, tab
sb.Append((char)b);
else
{
// handle control characters
switch (b)
{
case 0: sb.Append("(nul)"); break;
case 27: sb.Append("(esc)"); break;
// etc.
}
}
}
return sb.ToString();
I found the help in

Reading resource txt line by line

Had a txt file on my desktop with code:
string source = #"C:\Users\Myname\Desktop\file.txt"
string searchfor = *criteria person enters*
foreach (string content in File.ReadLines(source))
{
if (content.StartsWith(searchfor)
{
*do stuff*
}
}
I recently just learned I can add the txt as a resource file (as it will never be changed). However, I cannot get the program to read that file.txt as a resource line by line like above. I have tried
Assembly.GetExecutingAssembly().GetManifestResourceStream("WindowsFormsApplication.file.txt")
with a stream reader but it says invalid types.
Basic concept: user enters data, turned into a string, compared to the starting line of the file.txt as it reads down the list.
Any help?
edit
Jon, I tried as a test to see if it is even reading the file:
var assm = Assembly.GetExecutingAssembly();
using (var stream = assm.GetManifestResourceStream("WindowsFormsApplication.file.txt")) ;
{
using (var reader = new StreamReader(stream))
{
string line;
while ((line = reader.ReadLine()) != null)
{
label1.Text = line;
}
}
}
It says "The name stream does not exist in the current context" and "Possible Mistaken Empty Statement" for the stream = assm.Get line
You can use a TextReader to read a line at a time - and StreamReader is a TextReader which reads from a stream. So:
var assm = Assembly.GetExecutingAssembly();
using (var stream = assm.GetManifestResourceStream("WindowsFormsApplication.file.txt"))
{
using (var reader = new StreamReader(stream))
{
string line;
while ((line = reader.ReadLine()) != null)
{
...
}
}
}
You could write an extension method on TextReader to read all the lines, but the above is simpler if you only need this once.
Found the issue:
The file, while loaded as a resource, despite all the tutorials saying it is NameSpace.File, the truth is the system puts the location as NameSpace.Resources.File, so I had to update that as well.
Then I used the following code:
string searchfor = textBox1.Text
Assembly assm = Assembly.GetExecutingAssembly();
using (Stream datastream = assm.GetManifestResourceStream("WindowsFormsApplication2.Resources.file1.txt"))
using (StreamReader reader = new StreamReader(datastream))
{
string lines;
while ((lines = reader.ReadLine()) != null)
{
if (lines.StartsWith(searchfor))
{
label1.Text = "Found";
break;
}
else
{
label1.Text = "Not found";
}
}
}

Split large XML file after string found

What I have:
A large XML file # nearly 1 million lines worth of content. Example of content:
<etc35yh3 etc="numbers" etc234="a" etc345="date"><something><some more something></some more something></something></etc123>
<etc123 etc="numbers" etc234="a" etc345="date"><something><some more something></some more something></something></etc123>
<etc15y etc="numbers" etc234="a" etc345="date"><something><some more something></some more something></something></etc123>
^ repeat that by 900k or so lines (content changing of course)
What I need:
Search the XML file for "<etc123". Once found move (write) that line along with all lines below it to a separate XML file.
Would it be advisable to use a method such as File.ReadAllLines for the search portion? What would you all recommend for the writing portion. Line by line is not an option as far as I can tell as it would take much too long.
To quite literaly discard the content above your search string, I would not use File.ReadAllLines, as it would load the entire file into memory. Try File.Open and wrap it in a StreamReader. Loop on StreamReader.ReadLine, then start writing to a new StreamWriter, or do a byte copy on the underlying filestream.
An example of how to do so with StreamWriter/StreamReader alone is listed below.
//load the input file
//open with read and sharing
using (FileStream fsInput = new FileStream("input.txt",
FileMode.Open, FileAccess.Read, FileShare.Read))
{
//use streamreader to search for start
var srInput = new StreamReader(fsInput);
string searchString = "two";
string cSearch = null;
bool found = false;
while ((cSearch = srInput.ReadLine()) != null)
{
if (cSearch.StartsWith(searchString, StringComparison.CurrentCultureIgnoreCase)
{
found = true;
break;
}
}
if (!found)
throw new Exception("Searched string not found.");
//we have the data, write to a new file
using (StreamWriter sw = new StreamWriter(
new FileStream("out.txt", FileMode.OpenOrCreate, //create or overwrite
FileAccess.Write, FileShare.None))) // write only, no sharing
{
//write the line that we found in the search
sw.WriteLine(cSearch);
string cline = null;
while ((cline = srInput.ReadLine()) != null)
sw.WriteLine(cline);
}
}
//both files are closed and complete
You can copy with LINQ2XML
XElement doc=XElement.Load("yourXML.xml");
XDocument newDoc=new XDocument();
foreach(XElement elm in doc.DescendantsAndSelf("etc123"))
{
newDoc.Add(elm);
}
newDoc.Save("yourOutputXML.xml");
You could do one line at a time... Would not use read to end if checking contents of each line.
FileInfo file = new FileInfo("MyHugeXML.xml");
FileInfo outFile = new FileInfo("ResultFile.xml");
using(FileStream write = outFile.Create())
using(StreamReader sr = file.OpenRead())
{
bool foundit = false;
string line;
while((line = sr.ReadLine()) != null)
{
if(foundit)
{
write.WriteLine(line);
}
else if (line.Contains("<etc123"))
{
foundit = true;
}
}
}
Please note, this method may not produce valid XML, given your requirements.

Unexpected output when reading and writing to a text file

I am a bit new to files in C# and am having a problem. When reading from a file and copying to another, the last chunk of text is not being written. Below is my code:
StringBuilder sb = new StringBuilder(8192);
string fileName = "C:...rest of path...inputFile.txt";
string outputFile = "C:...rest of path...outputFile.txt";
using (StreamReader reader = File.OpenText(fileName))
{
char[] buffer = new char[8192];
while ((reader.ReadBlock(buffer, 0, buffer.Length)) != 0)
{
foreach (char c in buffer)
{
//do some function on char c...
sb.Append(c);
}
using (StreamWriter writer = File.CreateText(outputFile))
{
writer.Write(sb.ToString());
}
}
}
My aim was to read and write to a textfile in a buffered manner. Something that in Java I would achieve in the following manner:
public void encrypt(File inputFile, File outputFile) throws IOException
{
BufferedReader infromfile = null;
BufferedWriter outtofile = null;
try
{
String key = getKeyfromFile(keyFile);
if (key != null)
{
infromfile = new BufferedReader(new FileReader(inputFile));
outtofile = new BufferedWriter(new FileWriter(outputFile));
char[] buffer = new char[8192];
while ((infromfile.read(buffer, 0, buffer.length)) != -1)
{
String temptext = String.valueOf(buffer);
//some changes to temptext are done
outtofile.write(temptext);
}
}
}
catch (FileNotFoundException exc)
{
} // and all other possible exceptions
}
Could you help me identify the source of my problem?
If you think that there is possibly a better approach to achieve buffered i/o with text files, I would truly appreciate your suggestion.
There are a couple of "gotchas":
c can't be changed (it's the foreach iteration variable), you'll need to copy it in order to process before writing
you have to keep track of your buffer's size, ReadBlock fills it with characters which would make your output dirty
Changing your code like this looks like it works:
//extracted from your code
foreach (char c in buffer)
{
if (c == (char)0) break; //GOTCHA #2: maybe you don't want NULL (ascii 0) characters in your output
char d = c; //GOTCHA #1: you can't change 'c'
// d = SomeProcessingHere();
sb.Append(d);
}
Try this:
string fileName = #"";
string outputfile = #"";
StreamReader reader = File.OpenText(fileName);
string texto = reader.ReadToEnd();
StreamWriter writer = new StreamWriter(outputfile);
writer.Write(texto);
writer.Flush();
writer.Close();
Does this work for you?
using (StreamReader reader = File.OpenText(fileName))
{
char[] buffer = new char[8192];
bool eof = false;
while (!eof)
{
int numBytes = (reader.ReadBlock(buffer, 0, buffer.Length));
if (numBytes>0)
{
using (StreamWriter writer = File.CreateText(outputFile))
{
writer.Write(buffer, 0, numBytes);
}
} else {
eof = true;
}
}
}
You still have to take care of character encoding though!
If you dont care about carraign returns, you could use File.ReadAllText
This method opens a file, reads each line of the file, and then adds each line as an element of a string. It then closes the file. A line is defined as a sequence of characters followed by a carriage return ('\r'), a line feed ('\n'), or a carriage return immediately followed by a line feed. The resulting string does not contain the terminating carriage return and/or line feed.
StringBuilder sb = new StringBuilder(8192);
string fileName = "C:...rest of path...inputFile.txt";
string outputFile = "C:...rest of path...outputFile.txt";
// Open the file to read from.
string readText = File.ReadAllText(fileName );
foreach (char c in readText)
{
// do something to c
sb.Append(new_c);
}
// This text is added only once to the file, overwrite it if it exists
File.WriteAllText(outputFile, sb.ToString());
Unless I'm missing something, it appears that your issue is that you're overwriting the existing contents of your output file on each blockread iteration.
You call:
using (StreamWriter writer = File.CreateText(outputFile))
{
writer.Write(sb.ToString());
}
for every ReadBlock iteration. The output of the file would only be the last chunk of data that was read.
From MSDN documentation on File.CreateText:
If the file specified by path does not exist, it is created. If the
file does exist, its contents are overwritten.

c# edit txt file without creating another file (without StreamWriter)

Because I'm using non-latin alphabet, if I use StreamWriter, the characters aren't correct.
String line;
StreamReader sr = new StreamReader(#"C:\Users\John\Desktop\result.html");
line = sr.ReadLine();
while (line != null)
{
line = sr.ReadLine();
if (line.Contains("</head>"))
{
line = "<img src=\"result_files\\image003.png\"/>" + line;
}
}
sr.Close();
Here I'm editing the string I want to edit in the file, but I'm not saving it in the same file. How to do that?
If you use one of the StreamWriter constructors that accepts an encoding you shouldn't have any problems with incorrect characters. You are also skipping the first line in your reading method.
Encoding encoding;
StringBuilder output = new StringBuilder();
using (StreamReader sr = new StreamReader(filename))
{
string line;
encoding = sr.CurrentEncoding;
while ((line = sr.ReadLine()) != null)
{
if (line.Contains("</head>"))
{
line = "<img src=\"result_files\\image003.png\"/>" + line;
}
output.AppendLine(line);
}
}
using (StreamWriter writer = new StreamWriter(filename, false, encoding))
{
writer.Write(output.ToString());
}
I think the easiest approach would be
open the file in read/write mode
read everything from the file
make the modifications inmemory
rewrite it back to the file rather than appending..
you use a StreamReader. And the name say what it's function. To read!
Dirty-Code
if (File.Exists(fileName))
{
int counter = 1;
StringBuilder sb = new StringBuilder();
foreach (string s in File.ReadAllLines(fileName, Encoding.Default))
{
if (s.Contains("</head>"))
{
s= "<img src=\"result_files\\image003.png\"/>" + line;
}
sb.AppendLine(s);
counter++;
}
File.WriteAllText(fileName, sb.ToString(), Encoding.Default);
}

Categories

Resources