I'm making changes to a file, the 'source' file is a plain text file, but from a UNIX system.
I'm using a StreamReader to read the 'source' file, I then make and store the changes to a StringBuilder using AppendLine().
I then create a StreamWriter (set-up with the same Encoding as the StreamReader) and write the file.
Now when I compare the two files using WinMerge it reports that the carriage return characters are different.
What should I do to ensure that the carriage return characters are the same for the type of file that I am editing?
It should also be noted that the files that are to be modified could come from any system, not just from a UNIX system, they could equally be from a Windows system - so I want a way of inserting these new lines with the same type of carriage return as the original file
Thanks very much for any and all replies.
:)
First you need to assert what is the newline character for the current file. This can be a problem since a given file can mix different line endings.
Then you can set the NewLine property of your StreamWriter:
StreamWriter sw = new StreamWriter("example.txt");
sw.NewLine = "\r";
For an interesting read about line endings check out The Great Newline Schism.
The type of line ending used by a file is independent of the encoding. It sounds like you may need to read the source file to determine the line endings to start with (which you have to do explicitly; I don't believe there's anything in the framework to give you that information) and then explicitly use that when creating the new file. (Instead of calling AppendLine, call Append with the line itself, and then Append again with the appropriate line terminator for that file.)
Note that a single file can have a variety of line endings - some "\r\n", some "\n" and some "\r", so you'll have to apply appropriate heuristics (e.g. picking the most commonly seen line ending).
Related
Why while using the following code
string filePath = #"C:\test\upload.pdf"
FileStream fs = File.OpenRead(filePath);
raises the following exception?
The filename, directory name, or volume label syntax is incorrect :
'C:\ConsoleApp\bin\Debug\netcoreapp2.1\C:\test\upload.pdf'
From where the C:\ConsoleApp\bin\Debug\netcoreapp2.1\ directory comes from?
Update:
The File.OpenRead() in my case, exists within in an dll & the filePath (C:\test\upload.pdf) is sent via the application that is using the dll.
The string starts with an invisible character, so it's not a valid path. This
(int)#"C:\test\upload.pdf"[0]
Returns
8234
Or hex 202A. That's the LEFT-TO-RIGHT EMBEDDING punctuation character
UPDATE
Raymond Chen posted an article Why is there an invisible U+202A at the start of my file name?.
We saw some time ago that you can, as a last resort, insert the character U+202B (RIGHT-TO-LEFT EMBEDDING) to force text to be interpreted as right-to-left. The converse character is U+202A (LEFT-TO-RIGHT EMBEDDING), which forces text to be interpreted as left-to-right.
The Security dialog box inserts that control character in the file name field in order to ensure that the path components are interpreted in the expected manner. Unfortunately, it also means that if you try to copy the text out of the dialog box, the Unicode formatting control character comes along for a ride. Since the character is normally invisible, it can create all sorts of silent confusion.
(We’re lucky that the confusion was quickly detected by Notepad and the command prompt. But imagine if you had pasted the path into the source code to a C program!)
In the 4 years since that article Notepad got UTF8 support so the character isn't replaced by a question mark. Pasting into the current Windows Console with its incomplete UTF8 support still replaces the character.
The File.OpenRead() in my case, exists within in an dll .
Set the CopyLocal= true in properties of dll in which File.OpenRead exists.
I'm working on a c# project in which some data contains characters which are not recognised by the encoding.
They are displayed like that:
"Some text � with special � symbols in it".
I have no control over the encoding process, also data come from files of various origins and various formats.
I want to be able to flag data that contains such characters as erroneous or incomplete. Right now I am able to detect them this way:
if(myString.Contains("�"))
{
//Do stuff
}
While it does work, it doesn't feel quite right to use the weird symbol directly in the Contains function. Isn't there a cleaner way to do this ?
EDIT:
After checking back with the team responsible for reading the files, this is how they do it:
var sr = new StreamReader(filePath, true);
var content = sr.ReadToEnd();
Passing true as a second parameter of StreamReader is supposed to detect the encoding from the file's BOM, and use it to read the content. It doesn't always work though, as some files don't bear that information, hence why their data is read incorrectly.
We've made some tests and using StreamReader(filePath, Encoding.Default) instead appears to work for most if not all files we had issues with. Expectedly, files that were working before not longer work because they do not use the default encoding.
So the best solution for us would be to do the following: read the file trying to detect its encoding, then if it wasn't successful read it again with the default encoding.
The problem remains the same though: how do we check, after trying to detect the file's encoding, if data has been read incorrectly ?
The � character is not a special symbol. It's the Unicode Replacement Character. This means that the code tried to convert ASCII text using the wrong codepage. Any characters that didn't have a match in the codepage were replaced with �.
The solution is to read the file using the correct encoding. The default encoding used by the File methods or StreamReader is UTF8. You can pass a different encoding using the appropriate constructor, eg StreamReader(Stream, Encoding, Boolean). To use the system locale's codepage, you need to use Encoding.Default :
var sr = new StreamReader(filePath,Encoding.Default);
You can use the StreamReader(Stream, Encoding, Boolean) constructor to autodetect Unicode encodings from the BOM and fallback to a different encoding.
Assuming the files are either some type of Unicode or match your system locale, you can use:
var sr = new StreamReader(filePath,Encoding.Default, true);
From StreamReader's source shows that the DetectEncoding method will check the first bytes of a file to determine the encoding. If one is found, it is used instead of the supplied encoding. The operation doesn't cause extra IO because the method checks the class's internal buffer
EDIT
I just realized you can't actually load the raw file into a .NET string and still be able to have full information about the original file.
The project here uses the Mlang api which does a better job at not loading the file into a .NET string before guessing. There is also a related SO question
I know the question has been asked before, but I wasn't quite satisfied with the answer (considering it didn't explain what was going on).
My specific question is :How do I open a csv/text file that's comma delimited and rows are separated by returns and put it into an HTML table using C#?
I understand how to do this in PHP but I have just started learning ASP.Net/C#, if anyone has some free resources for C# and/or is able to provide me with a snippet of code with some explanation of whats going on I would appreciate it.
I have this code, but I'm not sure how I would use it because A)I don't know how C# arrays work and B)I'm not sure how to open files in ASP.Net C#:
var lines =File.ReadAllLines(args[0]);
using (var outfs = File.AppendText(args[1]))
{
outfs.Write("<html><body><table>");
foreach (var line in lines)
outfs.Write("<tr><td>" + string.Join("</td><td>", line.Split(',')) + "</td></tr>");
outfs.Write("</table></body></html>");
}
I apologize for my glaring inexperience here.
The code sample you posted does exactly what you're asking:
Opens a file.
Writes a string of HTML for a table with the contents of the file from step 1 to another file.
Let's break it down:
var lines =File.ReadAllLines(args[0]);
This opens the file specified in args[0] and reads all the lines into a string array, one lay per element. See File.ReadAllLines Method (String).
using (var outfs = File.AppendText(args[1]))
{
File.AppendText Method creates a StreamWriter to append text to an existing file (or creates it if it doesn't exist). The filename (and path, possibly) are in args[1]. The using statement puts the StreamWriter into what is called a using block, to ensure the stream is correctly disposed once the using block is left. See using Statement (C# Reference) for more information.
outfs.Write("<html><body><table>");
outfs.Write calls the Write method of the StreamWriter (StreamWriter.Write Method (String)). Actually in the case of your code snippet nothing is written to the file until you exit the using block - it's written to a buffer. Exiting the using block will flush the buffer and write to the file.
foreach (var line in lines)
This command starts a loop through all the elements in the string array lines, staring with the first (element 0) index. See foreach, in (C# Reference) for more information if you need it.
outfs.Write("<tr><td>" + string.Join("</td><td>", line.Split(',')) + "</td></tr>");
String.Join is the key part here, where most of the work is done. String.Join Method (String, String[]) has the technical details, but essentially what is happening here is that the second argument (line.Split(',')) is passing in an array of strings, and the strings in that array are then being concatenated together with the first argument (</td><td>) as the separator, and the table row is being opened and closed.
For example, if the line is "1,2,3,4,5,6", the Split gives you a 6 element array. This array is then conatenated with </td><td> as the separator by String.Join, so you have "1</td><td>2</td><td>3</td><td>4</td><td>5</td><td>6". "<tr><td>" is added to the front and "</td></tr>" is added to the end and the final line is "<tr><td>1</td><td>2</td><td>3</td><td>4</td><td>5</td><td>6</td></tr>".
outfs.Write("</table></body></html>");
}
This writes the end of the HTML to the buffer, which is then flushed and written to the specified text file.
A couple of things to note. args[0] and args[1] are used to hold command line arguments (i.e., MakeMyTable.exe InFile.txt OutFile.txt), which aren't (in my experience) applicable to ASP.NET applications. You'll need to either code the files (and paths) necessary, or allow the user to specify the input file and/or output file. The ASP.NET application will need to be running under an account that has permission to access those files as well.
If you have quoted values in the CSV file, you'll need to handle those (this is very common when dealing with monetary amounts, for example), as splitting on the , may cause an incorrect split. I recommend taking a look at TextFieldParser, as it can handle quoted fields quite easily.
Unless you're sure that each line in the file has the same number of fields, you run the risk of having poorly formed HTML in your table and no guarantees on how it will render.
Additionally, it would be advisable to test that the file you're opening exists. There's probably more, but these are the basics (and may already be beyond the scope of Stack Overflow).
Hopefully this will help point you in the right direction:
line = "<table>" + line;
foreach (string line in lines)
{
line = "<tr><td>" + line;
line.Replace(",", "</td><td>");
line += "</td></tr>";
}
Response.Write(line + "</table>");
Good luck with your learning!
I currently have two strings assigned - domain,subdomain
How could I delete any matched occurrences of these strings in a text file?
string domain = "127.0.0.1 test.com"
string subdomain = "127.0.0.1 sub.test.com"
I don't think using a regex would be ideal in this situation.
How can this be done?
You need to:
Open the existing file for input
Open a new file for output
Repeatedly:
Read a line of text from the input
See if it matches your pattern (it's unclear at the moment what pattern you're looking for)
If it doesn't, write the line to the output (or if you're only trying to remove bits of lines, work out which bit you want to write out)
Close both the input and output (a using statement will do this automatically)
Optionally delete the original file and rename the new one if you want to effectively replace the original.
var result = Regex.Replace(File.ReadAllText("file.txt"),
#"127\.0\.0\.1 test\.com|127\.0\.0\.1 sub\.test\.com", string.Empty);
Then write to file obtained result.
I am looking for a way to search a comma separated txt file for a keyword, and then replace another keyword on that exact line. For example if i have the following line in a big txt file:
Help, 0
I want to find this line in the txt (by telling program to look for the first word 'help') and replace the 0 with 1 to indicate that i have read it once so it looks like:
Help, 1
Thanks
It is generally a very bad idea to try and overwrite data in the same file: if your code throws an exception, you'll be left with a partially processed file; if your search target and replacement value have different lengths, you have to re-write the rest of the file. Note that these don't apply in your specific situation - but it's best not to let it become habit.
My recommendation:
Open both the input file and a temporary file (Path.GetTempFileName)
process and write each line ( StreamReader.ReadLine)
When finished with no errors, rename the original file to something like origFile.old
rename the temporary file to the original file name.
If something goes wrong, delete the temporary file and exit. This way the original file is left intact in the event of an error.
If you want to do the replacement "in place" (meaning you don't want to use another, temporary, file) then you would do so with a FileStream.
You have a couple of options, you can Read through the file stream until you find the text that you're looking for, then issue a Write. Keep in mind that FileStream works at the byte level, so you'll need to take character encoding into consideration. Encoding.GetString will do the conversion.
Alternatively, you can search for the text, and note its position. Then you can open a FileStream and just Seek to that position. Then you can issue the Write.
This may be the most efficient way, but it's definitely more challenging then the naive option. With the naive implementation, you:
Read the entire file into memory (File.ReadAllText)
Perform the replace (Regex.Replace)
Write it back to disk (File.WriteAllText)
There's no second file, but you are bound by the amount of memory the system has. If you know you're always dealing with small files, then this could be an option. Otherwise, you need to read up on character encoding and file streams.
Here's another SO question on the topic (including sample code): Editing a text file in place through C#