c# how to add a line break to a memory stream - c#

I am merging 3 files, for example, but at final there are not line breaks between the files...
MemoryStream m = new MemoryStream();
File.OpenRead("c:\file1.txt").CopyTo(m);
File.OpenRead("c:\file2.txt").CopyTo(m);
File.OpenRead("c:\file3.txt").CopyTo(m);
m.Position = 0;
Console.WriteLine(new StreamReader(m).ReadToEnd());
how can I may add a line break to a memory stream?

You can write the line break to the stream. You need to decide which one you want. Probably, you want Encoding.Xxx.GetBytes(Environment.NewLine). You also need to decide which encoding to use (which must match the encoding of the other files).
Since the line break string is ASCII what matters is only the distinction between single-byte encodings and ones that use more. Unicode uses two bytes per newline char for example.
If you need to guess you probably should go with UTF 8 without BOM.
You also can try a fully text based approach:
var result = File.ReadAllLines(a) + Environment.NewLine + File.ReadAllLines(b);
Let me also point out that you need to dispose the streams that you open.

Quick and dirty:
MemoryStream m = new MemoryStream();
File.OpenRead("c:\file1.txt").CopyTo(m);
m.WriteByte(0x0A); // this is the ASCII code for \n line feed
// You might want or need \n\r in which case you'd
// need to write 0x0D as well.
File.OpenRead("c:\file2.txt").CopyTo(m);
m.WriteByte(0x0A);
File.OpenRead("c:\file3.txt").CopyTo(m);
m.Position = 0;
Console.WriteLine(new StreamReader(m).ReadToEnd());
But as #usr points out, you really should think about the encoding.

Assuming you know the encoding, for example UTF-8, you can do:
using (var ms = new MemoryStream())
{
// Do stuff ...
var newLineBytes = Encoding.UTF8.GetBytes(Environment.NewLine);
ms.Write(newLineBytes, 0, newLineBytes.Length);
// Do more stuff ...
}

Related

Importing Data from a csv file

I have a csv file.
When I try to read that file using filestream readtoend(), I get inverted commas and \r at many places that breaks my number of rows in each column.
Is there a way to remove inverted commas and \r.
I tried to replace
FileStream obj = new FileStream();
string a = obj.ReadToEnd();
a.Replace("\"","");
a.Replace("\r\"","");
When I visualize a all \r and inverted commas are removed.
But when I read the file again from beginning using ReadLine() they appear again?
First of all, a String is immutable. You might think this is not important for your question, but actualy it's important whenever you are developing.
If I look at your code snippet, I'm pretty sure you have no knowledge of immutable objects so I advice you to make sure you fully understand the concept.
More information regarding immutable objects can be found: http://en.wikipedia.org/wiki/Immutable_object
Basicly, it means one can never modify a string object. Strings will always point to a new object whenever we change the value.
That's why the Replace method returns a value, which's documentation can be found here: https://msdn.microsoft.com/en-us/library/system.string.replace%28v=vs.110%29.aspx and states clearly that it Returns a new string in which all occurrences of a specified string in the current instance are replaced with another specified string.
In your example, you aren't using the return value of the Replace function.
Could you show us that the string values are actuably being replaced from your a variable? Because I do not believe this is going to be the case. When you visualize a string, carriage returns (\r) are not visual and replaced by an actual carriage return. If you debug and take alook at the actual string value, you should still see the \n.
Take the following code snippet:
var someString = "Hello / world";
someString.Replace("/", "");
Console.Log(someString);
You might think that the console will show "Hello world". However, on this fiddle you can see that it still logs "Hello / World": https://dotnetfiddle.net/cp59i3
What you have to do to correctly use String.Replace can be seen in this fiddle: https://dotnetfiddle.net/XCGtOu
Basicly, you want to log the return value of the Replace function:
var a = "Some / Value";
var b = a.Replace("/", "");
Console.WriteLine(b);
Also, as mentioned by others in the comment section at ur post, you are not replacing the contents of the file, but the string variable in your memory.
If you want to save the new string, make sure to use the Write method of the FileStream (or any other way to write to a file), an explanation can be found here: How to Find And Replace Text In A File With C#
Apart from all what I have been saying throughout this answer, you should not replace both inverted comma's and carriage returns in a file in most cases, they are there for a reason. Unless you do have a specific reason.
At last I succeeded. Thanks to everybody. Here is the code I did.
FileStream obj = new FileStream();
using(StreamReader csvr = new StreamReader(obj))
{
string a = obj.ReadToEnd();
a = a.Replace("\"","");
a = a.Replace("\r\"","");
obj.Dispose();
}
using(StreamWriter Wr = new StreamWriter(TempPath))
{
Wr.Write(a);
}
using(StreamReader Sr = new StreamReader(Tempath))
{
Sr.ReadLine();
}
I Created a temp path on the system. After this things were easy to enter into database.
Try something like this
StreamReader sReader = new StreamReader("filename");
string a = sReader.ReadToEnd();
a.Replace("\"", "");
a.Replace("\r\"", "");
StringReader reader = new StringReader(a);
string inputLine = "";
while ((inputLine = reader.ReadLine()) != null)
{
}

reading large file, wrong file size

I'm trying to read a large file from a disk and report percentage while it's loading. The problem is FileInfo.Length is reporting different size than my Encoding.ASCII.GetBytes().Length.
public void loadList()
{
string ListPath = InnerConfig.dataDirectory + core.operation[operationID].Operation.Trim() + "/List.txt";
FileInfo f = new FileInfo(ListPath);
int bytesLoaded = 0;
using (FileStream fs = File.Open(ListPath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (BufferedStream bs = new BufferedStream(fs))
using (StreamReader sr = new StreamReader(bs))
{
string line;
while ((line = sr.ReadLine()) != null)
{
byte[] array = Encoding.ASCII.GetBytes(line);
bytesLoaded += array.Length;
}
}
MessageBox.Show(bytesLoaded + "/" + f.Length);
}
The result is
13357/15251
There's 1900 bytes 'missing'. The file contains list of short strings. Any tips why it's reporting different file sizes? does it has to do anything with '\r' and '\n' characters in the file? In addition, I have the following line:
int bytesLoaded = 0;
if the file is lets say 1GB large, do I have to use 'long' instead? Thank you for your time!
Your intuition is correct; the difference in the reported sizes is due to the newline characters. Per the MSDN documentation on StreamReader.ReadLine:
The string that is returned does not contain the terminating carriage return or line feed.
Depending on the source which created your file, each newline would consist of either one or two characters (most commonly: \r\n on Windows; just \n on Linux).
That said, if your intention is to read the file as a sequence of bytes (without regard to lines), you should use the FileStream.Read method, which avoids the overhead of ASCII encoding (as well as returns the correct count in total):
byte[] array = new byte[1024]; // buffer
int total = 0;
using (FileStream fs = File.Open(ListPath, FileMode.Open,
FileAccess.Read, FileShare.ReadWrite))
{
int read;
while ((read = fs.Read(array, 0, array.Length)) > 0)
{
total += read;
// process "array" here, up to index "read"
}
}
Edit: spender raises an important point about character encodings; your code should only be used on ASCII text files. If your file was written using a different encoding – the most popular today being UTF-8 – then results may be incorrect.
Consider, for example, the three-byte hex sequence E2-98-BA. StreamReader, which uses UTF8Encoding by default, would decode this as a single character, ☺. However, this character cannot be represented in ASCII; thus, calling Encoding.ASCII.GetBytes("☺") would return a single byte corresponding to the ASCII value of the fallback character, ?, thereby leading to loss in character count (as well as incorrect processing of the byte array).
Finally, there is also the possibility of an encoding preamble (such as Unicode byte order marks) at the beginning of the text file, which would also be stripped by the ReadLine, resulting in a further discrepancy of a few bytes.
It's the line endings which get swallowed by ReadLine, and could also possibly be because your source file is in a more verbose encoding than ASCII (perhaps it's UTF8?).
int.MaxValue is 2147483647, so you're going to run into problem using an int for bytesLoaded if your file is >2GB. Switch to a long. After all, FileInfo.Length is defined as a long.
The ReadLine method removes the trailing line termination character.

How to read byte[] with current encoding using streamreader

I would like to read byte[] using C# with the current encoding of the file.
As written in MSDN the default encoding will be UTF-8 when the constructor has no encoding:
var reader = new StreamReader(new MemoryStream(data)).
I have also tried this, but still get the file as UTF-8:
var reader = new StreamReader(new MemoryStream(data),true)
I need to read the byte[] with the current encoding.
A file has no encoding. A byte array has no encoding. A byte has no encoding. Encoding is something that transforms bytes to text and vice versa.
What you see in text editors and the like is actually program magic: The editor tries out different encodings an then guesses which one makes the most sense. This is also what you enable with the boolean parameter. If this does not produce what you want, then this magic fails.
var reader = new StreamReader(new MemoryStream(data), Encoding.Default);
will use the OS/Location specific default encoding. If that is still not what you want, then you need to be completely explicit, and tell the streamreader what exact encoding to use, for example (just as an example, you said you did not want UTF8):
var reader = new StreamReader(new MemoryStream(data), Encoding.UTF8);
I just tried leveraging different way of trying to figure out the ByteEncoding and it is not possible to do so as the byte array does not have an encoding in place as Jan mentions in his reply. However you can always take the value and do the type conversion to UTF8 or ASCII/Unicode and test the string values in case you are doing a "Text.EncodingFormat.GetString(byte [] array)"
public static bool IsUnicode(string input)
{
var asciiBytesCount = Encoding.ASCII.GetByteCount(input);
var unicodBytesCount = Encoding.UTF8.GetByteCount(input);
return asciiBytesCount != unicodBytesCount;
}

How do I load a string into a FileStream without going to disk?

string abc = "This is a string";
How do I load abc into a FileStream?
FileStream input = new FileStream(.....);
Use a MemoryStream instead...
MemoryStream ms = new MemoryStream(System.Text.Encoding.ASCII.GetBytes(abc));
remember a MemoryStream (just like a FileStream) needs to be closed when you have finished with it. You can always place your code in a using block to make this easier...
using(MemoryStream ms = new MemoryStream(System.Text.Encoding.ASCII.GetBytes(abc)))
{
//use the stream here and don't worry about needing to close it
}
NOTE: If your string is Unicode rather than ASCII you may want to specify this when converting to a Byte array. Basically, a Unicode character takes up 2 bytes instead of 1. Padding will be added if needed (e.g. 0x00 0x61 = "a" in unicode, where as in ASCII 0x61 = "a")

How do I count the number of bytes read by TextReader.ReadLine()?

I am parsing a very large file of records (one per line, each of varying length), and I'd like to keep track of the number of bytes I've read in the file so that I may recover in the event of a failure.
I wrote the following:
using (TextReader myTextReader = CreateTextReader())
{
string record = myTextReader.ReadLine();
bytesRead += record.Length;
ParseRecord(record);
}
However this doesn't work since ReadLine() strips any CR/LF characters in the line. Furthermore, a line may be terminated by either CR, LF, or CRLF characters, which means I can't just add 1 to bytesRead.
Is there an easy way to get the actual line length, or do I write my own ReadLine() method in terms of the granular Read() operations?
Getting the current position of the underlying stream won't help, since the StreamReader will buffer data read from the stream.
Essentially you can't do this without writing your own StreamReader. But do you really need to?
I would simply count the number of lines read.
Of course, this means that to position to a specific line you will need to read N lines rather than simply seeking to an offset, but what's wrong with that? Have you determined that performance will be unacceptable?
A TextReader reads strings, which are characters, which [depending on the encoding] isn't equal to bytes.
How about just storing number of lines read, and just skip that many lines when recovering? I guess that it's all about not processing those line, not necessarily avoiding to read them from the stream.
If you are reading a string, you can use regular expression matches and count the number of characters.
var regex = new Regex("^(.*)$", RegexOptions.Compiled | RegexOptions.Multiline);
var matches = regex.Matches(text);
var count = matches.Count;
for (var matchIndex = 0; matchIndex < count; ++matchIndex)
{
var match = matches[matchIndex];
var group = match.Groups[1];
var value = group.Captures[0].Value;
Console.WriteLine($"Line {matchIndex + 1} (pos={match.Index}): {value}");
}
Come to think of it, I can use a StreamReader and get the current position of the underlying stream as follows.
using (StreamReader myTextReader = CreateStreamReader())
{
stringRecord = myTextReader.ReadLine();
bytesRead += myTextReader.BaseStream.Position;
ParseRecord(record);
// ...
}

Categories

Resources