I have a file that exists within a text and a binary image, I need to read from 0 to 30 position the text in question, and the position on 31 would be the image in binary format.
What are the steps that I have to follow to proceed with that problem?
Currently, I am trying to read it using FileStream, and then I move the FileStream var to one BinaryReader as shown below:
FileStream fs = new FileStream(filePath, FileMode.Open, FileAccess.Read)
BinaryReader br = new BinaryReader(fs)
From there forward, I'm lost.
UPDATE
Alright, so I Can read my file now.
Until the position 30 is my 30 string, from position 30 is the bit string Which is Actually an image.
I wonder how do I read the bytes from position 30 and then save the images!
Does anyone have any ideas?
Follow an example from my file to you have some ideia:
£ˆ‰¢#‰¢#¢–”…#•…¦#„£################################.-///%<<??#[K}#k{M÷]kðñôôô}ù~øòLKóôòÿg
Note that even the # # # is my string and from that the picture would be one byte.
Expanding on Roger's answer a bit, with some code.
A string is always encoded in some format, and to read it you need to know that encoding (especially when using binary reader). In many cases, it's plain ASCII and you can use Encoding.ASCII.GetString to parse it if you get unexpected results (weird characters etc.) then try another encoding.
To parse the image you need to use an image parser. .NET has several as part of their GUI namespaces. In the sample below I'm using the one from System.Drawing (windows forms) but similar ones exists in WPF and there are many third party libraries out there.
using (var reader = new BinaryReader(File.Open(someFile, FileMode.Open))
{
// assuming your string is in plain ASCII encoding:
var myString = System.Text.Encoding.ASCII.GetString(reader.ReadBytes(30));
// The rest of the bytes is image data, use an image library to process it
var myImage = System.Drawing.Image.FromStream(reader.BaseStream);
}
Now MSDN has a caution about using the BaseStream in conjunction with BinaryReader but I believe in the above case you should be safe since you're not using the stream after the image. But keep an eye out for problems. If it fails, you can always read the bytes into a new byte[] and create a new MemoryStream from those bytes.
EDIT:
You indicated in your comment your string is EBCDIC which unfortunately means you cannot use any of the built in Encodings to decode it. A quick google search revealed a post by Jon Skeet on a EBCDIC .NET Encoding class that may get you started. It will essentially give you ebcdicEncoding.GetString(...);
You can use FileStream to open and read from the file. If you read the first 30 bytes into a buffer you can then convert that to a string using "string Encoding.ASCII.GetString(byte[] buffer, int offset, int length)".
Related
I'm using a finite-state machine to read a extra large file. It's not multi-threaded, so there won't be any problem of thread safety.
It contains 3 kinds of content:
binary number, indicates the length of the following string, counts a character as 1
ANSI, takes 1~2 Bytes for a character
UTF-8, takes 1~4 Bytes for a character
I've found this question that might be useful, but it failed. The similiar python question is neither useful, because it won't throw any error. I have to read the content with proper encoding, or the behavior will go unknown.
Currently, i'm using StreamReader, but the CurrentEncoding property cannot be changed, once the StreamReader is initialized.
So i've also tried to recreate the StreamReader on the same Stream:
reader = new StreamReader(stream, encoding65001); //UTF-8
DoSomething(reader);
reader = new StreamReader(stream, encoding1252); //ANSI
DoSomething(reader);
reader = new StreamReader(stream, encoding936); //ANSI
//...
But it starts to read strange content from an unknown position. I haven't find out the possible cause for this strange behavior.
Have I made mistake on creating multiple StreamReader, or it is designed not to create multiple on the same stream?
If it is designed so, is there any solution for reading such file?
Thank you for the time reading.
Edit:
I've run the following code on .NET Core 3.1:
Stream stream = File.OpenRead(testFilePath);
Console.WriteLine(stream.Position);
Console.WriteLine(stream.ReadByte());
Console.WriteLine(stream.Position + "\r\n");
StreamReader reader = new StreamReader(stream, Encoding.UTF8);
Console.WriteLine(reader.Read());
Console.WriteLine(stream.Position + "\r\n");
reader = new StreamReader(stream, CodePagesEncodingProvider.Instance.GetEncoding(1252));
Console.WriteLine(reader.Read());
Console.WriteLine(stream.Position);
With the example text of following:
abcdefg
And the output:
0
97
1
98
7
-1
7
It's strange and interesting.
The stream readers are going to buffer the content from the underlying stream they're reading, which is what's causing your problems. Just because you read one character from your reader doesn't mean it'll read just one character from the underlying stream. It'll fill a while buffer with bytes, and then yield you one character from the buffer.
If you want to be reading values from a stream and interpreting different sections of bytes as different encodings (for the record, if at all possible you should avoid putting yourself in this position of having mixed encodings in your data) you'll have to pull the bytes out of the stream yourself and then convert the bytes using the appropriate encodings, so that you can be sure you only pull the exact sections of bytes you want and no more.
I need to change a file in memory, and currently I read the file to memory into a byte[] using a filestream and a binaryreader.
I was wondering whats the best approach to change that file in memory, convert the byte[] to string, make changes and do an Encoding.GetBytes()? or Read the file first as string using File.ReadAllText() and then Encoding.GetBytes()? or any approach will work without caveats?
Any special approaches? I need to replace specific text inside files with additional chars or replacement strings, several 100,000 of files. Reliability is preferred over efficiency. Files are text like HTML, not binary files.
Read the files using File.ReadAllText(), modify them, then do byte[] byteData = Encoding.UTF8.GetBytes(your_modified_string_from_file). Use the encoding with which the files were saved. This will give you an array of byte[]. You can convert the byte[] to a stream like this:
MemoryStream stream = new MemoryStream();
stream.Write(byteData, 0, byteData.Length);
Edit:
It looks like one of the Add methods in the API can take a byte array, so you don't have to use a stream.
You're definitely making things harder on yourself by reading into bytes first. Just use a StreamReader. You can probably get away with using ReadLine() and processing a line at a time. This can seriously reduce your app's memory usage, especially if you're working with that many files.
using (var reader = File.OpenText(originalFile))
using (var writer = File.CreateText(tempFile))
{
string line;
while ((line = reader.ReadLine()) != null)
{
var temp = DoMyStuff(line);
writer.WriteLine(temp);
}
}
File.Delete(originalFile);
File.Move(tempFile, originalFile);
Based on the size of the files, I would use File.ReadAllText to read them and File.WriteAllText to wirte them. This frees you up from the responsibility of having to call Close or Dispose on either read or write.
You generally don't want to read a text file on a binary level - just use File.ReadAllText() and supply it with the correct encoding used in the file (there's an overload for that). If the file encoding is UTF8 or UTF32 usually the method can automatically detect and use the correct endcoding. Same applies to writing it back - if it's not UTF8 specify which encoding you want.
I am trying to use StreamReader and StreamWriter to Open a text file (fixed width) and to modify a few specific columns of data. I have dates with the following format that are going to be converted to packed COMP-3 fields.
020100718F
020100716F
020100717F
020100718F
020100719F
I want to be able to read in the dates form a file using StreamReader, then convert them to packed fields (5 characters), and then output them using StreamWriter. However, I haven't found a way to use StreamWriter to right to a specific position, and beginning to wonder if is possible.
I have the following code snip-it.
System.IO.StreamWriter writer;
this.fileName = #"C:\Test9.txt";
reader = new System.IO.StreamReader(System.IO.File.OpenRead(this.fileName));
currentLine = reader.ReadLine();
currentLine = currentLine.Substring(30, 10); //Substring Containing the Date
reader.Close();
...
// Convert currentLine to Packed Field
...
writer = new System.IO.StreamWriter(System.IO.File.Open(this.fileName, System.IO.FileMode.Open));
writer.Write(currentLine);
Currently what I have does the following:
After:
!##$%0718F
020100716F
020100717F
020100718F
020100719F
!##$% = Ascii Characters SO can't display
Any ideas? Thanks!
UPDATE
Information on Packed Fields COMP-3
Packed Fields are used by COBOL systems to reduce the number of bytes a field requires in files. Please see the following SO post for more information: Here
Here is Picture of the following date "20120123" packed in COMP-3. This is my end result and I have included because I wasn't sure if it would effect possible answers.
My question is how do you get StreamWriter to dynamically replace data inside a file and change the lengths of rows?
I have always found it better to to read the input file, filter/process the data and write the output to a temporary file. After finished, delete the original file (or make a backup) and copy the temporary file over. This way you haven't lost half your input file in case something goes wrong in the middle of processing.
You should probably be using a Stream directly (probably a FileStream). This would allow you to change position.
However, you're not going to be able to change record sizes this way, at least, not in-line. You can have one Stream reading from the original file, and another writing to a new, converted copy of the file.
However, I haven't found a way to use StreamWriter to right to a specific position, and
beginning to wonder if is possible.
You can use StreamWriter.BaseStream.Seek method
using (StreamWriter wr = new StreamWriter(File.Create(#"c:\Temp\aaa.txt")))
{
wr.Write("ABC");
wr.Flush();
wr.BaseStream.Seek(0, SeekOrigin.Begin);
wr.Write("Z");
}
I will try to make this as straight forward as possible. This question does not simply involve reading and writing bytes. I am looking for an exact translation between this VB6 code and C# code. I know this is not always a posibility but I'm sure someone out there has some ideas!
VB6 Code & explanation:
The below code writes data into a specific part of the file.
[ Put [#]filenumber, [byte position], varname ].
It is the *byte position * that I am having trouble figuring out - and help with this would be very much appreciated!
Dim file, stringA as string
Open file for Binary As #1
lPos = 10,000
stringA = "ThisIsMyData"
Put #1, lPos, stringA
Close #1
So, I am looking for some help with the byte position, once again. In this example the byte position was represented by lPos.
EDIT FOR HENK -
I will be reading binary data. There are some characters in this binary data that I will need to replace. For this reason, I will be using VB6's instr function to get the poisition of this data (there lengths are previously known). I will then use Vb6's Put function to write this data at the newfound position. This will overwrite the old data with the new data. I hope this helped!
If it helps anyone, here is some further information regarding the Put function.
Thanks so much,
Evan
Can you not use a BinaryWriter?
For example:
FileStream fs = new FileStream(file, FileMode.Open);
BinaryWriter w = new BinaryWriter(fs);
w.Seek(10000, SeekOrigin.Origin);
w.Write(encoding.GetBytes("ThisIsMyData"));
w.Flush();
w.Close();
fs.Close();
You can do this using StreamReader and StreamWriter.
I would try something like this:
Read the first n bits and write them into a new stream using StreamWriter.
Using the same StreamWriter, write the new bits that you want to insert.
Finally, write the rest of the bits from your StreamReader.
This question is not a perfect fit, however it shows a similiar technique using text (not binary data): Insert data into text file
Take a look at the StreamWriter Class specially at this overload of the Write method, which allows you to start writing to a specific place within the stream.
Hey there! I'm trying to read a 150mb file with a file stream but every time I do it all I get is: |zl instead of the whole stream. Note that it has some special characters in it.
Does anybody know what the problem could be? here is my code:
using (FileStream fs = File.OpenRead(path))
{
byte[] buffer = new byte[fs.Length];
fs.Read(buffer, 0, buffer.Length);
extract = Encoding.Default.GetString(buffer);
}
Edit:
I tried to read all text but it still returned the same four characters. It works fine on any other file except for these few. When I use read all lines it only gets the first line.
fs.Read() does not read the whole smash of bytes all at once, it reads some number of bytes and returns the number of bytes read. MSDN has an excellent example of how to use it to get the whole file:
http://msdn.microsoft.com/en-us/library/system.io.filestream.read.aspx
For what it's worth, reading the entire 150MB of data into memory is really going to put a drain on your client's system -- the preferred option would be to optimize it so that you don't need the whole file all at once.
If you want to read text this way File.ReadAllLine (or ReadAllText) - http://msdn.microsoft.com/en-us/library/s2tte0y1.aspx is better option.
My guess the file is not text file to start with and the way you display resulting string does stop at 0 characters.
As debracey pointed out Read returns number of bytes read - check that out. Also for file operations it is unlikely to stop at 4 characters...