I'm writing out a file in C++ as:
void TestExportWriteOut()
{
const auto filePath = R"(C:\TestOut.bin)";
ofstream outFile(filePath, ios::out | ios::binary);
outFile<< 1;
outFile.flush();
outFile.close();
}
and trying to read it in C# as:
[TestMethod]
public void LoadTestOut()
{
const string filePath = "C:\\TestOut.bin";
using (var fileStream = File.OpenRead(filePath))
using (var reader = new BinaryReader(fileStream))
{
var intOne = reader.ReadInt32(); //!!Throws Exception
Assert.AreEqual(intOne, 1);
}
}
However, right at the line where I try to read the integer value 1, I get an exception saying:
System.IO.EndOfStreamException: 'Unable to read beyond the end of the stream.'
What's the correct way of reading files in C# that are created using ofstream in C++?
I can see the file created by C++ on the file system, and while all I wrote to it is an int, its size is 1KB.
The ios::binary flags does not do what you think(*). It doesn't influence the behavior of operator<<(int) which still put a text representation of its argument. On the C++ side, use outFile.write() to output the binary representation (in conjunction with the ios::binary flag).
(*) It does open the file as a binary file for OS which have a notion of binary file different of text file (neither Windows nor Unix are in that class) and it prevent the transformation of \n in the normal way of representing end of line for the OS (Windows use a pair of characters for that, that's why you need the ios::binary flag)
Related
I am trying to convert byte[] to base64 string format so that i can send that information to third party. My code as below:
byte[] ByteArray = System.IO.File.ReadAllBytes(path);
string base64Encoded = System.Convert.ToBase64String(ByteArray);
I am getting below error:
Exception of type 'System.OutOfMemoryException' was thrown. Can you
help me please ?
Update
I just spotted #PanagiotisKanavos' comment pointing to Is there a Base64Stream for .NET?. This does essentially the same thing as my code below attempts to achieve (i.e. allows you to process the file without having to hold the whole thing in memory in one go), but without the overhead/risk of self-rolled code / rather using a standard .Net library method for the job.
Original
The below code will create a new temporary file containing the Base64 encoded version of your input file.
This should have a lower memory footprint, since rather than doing all data at once, we handle it several bytes at a time.
To avoid holding the output in memory, I've pushed that back to a temp file, which is returned. When you later need to use that data for some other process, you'd need to stream it (i.e. so that again you're not consuming all of this data at once).
You'll also notice that I've used WriteLine instead of Write; which will introduce non base64 encoded characters (i.e. the line breaks). That's deliberate, so that if you consume the temp file with a text reader you can easily process it line by line.
However, you can amend per your needs.
void Main()
{
var inputFilePath = #"c:\temp\bigfile.zip";
var convertedDataPath = ConvertToBase64TempFile(inputFilePath);
Console.WriteLine($"Take a look in {convertedDataPath} for your converted data");
}
//inputFilePath = where your source file can be found. This is not impacted by the below code
//bufferSizeInBytesDiv3 = how many bytes to read at a time (divided by 3); the larger this value the more memory is required, but the better you'll find performance. The Div3 part is because we later multiple this by 3 / this ensures we never have to deal with remainders (i.e. since 3 bytes = 4 base64 chars)
public string ConvertToBase64TempFile(string inputFilePath, int bufferSizeInBytesDiv3 = 1024)
{
var tempFilePath = System.IO.Path.GetTempFileName();
using (var fileStream = File.Open(inputFilePath,FileMode.Open))
{
using (var reader = new BinaryReader(fileStream))
{
using (var writer = new StreamWriter(tempFilePath))
{
byte[] data;
while ((data = reader.ReadBytes(bufferSizeInBytesDiv3 * 3)).Length > 0)
{
writer.WriteLine(System.Convert.ToBase64String(data)); //NB: using WriteLine rather than Write; so when consuming this content consider removing line breaks (I've used this instead of write so you can easily stream the data in chunks later)
}
}
}
}
return tempFilePath;
}
I am working on Document management project and I want to extract text from pdf. How can I achieve this. I am using Itextsharp to extract pdf on local system
This is a function I am using for this purpose. Path is a FTP Server Path
public static string ExtractTextFromPdf(string path)
{
using (PdfReader reader = new PdfReader(path))
{
StringBuilder text = new StringBuilder();
for (int i = 1; i <= reader.NumberOfPages; i++)
{
text.Append(PdfTextExtractor.GetTextFromPage(reader, i));
}
return text.ToString();
}
}
It throws an exception
'ftp:\\###\index\500199.pdf not found as file or resource.'
[### is my ftp server]
PdfReader has a bunch of constructor overloads but most of them rely on RandomAccessSourceFactory to convert whatever is passed in into a Stream format. When you pass a string in it is checked if it is a file on disk and if not it is checked if it can be converted to a Uri as one of file:/, http:// or https:// link. This is your first point of failure because none of these checks handle the ftp protocol and you ultimately end up at a local resource loader which doesn't work for you.
You could try converting your string to an explicit Uri but that actually won't work, either:
//This won't work
new PdfReader(new Uri(path))
The reason that this won't work is because iText tells .Net to use CredentialCache.DefaultCredentials when loading remote resources however that concept doesn't exist in the FTP world.
Long story short, when using FTP you'll want to download the files on your own. Depending on their size you'll want to either download them to disk or download them a byte array. Below is a sample of the latter:
Byte[] bytes;
if( path.StartsWith(#"ftp://")) {
var wc = WebRequest.Create(path);
using (var response = wc.GetResponse()) {
using (var responseStream = response.GetResponseStream()) {
bytes = iTextSharp.text.io.StreamUtil.InputStreamToArray(responseStream);
}
}
}
You can then pass either the local file or the byte array to the PdfReader constructor.
I am creating a stream in C# and trying to read it in java, but I receive the error: "Protocol message tag had invalid wire type." when i read it in my java code the object created in c#.
Details:
I started from an equal .proto file (see below) to create the correspondent .java file and .cs file (compiling using the protoc for java in version "protobuf-2.4.1" and the protobuf-csharp-port-2.4.1.473-full-binaries for c#).
I succeed to create the addressbook.java and the addressbook.cs.
The object is created in c# and written to a file using the following c# code:
[...]
byte[] bytes;
//Create a builder to start building a message
Person.Builder newContact = Person.CreateBuilder();
//Set the primitive properties
newContact.SetId(1)
.SetName("Foo")
.SetEmail("foo#bar");
//Now add an item to a list (repeating) field
newContact.AddPhone(
//Create the child message inline
Person.Types.PhoneNumber.CreateBuilder().SetNumber("555-1212").Build()
);
//Now build the final message:
Person person = newContact.Build();
newContact = null;
using(MemoryStream stream = new MemoryStream())
{
//Save the person to a stream
person.WriteTo(stream);
bytes = stream.ToArray();
//save this to a file (by me)
ByteArrayToFile("personStreamFromC#", bytes);
[...]
I copy the created file "personStreamFromC#" to my java solution and try to read it using the following java code:
AddressBook.Builder addressBook = AddressBook.newBuilder();
// Read the existing address book.
try {
FileInputStream input = new FileInputStream(args[0]);
byte[] data = IOUtils.toByteArray(input);
addressBook.mergeFrom(data);
// Read the existing address book.
AddressBook addressBookToReadFrom =
AddressBook.parseFrom(new FileInputStream(args[0]));
Print(addressBookToReadFrom);
}
But I get the following message:
Exception in thread "main" com.google.protobuf.InvalidProtocolBufferException: Protocol message
tag had invalid wire type. at
com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:78)
at
com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:498)
at
com.google.protobuf.GeneratedMessage$Builder.parseUnknownField(GeneratedMessage.java:438)
at
com.example.tutorial.AddressBookProtos$Person$Builder.mergeFrom(AddressBookProtos.java:1034)
at
com.example.tutorial.AddressBookProtos$Person$Builder.mergeFrom(AddressBookProtos.java:1)
at
com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:275)
at
com.example.tutorial.AddressBookProtos$AddressBook$Builder.mergeFrom(AddressBookProtos.java:1715)
at
com.example.tutorial.AddressBookProtos$AddressBook$Builder.mergeFrom(AddressBookProtos.java:1)
at
com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:300)
at
com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:238)
at
com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:162)
at
com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:716)
at
com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:238)
at
com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:153)
at
com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:709)
at AddPerson.main(test.java:104)
Below the .proto file:
package tutorial;
message Person {
required string name = 1;
required int32 id = 2; // Unique ID number for this person.
optional string email = 3;
enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}
message PhoneNumber {
required string number = 1;
optional PhoneType type = 2 [default = HOME];
}
repeated PhoneNumber phone = 4;
}
message AddressBook {
repeated Person person = 1;
}
Any ideas ??
You write Person object to file in C#, but then read AddressBook in Java, I don't think this is correct. Try following in your Java code:
Person.parseFrom(new FileInputStream(args[0]));
One common mistake that can cause invalid wire-type errors (especially when using files) is: over-writing an existing file without truncating it. We can't see your ByteArrayToFile, but frankly File.WriteAllBytes may be an easier option. The problem is that if the new data is smaller than the original contents, any remaining extra bytes are essentially garbage.
My advice:
check if you can deserialize it in c#; if you can't, the error is certainly in the file handling
if it works in c#, check how you are getting the file to the java code: are you copying it around anywhere?
and check you are using binary (not text) processing at all stages
I'm working on a project that uses both .net and java, using zeromq to communicate between them.
I can connect to the .net server, however when I try to convert the byte array to a string strange things happen. In the eclipse debugger I can see the string, and its length. When I click on the string its value changes to being only the first letter, and the length changes to 1. In the eclipse console when I try to copy and paste the output I only get the first letter. I also tried running it in NetBeans and get the same issue.
I thought it might be due to Endianness, so have tired both
BIG_ENDIAN
LITTLE_ENDIAN
Anyone know how I an get the full string, and not just the first letter?
import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import org.zeromq.ZMQ;
class local_thr
{
private static final String ENDPOINT = "tcp://127.0.0.1:8000";
static String[] myargs={ENDPOINT, "1000", "100"};
public static void main (String [] args) {
args = myargs;
ZMQ.Context ctx = ZMQ.context (1);
ZMQ.Socket s = ctx.socket (ZMQ.SUB);
s.subscribe("".getBytes());
s.connect (ENDPOINT);
while(true){
byte [] data = s.recv (0);
ByteBuffer buf = ByteBuffer.wrap(data);
buf.order(ByteOrder.nativeOrder());
byte[] bytes = new byte[buf.remaining()];
buf.get(bytes, 0, bytes.length);
String quote;
quote = new String(bytes);
String myQuote;
myQuote = new String();
System.out.println (quote);
}
}
}
1 char suggests that the data is being encoded as little-endian UTF-16 and decoded as nul-terminated (could be expecting single-byte, could be expecting UTF-8).
Make sure you are familiar with encodings, and ensure that both ends of the pipe are using the same encoding.
The java string(byte[]) constructor uses the default system charset; I would start by investigating how to read UTF-16 from java. Or maybe use UTF-8 from both ends. Using a default charset is never robust.
I've written several ints, char[]s and the such to a data file with BinaryWriter in C#. Reading the file back in (in C#) with BinaryReader, I can recreate all of the pieces of the file perfectly.
However, attempting to read them back in with C++ yields some scary results. I was using fstream to attempt to read back the data and the data was not reading in correctly. In C++, I set up an fstream with ios::in|ios::binary|ios::ate and used seekg to target my location. I then read the next four bytes, which were written as the integer "16" (and reads correctly into C#). This reads as 1244780 in C++ (not the memory address, I checked). Why would this be? Is there an equivalent to BinaryReader in C++? I noticed it mentioned on msdn, but that's Visual C++ and intellisense doesn't even look like c++, to me.
Example code for writing the file (C#):
public static void OpenFile(string filename)
{
fs = new FileStream(filename, FileMode.Create);
w = new BinaryWriter(fs);
}
public static void WriteHeader()
{
w.Write('A');
w.Write('B');
}
public static byte[] RawSerialize(object structure)
{
Int32 size = Marshal.SizeOf(structure);
IntPtr buffer = Marshal.AllocHGlobal(size);
Marshal.StructureToPtr(structure, buffer, true);
byte[] data = new byte[size];
Marshal.Copy(buffer, data, 0, size);
Marshal.FreeHGlobal(buffer);
return data;
}
public static void WriteToFile(Structures.SomeData data)
{
byte[] buffer = Serializer.RawSerialize(data);
w.Write(buffer);
}
I'm not sure how I could show you the data file.
Example of reading the data back (C#):
BinaryReader reader = new BinaryReader(new FileStream("C://chris.dat", FileMode.Open));
char[] a = new char[2];
a = reader.ReadChars(2);
Int32 numberoffiles;
numberoffiles = reader.ReadInt32();
Console.Write("Reading: ");
Console.WriteLine(a);
Console.Write("NumberOfFiles: ");
Console.WriteLine(numberoffiles);
This I want to perform in c++. Initial attempt (fails at first integer):
fstream fin("C://datafile.dat", ios::in|ios::binary|ios::ate);
char *memblock = 0;
int size;
size = 0;
if (fin.is_open())
{
size = static_cast<int>(fin.tellg());
memblock = new char[static_cast<int>(size+1)];
memset(memblock, 0, static_cast<int>(size + 1));
fin.seekg(0, ios::beg);
fin.read(memblock, size);
fin.close();
if(!strncmp("AB", memblock, 2)){
printf("test. This works.");
}
fin.seekg(2); //read the stream starting from after the second byte.
int i;
fin >> i;
Edit: It seems that no matter what location I use "seekg" to, I receive the exact same value.
You realize that a char is 16 bits in C# rather than the 8 it usually is in C. This is because a char in C# is designed to handle Unicode text rather than raw data. Therefore, writing chars using the BinaryWriter will result in Unicode being written rather than raw bytes.
This may have lead you to calculate the offset of the integer incorrectly. I recommend you take a look at the file in a hex editor, and if you cannot work out the issue post the file and the code here.
EDIT1
Regarding your C++ code, do not use the >> operator to read from a binary stream. Use read() with the address of the int that you want to read to.
int i;
fin.read((char*)&i, sizeof(int));
EDIT2
Reading from a closed stream is also going to result in undefined behavior. You cannot call fin.close() and then still expect to be able to read from it.
This may or may not be related to the problem, but...
When you create the BinaryWriter, it defaults to writing chars in UTF-8. This means that some of them may be longer than one byte, throwing off your seeks.
You can avoid this by using the 2 argument constructor to specify the encoding. An instance of System.Text.ASCIIEncoding would be the same as what C/C++ use by default.
There are many thing going wrong in your C++ snippet. You shouldn't mix binary reading with formatted reading:
// The file is closed after this line. It is WRONG to read from a closed file.
fin.close();
if(!strncmp("AB", memblock, 2)){
printf("test. This works.");
}
fin.seekg(2); // You are moving the "get pointer" of a closed file
int i;
// Even if the file is opened, you should not mix formatted reading
// with binary reading. ">>" is just an operator for reading formatted data.
// In other words, it is for reading "text" and converting it to a
// variable of a specific data type.
fin >> i;
If it's any help, I went through how the BinaryWriter writes data here.
It's been a while but I'll quote it and hope it's accurate:
Int16 is written as 2 bytes and padded.
Int32 is written as Little Endian and zero padded
Floats are more complicated: it takes the float value and dereferences it, getting the memory address's contents which is a hexadecimal