C# 4.0: Convert pdf to byte[] and vice versa - c#

How do I convert a pdf file to a byte[] and vice versa?

// loading bytes from a file is very easy in C#. The built in System.IO.File.ReadAll* methods take care of making sure every byte is read properly.
// note that for Linux, you will not need the c: part
// just swap out the example folder here with your actual full file path
string pdfFilePath = "c:/pdfdocuments/myfile.pdf";
byte[] bytes = System.IO.File.ReadAllBytes(pdfFilePath);
// munge bytes with whatever pdf software you want, i.e. http://sourceforge.net/projects/itextsharp/
// bytes = MungePdfBytes(bytes); // MungePdfBytes is your custom method to change the PDF data
// ...
// make sure to cleanup after yourself
// and save back - System.IO.File.WriteAll* makes sure all bytes are written properly - this will overwrite the file, if you don't want that, change the path here to something else
System.IO.File.WriteAllBytes(pdfFilePath, bytes);

using (FileStream fs = new FileStream("sample.pdf", FileMode.Open, FileAccess.Read))
{
byte[] bytes = new byte[fs.Length];
int numBytesToRead = (int)fs.Length;
int numBytesRead = 0;
while (numBytesToRead > 0)
{
// Read may return anything from 0 to numBytesToRead.
int n = fs.Read(bytes, numBytesRead, numBytesToRead);
// Break when the end of the file is reached.
if (n == 0)
{
break;
}
numBytesRead += n;
numBytesToRead -= n;
}
numBytesToRead = bytes.Length;
}

Easiest way:
byte[] buffer;
using (Stream stream = new IO.FileStream("file.pdf"))
{
buffer = new byte[stream.Length - 1];
stream.Read(buffer, 0, buffer.Length);
}
using (Stream stream = new IO.FileStream("newFile.pdf"))
{
stream.Write(buffer, 0, buffer.Length);
}
Or something along these lines...

Related

Split an avro file and upload to REST

I have created some avro files. I can use the following commands to convert them to json, just to check whether the files are ok
java -jar avro-tools-1.8.2.jar tojson FileName.avro>outputfilename.json
Now, I have some big avro files and the REST API I m trying to upload to, has size limitations and thus I am trying to upload it in chunks using streams.
The following sample, which just reads from the original file in chunks and copies to another avro file, creates the file perfectly
using System;
using System.IO;
class Test
{
public static void Main()
{
// Specify a file to read from and to create.
string pathSource = #"D:\BDS\AVRO\filename.avro";
string pathNew = #"D:\BDS\AVRO\test\filenamenew.avro";
try
{
using (FileStream fsSource = new FileStream(pathSource,
FileMode.Open, FileAccess.Read))
{
byte[] buffer = new byte[(20 * 1024 * 1024) + 100];
long numBytesToRead = (int)fsSource.Length;
int numBytesRead = 0;
using (FileStream fsNew = new FileStream(pathNew,
FileMode.Append, FileAccess.Write))
{
// Read the source file into a byte array.
//byte[] bytes = new byte[fsSource.Length];
//int numBytesToRead = (int)fsSource.Length;
//int numBytesRead = 0;
while (numBytesToRead > 0)
{
int bytesRead = fsSource.Read(buffer, 0, buffer.Length);
byte[] actualbytes = new byte[bytesRead];
Array.Copy(buffer, actualbytes, bytesRead);
// Read may return anything from 0 to numBytesToRead.
// Break when the end of the file is reached.
if (bytesRead == 0)
break;
numBytesRead += bytesRead;
numBytesToRead -= bytesRead;
fsNew.Write(actualbytes, 0, actualbytes.Length);
}
}
}
// Write the byte array to the other FileStream.
}
catch (FileNotFoundException ioEx)
{
Console.WriteLine(ioEx.Message);
}
}
}
How do I know this creates a ok avro. Because the earlier command to convert to json, again works i.e.
java -jar avro-tools-1.8.2.jar tojson filenamenew.avro>outputfilename.json
However, when I use the same code, but instead of copying to another file, just call a rest api, the file gets uploaded but upon downloading the same file from the server and running the command above to convert to json says - "Not a Data file".
So, obviously something is getting corrupted and I am struggling to figure out what.
This is the snippet
string filenamefullyqualified = path + filename;
Stream stream = System.IO.File.Open(filenamefullyqualified, FileMode.Open, FileAccess.Read, FileShare.None);
long? position = 0;
byte[] buffer = new byte[(20 * 1024 * 1024) + 100];
long numBytesToRead = stream.Length;
int numBytesRead = 0;
do
{
var content = new MultipartFormDataContent();
int bytesRead = stream.Read(buffer, 0, buffer.Length);
byte[] actualbytes = new byte[bytesRead];
Array.Copy(buffer, actualbytes, bytesRead);
if (bytesRead == 0)
break;
//Append Data
url = String.Format("https://{0}.dfs.core.windows.net/raw/datawarehouse/{1}/{2}/{3}/{4}/{5}?action=append&position={6}", datalakeName, filename.Substring(0, filename.IndexOf("_")), year, month, day, filename, position.ToString());
numBytesRead += bytesRead;
numBytesToRead -= bytesRead;
ByteArrayContent byteContent = new ByteArrayContent(actualbytes);
content.Add(byteContent);
method = new HttpMethod("PATCH");
request = new HttpRequestMessage(method, url)
{
Content = content
};
request.Headers.Add("Authorization", "Bearer " + accesstoken);
var response = await client.SendAsync(request);
response.EnsureSuccessStatusCode();
position = position + request.Content.Headers.ContentLength;
Array.Clear(buffer, 0, buffer.Length);
} while (numBytesToRead > 0);
stream.Close();
I have looked through the forum threads but haven't come across anything which deals with splitting of avro files.
I have a hunch that my "content" for the http request isn't right. what is it that I am missing?
If you need more details, I will be happy to provide.
I have found the problem now. The problem was because of MultipartFormDataContent. When an avro file is uploaded with that, it adds extra text like content Type etc, along with removal of many lines (I do not know why).
So, the solution was to upload the contents as "ByteArrayContent" itself and not add it to MultipartFormDataContent like I was doing earlier.
Here is the snippet, almost similar to the one in the question, except that I no longer use MultipartFormDataContent
string filenamefullyqualified = path + filename;
Stream stream = System.IO.File.Open(filenamefullyqualified, FileMode.Open, FileAccess.Read, FileShare.None);
//content.Add(CreateFileContent(fs, path, filename, "text/plain"));
long? position = 0;
byte[] buffer = new byte[(20 * 1024 * 1024) + 100];
long numBytesToRead = stream.Length;
int numBytesRead = 0;
//while ((bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0)
//{
do
{
//var content = new MultipartFormDataContent();
int bytesRead = stream.Read(buffer, 0, buffer.Length);
byte[] actualbytes = new byte[bytesRead];
Array.Copy(buffer, actualbytes, bytesRead);
if (bytesRead == 0)
break;
//Append Data
url = String.Format("https://{0}.dfs.core.windows.net/raw/datawarehouse/{1}/{2}/{3}/{4}/{5}?action=append&position={6}", datalakeName, filename.Substring(0, filename.IndexOf("_")), year, month, day, filename, position.ToString());
numBytesRead += bytesRead;
numBytesToRead -= bytesRead;
ByteArrayContent byteContent = new ByteArrayContent(actualbytes);
//byteContent.Headers.ContentType= new MediaTypeHeaderValue("text/plain");
//content.Add(byteContent);
method = new HttpMethod("PATCH");
//request = new HttpRequestMessage(method, url)
//{
// Content = content
//};
request = new HttpRequestMessage(method, url)
{
Content = byteContent
};
request.Headers.Add("Authorization", "Bearer " + accesstoken);
var response = await client.SendAsync(request);
response.EnsureSuccessStatusCode();
position = position + request.Content.Headers.ContentLength;
Array.Clear(buffer, 0, buffer.Length);
} while (numBytesToRead > 0);
stream.Close();
But the streaming by record will not be able to handle the AVRO file as a whole in a transaction. We may end up in partial success, if some records fail, for example.
If we have a small tool that can split AVRO files based on a threshold number of records, it will be great.
The spark-based split by partition technique does allow to split data set to a pre-defined number of files; but, it does not allow splitting based on the number of records. I.e., I do not want an AVRO file with more than 500 records.
So we have to devise a batching logic based on the comfortable heap size the application can handle along with a two-phase commit, to handle transactions

Copy a file from address byte to address byte

I was looking around for a way to copy a portion of a file from a determined address to another address, is there a way to do that in C#?
For example let's say i have a file like this:
and I want to copy from 0xA0 to 0xB0 and then paste it to another file.
Something like this perhaps:
long start = 0xA0;
int length = 0xB0 - start;
byte[] data = new byte[length];
using (FileStream fs = File.OpenRead(#"C:\Temp\InputFile.txt"))
{
fs.Seek(start, SeekOrigin.Begin);
fs.Read(data, 0, length);
}
File.WriteAllBytes(#"C:\Temp\OutputFile.txt", data);
It should be something like:
// input data
string inputName = "input.bin";
long startInput = 0xa0;
long endInput = 0xb0; // excluding 0xb0 that is not copied
string outputName = "output.bin";
long startOutput = 0xa0;
// begin of code
long count = endInput - startInput;
using (var fs = File.OpenRead(inputName))
using (var fs2 = File.OpenWrite(outputName))
{
fs.Seek(startInput, SeekOrigin.Begin);
fs2.Seek(startOutput, SeekOrigin.Begin);
byte[] buf = new byte[4096];
while (count > 0)
{
int read = fs.Read(buf, 0, (int)Math.Min(buf.Length, count));
if (read == 0)
{
// end of file encountered
throw new IOException("end of file encountered");
}
fs2.Write(buf, 0, read);
count -= read;
}
}

How to change csv delimiter from "," to " : " in C#

I am trying to generate new CSV file from reading a existing CSV file in C# console application.
using (FileStream stream = File.OpenRead("C:\\Files\\test_input_file.csv"))
using (FileStream writeStream = File.OpenWrite("C:\\Files\\test_Output_file.csv"))
{
BinaryReader reader = new BinaryReader(stream);
BinaryWriter writer = new BinaryWriter(writeStream);
// create a buffer to hold the bytes
byte[] buffer = new Byte[1024];
int bytesRead;
// while the read method returns bytes
// keep writing them to the output stream
while ((bytesRead = stream.Read(buffer, 0, 1024)) > 0)
{
writeStream.Write(buffer, 0, bytesRead);
}
}
Now I want to change the delimiter to ":" instead of "," in the output file
How do I do it? Please help me.
Because you are trying to modify the text characters then BinaryReader is not a suitable class for your case. due to encoding problem you need to use StreamReader instead.
using (FileStream stream = File.OpenRead("C:\\Files\\test_input_file.csv"))
using (FileStream writeStream = File.OpenWrite("C:\\Files\\test_Output_file.csv"))
{
StreamReader reader = new StreamReader(stream);
StreamWriter writer = new StreamWriter(writeStream, reader.CurrentEncoding);
// create a buffer to hold the chars
char[] buffer = new char[1024];
int charsRead;
// while the read method returns chars
// keep writing them to the output stream
while ((charsRead =
reader.Read(buffer, 0, buffer.Length)) > 0)
{
for (int i = 0; i < charsRead; i++)
{
if (buffer[i] == ':') buffer[i] = ',';
}
writer.Write(buffer, 0, charsRead);
}
}
What is the encoding problem ? a character can be 1, 2 or 3 bytes or even 7 bits etc... depending on encoding. the stream reader will handle this for you.
Assuming that:
your CSV file is encoded in ASCII or UTF-8
your CSV values will not contain any embedded commas
...you can simply use:
for (int i = 0; i < bytesRead; i++)
if (buffer[i] == ',')
buffer[i] = ':';
writeStream.Write(buffer, 0, bytesRead);

FileStream read from A after 128 bytes and write to B C#

I have a large file (around 400GB) which I need to FileStream and skip the first 128 bytes into another file. I have the following code, but it is not working properly, because when I check the file sizes after the stream has finished, File B is missing a lot more than 128 bytes. What am I doing wrong?
private void SplitUnwantedHeader(string file1, string file2)
{
FileStream fr = new FileStream(file1, FileMode.Open, FileAccess.Read);
FileStream fw = new FileStream(file2, FileMode.Create, FileAccess.Write);
byte[] fByte = new byte[65534];
long headerToSplit = 128;
int bytesRead = 0;
try
{
fr.Position = headerToSplit;
do
{
bytesRead = fr.Read(fByte, 0, fByte.Length);
fw.Write(fByte, 0, fByte.Length - (int)headerToSplit);
} while (bytesRead != 0);
}
catch (Exception ex)
{
UpdateStatusBarMessage.ShowStatusMessage(ex.Message);
}
finally
{
fw.Close();
fr.Close();
}
}
Thanks.
The line
fw.Write(fByte, 0, fByte.Length - (int)headerToSplit);
is wrong when used in a loop like this. It will write the "buffer size" minus 128 bytes every loop cycle. Instead, the code should write bytesRead count during the copy.
fw.Write(fByte, 0, bytesRead);
Only perform the offset before entering the copy-everything-else loop. Also, the loop can be replaced with FileStream.CopyTo (since .NET 4) and using can tidy up resource management.
That is, consider:
using (var fr = new FileStream(file1, FileMode.Open, FileAccess.Read))
using (var fw = new FileStream(file2, FileMode.Create, FileAccess.Write)) {
fr.Position = 128; // or fr.Seek(128, SeekOrigin.Begin);
fr.CopyTo(fw, 65534);
}
There are two things wrong with the code:
Instead of skipping the first 128 bytes of the first block, it is skipping the last 128 bytes of every block.
It is ignoring the bytesRead value when writing, so it may be writing data from the buffer that was never read into the buffer. The number of bytes read can be less than the number of bytes requested, even when you are not at the end of the file.
The code is a mix between skipping the header before the loop and skipping the header inside the loop. You should do one, not both.
You can check how much data you have in the buffer compared to how much you should skip, and update the number of bytes to skip so that it's zero once you are beyond the header:
do {
bytesRead = fr.Read(fByte, 0, fByte.Length);
if (bytesRead > headerToSplit) {
fw.Write(fByte, (int)headerToSplit, bytesRead - (int)headerToSplit);
headerToSplit = 0;
} else {
headerToSplit -= bytesRead;
}
} while (bytesRead != 0);
Or if you are skipping the header before the loop, just write all the data that you have in the buffer:
fr.Position = headerToSplit;
do {
bytesRead = fr.Read(fByte, 0, fByte.Length);
fw.Write(fByte, 0, bytesRead);
} while (bytesRead != 0);

Reading(/Writing) Files in C#

I recently wanted to track progress of a HTTPWebRequest Upload progress. So I started small and started with buffered read of a simple text file. I then discovered that a simple task like
File.ReadAllText("text.txt");
becomes something like below, with all the streams, readers, writers etc. Or can somethings be removed? Also the code below is not working. Maybe I did something wrong, whats the way to read (i guess write will be similar) into buffer so that I can track progress, assuming the stream are not local eg. WebRequest
byte[] buffer = new byte[2560]; // 20KB Buffer, btw, how should I decide the buffer size?
int bytesRead = 0, read = 0;
FileStream inStream = new FileStream("./text.txt", FileMode.Open, FileAccess.Read);
MemoryStream outStream = new MemoryStream();
BinaryWriter outWriter = new BinaryWriter(outStream);
// I am getting "Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection."
// inStream.Length = Length = 9335092
// bytesRead = 2560
// buffer.Length = 2560
while ((read = inStream.Read(buffer, bytesRead, buffer.Length)) > 0)
{
outWriter.Write(buffer);
//outStream.Write(buffer, bytesRead, buffer.Length);
bytesRead += read;
Debug.WriteLine("Progress: " + bytesRead / inStream.Length * 100 + "%");
}
outWriter.Flush();
txtLog.Text = outStream.ToString();
Update: Solution
byte[] buffer = new byte[2560];
int bytesRead = 0, read = 0;
FileStream inStream = File.OpenRead("text.txt");
MemoryStream outStream = new MemoryStream();
while ((read = inStream.Read(buffer, 0, buffer.Length)) > 0)
{
outStream.Write(buffer, 0, buffer.Length);
bytesRead += read;
Debug.WriteLine((double)bytesRead / inStream.Length * 100);
}
inStream.Close();
outStream.Close();
should probably be
outWriter.Write(buffer,0,read);
Since you seem to be reading text (although I could be wrong), it seems that your program could be a lot simpler if you read character by character instead of calling the standard Read():
BinaryReader reader = new BinaryReader(File.Open("./text.txt", FileMode.Open));
MemoryStream outStream = new MemoryStream();
StreamWriter outWriter = new StreamWriter(outStream);
while (Reader.BaseStream.Position < Reader.BaseStream.Length)
{
outWriter.Write(reader.ReadChar());
Debug.WriteLine("Progress: " + ((double)reader.BaseStream.Position) / (double)(reader.BaseStream.Length) + "%");
}
outWriter.Close();
txtLog.Text = outStream.ToString();
Since you only need to check the progress of the upload operation you can just check the size of the file using a fileinfo object.
In the FileInfo class theres a property called length that returns the file size in bytes. Not sure if it gives the current size when the file being written. But I think it'll be worth giving a try as it is more simple and efficient than the method that you are using

Categories

Resources