I'm currently developing a torrent metainfo management library for Ruby.
I'm having trouble reading the pieces from the files. I just don't understand how I'm supposed to go about it. I know I'm supposed to SHA1 digest piece length bytes of a file once (or read piece length bytes multiple times, or what?)
I'm counting on your help.
Pseudo / Python / Ruby / PHP code preferred.
Thanks in advance.
C#
// Open the file
using (var file = File.Open(...))
{
// Move to the relevant place in the file where the piece begins
file.Seek(piece * pieceLength, SeekOrigin.Begin);
// Attempt to read up to pieceLength bytes from the file into a buffer
byte[] buffer = new byte[pieceLength];
int totalRead = 0;
while (totalRead < pieceLength)
{
var read = stream.Read(buffer, totalRead, pieceLength-totalRead);
if (read == 0)
{
// the piece is smaller than the pieceLength,
// because it’s the last in the file
Array.Resize(ref buffer, totalRead);
break;
}
totalRead += read;
}
// If you want the raw data for the piece:
return buffer;
// If you want the SHA1 hashsum:
return SHA1.Create().ComputeHash(buffer);
}
Please taker a look at this distribution here:
http://prdownload.berlios.de/torrentparse/TorrentParse.GTK.0.21.zip
Written in PHP, it contains an Encoder and Decoder and the in's and out I believe!
Related
We writing an application to move content from an OneDrive account into Azure Storage. We've managed to get this working but ran into memory issues working with big files (> 1GB) and Block Blobs. We've decided that Append Blobs are the best way going forward as that will solve the memory issues.
We're using a RPC call to SharePoint to get the file stream for big files, more info can be found here:
http://sharepointfieldnotes.blogspot.co.za/2009/09/downloading-content-from-sharepoint-let.html
Using the following code is working fine when writing the file from OneDrive to local storage
using (var strOut = System.IO.File.Create("path"))
using (var sr = wReq.GetResponse().GetResponseStream())
{
byte[] buffer = new byte[16 * 1024];
int read;
bool isHtmlRemoved = false;
while ((read = sr.Read(buffer, 0, buffer.Length)) > 0)
{
if (!isHtmlRemoved)
{
string result = Encoding.UTF8.GetString(buffer);
int startPos = result.IndexOf("</html>");
if (startPos > -1)
{
//get the length of the text, '</html>' as well
startPos += 8;
strOut.Write(buffer, startPos, read - startPos);
isHtmlRemoved = true;
}
}
else
{
strOut.Write(buffer, 0, read);
}
}
}
This creates the file with the correct size, but when we try to write it to an append blob in Azure Storage, we are not getting the complete file and in other cases getting bigger files.
using (var sr = wReq.GetResponse().GetResponseStream())
{
byte[] buffer = new byte[16 * 1024];
int read;
bool isHtmlRemoved = false;
while ((read = sr.Read(buffer, 0, buffer.Length)) > 0)
{
if (!isHtmlRemoved)
{
string result = Encoding.UTF8.GetString(buffer);
int startPos = result.IndexOf("</html>");
if (startPos > -1)
{
//get the length of the text, '</html>' as well
startPos += 8;
//strOut.Write(buffer, startPos, read - startPos);
appendBlob.UploadFromByteArray(buffer, startPos, read - startPos);
isHtmlRemoved = true;
}
}
else
{
//strOut.Write(buffer, 0, read);
appendBlob.AppendFromByteArray(buffer, 0, read);
}
}
}
Is this the correct way of doing it? Why would we be getting different file sizes?
Any suggestions will be appreciated
Thanks
In response to "Why would we be getting different file sizes?":
From the CloudAppendBlob.appendFromByteArray documentation
"This API should be used strictly in a single writer scenario
because the API internally uses the append-offset conditional header
to avoid duplicate blocks which does not work in a multiple writer
scenario." If you are indeed using a single writer, you need to
explicitly set the value of
BlobRequestOptions.AbsorbConditionalErrorsOnRetry to true.
You can also check if you are exceeding the 50,000 committed block
limit. Your block sizes are relatively small, so this is a
possibility with sufficiently large files (> 16KB * 50,000 = .82
GB).
In response to "Is this the correct way of doing it?":
If you feel you need to use Append Blobs, try using the CloudAppendBlob.OpenWrite method to achieve functionality more similar to your code example for local storage.
Block Blobs seem like they might be a more appropriate fit for your scenario. Can you please post the code you were using to upload Block Blobs? You should be able to upload to Block Blobs without running out of memory. You can upload different blocks in parallel to achieve faster throughput. Using Append Blobs to append (relatively) small blocks will result in degradation of sequential read performance, as currently append blocks are not defragmented.
Please let me know if any of these solutions work for you!
I am in needed to use the text data in to a program when someone prints a file.
I have basic ideas about the TCP/IP client and listener programming.
Already I could send and receive txt files between two machines.
But how to receive the file contents if the files were in docx,xlx,pdf or any other format?
My requirement is,
I wanted to use the contents (texts) of a file in to another program when someone prints a file.
Please suggest me if there is some alternative ways to do it.
Thank in advance.
Since you haven't posted any code I'll write the code part in "my way" but you should have a bit of understanding after reading this.
First on both of the ends ( client and server ) you should apply unified protocol which will describe what data you're sending. Example could be:
[3Bytes - ASCII extension][4Bytes - lengthOfTheFile][XBytes - fileContents]
Then in your sender you can receive data according to the protocol which means first your read 3 bytes to decide what format file has, then you read 4 bytes which will basically inform you how large file is incomming. Lastly you have to read the content and write it directly to the file. Example sender could look like this :
byte[] extensionBuffer = new byte[3];
if( 3 != networkStream.Read(extensionBuffer, 0, 3))
return;
string extension = Encoding.ASCII.GetString(extensionBuffer);
byte[] lengthBuffer = new byte[sizeof(int)];
if(sizeof(int) != networkStream.Read(lengthBuffer, 0, 3))
return;
int length = BitConverter.ToInt32(lengthBuffer, 0);
int recv = 0;
using (FileStream stream = File.Create(nameOfTheFile + "." + extension))
{
byte #byte = 0x00;
while( (#byte = (byte)networkStream.ReadByte() ) != 0x00)
{
stream.WriteByte(#byte);
recv++;
}
stream.Flush();
}
On the sender part you can read the file extension then open up the file stream get the length of the stream then send the stream length to the client and "redirect" each byte from FileStream into a networkStream. This can look something like :
FileInfo meFile = //.. get the file
byte[] extBytes = Encoding.ASCII.GetBytes(meFile.Extension);
using(FileStream stream = meFile.OpenRead())
{
networkStream.Write(extBytes, 0, extBytes.Length);
networkStream.Write(BitConverter.GetBytes(stream.BaseStream.Length));
byte #byte = 0x00;
while ( stream.Position < stream.BaseStream.Length )
{
networkStream.WriteByte((byte)stream.ReadByte());
}
}
This approach is fairly easy to implement and doesn't require big changes if you want to send different file types. It lack some validator but I think you do not require this functionality.
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream stream = response.GetResponseStream();
int sizeToRead = (int)response.ContentLength;
int sizeRead = 0;
int buffer = 1;
byte[] bytes = new byte[sizeToRead];
while (sizeToRead > 0)
{
int rs = sizeToRead > buffer ? buffer : sizeToRead;
stream.Read(bytes, sizeRead, rs);
sizeToRead -= rs;
sizeRead += rs;
}
stream.Close();
System.IO.File.WriteAllBytes("c:\\tmp\\b.mp3", bytes);
I have the above piece of code. Its purpose is to download a mp3 file from somewhere and save it to c:\tmp\filename. And it works perfectly.
However, if i change the buffer size to something not 1, say 512. The downloaded mp3 file will be scratchy. I have compared the file downloaded by my program with the one downloaded via browser, I found that some bytes of the mp3 file downloaded by my program are set to 0 (their file sizes are same thought).
Besides, I have also used fiddler to monitor the traffic when I use the above piece of code to download the mp3 file. I diffed the mp3 downloaded from my program and the browser, all the bytes are same.
So, I guess the problem is inside the stream reader or the reading process. Does anyone know why does it happen? and how to solve it without setting the buffer size to 1?
Stream.Read returns an int that tells you how many bytes were actually read. If you're dealing with a stream you had better actually take in that information and act on it.
To put it another way, just because you asked for 2 bytes to be read, doesn't mean that your buffer contains 2 valid bytes.
If you need to retrieve a particular number of bytes (that you know of), then you should loop until you've obtained that number of bytes.
Is stream.Read() returning the same value as rs? Try this:
byte[] bytes = new byte[sizeToRead];
while (sizeToRead > 0) {
int rs = sizeToRead > buffer ? buffer : sizeToRead;
rs = stream.Read(bytes, sizeRead, rs);
sizeToRead -= rs;
sizeRead += rs;
}
I am developing a TCP file transfer client-server program. At the moment I am able to send text files and other file formats perfectly fine, such as .zip with all contents intact on the server end. However, when I transfer a .gif the end result is a gif with same size as the original but with only part of the image showing as if most of the bytes were lost or not written correctly on the server end.
The client sends a 1KB header packet with the name and size of the file to the server. The server then responds with OK if ready and then creates a fileBuffer as large as the file to be sent is.
Here is some code to demonstrate my problem:
// Serverside method snippet dealing with data being sent
while (true)
{
// Spin the data in
if (streams[0].DataAvailable)
{
streams[0].Read(fileBuffer, 0, fileBuffer.Length);
break;
}
}
// Finished receiving file, write from buffer to created file
FileStream fs = File.Open(LOCAL_FOLDER + fileName, FileMode.CreateNew, FileAccess.Write);
fs.Write(fileBuffer, 0, fileBuffer.Length);
fs.Close();
Print("File successfully received.");
// Clientside method snippet dealing with a file send
while(true)
{
con.Read(ackBuffer, 0, ackBuffer.Length);
// Wait for OK response to start sending
if (Encoding.ASCII.GetString(ackBuffer) == "OK")
{
// Convert file to bytes
FileStream fs = new FileStream(inPath, FileMode.Open, FileAccess.Read);
fileBuffer = new byte[fs.Length];
fs.Read(fileBuffer, 0, (int)fs.Length);
fs.Close();
con.Write(fileBuffer, 0, fileBuffer.Length);
con.Flush();
break;
}
}
I've tried a binary writer instead of just using the filestream with the same result.
Am I incorrect in believing successful file transfer to be as simple as conversion to bytes, transportation and then conversion back to filename/type?
All help/advice much appreciated.
Its not about your image .. It's about your code.
if your image bytes were lost or not written correctly that's mean your file transfer code is wrong and even the .zip file or any other file would be received .. It's gonna be correpted.
It's a huge mistake to set the byte buffer length to the file size. imagine that you're going to send a large a file about 1GB .. then it's gonna take 1GB of RAM .. for an Idle transfering you should loop over the file to send.
This's a way to send/receive files nicely with no size limitation.
Send File
using (FileStream fs = new FileStream(srcPath, FileMode.Open, FileAccess.Read))
{
long fileSize = fs.Length;
long sum = 0; //sum here is the total of sent bytes.
int count = 0;
data = new byte[1024]; //8Kb buffer .. you might use a smaller size also.
while (sum < fileSize)
{
count = fs.Read(data, 0, data.Length);
network.Write(data, 0, count);
sum += count;
}
network.Flush();
}
Receive File
long fileSize = // your file size that you are going to receive it.
using (FileStream fs = new FileStream(destPath, FileMode.Create, FileAccess.Write))
{
int count = 0;
long sum = 0; //sum here is the total of received bytes.
data = new byte[1024 * 8]; //8Kb buffer .. you might use a smaller size also.
while (sum < fileSize)
{
if (network.DataAvailable)
{
{
count = network.Read(data, 0, data.Length);
fs.Write(data, 0, count);
sum += count;
}
}
}
}
happy coding :)
When you write over TCP, the data can arrive in a number of packets. I think your early tests happened to fit into one packet, but this gif file is arriving in 2 or more. So when you call Read, you'll only get what's arrived so far - you'll need to check repeatedly until you've got as many bytes as the header told you to expect.
I found Beej's guide to network programming a big help when doing some work with TCP.
As others have pointed out, the data doesn't necessarily all arrive at once, and your code is overwriting the beginning of the buffer each time through the loop. The more robust way to write your reading loop is to read as many bytes as are available and increment a counter to keep track of how many bytes have been read so far so that you know where to put them in the buffer. Something like this works well:
int totalBytesRead = 0;
int bytesRead;
do
{
bytesRead = streams[0].Read(fileBuffer, totalBytesRead, fileBuffer.Length - totalBytesRead);
totalBytesRead += bytesRead;
} while (bytesRead != 0);
Stream.Read will return 0 when there's no data left to read.
Doing things this way will perform better than reading a byte at a time. It also gives you a way to ensure that you read the proper number of bytes. If totalBytesRead is not equal to the number of bytes you expected when the loop is finished, then something bad happened.
Thanks for your input Tvanfosson. I tinkered around with my code and managed to get it working. The synchronicity between my client and server was off. I took your advice though and replaced read with reading a byte one at a time.
I'm trying to inflate a string using zlib's deflate, but it's failing, apparently because it doesn't have the right header. I read elsewhere that the C# solution to this problem is:
public static byte[] FlateDecode(byte[] inp, bool strict) {
MemoryStream stream = new MemoryStream(inp);
InflaterInputStream zip = new InflaterInputStream(stream);
MemoryStream outp = new MemoryStream();
byte[] b = new byte[strict ? 4092 : 1];
try {
int n;
while ((n = zip.Read(b, 0, b.Length)) > 0) {
outp.Write(b, 0, n);
}
zip.Close();
outp.Close();
return outp.ToArray();
}
catch {
if (strict)
return null;
return outp.ToArray();
}
}
But I know nothing about C#. I can surmise that all it's doing is adding a prefix to the string, but what that prefix is, I have no idea. Would someone be able to phrase this function (or even just the header creation and string concatenation) in C++?
The data which I'm trying to inflate is taken from a PDF using zlib deflation.
Thanks a million,
Wyatt
I've had better luck using SharpZipLib for zlib interop than with the native .Net Framework classes. This correctly handles streams from C++ (zlib native) and from Java's compression classes without any funny business being needed.
I can't see any prefixes, sorry. Here's what the logic appears to be; sorry this isn't in C++:
MemoryStream stream = new MemoryStream(inp);
InflaterInputStream zip = new InflaterInputStream(stream);
Create an inflate stream from the data passed
MemoryStream outp = new MemoryStream();
Create a memory buffer stream for output
byte[] b = new byte[strict ? 4092 : 1];
try {
int n;
while ((n = zip.Read(b, 0, b.Length)) > 0) {
If you're in strict mode, read up to 4092 bytes - or 1 in non-strict mode - into a byte buffer
outp.Write(b, 0, n);
Write all the bytes decoded (may be less than the 4092) to the output memory buffer stream
zip.Close();
outp.Close();
return outp.ToArray();
Clean up, and return the output memory buffer stream as an array.
I'm a bit confused, though: why not just cut array b off at n elements and return that rather than go via a MemoryStream? The code also ought really to take care to clean up the memory streams and zip on exception (e.g. using using) since they're all IDisposable but I guess that's not really important since they don't correspond to I/O file handles, only memory structures.