Strange results from OpenReadAsync() when reading data from Azure Blob storage - c#

I'm having a go at modifying an existing C# (dot net core) app that reads a type of binary file to use Azure Blob Storage.
I'm using Windows.Azure.Storage (8.6.0).
The issue is that this app reads the binary data from files from a Stream in very small blocks (e.g. 5000-6000 bytes). This reflects how the data is structured.
Example pseudo code:
var blocks = new List<byte[]>();
var numberOfBytesToRead = 6240;
var numberOfBlocksToRead = 1700;
using (var stream = await blob.OpenReadAsync())
{
stream.Seek(3000, SeekOrigin.Begin); // start reading at a particular position
for (int i = 1; i <= numberOfBlocksToRead; i++)
{
byte[] traceValues = new byte[numberOfBytesToRead];
stream.Read(traceValues, 0, numberOfBytesToRead);
blocks.Add(traceValues);
}
}`
If I try to read a 10mb file using OpenReadAsync(), I get invalid/junk values in the byte arrays after around 4,190,000 bytes.
If I set StreamMinimumReadSize to 100Mb it works.
If I read more data per block (e.g. 1mb) it works.
Some of the files can be more than 100Mb, so setting the StreamMinimumReadSize may not be the best solution.
What is going on here, and how can I fix this?

Are the invalid/junk values zeros? If so (and maybe even if not) check the return value from stream.Read. That method is not guaranteed to actually read the number of bytes that you ask it to. It can read less. In which case you are supposed to call it again in a loop, until it has read the total amount that you want. A quick web search should show you lots of examples of the necessary looping.

Related

Reading data from a socket directly into a Memory Mapped File

I'm trying to create a .NET Core application and am getting a little stuck on IPC.
I have data coming in (let's say, from a socket, or a file, some sort of streaming interface), via executable 1. Now, I want this data to be read by executable 2. So, I've created a MMF, where executable 1 writes data, and executable 2 reads data. All is well.
However, I would really like to skip the "copy" step here. If I have a message coming in via a socket, I need to read the message (ergo; store it in some byte array), and then copy it to the appropriate Memory Mapped File.
Is there a way (especially now, with the new Memory, Span etc) to have them use the same memory?
This code almost seems to work, but not completely:
const int bufferSize = 1024;
var mappedFile = MemoryMappedFile.CreateNew("IPCFile", bufferSize);
var fileAccessor = mappedFile.CreateViewAccessor(0, bufferSize, MemoryMappedFileAccess.ReadWrite);
// THere now exists a region of byte[] * bufferSize somewhere. I want to write to that.
byte[] bufferMem = new byte[bufferSize];
// I know the memory address where this area is, and I know the size of it:
unsafe
{
byte* startOfRegion = (byte*)0;
fileAccessor.SafeMemoryMappedViewHandle.AcquirePointer(ref startOfRegion);
// But how do I "assign" this region to the managed object?
// This throws "System.MissingMethodException: 'No parameterless constructor defined for type 'System.Byte[]'."
//bufferMem = Marshal.PtrToStructure<byte[]>(new IntPtr(startOfRegion));
// This almost looks like it works, but bufferMem remains null. Does not give any errors though.
bufferMem = Unsafe.AsRef<byte[]>(startOfRegion);
}
// For a shorter example, just using a file stream
var incomingData = File.OpenRead(#"C:\Temp\plaatje.png");
// Pass in the "reserved" memory region. But StreamPipeReaderOptions wants a MemoryPool<byte>, not a byte[].
// How would I cast that? MemoryPool is abstract, so can't even be instantiated
var reader = PipeReader.Create(incomingData, new StreamPipeReaderOptions(bufferMem));
ReadResult readResult;
// Actually read data
while (true)
{
readResult = await reader.ReadAsync();
if (readResult.IsCompleted || readResult.IsCanceled)
break;
reader.AdvanceTo(readResult.Buffer.Start, readResult.Buffer.End);
}
// Now signal the other process to read the contents of the MMF and continue
This question seems to ask a similar thing, but does not have an answer, and is from 2013.

How to efficiently set the number of bytes to download for HttpWebRequest?

I'm currently working on a file downloader project. The application is designed so as to support resumable downloads. All downloaded data and its metadata(download ranges) are stored on the disk immediately per call to ReadBytes. Let's say that I used the following code snippet :-
var reader = new BinaryReader(response.GetResponseStream());
var buffr = reader.ReadBytes(_speedBuffer);
DownloadSpeed += buffr.Length;//used for reporting speed and zeroed every second
Here _speedBuffer is the number of bytes to download which is set to a default value.
I have tested the application by two methods. First is by downloading a file which is hosted on a local IIS server. The speed is great. Secondly, I tried to download the same file's copy(from where it was actually downloaded) from the internet. My net speed is real slow. Now, what I observed that if I increase the _speedBuffer then the downloading speed from the local server is good but for the internet copy, speed reporting is slow. Whereas if I decrease the value of _speedBuffer, the downloading speed(reporting) for the file's internet copy is good but not for the local server. So I thought, why shouldn't I change the _speedBuffer at runtime. But all the custom algorithms(for changing the value) I came up with were in-efficient. Means the download speed was still slow as compared other downloaders.
Is this approach OK?
Am I doing it the wrong way?
Should I stick with default value for _speedBuffer(byte count)?
The problem with ReadBytes in this case is that it attempts to read exactly that number of bytes, or it returns when there is no more data to read.
So you receive a packet containing 99 bytes of data, then calling ReadBytes(100) will wait for the next packet to include that missing byte.
I wouldn't use a BinaryReader at all:
byte[] buffer = new byte[bufferSize];
using (Stream responseStream = response.GetResponseStream())
{
int bytes;
while ((bytes = responseStream.Read(buffer, 0, buffer.Length)) > 0)
{
DownloadSpeed += bytes;//used for reporting speed and zeroed every second
// on each iteration, "bytes" bytes of the buffer have been filled, store these to disk
}
// bytes was 0: end of stream
}

Partial File reading in C#

I am developing an application which requires to read a text file which is continuously updating. I need to read the file till the end of file(at that very moment) and need to remember this location for my next file read. I am planning to develop this application in C#.net . How can I perform these partial reads and remember the locations as C# does not provide pointers in file handling ?
Edit : File is updated every 30 seconds and new data is appended to the old file.
I tried with maintaining the length of previous read and then reading the data from that location but the file can not be accessed by two applications at the same time.
You can maintain the last offset of the read pointer in the file. You can do sth like this
long lastOffset = 0;
using (var fs = new FileStream("myFile.bin", FileMode.Open))
{
fs.Seek(lastOffset, SeekOrigin.Begin);
// Read the file here
// Once file is read, update the lastOffset
lastOffset=fs.Seek(0, SeekOrigin.End);
}
Open the file, read everything, save the count of bytes you read in a variable (let's assume it's read_until_here).
next time you read the file, just take the new information (whichever comes after the location of your read_until_here variable ...
I am planning to develop this application in C#.net . How can I perform these partial reads and remember the locations as C# does not provide pointers in file handling ?
Not entirely sure why you'd be concerned about the supposed lack of pointers... I'd look into int FileStream.Read(byte[] array, int offset, int count) myself.
It allows you to read from an offset into a buffer, as many bytes as you want, and tells you how many bytes were actually read... which looks to be all the functionality you'd need.

How to shrink a string and be able to find the original later

I am working on this app that is still in beta, so I set up a logging system. The log is too long to be used in a mailto url so I thought about shrinking the text and then decrypt it.
Let's say I have a 50 line long log, this should help me make something like this zef16z1e6f8 and then have a procedure to use that to find out all 50 lines of the log.
I would like to note that I don't need any fancy TripleDES encryption or something.
First I would suggest re-looking at why you can't just mail the entire log content? Unless you have large logs (>5MB) I'd suggest just mailing the log. If you still want to pursue some shrinking strategy there are two I'd consider.
If you want a simple reference string which can be used to lookup your log data at some later stage you can just associate some sort of identifier with the data (e.g. a GUID as suggested by Eugene). This has the benefit of having a constant length, irrespective of the log size.
Alternatively you could just compress the log, this will shrink the data somewhat (anything up to about 90%, as Dan mentioned). However this has the downside of having a variable length and for very large logs may still exceed your size limitations. If you go this route you could do something like this (not tested):
private string GetCompressedString()
{
byte[] byteArray = Encoding.UTF8.GetBytes("Some long log string");
using (var ms = new MemoryStream())
{
using (var gz = new GZipStream(ms, CompressionMode.Compress, true))
{
ms.Write(byteArray, 0, byteArray.Length);
}
ms.Position = 0;
var compressedBytes = new byte[ms.Length];
ms.Read(compressedBytes, 0, compressedBytes.Length);
return Convert.ToBase64String(compressedBytes);
}
}

Decrypt a media file and play it without saving to the HDD

I need to develop WinForms app, which will be able to decrypt a media file (a movie) and then play it without saving decrypted file to the HDD (the decrypted file finally will be stored in the memory stream) The problem is, how then play that movie from the memory stream ? Is it possible ?
It is possible, but I expect you will need to write your own DirectShow filter to do so, which once created will act as a file reader (implementing the IFileSourceFilter interface), and, as the video plays, will read successive frames from the file, decrypt them, and pass them up to the next filter.
This will only work however if the file is encrypted in a sequential form (i.e. each individual frame is encrypted as a seperate entity). Otherwise, you will have to decrypt the entire file at once, which could be intensive, slow, and probably have to hit the hard drive to store the end file.
But anyway, this link should get you started: http://msdn.microsoft.com/en-us/library/dd375454%28VS.85%29.aspx
I'm afraid that in order to create the DirectShow filter, you will need to use C++, and it isn't the easiest API to get your head around.
An alternate way to do it may be to use the Windows Media Format SDK, which allows you to pass custom video packets to a renderer in real time. There is also a good interop library for C# (WindowsMediaLib)
First of all, it's a good idea to encrypt source video piece by piece. So the encrypted video file is a set of encrypted parts. Just split original file into parts of the same size and encrypt them.
Here the scheme (OutputStream is a stream of encrypted video file, InputStream is original file stream, ChunkSize is a size of each part in the original file, also we write some metadata: sizes of original and encrypted pieces):
using (BinaryWriter Writer = new BinaryWriter(OutputStream))
{
byte[] Buf = new byte[ChunkSize];
List<int> SourceChunkSizeList = new List<int>();
List<int> EncryptedChunkSizeList = new List<int>();
int ReadBytes;
while ((ReadBytes = InputStream.Read(Buf, 0, Buf.Length)) > 0)
{
byte[] EncryptedData = Encrypt(Buf, ReadBytes);
OutputStream.Write(EncryptedData, 0, EncryptedData.Length);
SourceChunkSizeList.Add(ReadBytes);
EncryptedChunkSizeList.Add(EncryptedData.Length);
}
foreach (int SourceChunkSize in SourceChunkSizeList)
Writer.Write(SourceChunkSize);
foreach (int EncryptedChunkSize in EncryptedChunkSizeList)
Writer.Write(EncryptedChunkSize);
}
Such metadata will help to find encrypted part rapidly.
Secondly, don't decrypt encrypted data in each read request. Cache it: video playing in the most case is just a sequential reading.
The tricky part is how to play encrypted video file. You may write either a DirectShow filter (video specific solution), or check a 3rd party product (multipurpose solution): BoxedApp, a virtualization SDK. What's cool is that they have an article that shows how to solve exact your task, look: http://boxedapp.com/encrypted_video_streaming.html

Categories

Resources