I have a table that has a blob column representing a file.
I'd like to run a LinqToSql query that returns a name and description of the file, along with the file size... but in the interests of not killing performance, I obviously don't want to download the whole blob!
var q = from f in MyFiles
select new {f.Name, f.Description, f.Blob.Length};
appears to pull the entire blob from the DB, then calculate its length in local memory.
How can I do this so that I only get the blob size, without downloading the entire blob?
I think the best choose in your case is to store blob size in the separate column, when storing file to database.
Related
I have a high scale distributed system which downloads a lot of large .csv files and indexes the data everyday.
Lets say, our file(file.csv) is:
col1 col2 col3
user11 val12 val13
user21 val22 val23
Then we read this file row wise and store the byte offset of where the row of user11 or user12 is located in this file. eg:
Index table -
user11 -> 1120-2130 (bytes offset)
user12 -> 2130-3545 (bytes offset)
When someone says, delete the data for user11, we refer this table, download and open the file, delete the byte offset in the file. Please note, this byte offset is of the entire row.
How can I design the system to process parquet files?
Parquet files operate column wise. To get an entire row of say 10 columns, will i have to make 10 calls? Then, form an entire row, calculate the bytes and then store them in the table?
Then, while deleting, I will have to again form the row and then delete the bytes?
Other option is store the byte offset of each column instead and process column wise but that will blow up the index table.
How can parquet files be efficiently processed in row-wise manner?
Current system is a background job in C#.
Using Cinchoo ETL, an open source library to convert CSV to parquet file easily.
string csv = #"Id,Name
1,Tom
2,Carl
3,Mark";
using (var r = ChoCSVReader.LoadText(csv)
.WithFirstLineHeader()
)
{
using (var w = new ChoParquetWriter("*** PARQUET FILE PATH ***"))
w.Write(r);
}
For more information, pls check https://www.codeproject.com/Articles/5270332/Cinchoo-ETL-Parquet-Reader article.
Sample fiddle: https://dotnetfiddle.net/Ra8yf4
Disclaimer: I'm author of this library.
I want to save a reference to an image that will be stored in the database.
But I am not sure how to approach this in C# (Entityframework).
Using EF's code first approach.
In the Model class, must i do String imageReference, og must I use byte? Or is there another and better solution to this? What I want once the database is created, for that column, it should say Blob or what ever is used to hold large objects like images.
I am also thinking that only saving a reference in the database, instead of the image itself might be a solution. But I don't know which is better?
You need to create it as a byte, and your db should be blob
var image = new ImageEntity(){
imageReference= convertImageToByteArray(image)
}
_Context.Images.Add(image);
_Context.SaveChanges();
Convert ur image to a byte array:
public byte[] convertImageToByteArray(Image imageIn)
{
MemoryStream memStream = new MemoryStream();
imageIn.Save(memStream , ImageFormat.Gif); //u may choose other formats
return memStream .ToArray();
}
I would recommend saving the file path as reference instead of storing it as a blob unless you have no choice. This is because a larger DB will degrade the performance, the hard disk would do a better job at handling files. If your image files are larger than 1MB, the file system has an advantage over a SQL Server. Also storing it in the file system has greater flexibility (i.e. you may migrate your files elsewhere next time, and change the link in the DB during migration, you can't do that on the DB)
I have converted an image to a byte array using below code to store it in Database
if (Request.Content.IsMimeMultipartContent())
{
var provider = new MultipartMemoryStreamProvider();
await Request.Content.ReadAsMultipartAsync(provider);
foreach (var file in provider.Contents)
{
var filename = file.Headers.ContentDisposition.FileName.Trim('\"');
var attachmentData = new AttachmentData ();
attachmentData.File = file.ReadAsByteArrayAsync().Result;
db.AttachmentData.Add(attachmentData);
db.SaveChanges();
return Ok(attachmentData);
}
}
Here File column in DB is of type "varbinary(max)" and in EF model it is byte array (byte[]).
Using above code I was able to save the image in the File column something similar to "0x30783839353034453437304430413143136303832....." (This length is exceeding 43679 characters which is more than default given to any column so the data got truncated while storing it)
I have to change the default length(43679) of column to store it in database.
Am I doing it the correct way to retrieve and store image in Database. I was also thinking to store the image as "Base64 String" but 43679 will still exceed.
I am using Angular JS to show the image on front end which uses WebAPI to fetch and save the image as ByteArray in database.
Yes, It really helps to not know how databases work.
First:
varbinary(max)
This stores up to 2gb, not 43679 bytes.
Then:
similar to "0x30783839353034453437304430413143136303832.....
This is not how it is stored. This is a textual representation in uotput.
(This length is exceeding 43679 characters which is more than default given to
any column so the data got truncated while storing it)
There is no default given to any column - outside basically SQL Server Management Studio which likely will not be able to handle images as images and has a lot of limitations. But this is not the database, it is an admin ui.
I was also thinking to store the image as "Base64 String" but 43679 will still
exceed.
Actually no, it will exceed this by more - your data will be significantly longer as Base64 is longer than binary data.
i am trying to add bytes of a file to a field in the database which is of type VARBINARY bu this needs to be appended due to file size constraits
Is there any example code/website of how to do this? Or is it even possible to append the bytes to this field using Entity Framework?
I need to append the data as getting a byte array of 1GB + is going to cause memory exceptions so I think this is the only way..
Some code I have done
using (var stream = File.OpenRead(fn))
{
long bytesToRead = 1;
byte[] buffer = new byte[bytesToRead];
while (stream.Read(buffer, 0, buffer.Length) > 0)
{
Item = buffer;
}
}
Thanks for any help
The basic idea is making an stored procedure that implements an update like this:
UPDATE MyTable SET Col = Col + #newdata WHERE Id = #Id
and invoking it using ExecuteSqlCommand (see MSDN docs here).
But in this case you're only transfering the problem to the SQL Server side The column must be retrieved, modified, and written back).
To really get rid of the memory problem, implement your stored procedure using UPDATETEXT, which is much more efficient for your requirements:
Updates an existing text, ntext, or image field. Use UPDATETEXT to change only a part of a text, ntext, or image column in place. Use WRITETEXT to update and replace a whole text, ntext, or image field
When storing large files in a database, it is usual to store the file on disc on the Web Server rather than in the database. In the database you store the path to the file, thus your code can get to it's contents without having to store gigs of data in the database.
However, if you are attempting to manipulate the contents of a 1GB+ file in memory, this is going to be interesting however you do it...
I'd like to save some images (jpeg) into a blob file. I don't have any idea to start, how is a blob file generated ? I searched google and this site but i couldn't find any example. I guess I don't understand about blobs and database. Your guidance is best appreciated.
You could try something like this:
MySqlCommand cmd;
cmd.CommandText = "INSERT INTO mytable (id, blobcol) VALUES (1,:blobfile)";
cmd.Parameters.Add("blobfile", File.ReadAllBytes(your_jpeg_file));
A BLOB is a binary field in which you can write (in general) an array of bytes.
So you can read your file as byte[] and pass it to a query parameter.
What is a BLOB file
"In general, a blob is an amorphous and undefinable object."
The actual contents of a JPEG file when read in their raw (as they are) format can be considered to be a BLOB object. What you can do is simply read the entire JPEG file in a byte[] buffer and whatever you get, you just put in your database in a BLOB field