Architecture: Handling large scale photo upload and resizing

Architecture: Handling large scale photo upload and resizing - c#

I have a system where users can upload full resolution sized images of about 16 mega-pixels which result in large files.
The current methodology is:
Receive the upload in a HTTP request.
Within the request, write the original file to blob store
Still within the request, make about 10 copies of the file at various resolutions. (These are thumbnails at different sizes, some for Hi-DPI (retina) devices, as well as a dimension for full-sized viewing. I also convert the images to WebP.
I then transfer all the results to blob stores in different regions for private CDN purposes.
Clearly, the issue is that since this is all done within a HTTP request, it consumes vastly more server resources than any other typical HTTP request, especially when users start uploading images in bulk, several users at a time. If a user uploads a large image, the memory consumption jumps dramatically (I am using ImageMagick.NET for image processing).
Is this architecture more suitable:
Receive the file upload, write to the blob, add a notification to a processing queue, return success to the user.
A separate worker server receives the notification of the new file and starts all the re-sizing, processing and replication.
I just set the client-side JavaScript to not load the image previews for a few seconds, or get it retry if the image is not found (meaning that the image is still being processed, but is likely to show up sometime soon).
At least this new method will scale easier, has more predictable performance. But it seems like a lot of work just to handle something as 'every day' as photo uploading. Is there a better way?
I know the new method follows the same principle as using an external re-sizing service where, but wan't to do this in house since I am concerned about privacy of some of these third-party services. It would still mean I would have to adapt the client to deal with missing/unprocessed images.

Yes, what you're describing is a better way. It sounds more complicated, but it is how majority of scalable sites handle big load.. offload it to a queue and let workers process it.
I'd add a correction in your case for step #2:
A separate worker server monitors a queue and and starts all the re-sizing, processing and replication when a message appears instructing it to do so.

Another option would be to use the new Web Jobs feature. In fact your scenario seems to be so common (in terms of image processing) that it's listed as one of the Typical Scenario on MSDN.
Image processing or other CPU-intensive work. A common feature of web
sites is the ability to upload images or videos. Often you want to
manipulate the content after it's uploaded, but you don't want to make
the user wait while you do that.
Whether its better or not I'll leave it up to you to decide.

Related

Is there a way to capture a view and save it?

We have many element(s) in a ContentPage. The goal is to take a picture of a specific element and then have access to that data - to save it or possibly other things such as cropping it.
So this question is twofold - is there a way to photographically capture a given element? Is there a way to do this if the element is not fully in view? Example a ScrollView would potentially have some of its elements not currently in view.
Our attempt at this is to use device specific screenshots and crop them to a given element. The screenshots are working, but we aren't having luck with cropping. Not to mention in the case as described above the screenshot will not work as the view isn't fully visible.
Is there a way to obtain the "graphical" (photo) data of an element at a given time even if it's not currently visible/partially visible?
Thanks for reading in advance.

After a lot of talking, this is what I understand
The Users of your application are the Workers of Your company
The application is for managing the accounts of your companies Customers
The Customers have no access to their data, in any shape or form
Part of the Customer Data is their Email Adress
You want to send a copy of their Data to the Customers
As Emails do not allow formating that well, you want to send that Data as a Screenshot of the UI.
If I got all that right:
You are neck-deep in a XY Problem. Or rather a ((XY)Y)Y Problem - a XY problem of the 3rd Generation.
The obvious solution would be to fix point 3 and give your customers access to their Data already:
You can do that via a extra Programm, App, a Webpage or anything similar. If they can receive emails, they can download a app or open a Webpage and see their data there. May need a login, but nothing special. There are even ways to encode data/direct links into Emails and register your Programm with a custom Format. Indeed, that is how Steam Links on the Desktop work.
Meanwhile the In-House user get a "Customer Management" Programm that allows more direct access to the Customers Data in the Database (I asume you got a backend Database. But it is at least possible you do not).
If you can not fix Option 3 for stupid Boss/Legal Reasons (these are the only Valid reasons I can Imagine. And I can not stress enoug how stupid the boss would have to be in that), you should at least be able to fix at Point 5/6:
The first Option would be to send Text Emails. People often underestimate jsut how much is possible with pure Text. It is basically like writing on a Console, but even that is enough medium to make a Art in it.
The other ways involve Managing the HTML limitations:
Save HTML Mail
The main security issue with HTML mails, is "downloading external content" part. Those operations can not be reliable scanned by Virus scanners and the like - especially in the age of HTTPS. Unless we talk about Kaspersky and the stupid Idea they had.
And even if they can be scanned reliable, even just the request of those files can be used for spam senders to verify the Email Adress is still in use. So it is a no-go too.
So you will need to Inline as much as possible. Inlining images is not that possible. While HTML totally has a Standart for that - you Base64 encode the binary into the HTML - this does not work reliably. At least Microsoft Outlook is known to interpret all Base64 images in the Email as Attachments - even the inlined ones. And even if they fixed this or it is no longer a relevant issue, inlining images tends to increase the HTML size significantly.
You can use CSS to some degree. But aside from inlining it, you might have to go back a step or two. In the end, Email Programms are really weak web-browsers. So they do not nessesarily support all the latest stuff instantly. Anything below CSS 3.0 should reliably work by now. But you better ask someone once you got more specific Requirements for this Email.
PDF Attachment
Somewhat more established is to create a .PDF file and send it. All those bills/other stuff in .PDF format you get - those have been created on demand from a Database, by the same code that also send the Email. In many cases the demand was automated too or the Sending Programm was a outright Background Process.
.PDF allows all Formating you could want. It can take up images inline. And there is plenty of ways to create .PDF from code. And as you can send it as a attachment, the Virus scanner has time to go over it. And we are not in the last Millenium, where a PDF Reader was a uncommon programm to have installed (I still remember the times when a current Version of Acrobat PDf Reader was delivered on every CD with a .PDF Format Handbook).
If you are stil dead serious about the whole "make a Image of the UI to send that", my only question is: How many Years have been aloted for that?

Low Bandwidth Website Design

A little while back, one of the junior developers at our company was tasked with creating a website for users to enter timesheets offsite. Mostly this is used for staff that reside offshore and have limited bandwidth (it's satellite internet, so we're already looking at a 500ms - 600ms response time, typically with only 10KB/s or less, including 10% - 20% intermittent packet loss).
So it's a challenging situation...
Recently I've been tasked with helping the junior to improve the speed and functionality of the website, mostly for my own benefit, since I'm usually a desktop dev. One thing I've noticed is that the website is using MultiView and I'm wondering if that's the best approach. I can see the reasoning; download the entire website once, then just make queries back and forth, showing/hiding the various views as necessary. Except it doesn't seem to work as smoothly as that.
95% of operations required a run by the server; i.e. add a new timesheet - need to tell the server, which in turn creates a new entry in the database. When the server is done, it seems to cause the client to download the entire webpage again, which is obviously counter productive.
So my question(s) are as follows;
Is this the expected behaviour, given the above situation? i.e. Should the entire webpage be getting re-downloaded once the server has completed it's actions?
If so, is this the best approach for the situation? Would it be better to have smaller, individual pages for the various features (timesheets/leave/etc.)?
I know this is probably a bit opinion based, but any ideas or assistance is greatly appreciated; for both our benefits.

Going from memory, Multiview only renders one of the views, not all of them, but since you mention the Multiview, that tells me you are using the older WebForms technology which often carries large amounts of overhead saving/restoring state. You can try and optimize that, especially if you are using some kind of grid control.
A better approach may be to ditch WebForms and switch to a newer technology like MVC. Rewrite the application to use AJAX with a webservice that returns JSON whenever possible to reduce the amount of data that needs to be sent to and from the server. Using MVC will also reduce the number of resources required for a page load (No resource.axd, etc) which will help page load times, especially over high latency links.
Make sure the server is set to compress dynamic pages with GZIP.
Compress and minify your javascript and CSS.
Don't use inline styles (the style attribute) in your HTML (use classes or IDs+children selectors) to reduce HTMLsize.
Bundle all your javascript and CSS.
Sprite your images in CSS where possible.
Run your images through a good image optimizer like http://kraken.io
Make sure you are caching whatever you can, and the cache duration is set properly.
Minify your HTML.
Stop using WebForms (or watch your page state, and control state very closely)
Check into some of the SPA architectures out there -- you may be able to make the whole application "offline-able" with the exception of the calls to get/update/create data.
Ultimately, each page should only require 1 HTML file, 1 CSS file, 1 Javascript file, and 1 sprite sheet on the first page hit, and then every page after that should only require a single HTML file.
You might also want to look into using a client side library like angular or knockout to handle rendering views. This can reduce the amount of traffic that needs to be sent (although it likely will increase the number of requests by one).

I think the best bet is a SPA (Single Page App) with Angularjs. Done right it greatly reduces the number of http requests. Navigation does not cause entire page reload in any case. Javascript files, css files etc, are loaded just one time at app load time. Once the app is loaded in the browser, the traffic is mainly sending JSON back and forth.
There are some tricks you should apply to reduce app load time:
Bundle javascript files into just one minified javascript file.
Bundle css files into just one css file.
Levearage http cache. You can use file versioning combined with MaxAge http header, so the browser does not even ask the server if the file has changed.
Some tools to help:
Fiddler, look at what is being cached and what isn't.
Facebooks augmented Traffic Control

To my understanding, ajax would be the best choice for you. If you want to access server 95% of times and reload the page with the new elements then the performance would hamper.
So instead of doing this make partial reloading with Ajax or Jquery. There are plenty of functionality available with jquery which would use ajax and reload specific portion of the webpage instead of whole page. It would increse the performance a lot.
One more thing I would like to add is that the response packet coming from server might be huge chunk. So instead of directly throwing the response from the server, implement GZip functionality in the website. It would compress the size of the data packet and the page would load/reload much faster.
Other than these, place your CSS and JS code inside some .css and .js file instead of placing it inside the page itself(and make sure to use it maximum time from all the pages). Browser would make a cache version of those files and reuse it instead of download it every time you want to connect to the server.

I believe that you have already figured out what's wrong. No Multiview is not good if it is implemented as is without tweaks. If your website uses viewstate and on top of that you have the multiview implemented, then it is going to be a costly affair.
Here are your options.
To use most out of the code, I would recommend to convert your methods HTTP GET / POST methods which can be then called separately from the needed actions in the html.
Don't re-render the entire page, but render the content which changes on menu action.
Change the non-changing part of your page / site to static content and apply compression on the static contents.
Enable page caching.
Cache the data offline wherever possible. (Remember it comes with a overhead of syncing data).
If you are considering a revamp give a thought about HTML 5 offline features.

Do I need to create thumbnails for WinRT application

I am coming from an asp.net background where if you want to display a photo gallery you have to have two files for each photo i.e. the original and a separate thumbnail file.
If I was to create a Win8App gallery that has say 100 photos per view. Would it be okay performance wise to simply change the size of the photo... i.e. only have the 1 file. (These are loaded from the file system).
I know it may depend on certain conditions but generally what is the best way to do it?

Depends on the file size, and where you get them from. If the files are on the system you could use StorageFile.GetThumbnailAsync. Otherwise, if the files are large and you are getting them from somewhere else (a service) you could load them only as they scroll into view for the user. Make sure to dispose objects as you are not using them as bitmaps are notorious for eating up memory resources.
100 images doesn't sound like a lot for me, but it's better to have numbers to back that statement up with as I have no idea how large the files are.
Here are some general guidelines for thumbnails from MSDN
I would try different ways to deal with it and use the performance tools to see what the end result is. Maybe you could group the images and have the user view one group at the time, maybe use placeholder images, or maybe the files aren't that big and its no problem at all to simply resize depending on view.
For lazy loading (recommended with many items) use data virtualization by implementing the ISupportIncrementalLoading You can find more information about that on MSDN.

Host, get and display a huge library of various images

We are developping an e-commerce application and I have a bit of a problem.
Right now we have 2 MVC applications:
A main MVC application which role is to manage the inventory and set items to sale;
Another MVC application which will serve as a the e-commerce on which the items set to sale by the main application will be displayed.
My main problem is that these two shares a same library of image, and this library is huge (about 60 000 images and counting). Up to now to allow a fast process each project has a physical copy of each images "~/Images/BankImages/FullImage/theFirstImage.jpeg", and so on, but you can guess that this is a pretty huge library that takes a lot of room.
I'm looking for options on how I could develop something that would return an image in whichever C# format. I was thinking about a web service, I suppose, which task would be to return these images upon being called, but I don't know how I can do it (newb here) and I think I may lose a bit of speed because a call to the web service may not return immediately the needed image, and I may have to retrieve a few hundred of these images at the same time.
So I'm looking for suggestions. What would be the best way to solve my main problem and avoid (if possible) having to copy each time the whole image library?
Thanks a lot!

When You say:
My main problem is that these two shares a same library of image
This mean you need a single assets repository, so you need a CDN, same asset, common API

Best approach to show default image

Looking for information - I am creating a catolog website that includes a list of products. Each product has an image stored stored on the hard drive on the server. If the image does not exist, I want to show a default image. Whats the best way of doing this. I am using C# and considered checking on the server side if the image exists. But as some pages could have 50-60 images this would slow down the page. I use jquery on the client side. Any tips on this?

This is a great question, as the sitation arises in many circumstances. I see several options:
1) check for image availability during rendering of the catalog and use a link to the default image for items that do not have an image,
2) check for image availability in the image controller and return the default image when not available
3) put images inline in the document using data URLs
A major factor here is the possibility of caching.
Option (1) facilitates caching of the default image, but precludes caching of the catalog page. It is better if there are many items without an image, then such items will not even generate a hit to the server Furthermore, if there's a low chance that an image would appear for an item, you could cache the index too (for a reasonably short time).
Option (2) facilitates caching of the index page, but each image will have to send a request to the server. Again, you could use aggressive caching to avoid the same requests the second time the page is rendered.
Option (3) is best if your images are small and if the catalog page is relatively static. Be sure to use caching on the server side though while generating the page to reduce the load on the filesystem/database.

Sounds like this is a web application, so you should look into doing some caching. Even though image file lookups are expensive, once your page gets hit a few times the disk lookups will no longer be necessary.
Or you could store the information about whether a product image exists in your database. Then you prepopulate the database with the information and no disk checks are necessary.

Your best bet is to do this server-side as you suggest. You could do it client-side (attempt to load image, and load a default image if that fails), but this is not really what client-side scripting is designed for. You're making the user do extra HTTP requests, which is slower for the user.
An even better solution, as marcind suggests, is to pre-populate the database with default images. So in your CMS, when you create a new item, it assigns a default image URL to itself. You can then manually change it from there.

How does your jQuery code know the name of the image?
Seeing that your image files are physical files on the server and are accessible from a browser, I'd probably leave that part as is since that implies you don't have to serve the images yourself and IIS can handle that for you as a static file.
So your jQuery code obviously know the name of the image for each product. I assume this name is given to it by some server side process, so that process needs to give it either the name of the image for the product or the default image.
Some part of your code has to go through the process of figuring out if an image exists for the product and react accordingly. If you're using a database for your products that you could have a field in product table that indicates if the product has an image or not.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.