Background
I am writing and using a very simple CGI-based (Perl) content management tool for two pro-bono websites. It provides the website administrator with HTML forms for events where they fill the fields (date, place, title, description, links, etc.) and save it. On that form I allow the administrator to upload an image related to the event. On the HTML page displaying the form, I am also showing a preview of the picture uploaded (HTML img tag).
The Problem
The problem happens when the administrator wants to change the picture. He would just have to hit the "browse" button, pick a new picture and press ok. And this works fine.
Once the image is uploaded, my back-end CGI handles the upload and reloads the form properly.
The problem is that the image shown does not get refreshed. The old image is still shown, even though the database holds the right image. I have narrowed it down to the fact that the IMAGE IS CACHED in the web browser. If the administrator hits the RELOAD button in Firefox/Explorer/Safari, everything gets refreshed fine and the new image just appears.
My Solution - Not Working
I am trying to control the cache by writing a HTTP Expires instruction with a date very far in the past.
Expires: Mon, 15 Sep 2003 1:00:00 GMT
Remember that I am on the administrative side and I don't really care if the pages takes a little longer to load because they are always expired.
But, this does not work either.
Notes
When uploading an image, its filename is not kept in the database. It is renamed as Image.jpg (to simply things out when using it). When replacing the existing image with a new one, the name doesn't change either. Just the content of the image file changes.
The webserver is provided by the hosting service/ISP. It uses Apache.
Question
Is there a way to force the web browser to NOT cache things from this page, not even images?
I am juggling with the option to actually "save the filename" with the database. This way, if the image is changed, the src of the IMG tag will also change. However, this requires a lot of changes throughout the site and I rather not do it if I have a better solution. Also, this will still not work if the new image uploaded has the same name (say the image is photoshopped a bit and re-uploaded).
Armin Ronacher has the correct idea. The problem is random strings can collide. I would use:
<img src="picture.jpg?1222259157.415" alt="">
Where "1222259157.415" is the current time on the server.
Generate time by Javascript with performance.now() or by Python with time.time()
Simple fix: Attach a random query string to the image:
<img src="foo.cgi?random=323527528432525.24234" alt="">
What the HTTP RFC says:
Cache-Control: no-cache
But that doesn't work that well :)
I use PHP's file modified time function, for example:
echo <img src='Images/image.png?" . filemtime('Images/image.png') . "' />";
If you change the image then the new image is used rather than the cached one, due to having a different modified timestamp.
I would use:
<img src="picture.jpg?20130910043254">
where "20130910043254" is the modification time of the file.
When uploading an image, its filename is not kept in the database. It is renamed as Image.jpg (to simply things out when using it). When replacing the existing image with a new one, the name doesn't change either. Just the content of the image file changes.
I think there are two types of simple solutions: 1) those which come to mind first (straightforward solutions, because they are easy to come up with), 2) those which you end up with after thinking things over (because they are easy to use). Apparently, you won't always benefit if you chose to think things over. But the second options is rather underestimated, I believe. Just think why php is so popular ;)
use Class="NO-CACHE"
sample html:
<div>
<img class="NO-CACHE" src="images/img1.jpg" />
<img class="NO-CACHE" src="images/imgLogo.jpg" />
</div>
jQuery:
$(document).ready(function ()
{
$('.NO-CACHE').attr('src',function () { return $(this).attr('src') + "?a=" + Math.random() });
});
javascript:
var nods = document.getElementsByClassName('NO-CACHE');
for (var i = 0; i < nods.length; i++)
{
nods[i].attributes['src'].value += "?a=" + Math.random();
}
Result:
src="images/img1.jpg" => src="images/img1.jpg?a=0.08749723793963926"
You may write a proxy script for serving images - that's a bit more of work though. Something likes this:
HTML:
<img src="image.php?img=imageFile.jpg&some-random-number-262376" />
Script:
// PHP
if( isset( $_GET['img'] ) && is_file( IMG_PATH . $_GET['img'] ) ) {
// read contents
$f = open( IMG_PATH . $_GET['img'] );
$img = $f.read();
$f.close();
// no-cache headers - complete set
// these copied from [php.net/header][1], tested myself - works
header("Expires: Sat, 26 Jul 1997 05:00:00 GMT"); // Some time in the past
header("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");
header("Cache-Control: no-store, no-cache, must-revalidate");
header("Cache-Control: post-check=0, pre-check=0", false);
header("Pragma: no-cache");
// image related headers
header('Accept-Ranges: bytes');
header('Content-Length: '.strlen( $img )); // How many bytes we're going to send
header('Content-Type: image/jpeg'); // or image/png etc
// actual image
echo $img;
exit();
}
Actually either no-cache headers or random number at image src should be sufficient, but since we want to be bullet proof..
I checked all the answers around the web and the best one seemed to be: (actually it isn't)
<img src="image.png?cache=none">
at first.
However, if you add cache=none parameter (which is static "none" word), it doesn't effect anything, browser still loads from cache.
Solution to this problem was:
<img src="image.png?nocache=<?php echo time(); ?>">
where you basically add unix timestamp to make the parameter dynamic and no cache, it worked.
However, my problem was a little different:
I was loading on the fly generated php chart image, and controlling the page with $_GET parameters. I wanted the image to be read from cache when the URL GET parameter stays the same, and do not cache when the GET parameters change.
To solve this problem, I needed to hash $_GET but since it is array here is the solution:
$chart_hash = md5(implode('-', $_GET));
echo "<img src='/images/mychart.png?hash=$chart_hash'>";
Edit:
Although the above solution works just fine, sometimes you want to serve the cached version UNTIL the file is changed. (with the above solution, it disables the cache for that image completely)
So, to serve cached image from browser UNTIL there is a change in the image file use:
echo "<img src='/images/mychart.png?hash=" . filemtime('mychart.png') . "'>";
filemtime() gets file modification time.
I'm a NEW Coder, but here's what I came up with, to stop the Browser from caching and holding onto my webcam views:
<meta Http-Equiv="Cache" content="no-cache">
<meta Http-Equiv="Pragma-Control" content="no-cache">
<meta Http-Equiv="Cache-directive" Content="no-cache">
<meta Http-Equiv="Pragma-directive" Content="no-cache">
<meta Http-Equiv="Cache-Control" Content="no-cache">
<meta Http-Equiv="Pragma" Content="no-cache">
<meta Http-Equiv="Expires" Content="0">
<meta Http-Equiv="Pragma-directive: no-cache">
<meta Http-Equiv="Cache-directive: no-cache">
Not sure what works on what Browser, but it does work for some:
IE: Works when webpage is refreshed and when website is revisited (without a refresh).
CHROME: Works only when webpage is refreshed (even after a revisit).
SAFARI and iPad: Doesn't work, I have to clear the History & Web Data.
Any Ideas on SAFARI/ iPad?
When uploading an image, its filename is not kept in the database. It is renamed as Image.jpg (to simply things out when using it).
Change this, and you've fixed your problem. I use timestamps, as with the solutions proposed above: Image-<timestamp>.jpg
Presumably, whatever problems you're avoiding by keeping the same filename for the image can be overcome, but you don't say what they are.
You must use a unique filename(s). Like this
<img src="cars.png?1287361287" alt="">
But this technique means high server usage and bandwidth wastage.
Instead, you should use the version number or date. Example:
<img src="cars.png?2020-02-18" alt="">
But you want it to never serve image from cache. For this, if the page does not use page cache, it is possible with PHP or server side.
<img src="cars.png?<?php echo time();?>" alt="">
However, it is still not effective. Reason: Browser cache ...
The last but most effective method is Native JAVASCRIPT. This simple code finds all images with a "NO-CACHE" class and makes the images almost unique. Put this between script tags.
var items = document.querySelectorAll("img.NO-CACHE");
for (var i = items.length; i--;) {
var img = items[i];
img.src = img.src + '?' + Date.now();
}
USAGE
<img class="NO-CACHE" src="https://upload.wikimedia.org/wikipedia/commons/6/6a/JavaScript-logo.png" alt="">
RESULT(s) Like This
https://example.com/image.png?1582018163634
Your problem is that despite the Expires: header, your browser is re-using its in-memory copy of the image from before it was updated, rather than even checking its cache.
I had a very similar situation uploading product images in the admin backend for a store-like site, and in my case I decided the best option was to use javascript to force an image refresh, without using any of the URL-modifying techniques other people have already mentioned here. Instead, I put the image URL into a hidden IFRAME, called location.reload(true) on the IFRAME's window, and then replaced my image on the page. This forces a refresh of the image, not just on the page I'm on, but also on any later pages I visit - without either client or server having to remember any URL querystring or fragment identifier parameters.
I posted some code to do this in my answer here.
From my point of view, disable images caching is a bad idea. At all.
The root problem here is - how to force browser to update image, when it has been updated on a server side.
Again, from my personal point of view, the best solution is to disable direct access to images. Instead access images via server-side filter/servlet/other similar tools/services.
In my case it's a rest service, that returns image and attaches ETag in response. The service keeps hash of all files, if file is changed, hash is updated. It works perfectly in all modern browsers. Yes, it takes time to implement it, but it is worth it.
The only exception - are favicons. For some reasons, it does not work. I could not force browser to update its cache from server side. ETags, Cache Control, Expires, Pragma headers, nothing helped.
In this case, adding some random/version parameter into url, it seems, is the only solution.
Add a time stamp <img src="picture.jpg?t=<?php echo time();?>">
will always give your file a random number at the end and stop it caching
With the potential for badly behaved transparent proxies in between you and the client, the only way to totally guarantee that images will not be cached is to give them a unique uri, something like tagging a timestamp on as a query string or as part of the path.
If that timestamp corresponds to the last update time of the image, then you can cache when you need to and serve the new image at just the right time.
I assume original question regards images stored with some text info. So, if you have access to a text context when generating src=... url, consider store/use CRC32 of image bytes instead of meaningless random or time stamp. Then, if the page with plenty of images is displaying, only updated images will be reloaded. Eventually, if CRC storing is impossible, it can be computed and appended to the url at runtime.
Ideally, you should add a button/keybinding/menu to each webpage with an option to synchronize content.
To do so, you would keep track of resources that may need to be synchronized, and either use xhr to probe the images with a dynamic querystring, or create an image at runtime with src using a dynamic querystring. Then use a broadcasting mechanism to notify all
components of the webpages that are using the resource to update to use the resource with a dynamic querystring appended to its url.
A naive example looks like this:
Normally, the image is displayed and cached, but if the user pressed the button, an xhr request is sent to the resource with a time querystring appended to it; since the time can be assumed to be different on each press, it will make sure that the browser will bypass cache since it can't tell whether the resource is dynamically generated on the server side based on the query, or if it is a static resource that ignores query.
The result is that you can avoid having all your users bombard you with resource requests all the time, but at the same time, allow a mechanism for users to update their resources if they suspect they are out of sync.
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="mobile-web-app-capable" content="yes" />
<title>Resource Synchronization Test</title>
<script>
function sync() {
var xhr = new XMLHttpRequest;
xhr.onreadystatechange = function() {
if (this.readyState == 4 && this.status == 200) {
var images = document.getElementsByClassName("depends-on-resource");
for (var i = 0; i < images.length; ++i) {
var image = images[i];
if (image.getAttribute('data-resource-name') == 'resource.bmp') {
image.src = 'resource.bmp?i=' + new Date().getTime();
}
}
}
}
xhr.open('GET', 'resource.bmp', true);
xhr.send();
}
</script>
</head>
<body>
<img class="depends-on-resource" data-resource-name="resource.bmp" src="resource.bmp"></img>
<button onclick="sync()">sync</button>
</body>
</html>
I've found Chrome specifically tries to get clever with the URL arguments solution on images. That method to avoid cache only works some of the time.
The most reliable solution I've found is to add both a URL argument (E.g. time stamp or file version) AND also change the capitalisation of the image file extension in the URL.
<img src="picture.jpg">
becomes
<img src="picture.JPG?t=current_time">
All the Answers are valid as it works fine. But with that, the browser also creates another file in the cache every time it loads that image with a different URL. So instead of changing the URL by adding some query params to it.
So, what we can do is we can update the browser cache using cache.put
caches.open('YOUR_CACHE_NAME').then(cache => {
const url = 'URL_OF_IMAGE_TO_UPDATE'
fetch(url).then(res => {
cache.put(url, res.clone())
})
})
cache.put updates the cache with a new response.
for more: https://developer.mozilla.org/en-US/docs/Web/API/Cache/put
I made a PHP script that automatically appends the timestamps on all images and also on links. You just need to include this script in your pages. Enjoy!
http://alv90.altervista.org/how-to-force-the-browser-not-to-cache-images/
Best solution is to provide current time at the end of the source href like
<img src="www.abc.com/123.png?t=current_time">
this will remove the chances of referencing the already cache image.
To get the recent time one can use performance.now() function in jQuery or javascript.
Related
I'm currently working on a project where I need to create a "dashboard" which can be exported as pdf. I wanted to use Rotativa but as our application uses .NET framework 4.0 it's not possible. So I found the NReco PdfGenerator.
Now that's the code how I create the PDF result:
var ViewAsString = RenderViewAsString("~/Views/QMetrics/StandardDashboard.cshtml", viewModel);
var htmlToPdf = new NReco.PdfGenerator.HtmlToPdfConverter();
htmlToPdf.PageWidth = 1600;
htmlToPdf.PageHeight = 900;
var pdfBytes = htmlToPdf.GeneratePdf(ViewAsString);
FileResult FileResult = new FileContentResult(pdfBytes, "application/pdf");
FileResult.FileDownloadName = "Dashboard-" + viewModel.ProjectName + "-" +
DateTime.Now.ToString() + "-.pdf";
return FileResult;
It successfully creates the PDF page with all the content that comes from the backend (Project information, and so on) but the page looks very ugly. On the original page I have 2 columns and on the PDF page it puts everything in one column. I tried a few different page sizes and I also changed the layout to be non-responsive but nothing has changed.
My first suggesstion was that the referenced CSS and JS files are not included when the PDF get's created, so I copied all the stuff that comes from external files (bootstrap, Chart.js) and pasted it directly in the .cshtml file. But nothing changed at all. My Chart is not rendering/loading and the missing CSS stuff is still not there.
On the NReco PDFGenerator website they say that it supports complex CSS code and also javascript code so I don't really understand why this is not working.
Has anyone here experiences with NReco or can someone recommend something else that works for .NET 4.0?
NReco PdfGenerator internally uses wkhtmltopdf tool, so you can check it and its options.
Regarding 2 columns: if you don't use flex/grid layout everything should work fine. Possibly you need to disable wkhtmltopdf smart shrinking logic (enabled by default) and define web page 'window' size explicitely (with "--viewport-size 1600" option).
Regarding CSS and charts: you need to check that CSS files could be accessed by wkhtmltopdf, simplest way to do that is running wkhtmltopdf.exe from the command line and check console log output (or, handle PdfGenerator's "LogReceived" event in C#). For Chart.js ensure that chart container div has explicit width (not in %), and that there are no js errors (you can get them in console by specifying "--debug-javascript" option). If your js code uses 'bind' method you have to include polyfill as WebKit engine version used in wkhtmltopdf doesn't support 'bind'.
I'm uploading image to server and then processing the image. Funny thing is, after uploading the image image keywords are missing. Although other image properties are there.
There is no issue with parsing the tags, so please ignore below code snippet.
using (var xmp = Xmp.FromFile(workingFilePath, XmpFileMode.ReadOnly))
{
var iptc = new Iptc(xmp);
var Keywords = iptc.Keywords;
}
Note: I'm using FineUploader to upload image.
FineUploader configuration -
var manualUploader = new qq.FineUploader({
element: document.getElementById('fine-uploader-manual-trigger'),
template: 'qq-template-manual-trigger',
request: {
endpoint: '/image/uploadimage',
params: {
datestamp: datetimeStamp
}
},
callbacks: {
},
autoUpload: false,
multiple: true
});
qq(document.getElementById("trigger-upload")).attach("click", function () {
manualUploader.uploadStoredFiles();
});
Fineuploader log -
[Fine Uploader 5.10.1] Received 1 files.
[Fine Uploader 5.10.1] Attempting to validate image.
[Fine Uploader 5.10.1] Generating new thumbnail for 0
[Fine Uploader 5.10.1] Attempting to draw client-side image preview.
[Fine Uploader 5.10.1] Attempting to determine if _DSE8404.jpg can be rendered in this browser
[Fine Uploader 5.10.1] First pass: check type attribute of blob object.
[Fine Uploader 5.10.1] Second pass: check for magic bytes in file header.
[Fine Uploader 5.10.1] '_DSE8404.jpg' is able to be rendered in this browser
[Fine Uploader 5.10.1] Moving forward with EXIF header parsing for '_DSE8404.jpg'
[Fine Uploader 5.10.1] EXIF Byte order is little endian
[Fine Uploader 5.10.1] Found 10 APP1 directory entries
[Fine Uploader 5.10.1] Successfully parsed some EXIF tags
[Fine Uploader 5.10.1] Sending simple upload request for 0
[Fine Uploader 5.10.1] xhr - server response received for 0
Edit :
Looks like I found the issue. There are some Icelandic character in tags. Thats making the problem. Anyone know how to solve this!
Latest Edit
If those tags have been added from Adobe Photoshop Lightroom then facing the issue. But if the same tags are added from windows machine by updating properties, it works!
There could be two causes of your problem :
At some point you are rewriting your picture, probably with a class that either does not properly handle tags or strip them out because of its configuration.
If you just save the exact binary content you receive from the client you will also retrieve your original tags, provided your image file is formatted the way you expect it to be.
If your image file is stored differently from what you expect, the tags may not be retrieved depending on the way you are extracting them.
For instance, JPG/JPEG tags can be stored in various manner (XMP beeing one).
Check the following link for more details. You will see there are other way to store tags (such as EXIF, Extended XMP, QVCI, FLIR).
To retrieve these tags you will have to parse them according to the way they are embedded in your image file.
From the server-side code you posted, you only seems to parse XMP tags. Depending on the software used to encode the original image, tags may be stored in an alternative format.
Although it look obvious, my advise would be :
to ensure that your workflow does not involve any explicit or implicit image manipulation between the content sent by the client to the content saved on the server.
That being said you will also have to ensure you are extracting tags with an appropriate way, depending on their format.
JPEG files can be really difficult to handle properly because of the various ways they may be stored.
Basiclly I'm trying to create an HTML, I already have it written but I want the user to be able to put some text on the textboxes and saving it into strings and use later when creating the HTML file.
I tried playing abit with StreamWriter but I don't think that will be the best idea.
Also I want it to open on the default web browser , or just on IE if it's easier after the file is created.
I really need help as I'm struggling especially with the creating part.
Thanks for reading!
You can also do this without external libraries.
Set up your HTML file as follows:
<!DOCTYPE html>
<html>
<header>
<title>{MY_TITLE}</title>
</header>
<body></body>
</html>
Then edit and save the HTML from C#:
const string fileName = "Foobar.html";
//Read HTML from file
var content = File.ReadAllText(fileName);
//Replace all values in the HTML
content = content.Replace("{MY_TITLE}", titleTextBox.Text);
//Write new HTML string to file
File.WriteAllText(fileName, content);
//Show it in the default application for handling .html files
Process.Start(fileName);
If you already have the HTML you want to export (just not customized), you could manually add format strings to it (like {0}, {1}, {2}) where you want to substitute text from your app, then embed it as a resource, load it in at runtime, substitute the TextBox text using string.Format, and finally write it out again. This is admittedly a really fragile way to do it, as you need to make sure the number of parameters agrees between the resource file and your call to string.Format. In fact, this is a horrible way to do it. Actually, you should do it the way #EmilePels suggests, which is basically a less fragile version of this answer.
I am unable to use the drag-and-drop functionality within DotNetNuke version 7.1.
The drag-and-drop functionality of the Telerik RadEditor takes the browser's Base64 input and encases it in an img tag where the source is the data. E.g., src="data:image/jpeg;base64,[base64data]".
When using drag/drop to a RadEditor within the HTML Module and then saving the HTML content, that src definition is changed to a URI request by prepending the relative path for the DNN portal. E.g., src="/mysite/portals/0/data:image/jpeg;base64,[base64data]".
This converts what started out as a perfectly valid embedded image tag into a request and thereby causes the browser to request this "image" from the server. The server then returns a 414 error (URI too long).
Example without prepended relative path: http://jsfiddle.net/GGGH/27Tbb/2/
<img src="data:image/jpeg;base64,[stuff]>
Example with prepended relative path (won't display): http://jsfiddle.net/GGGH/NL85G/2/
<img src="mysite/portals/0/data:image/jpeg;base64,[stuff]>
Is there some configuration that I've missed? Prepending relative paths is OK for src="/somephysicalpath" but not for src="data:image...".
I ended up solving the problem prior to posting the question but wanted to add this knowledge to SO in case someone else encountered the same problem (has no one noticed this yet?). Also, perhaps, DNN or the community can improve upon my solution and that fix can make it into a new DNN build.
I've looked at the source code for RadEditor, RadEditorProvider and then finally the Html module itself. It seems the problem is in the EditHtml.ascx.cs, FormatContent() method which calls the HtmlTextController's ManageRelativePaths() method. It's that method that runs for all "src" tags (and "background") in the Html content string. It post-processes the Html string that comes out of the RadEditor to add in that relative path. This is not appropriate when editing an embedded Base64 image that was dragged to the editor.
In order to fix this, and still allow for the standard functionality originally intended by the manufacturer, the DotNetNuke.Modules.Html.EditHtm.ascx.cs, ManageRelativePaths needs to be modified to allow for an exception if the URI includes a "data:image" string at its beginning. Line 488 (as of version 7.1.0) is potentially appropriate. I added the following code (incrementing P as appropriate and positioned after the URI length was determined -- I'm sure there's a better way but this works fine):
// line 483, HtmlTextController.cs, DNN code included for positioning
while (P != -1)
{
sbBuff.Append(strHTML.Substring(S, P - S + tLen));
// added code
bool skipThisToken = false;
if (strHTML.Substring(P + tLen, 10) == "data:image") // check for base64 image
skipThisToken = true;
// end added code - back to standard DNN
//keep characters left of URL
S = P + tLen;
//save startpos of URL
R = strHTML.IndexOf("\"", S);
//end of URL
if (R >= 0)
{
strURL = strHTML.Substring(S, R - S).ToLower();
}
else
{
strURL = strHTML.Substring(S).ToLower();
}
// added code to continue while loop after the integers were updated
if (skipThisToken)
{
P = strHTML.IndexOf(strToken + "=\"", S + strURL.Length + 2, StringComparison.InvariantCultureIgnoreCase);
continue;
}
// end added code -- the method continues from here (not reproduced)
This is probably not the best solution as its searching for a hard coded value. Better would be functionality that allows the developers to add tags later. (But, then again, EditHtml.ascx.cs and HtmlTextController both hard code the two tags that they intend to post-process.)
So, after making this small change, recompiling the DotNetNuke.Modules.Html.dll and deploying, drag-and-drop should be functional. Obviously this increases the complexity of an upgrade -- it would be better if this were fixed by DNN themselves. I verified that as of v7.2.2 this issue still exists.
UPDATE: Fixed in DNN Community Version 7.4.0
I have C# code for fetching images from URLs like http://i.imgur.com/QvkaduU.jpg but how would I fetch the image from Web pages like this:http://imgur.com/gallery/QvkaduU?
Is there any "easy" way to do this or I will have to fetch the HTML and construct a C# parser that looks in HTML for images that are bigger than all the others?
Let me clear this up. If you paste http://imgur.com/gallery/QvkaduU (HTML version) into for example Facebook's status update field it will find the main image and make a thumbnail out of it, this is exactly the behavior I'm looking for. The question is, how is this done? Do I have to write my own HTML parser or is there an easy way to get this?
There is no easy way to get a "good" thumbnail image for an arbitrary URL.
Facebook's algorithm for doing so is fairly complex. Page developers are able to give it a hint by adding various meta tags to the <head>, including:
<meta property="og:image" content="http://url_to_your_image_here" />
or
<link rel="image_src" href="http://www.code-digital.co.uk/preview.jpg" />
(more on this)
... so if you wanted to replicate Facebook's algorithm, you would need to fetch the page source, parse it for any "hints" like the one above (you'd better check that I haven't missed any other "hint" formats), and come up with a fallback algorithm if the page doesn't include one of those.
A more realistic solution would be to use someone else's URL -> thumbnail system.
If you like Facebook's version, I think you should be able to request Facebook's thumbnail for a given URL via their API.
Other services which offer this sort of thing are:
http://webthumb.bluga.net/home (not free)
http://immediatenet.com/thumbnail_api.html (free, may have restrictive TOS)
https://www.google.com/search?q=get+thumbnail+for+url
If the QvkaduU part is always the same between the html page and the image, could you just do a string replacement?
"http://imgur.com/gallery/QvkaduU".Replace("imgur.com/gallery","i.imgur.com") + ".jpg";
I would fetch the whole HTML source and put all <img ... src="..."> parameters as well as < ... style="... background-image: ...;"> css inline properties using regex and try to download all files behind the links temporary. Then I would (try to convert it to Bitmap and) check the pixel size, the largest picture should be the picture you want.
Google might help you how to check pixel size and convert any images.
The regex to get all image links from a HTML source should be
<img[^>]+src=\"([^"]+)\".*?>|<[^>]+style=\"[^"]*background-image:\s*url\(\s*'?([^')])\s*'?)\s*;.*?> (not tested, but pretty sure)
The result will be in the 2nd or 3rd group index, also don't forget to prefix the current url on relative links.
You're already on the right track, yes the most reliable way would be to fetch the HTML, parse it and look for images, you would then rank the images based on position and size. For instance, if the first image you find is big enough to make the thumbnail, then cool, if however it is small, you go to the next image, etc. It would be most advisable to use an image plugin like Timthumb (I think I've seen an ASP.NET version sometime) and cache the images such that once you've looked up the thumbnail to represent a website, you can call the image(s) from the catch instead.
Can you try to do something like this?
public void ProcessRequest(HttpContext context)
{
{
// load here the image
....
// and send it to browser
ctx.Response.OutputStream.Write(imageData, 0, imageData.Length);
}
}
You can also try what they are talking about here. I tried it and it worked like a charm.
http://www.dotnetspider.com/resources/42565-Download-images-from-URL-using-C.aspx
can you try this
public Bitmap getImageFromURL(String sURL)
{
HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(sURL);
myRequest.Method = "GET";
HttpWebResponse myResponse = (HttpWebResponse)myRequest.GetResponse();
System.Drawing.Bitmap bmp = new System.Drawing.Bitmap(myResponse.GetResponseStream());
myResponse.Close();
return bmp;
}
gotten from
How to get an image to a pictureBox from an URL? (Windows Mobile)