MVC routing when a file actually exists at the specified location - c#

So I have a route like this in my MVC 3 application running under IIS 7:
routes.MapRoute(
"VirtualTourConfig",
"virtualtour/config.xml",
new { controller = "VirtualTour", action = "Config" }
);
The trick is that a file actually exists at /virtualtour/config.xml. It seems like the request is just returning the xml file at that location instead of hitting the route, which processes the XML, makes some changes and returns a custom XmlResult.
Any suggestions on how I can tell my application to hit the route and not the actual file in the event that the file exists on disk?
EDIT: It appears that I can use routes.RouteExistingFiles = true; in the RegisterRoutes method of Global.asax to tell the application to ignore files on disk. This, however, sets the flag globally and breaks a lot of other requests within the application. For example, I still want calls to /assets/css/site.css to return the CSS file without having to specifically set routes up for each static asset. So now the question becomes, is there a way to do this on a per-route basis?

So far the best answer to this that I have found is to globally apply routes.RouteExistingFiles=true and then selectively ignore the routes I want to pass through to existing files like .js, .css, etc. So I ended up with something like this:
public static void RegisterRoutes(RouteCollection routes)
{
routes.IgnoreRoute("{resource}.axd/{*pathInfo}");
routes.IgnoreRoute("*.js|css|swf");
routes.RouteExistingFiles = true;
routes.MapRoute(
"VirtualTourConfig",
"virtualtour/config.xml",
new { controller = "VirtualTour", action = "Config" }
);
}
If anyone has a better solution, I'd like to see it. I'd much prefer to selectively apply an "RouteExistingFIles" flag to individual routes but I don't know if there's a way to do that.

No solution here, just an idea.
You can try to implement solution based on your own VirtualPathProvider that will provide your own mapping of web paths to file system paths and use default provider for all paths you don't want to take care of.

A very simple solution, certainly if you consider Scott's answer, would be eg: routes.IgnoreRoute("templates/*.html");,
routes.IgnoreRoute("scripts/*.js"); and routes.IgnoreRoute("styles/*.css");.
This gives you both the simplicity of just supplying paths to the RoutesCollection and avoids the work of having to initiate eg. a VirtualPathProvider.

You could try UrlRewiting e.g.:
<configuration>
<system.webServer>
<rewrite>
<rules>
<rule name="VirtualTourConfig">
<match url="^virtualtour/config.xml" />
<action type="Rewrite" url="virtualtour/config" />
</rule>
</rules>
</rewrite>
NB I'm using this for a different use-case (i.e. serving an angular app from asp.net MVC) - I haven't tested whether MVC routing occurs AFTER url rewriting.
In my case I have a normal MVC route (i.e. /Dashboard/ => DashboardController.Index()) but need all relative paths in the Views/Dashboard/Index.cshtml to serve static files without getting confused by MVC routing i.e. /Dashboard/app/app.module.js must serve a static file. I use UrlRewriting to map /Dashboard/(.+) to /ng/Dashboard/{R:1} as follows:
<configuration>
<system.webServer>
<rewrite>
<rules>
<rule name="Dashboard">
<match url="^dashboard/(.+)" />
<action type="Rewrite" url="ng/dashboard/{R:1}" />
</rule>
</rules>
</rewrite>

Related

URL Rewrite Rule without page extension

I have a project on .Net Framework 2.0 in which i need to call some pages without page extension, that means I have to remove the .aspx from url and also I need to pass some Query String data. URL rewrite has been implemented currently in the following way but it does not remove .aspx
<configuration>
<modulesSection>
<rewriteModule>
<rewriteOn>true</rewriteOn>
<rewriteRules>
<rule source="Admin/TheFetus/(.*)" destination="Admin/Fetus/$1"/>
<rule source="CaseDetails/(.*).aspx" destination="Client/Cases/CaseDetails.aspx"/>
<!--<rule source="ArticleDetails/(.*).aspx" destination="Client/Articles/ArticleDetails.aspx"/>-->
<rule source="ArticleDetails" destination="Client/Articles/ArticleDetails.aspx"/>
<rule source="ChapterDetails/(.*).aspx" destination="Client/Chapters/ChapterDetails.aspx"/>
<rule source="LectureDetails/(.*).aspx" destination="Client/Lectures/LectureDetails.aspx"/>
<rule source="ConventionDetails/(.*).aspx" destination="Client/Conventions/ConventionDetails.aspx"/>
<rule source="IfserDetails/(.*).aspx" destination="Client/Ifser/IfserDetails.aspx"/>
<rule source="Client/Fetus/Files/(.*)" destination="Client/Fetus/Files/$1"/>
<rule source="Fetus/Files/(.*)" destination="Client/Fetus/Files/$1"/>
<rule source="Client/Fetus/Index.php" destination="Client/Fetus/Home.aspx"/>
<rule source="Fetus/Index.php" destination="Client/Fetus/Home.aspx"/>
<rule source="Client/Fetus/(.*).php(.*)" destination="Client/Fetus/$1.aspx$2"/>
<rule source="Fetus/(.*).php(.*)" destination="Client/Fetus/$1.aspx$2"/>
<rule source="Admin/Fetus/(.*)" destination="Admin/Fetus/$1"/>
<rule source="Client/Fetus/(.*)" destination="Client/Fetus/$1"/>
<rule source="Fetus/(.*)" destination="Client/Fetus/$1"/>
<rule source="bannerspecs" destinatiofn="Client/FooterLinks/BannerSpecs.aspx"/>
<rule source="Client/TheFetus/Files/(.*)" destination="Client/Fetus/Files/$1"/>
<rule source="TheFetus/Files/(.*)" destination="Client/Fetus/Files/$1"/>
<rule source="Client/TheFetus/Index.php" destination="Client/Fetus/Home.aspx"/>
<rule source="TheFetus/Index.php" destination="Client/Fetus/Home.aspx"/>
<rule source="Client/TheFetus/(.*).php(.*)" destination="Client/Fetus/$1.aspx$2"/>
<rule source="TheFetus/(.*).php(.*)" destination="Client/Fetus/$1.aspx$2"/>
<rule source="Client/TheFetus/(.*).php(.*)" destination="Client/Fetus/$1.aspx$2"/>
<rule source="Client/TheFetus/(.*)" destination="Client/Fetus/$1"/>
<rule source="TheFetus/(.*)" destination="Client/Fetus/$1"/>
<rule source="(.*)/Default.aspx" destination="Default.aspx?Folder=$1"/>
<rule source="LATAM.aspx" destination="Client/MiniSites/MiniSiteDetails.aspx?MiniSiteId=10"/>
</rewriteRules>
</rewriteModule>
</modulesSection>
</configuration>
How Can I replace the current web.config code in order to achieve the url rewrite without .aspx extension, by passing some query string parameters in .Net framework 2.0
When you address a .aspx page, you address it directly. Meaning, that you have to address the url as .aspx, else it won't find the requested page, because there aren't any.
This is the big advantage MVC brought, when you can address your cshtml view and the url will show /Home/Index without the cshtml extenstion.
There are two ways to achieve what you ask, when it comes to .aspx page.
Globl.asax
public class Global : System.Web.HttpApplication
{
//Register your routes, match a custom URL with an .aspx file.
private void RegisterRoutes(RouteCollection routes)
{
routes.MapPageRoute("About", "about", "~/about.aspx");
routes.MapPageRoute("Index", "index", "~/index.aspx");
}
//Init your new route table inside the App_Start event.
protected void Application_Start(object sender, EventArgs e)
{
this.RegisterRoutes(RouteTable.Routes);
}
}
you override the map page route and declare the routes without .aspx extension, the way MVC does.
This is called custom route handler using Url Routing
Url Rewrite
As you did, url rewrite through web.config.
<urlMappings enabled="true">
<add url="~/questions/ask" mappedUrl="~/questions/ask.aspx?page=Ask"/>
</urlMappings>
You use the url mapping to map you .aspx url to a new route, this way you allow your web forms pages to rewrite the url without the .aspx extension.
<rule source>
Won't work as it does not rewrite your url, it is directing you to the exact file so if you do not use the .aspx extension, the file does not exist and exception is thrown.
You will have to use UrlMappings if you want to remove the .aspx extension.
As I wrote in the comment it is not clear which URLs should be redirected without ".aspx" but still, why can't you do the same trick you already do for ".php" such as in
<rule source="Client/Fetus/(.*).php(.*)" destination="Client/Fetus/$1.aspx$2"/>
So assuming you want to change URLs under old "Admin/TheFetus" folder, you might change your first rule to something like:
<rule source="Admin/TheFetus/(.*).aspx(.*)" destination="Admin/Fetus/$1$2"/>
I think what will work well for you is a simple Custom Filter Module: You can Build this as a DLL Project and Target .net 2.0 - 3.5.
This will enable you rewrite and redirect the request using all String functions as well ass all HTTP classes (Request, Response Etc. )
(Since I don't understand your redirect rules specifically, I can't do that part for you, but I am sure you can figure it out). Otherwise please explain the rules better and I will be happy to do so.
using System;
using System.Web;
using System.Threading;
//using System.Threading.Tasks;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace FilterModule
{
class AspxFilter : IHttpModule
{
public void Init(HttpApplication app)
{
app.BeginRequest += new EventHandler(AspxRedirect);
}
private void AspxRedirect(Object s, EventArgs e)
{
HttpApplication app = s as HttpApplication;
HttpRequest req = app.Request;
HttpContext context = app.Context;
string filePath = context.Request.FilePath;
string fileExtension = VirtualPathUtility.GetExtension(filePath);
string fileName = VirtualPathUtility.GetFileName(filePath);
if (fileExtension.ToLower() == ".aspx")
{
if (req.QueryString["Redirect"] != null)
{
String RedirectPath = "Redirect.html";
// Build your redirect Path as needed based on the rquest String
context.Response.Redirect(RedirectPath);
}
else
{
}
}
}
public void Dispose()
{
}
}
}
You can then implement the module just by adding/ editing your Web.Config, accordingly:
Implementation
1.Add the compiled DLL (FilterModule.dll) to your site's bin Directory.
2.Add the following to module definition in the Web Service (or site) configuration file (web.config)
in the <system.webServer> section under <modules>.
add the following:
<system.webServer>
<modules>
<add name ="FilterModule" type="FilterModule.AspxFilter" />
Update: I have since tested my answer and it works as expected. Please let me know if you have any trouble implementing it.

How to hide web-application from search bots? (ASP.NET)

I have a web-application for the company's inner needs. However, customers should have access only by some specific URL sent by company.
So, to make things short, site can be accessed by everyone, but only by typing concrete URL. No search engine allowed to index any page of the site.
How can I reach it?
I found an approach based on robots.txt, but could it be implemented in web.config file?
There is a Web.Config solution as well, based on the User Agent. It uses the IIS rewrite module. But it is not foolproof since User Agents can be easly changed. Note that I did not create the list myself but found it recently on the web, but cannot seem to find the article anymore.
<rewrite>
<rules>
<rule name="Abuse User Agents Blocking" stopProcessing="true">
<match url=".*" ignoreCase="false" />
<conditions logicalGrouping="MatchAny">
<add input="{HTTP_USER_AGENT}" pattern="^.*(1Noonbot|1on1searchBot|3D_SEARCH|3DE_SEARCH2|3GSE|50.nu|192.comAgent|360Spider|A6-Indexer|AASP|ABACHOBot|Abonti|abot|AbotEmailSearch|Aboundex|AboutUsBot|AccMonitor\ Compliance|accoona|AChulkov.NET\ page\ walker|Acme.Spider|AcoonBot|acquia-crawler|ActiveTouristBot|Acunetix|Ad\ Muncher|AdamM|adbeat_bot|adminshop.com|Advanced\ Email|AESOP_com_SpiderMan|AESpider|AF\ Knowledge\ Now\ Verity|aggregator:Vocus|ah-ha.com|AhrefsBot|AIBOT|aiHitBot|aipbot|AISIID|AITCSRobot|Akamai-SiteSnapshot|AlexaWebSearchPlatform|AlexfDownload|Alexibot|AlkalineBOT|All\ Acronyms|Amfibibot|AmPmPPC.com|AMZNKAssocBot|Anemone|Anonymous|Anonymouse.org|AnotherBot|AnswerBot|AnswerBus|AnswerChase\ PROve|AntBot|antibot-|AntiSantyWorm|Antro.Net|AONDE-Spider|Aport|Aqua_Products|AraBot|Arachmo|Arachnophilia|archive.org_bot|aria\ eQualizer|arianna.libero.it|Arikus_Spider|Art-Online.com|ArtavisBot|Artera|ASpider|ASPSeek|asterias|AstroFind|athenusbot|AtlocalBot|Atomic_Email_Hunter|attach|attrakt|attributor|Attributor.comBot|augurfind|AURESYS|AutoBaron|autoemailspider|autowebdir|AVSearch-|axfeedsbot|Axonize-bot|Ayna|b2w|BackDoorBot|BackRub|BackStreet\ Browser|BackWeb|Baiduspider-video|Bandit|BatchFTP|baypup|BDFetch|BecomeBot|BecomeJPBot|BeetleBot|Bender|besserscheitern-crawl|betaBot|Big\ Brother|Big\ Data|Bigado.com|BigCliqueBot|Bigfoot|BIGLOTRON|Bilbo|BilgiBetaBot|BilgiBot|binlar|bintellibot|bitlybot|BitvoUserAgent|Bizbot003|BizBot04|BizBot04\ kirk.overleaf.com|Black.Hole|Black\ Hole|Blackbird|BlackWidow|bladder\ fusion|Blaiz-Bee|BLEXBot|Blinkx|BlitzBOT|Blog\ Conversation\ Project|BlogMyWay|BlogPulseLive|BlogRefsBot|BlogScope|Blogslive|BloobyBot|BlowFish|BLT|bnf.fr_bot|BoaConstrictor|BoardReader-Image-Fetcher|BOI_crawl_00|BOIA-Scan-Agent|BOIA.ORG-Scan-Agent|boitho.com-dc|Bookmark\ Buddy|bosug|Bot\ Apoena|BotALot|BotRightHere|Botswana|bottybot|BpBot|BRAINTIME_SEARCH|BrokenLinkCheck.com|BrowserEmulator|BrowserMob|BruinBot|BSearchR&D|BSpider|btbot|Btsearch|Buddy|Buibui|BuildCMS|BuiltBotTough|Bullseye|bumblebee|BunnySlippers|BuscadorClarin|Butterfly|BuyHawaiiBot|BuzzBot|byindia|BySpider|byteserver|bzBot|c\ r\ a\ w\ l\ 3\ r|CacheBlaster|CACTVS\ Chemistry|Caddbot|Cafi|Camcrawler|CamelStampede|Canon-WebRecord|Canon-WebRecordPro|CareerBot|casper|cataguru|CatchBot|CazoodleBot|CCBot|CCGCrawl|ccubee|CD-Preload|CE-Preload|Cegbfeieh|Cerberian\ Drtrs|CERT\ FigleafBot|cfetch|CFNetwork|Chameleon|ChangeDetection|Charlotte|Check&Get|Checkbot|Checklinks|checkprivacy|CheeseBot|ChemieDE-NodeBot|CherryPicker|CherryPickerElite|CherryPickerSE|Chilkat|ChinaClaw|CipinetBot|cis455crawler|citeseerxbot|cizilla.com|ClariaBot|clshttp|Clushbot|cmsworldmap|coccoc|CollapsarWEB|Collector|combine|comodo|conceptbot|ConnectSearch|conpilot|ContentSmartz|ContextAd|contype|cookieNET|CoolBott|CoolCheck|Copernic|Copier|CopyRightCheck|core-project|cosmos|Covario-IDS|Cowbot-|Cowdog|crabbyBot|crawl|Crawl_Application|crawl.UserAgent|CrawlConvera|crawler|crawler_for_infomine|CRAWLER-ALTSE.VUNET.ORG-Lynx|crawler-upgrade-config|crawler.kpricorn.org|crawler#|crawler4j|crawler43.ejupiter.com|Crawly|CreativeCommons|Crescent|Crescent\ Internet\ ToolPak\ HTTP\ OLE\ Control|cs-crawler|CSE\ HTML\ Validator|CSHttpClient|Cuasarbot|culsearch|Curl|Custo|Cutbot|cvaulev|Cyberdog|CyberNavi_WebGet|CyberSpyder|CydralSpider).*$" />
<add input="{HTTP_USER_AGENT}" pattern="^.*(D1GArabicEngine|DA|DataCha0s|DataFountains|DataparkSearch|DataSpearSpiderBot|DataSpider|Dattatec.com|Dattatec.com-Sitios-Top|Daumoa|DAUMOA-video|DAUMOA-web|Declumbot|Deepindex|deepnet|DeepTrawl|dejan|del.icio.us-thumbnails|DelvuBot|Deweb|DiaGem|Diamond|DiamondBot|diavol|DiBot|didaxusbot|DigExt|Digger|DiGi-RSSBot|DigitalArchivesBot|DigOut4U|DIIbot|Dillo|Dir_Snatch.exe|DISCo|DISCo\ Pump|discobot|DISCoFinder|Distilled-Reputation-Monitor|Dit|DittoSpyder|DjangoTraineeBot|DKIMRepBot|DoCoMo|DOF-Verify|domaincrawler|DomainScan|DomainWatcher|dotbot|DotSpotsBot|Dow\ Jonesbot|Download|Download\ Demon|Downloader|DOY|dragonfly|Drip|drone|DTAAgent|dtSearchSpider|dumbot|Dwaar|Dwaarbot|DXSeeker|EAH|EasouSpider|EasyDL|ebingbong|EC2LinkFinder|eCairn-Grabber|eCatch|eChooseBot|ecxi|EdisterBot|EduGovSearch|egothor|eidetica.com|EirGrabber|ElisaBot|EllerdaleBot|EMail\ Exractor|EmailCollector|EmailLeach|EmailSiphon|EmailWolf|EMPAS_ROBOT|EnaBot|endeca|EnigmaBot|Enswer\ Neuro|EntityCubeBot|EroCrawler|eStyleSearch|eSyndiCat|Eurosoft-Bot|Evaal|Eventware|Everest-Vulcan|Exabot|Exabot-Images|Exabot-Test|Exabot-XXX|ExaBotTest|ExactSearch|exactseek.com|exooba|Exploder|explorersearch|extract|Extractor|ExtractorPro|EyeNetIE|ez-robot|Ezooms|factbot|FairAd\ Client|falcon|Falconsbot|fast-search-engine|FAST\ Data\ Document|FAST\ ESP|fastbot|fastbot.de|FatBot|Favcollector|Faviconizer|FDM|FedContractorBot|feedfinder|FelixIDE|fembot|fetch_ici|Fetch\ API\ Request|fgcrawler|FHscan|fido|Filangy|FileHound|FindAnISP.com_ISP_Finder|findlinks|FindWeb|Firebat|Fish-Search-Robot|Flaming\ AttackBot|Flamingo_SearchEngine|FlashCapture|FlashGet|flicky|FlickySearchBot|flunky|focused_crawler|FollowSite|Foobot|Fooooo_Web_Video_Crawl|Fopper|FormulaFinderBot|Forschungsportal|fr_crawler|Francis|Freecrawl|FreshDownload|freshlinks.exe|FriendFeedBot|frodo.at|froGgle|FrontPage|Froola|FU-NBI|full_breadth_crawler|FunnelBack|FunWebProducts|FurlBot|g00g1e|G10-Bot|Gaisbot|GalaxyBot|gazz|gcreep|generate_infomine_category_classifiers|genevabot|genieBot|GenieBotRD_SmallCrawl|Genieo|Geomaxenginebot|geometabot|GeonaBot|GeoVisu|GermCrawler|GetHTMLContents|Getleft|GetRight|GetSmart|GetURL.rexx|GetWeb!|Giant|GigablastOpenSource|Gigabot|Girafabot|GleameBot|gnome-vfs|Go-Ahead-Got-It|Go!Zilla|GoForIt.com|GOFORITBOT|gold|Golem|GoodJelly|Gordon-College-Google-Mini|goroam|GoSeebot|gotit|Govbot|GPU\ p2p|grab|Grabber|GrabNet|Grafula|grapeFX|grapeshot|GrapeshotCrawler|grbot|GreenYogi\ [ZSEBOT]|Gromit|GroupMe|grub|grub-client|Grubclient-|GrubNG|GruBot|gsa|GSLFbot|GT::WWW|Gulliver|GulperBot|GurujiBot|GVC|GVC\ BUSINESS|gvcbot.com|HappyFunBot|harvest|HarvestMan|Hatena\ Antenna|Hawler|Hazel's\ Ferret\ hopper|hcat|hclsreport-crawler|HD\ nutch\ agent|Header_Test_Client|healia|Helix|heritrix|hijbul-heritrix-crawler|HiScan|HiSoftware\ AccMonitor|HiSoftware\ AccVerify|hitcrawler_|hivaBot|hloader|HMSEbot|HMView|hoge|holmes|HomePageSearch|Hooblybot-Image|HooWWWer|Hostcrawler|HSFT\ -\ Link|HSFT\ -\ LVU|HSlide|ht:|htdig|Html\ Link\ Validator|HTMLParser|HTTP::Lite|httplib|HTTrack|Huaweisymantecspider|hul-wax|humanlinks|HyperEstraier|Hyperix).*$" />
<add input="{HTTP_USER_AGENT}" pattern="^.*(ia_archiver|IAArchiver-|ibuena|iCab|ICDS-Ingestion|ichiro|iCopyright\ Conductor|id-search|IDBot|IEAutoDiscovery|IECheck|iHWebChecker|IIITBOT|iim_405|IlseBot|IlTrovatore|Iltrovatore-Setaccio|ImageBot|imagefortress|ImagesHereImagesThereImagesEverywhere|ImageVisu|imds_monitor|imo-google-robot-intelink|IncyWincy|Industry\ Cortexcrawler|Indy\ Library|indylabs_marius|InelaBot|Inet32\ Ctrl|inetbot|InfoLink|INFOMINE|infomine.ucr.edu|InfoNaviRobot|Informant|Infoseek|InfoTekies|InfoUSABot|INGRID|Inktomi|InsightsCollector|InsightsWorksBot|InspireBot|InsumaScout|Intelix|InterGET|Internet\ Ninja|InternetLinkAgent|Interseek|IOI|ip-web-crawler.com|Ipselonbot|Iria|IRLbot|Iron33|Isara|iSearch|iSiloX|IsraeliSearch|IstellaBot|its-learning|IU_CSCI_B659_class_crawler|iVia|iVia\ Page\ Fetcher|JadynAve|JadynAveBot|jakarta|Jakarta\ Commons-HttpClient|Java|Jbot|JemmaTheTourist|JennyBot|Jetbot|JetBrains\ Omea\ Pro|JetCar|Jim|JoBo|JobSpider_BA|JOC|JoeDog|JoyScapeBot|JSpyda|JubiiRobot|jumpstation|Junut|JustView|Jyxobot|K.S.Bot|KakcleBot|kalooga|KaloogaBot|kanagawa|KATATUDO-Spider|Katipo|kbeta1|Kenjin.Spider|KeywenBot|Keyword.Density|Keyword\ Density|kinjabot|KIT-Fireball|Kitenga-crawler-bot|KiwiStatus|kmbot-|kmccrew|Knight|KnowItAll|Knowledge.com|Knowledge\ Engine|KoepaBot|Koninklijke|KrOWLer|KSbot|kuloko-bot|kulturarw3|KummHttp|Kurzor|Kyluka|L.webis|LabelGrab|Labhoo|labourunions411|lachesis|Lament|LamerExterminator|LapozzBot|larbin|LARBIN-EXPERIMENTAL|LBot|LeapTag|LeechFTP|LeechGet|LetsCrawl.com|LexiBot|LexxeBot|lftp|libcrawl|libiViaCore|libWeb|libwww|libwww-perl|likse|Linguee|Link|link_checker|LinkAlarm|linkbot|LinkCheck\ by\ Siteimprove.com|LinkChecker|linkdex.com|LinkextractorPro|LinkLint|linklooker|Linkman|LinkScan|LinksCrawler|LinksManager.com_bot|LinkSweeper|linkwalker|LiteFinder|LitlrBot|Little\ Grabber\ at\ Skanktale.com|Livelapbot|LM\ Harvester|LMQueueBot|LNSpiderguy|LoadTimeBot|LocalcomBot|locust|LolongBot|LookBot|Lsearch|lssbot|LWP|lwp-request|lwp-trivial|LWP::Simple|Lycos_Spider|Lydia\ Entity|LynnBot|Lytranslate|Mag-Net|Magnet|magpie-crawler|Magus|Mail.Ru|Mail.Ru_Bot|MAINSEEK_BOT|Mammoth|MarkWatch|MaSagool|masidani_bot_|Mass|Mata.Hari|Mata\ Hari|matentzn\ at\ cs\ dot\ man\ dot\ ac\ dot\ uk|maxamine.com--robot|maxamine.com-robot|maxomobot|Maxthon$|McBot|MediaFox|medrabbit|Megite|MemacBot|Memo|MendeleyBot|Mercator-|mercuryboard_user_agent_sql_injection.nasl|MerzScope|metacarta|Metager2|metager2-verification-bot|MetaGloss|METAGOPHER|metal|metaquerier.cs.uiuc.edu|METASpider|Metaspinner|MetaURI|MetaURI\ API|MFC_Tear_Sample|MFcrawler|MFHttpScan|Microsoft.URL|MIIxpc|miner|mini-robot|minibot|miniRank|Mirror|Missigua\ Locator|Mister.PiX|Mister\ PiX|Miva|MJ12bot|mnoGoSearch|mod_accessibility|moduna.com|moget|MojeekBot|MOMspider|MonkeyCrawl|MOSES|Motor|mowserbot|MQbot|MSE360|MSFrontPage|MSIECrawler|MSIndianWebcrawl|MSMOBOT|Msnbot|msnbot-products|MSNPTC|MSRBOT|MT-Soft|MultiText|My_Little_SearchEngine_Project|my-heritrix-crawler|MyApp|MYCOMPANYBOT|mycrawler|MyEngines-US-Bot|MyFamilyBot|Myra|nabot|nabot_|Najdi.si|Nambu|NAMEPROTECT|NatchCVS|naver|naverbookmarkcrawler|NaverBot|Navroad|NearSite|NEC-MeshExplorer|NeoScioCrawler|NerdByNature.Bot|NerdyBot|Nerima-crawl-).*$" />
<add input="{HTTP_USER_AGENT}" pattern="^.*(T-H-U-N-D-E-R-S-T-O-N-E|Tailrank|tAkeOut|TAMU_CRAWLER|TapuzBot|Tarantula|targetblaster.com|TargetYourNews.com|TAUSDataBot|taxinomiabot|Tecomi|TeezirBot|Teleport|Teleport\ Pro|TeleportPro|Telesoft|Teradex\ Mapper|TERAGRAM_CRAWLER|TerrawizBot|testbot|testing\ of|TextBot|thatrobotsite.com|The.Intraformant|The\ Dyslexalizer|The\ Intraformant|TheNomad|Theophrastus|theusefulbot|TheUsefulbot_|ThumbBot|thumbshots-de-bot|tigerbot|TightTwatBot|TinEye|Titan|to-dress_ru_bot_|to-night-Bot|toCrawl|Topicalizer|topicblogs|Toplistbot|TopServer\ PHP|topyx-crawler|Touche|TourlentaScanner|TPSystem|TRAAZI|TranSGeniKBot|travel-search|TravelBot|TravelLazerBot|Treezy|TREX|TridentSpider|Trovator|True_Robot|tScholarsBot|TsWebBot|TulipChain|turingos|turnit|TurnitinBot|TutorGigBot|TweetedTimes|TweetmemeBot|TwengaBot|TwengaBot-Discover|Twiceler|Twikle|twinuffbot|Twisted\ PageGetter|Twitturls|Twitturly|TygoBot|TygoProwler|Typhoeus|U.S.\ Government\ Printing\ Office|uberbot|ucb-nutch|UCSD-Crawler|UdmSearch|UFAM-crawler-|Ultraseek|UnChaos|unchaos_crawler_|UnisterBot|UniversalSearch|UnwindFetchor|UofTDB_experiment|updated|URI::Fetch|url_gather|URL-Checker|URL\ Control|URLAppendBot|URLBlaze|urlchecker|urlck|UrlDispatcher|urllib|URLSpiderPro|URLy.Warning|USAF\ AFKN\|usasearch|USS-Cosmix|USyd-NLP-Spider|Vacobot|Vacuum|VadixBot|Vagabondo|Validator|Valkyrie|vBSEO|VCI|VerbstarBot|VeriCiteCrawler|Verifactrola|Verity-URL-Gateway|vermut|versus|versus.integis.ch|viasarchivinginformation.html|vikspider|VIP|VIPr|virus-detector|VisBot|Vishal\ For\ CLIA|VisWeb|vlad|vlsearch|VMBot|VocusBot|VoidEYE|VoilaBot|Vortex|voyager|voyager-hc|voyager-partner-deep|VSE|vspider).*$" />
<add input="{HTTP_USER_AGENT}" pattern="^.*(W3C_Unicorn|W3C-WebCon|w3m|w3search|wacbot|wastrix|Water\ Conserve|Water\ Conserve\ Portal|WatzBot|wauuu\ engine|Wavefire|Waypath|Wazzup|Wazzup1.0.4800|wbdbot|web-agent|Web-Sniffer|Web.Image.Collector|Web\ CEO\ Online|Web\ Image\ Collector|Web\ Link\ Validator|Web\ Magnet|webalta|WebaltBot|WebAuto|webbandit|webbot|webbul-bot|WebCapture|webcheck|Webclipping.com|webcollage|WebCopier|WebCopy|WebCorp|webcrawl.net|webcrawler|WebDownloader\ for|Webdup|WebEMailExtrac|WebEMailExtrac.*|WebEnhancer|WebFerret|webfetch|WebFetcher|WebGather|WebGo\ IS|webGobbler|WebImages|Webinator-search2.fasthealth.com|Webinator-WBI|WebIndex|WebIndexer|weblayers|WebLeacher|WeblexBot|WebLinker|webLyzard|WebmasterCoffee|WebmasterWorld|WebmasterWorldForumBot|WebMiner|WebMoose|WeBot|WebPix|WebReaper|WebRipper|WebSauger|Webscan|websearchbench|WebSite|websitemirror|WebSpear|websphinx.test|WebSpider|Webster|Webster.Pro|Webster\ Pro|WebStripper|WebTrafficExpress|WebTrends\ Link\ Analyzer|webvac|webwalk|WebWalker|Webwasher|WebWatch|WebWhacker|WebXM|WebZip|Weddings.info|wenbin|WEPA|WeRelateBot|Whacker|Widow|WikiaBot|Wikio|wikiwix-bot-|WinHttp.WinHttpRequest|WinHTTP\ Example|WIRE|wired-digital-newsbot|WISEbot|WISENutbot|wish-la|wish-project|wisponbot|WMCAI-robot|wminer|WMSBot|woriobot|worldshop|WorQmada|Wotbox|WPScan|wume_crawler|WWW-Mechanize|www.freeloader.com.|WWW\ Collector|WWWOFFLE|wwwrobot|wwwster|WWWWanderer|wwwxref|Wysigot|X-clawler|Xaldon|Xenu|Xenu's|Xerka\ MetaBot|XGET|xirq|XmarksFetch|XoviBot|xqrobot|Y!J|Y!TunnelPro|yacy.net|yacybot|yarienavoir.net|Yasaklibot|yBot|YebolBot|yellowJacket|yes|YesupBot|Yeti|YioopBot|YisouSpider|yolinkBot|yoogliFetchAgent|yoono|Yoriwa|YottaCars_Bot|you-dir|Z-Add\ Link|zagrebin|Zao|zedzo.digest|zedzo.validate|zermelo|Zeus|Zeus\ Link\ Scout|zibber-v|zimeno|Zing-BottaBot|ZipppBot|zmeu|ZoomSpider|ZuiBot|ZumBot|Zyborg|Zyte).*$" />
<add input="{HTTP_USER_AGENT}" pattern="^.*(Nessus|NESSUS::SOAP|nestReader|Net::Trackback|NetAnts|NetCarta\ CyberPilot\ Pro|Netcraft|NetID.com|NetMechanic|Netprospector|NetResearchServer|NetScoop|NetSeer|NetShift=|NetSongBot|Netsparker|NetSpider|NetSrcherP|NetZip|NetZip-Downloader|NewMedhunt|news|News_Search_App|NewsGatherer|Newsgroupreporter|NewsTroveBot|NextGenSearchBot|nextthing.org|NHSEWalker|nicebot|NICErsPRO|niki-bot|NimbleCrawler|nimbus-1|ninetowns|Ninja|NjuiceBot|NLese|Nogate|Nomad-V2.x|NoteworthyBot|NPbot|NPBot-|NRCan\ intranet|NSDL_Search_Bot|nu_tch-princeton|nuggetize.com|nutch|nutch1|NutchCVS|NutchOrg|NWSpider|Nymesis|nys-crawler|ObjectsSearch|oBot|Obvius\ external\ linkcheck|Occam|Ocelli|Octopus|ODP\ entries|Offline.Explorer|Offline\ Explorer|Offline\ Navigator|OGspider|OmiExplorer_Bot|OmniExplorer_Bot|omnifind|OmniWeb|OnetSzukaj|online\ link\ validator|OOZBOT|Openbot|Openfind|Openfind\ data|OpenHoseBot|OpenIntelligenceData|OpenISearch|OpenSearchServer_Bot|OpiDig|optidiscover|OrangeBot|ORISBot|ornl_crawler_1|ORNL_Mercury|osis-project.jp|OsO|OutfoxBot|OutfoxMelonBot|OWLER-BOT|owsBot|ozelot|P3P\ Client|page_verifier|PageBitesHyperBot|Pagebull|PageDown|PageFetcher|PageGrabber|PagePeeker|PageRank\ Monitor|pamsnbot.htm|Panopy|panscient.com|Pansophica|Papa\ Foto|PaperLiBot|parasite|parsijoo|Pathtraq|Pattern|Patwebbot|pavuk|PaxleFramework|PBBOT|pcBrowser|pd-crawler|PECL::HTTP|penthesila|PeoplePal|perform_crawl|PerMan|PGP-KA|PHPCrawl|PhpDig|PicoSearch|pipBot|pipeLiner|Pita|pixfinder|PiyushBot|planetwork|PleaseCrawl|Plucker|Plukkie|Plumtree|Pockey|Pockey-GetHTML|PoCoHTTP|pogodak.ba|Pogodak.co.yu|Poirot|polybot|Pompos|Poodle\ predictor|PopScreenBot|PostPost|PrivacyFinder|ProjectWF-java-test-crawler|ProPowerBot|ProWebWalker|PROXY|psbot|psbot-page|PSS-Bot|psycheclone|pub-crawler|pucl|pulseBot\ \(pulse|Pump|purebot|PWeBot|pycurl|Python-urllib|pythonic-crawler|PythonWikipediaBot|q1|QEAVis\ agent|QFKBot|qualidade|Qualidator.com|QuepasaCreep|QueryN.Metasearch|QueryN\ Metasearch|quest.durato|Quintura-Crw|QunarBot|Qweery_robot.txt_CheckBot|QweeryBot|r2iBot|R6_CommentReader|R6_FeedFetcher|R6_VoteReader|RaBot|Radian6|radian6_linkcheck|RAMPyBot|RankurBot|RcStartBot|RealDownload|Reaper|REBI-shoveler|Recorder|RedBot|RedCarpet|ReGet|RepoMonkey|RepoMonkey\ Bait|Riddler|RIIGHTBOT|RiseNetBot|RiverglassScanner|RMA|RoboPal|Robosourcer|robot|robotek|robots|Robozilla|rogerBot|Rome\ Client|Rondello|Rotondo|Roverbot|RPT-HTTPClient|rtgibot|RufusBot|Runnk\ online\ rss\ reader|s~stremor-crawler|S2Bot|SafariBookmarkChecker|SaladSpoon|Sapienti|SBIder|SBL-BOT|SCFCrawler|Scich|ScientificCommons.org|ScollSpider|ScooperBot|Scooter|ScoutJet|ScrapeBox|Scrapy|SCrawlTest|Scrubby|scSpider|Scumbot|SeaMonkey$|Search-Channel|Search-Engine-Studio|search.KumKie.com|search.msn.com|search.updated.com|search.usgs.gov|Search\ Publisher|Searcharoo.NET|SearchBlox|searchbot|searchengine|searchhippo.com|SearchIt-Bot|searchmarking|searchmarks|searchmee_v|SearchmetricsBot|searchmining|SearchnowBot_v1|searchpreview|SearchSpider.com|SearQuBot|Seekbot|Seeker.lookseek.com|SeeqBot|seeqpod-vertical-crawler|Selflinkchecker|Semager|semanticdiscovery|Semantifire1|semisearch|SemrushBot|Senrigan|SEOENGWorldBot|SeznamBot|ShablastBot|ShadowWebAnalyzer|Shareaza|Shelob|sherlock|ShopWiki|ShowLinks|ShowyouBot|siclab|silk|Siphon|SiteArchive|SiteCheck-sitecrawl|sitecheck.internetseer.com|SiteFinder|SiteGuardBot|SiteOrbiter|SiteSnagger|SiteSucker|SiteSweeper|SiteXpert|SkimBot|SkimWordsBot|SkreemRBot|skygrid|Skywalker|Sleipnir|slow-crawler|SlySearch|smart-crawler|SmartDownload|Smarte|smartwit.com|Snake|Snapbot|SnapPreviewBot|Snappy|snookit|Snooper|Snoopy|SocialSearcher|SocSciBot|SOFT411\ Directory|sogou|sohu-search|sohu\ agent|Sokitomi|Solbot|sootle|Sosospider|Space\ Bison|Space\ Fung|SpaceBison|SpankBot|spanner|Spatineo\ Monitor\ Controller|special_archiver|SpeedySpider|Sphider|Sphider2|spider|Spider.TerraNautic.net|SpiderEngine|SpiderKU|SpiderMan|Spinn3r|Spinne|sportcrew-Bot|spyder3.microsys.com|sqlmap|Squid-Prefetch|SquidClamAV_Redirector|Sqworm|SrevBot|sslbot|SSM\ Agent|StackRambler|StarDownloader|statbot|statcrawler|statedept-crawler|Steeler|STEGMANN-Bot|stero|Stripper|Stumbler|suchclip|sucker|SumeetBot|SumitBot|SummizeBot|SummizeFeedReader|SuperBot|superbot.com|SuperHTTP|SuperLumin|SuperPagesBot|Supybot|SURF|Surfbot|SurfControl|SurveyBot|suzuran|SWEBot|swish-e|SygolBot|SynapticWalker|Syntryx\ ANT\ Scout\ Chassis\ Pheromone|SystemSearch-robot|Szukacz).*$" />
</conditions>
<action type="CustomResponse" statusCode="403" statusReason="Forbidden" statusDescription="Forbidden" />
</rule>
</rules>
</rewrite>
More info:
http://www.blogtips.org/web-crawlers-love-the-good-but-kill-the-bad-and-the-ugly/
https://perishablepress.com/eight-ways-to-blacklist-with-apaches-mod_rewrite/
No, it cant be achieved by web config file.
Dealing with search bots is done with robots.txt, its pretty simple solution. As written above, crawlers cannot see your web config file.
But, in code you can determine that HttpUserAgent is google bot, after you determine that, u dont write content to web page, just send Response code 404, or which i think is better, code 410.
Im not 100% behind above solution because there was no need to use that, but logically i think that would do the same. And you would just waste google energy to crawl your page :)
Web.config is obviously not accessible to users or crawlers. You should be able to create a route "/robots.txt", if your application resides in the root folder of the website, and return the text that normally should be inside that file from a controller, OWIN middleware or HTTP handler. But why?
As explained in other answers the way to prevent bots from crawling a page is by using robots.txt.
However based on your description this might not even be needed.
Here is why:
In order for a page to be crawled it needs to be linked to. There are two ways for this to happen:
The page is linked in your root/front web page (i.e. the page shown by typing httpx://mysite.domain/). My understanding is this does not happen. The information you want out of search engines is only available from specific deep (secret?) URLs not linked in the web site.
The page is linked another web-site. I.e. someone posts the URL to the information in a different web-site. In this case if you want to prevent the content of the page being crawled you need to use robots.txt, however the URL itself will be indexed in the search engine as this is information in a different web page.
The question then is how likely is option 2? For an example of public URLs that are not indexed unless published by a third party see dropbox/google drive/one drive URLs used for sharing to everyone. The URL usually includes a GUID which by definition is practically impossible to guess or brute force.

RedirectToAction() breaks umlauts

Note:
It's not RedirectToAction() that causes the problem, but rewrite rules in web.config
Original Question:
There is a strange behaviour when using RedirectToAction() which I cannot reproduce locally (debug). At some point my route value gets changed from äa to äa, but only on the server (Azure). I added logging to pinpoint the exact spot and ended up here:
RedirectToAction("square", new { id = criteria.Trim().ToLower() });
Locally this redirects correctly to /find/square/äa, but on the server it ends up at /find/square/äa. I logged every step and when my string is passed to RedirectToAction it's still intact (even after Trim() and ToLower()), but broken after redirection. I'm using the default route
routes.MapRoute("Default", "{controller}/{action}/{id}", new { controller = "Home", action = "Index", id = UrlParameter.Optional }
with controller find, action square and id äa (in this case)
This is pretty surprising to me since .NET is usually very careful and robust when it comes to umlauts and UTF-8. I'm especially lost here, because I'm not able to reproduce the issue locally. I assume it's a server setting, but Azure is pretty sparse at this point. Did anyone experience a similar behaviour before?
It's the one part of my code, that I didn't post, the rewrite rules. I only found out after I had a closer look at the redirects in Fiddler. As a matter of fact I found an additional redirect after the one I was intentionally triggering and it was caused by one of my rewrite rules:
<rule name="Lowercase Path" stopProcessing="true" >
<match url="[A-Z]" ignoreCase="false" />
<conditions>
<add input="{REQUEST_METHOD}" matchType="Pattern" pattern="POST" ignoreCase="true" negate="true" />
<add input="{URL}" pattern="^.*\.(axd|css|js|jpg|jpeg|png|gif|txt|xml|svg|pdf)$" negate="true" ignoreCase="true" />
</conditions>
<action type="Redirect" url="{tolower:{URL}}" redirectType="Permanent" />
</rule>
In fact it's the following line:
<action type="Redirect" url="{tolower:{URL}}" redirectType="Permanent" />
As soon as I disable this rule, everything works. My other rules which don't use {ToLower:} are not producing any problems. I'm not sure, why this doesn't break things locally. While doing my research I found an article describing how Microsoft patched a broken rewrite module which caused similar issues with umlauts. It's speculation, but maybe the rewrite version on my azure instance is unpatched - or it is something completely different.
Unfortunately there are few alternatives to using a rewrite-rule with {tolower:{URL}}, so I had to do a rather ugly workaround and check my URL in Global.asax.cs like so:
protected void Application_BeginRequest(object sender, EventArgs e)
{
var pathAndQuery = System.Web.HttpUtility.UrlDecode(Request.Url.PathAndQuery);
if (pathAndQuery == null || Regex.IsMatch(pathAndQuery, #"^.*\.(axd|css|js|jpg|jpeg|png|gif|txt|xml|svg|pdf)$"))
{
return;
}
if (Regex.IsMatch(pathAndQuery, ".*[A-Z].*"))
{
Response.Redirect(pathAndQuery.ToLower(), false);
Response.StatusCode = 301;
}
}
I'm not entirely happy with this approach; it feels smelly, but until I found a better solution this has to suffice.

How to set up embedded resources in an MVC application

I am trying to serve some JS and CSS files that are embedded into a DLL, with a solution based on this approach here: http://weblogs.asp.net/imranbaloch/asp-net-bundling-and-minification-and-embedded-resources
so, javascript and css files are embedded and I create bundles for them.
My problems start because, having quite a few of them, I need some folder structure to keep order. So the original route
RouteTable.Routes.Insert(0,
new Route("Embedded/{file}.{extension}",
new RouteValueDictionary(new { }),
new RouteValueDictionary(new { extension = "css|js" }),
new EmbeddedResourceRouteHandler()
));
is not enough anymore, so I have changed it to this:
RouteTable.Routes.Insert(0,
new Route("Embedded/{*url}",
new RouteValueDictionary(new { }),
new EmbeddedResourceRouteHandler()
));
I also cannot use the extension part because the catch-all part has to be the last one
So now if I try to access anything that looks like a file, my route will never be used so I will just get a 404
I have tried replacing the dot with a slash or adding a slash at the end but what I'm after here is a simple solution that will allow me to map urls that look like files to actual files.
I've also searched the web and there seem to be solutions based on UrlRewrite or altering the web.config but:
- I would like not to modify the IIS settings for every application to accomodate the library
- since it's a library, I would like it to be self contained and developers that use it shouldn't care about these sort of internal issues
So, is there a solution that I can implement in my library for this?
Also worth mentioning is that the original routing had the same issue, it only worked because of
<modules runAllManagedModulesForAllRequests="true" />
in the web.config, which I don't think is a good idea for performance
When you set
<modules runAllManagedModulesForAllRequests="true" />
this enables all available modules to run against the request. Which, as you mentioned, isn't the best for performance. However, you could add only the module you actually need- in this case the UrlRoutingModule.
You could add this module like this:
<system.webServer>
<modules>
<remove name="UrlRoutingModule-4.0" />
<add name="UrlRoutingModule-4.0" type="System.Web.Routing.UrlRoutingModule" preCondition="" />
</modules>
</system.webServer>
If you want an even better way (IMO) to do this, disregard the WebConfig and add it in a AppStart.cs file in your class library.
using Microsoft.Web.Infrastructure.DynamicModuleHelper;
[assembly: WebActivatorEx.PreApplicationStartMethod(typeof(AppStart), "PreStart")]
[assembly: WebActivatorEx.PostApplicationStartMethod(typeof(AppStart), "Start")]
namespace EmbeddedPages
{
public static class AppStart
{
private static bool PreStartFired = false;
public static void PreStart()
{
if (!PreStartFired)
{
PreStartFired = true;
DynamicModuleUtility.RegisterModule(typeof(UrlRoutingModule));
}
}
}
}
This adds the UrlRoutingModule into the module stack, and your URL's should now properly resolve. Note: you will need to add WebActivator to your project through nuget.

EPiServer, id as key in a querystring, id hijacked by url-rewriter, use rewriter for specified location?

I have rewritten an old application that then have quite a lot of external applications that redirects to a specific url containging the ?id={someid} querystring.. however.. episerver seems to do something with the id-key.. since that's never displayed in Request.QueryStrings.AllKeys..So my guess is that epi is doing some kind of url-rewriting or something.. is there any way to get around this and be able to use the id-key for only a specific page/location?
I would suggest using the IIS 7 url rewrite module to transform the id querystring key to something that will not conflict with EPiServer, as id is used for the page id in EPiServers internal url after the friendly url has been converted within the url rewrite provider.
The IIS url rewrite module would give you a chance to change this before hitting EPiServer.
You could write a regular expression to capture the id param, so you could add the following rule to your rewrite configuration
<rule name="QueryString">
<match url="(.*)" />
<conditions logicalGrouping="MatchAll" trackAllCaptures="false">
<add input="{QUERY_STRING}" pattern="(.*)id=([a-zA-Z0-9_-]+)(.*)" />
</conditions>
<action type="Rewrite" url="{R:1}?{C:1}oldid={C:2}{C:3}" appendQueryString="false" />
</rule>
Which would rewrite the url to /my-url-path/?oldid=123&another=param
More on how to implement as a rule in the IIS url rewrite module here
The querystring parameter id is reserved by EPiServer. You need to check it before EPi touches it.
Update: If you step up to EPi 7.5 the query parameter “ID” is no longer handled by EPi unless you register a EPiServer.Web.Routing.ClassicLinkRoute.

Categories

Resources