I'm wondering how application insights work with cookies because I'll like to understand user and session tracking, so I've been researching and...
Here is a brief introduction about the theory:
Whenever Application Insights SDK get a request that doesn’t have application insights user tracking cookie (set by Application Insights JS snippet) it will set this cookie and start a new session.
(from apmtips )
2.
UserTelemetryInitializer updates the Id and AcquisitionDate properties of User context for all telemetry items with values extracted from the ai_user cookie generated by the Application Insights JavaScript instrumentation code running in the user's browser.
SessionTelemetryInitializer updates the Id property of the Session context for all telemetry items with value extracted from the ai_session cookie generated by the ApplicationInsights JavaScript instrumentation code running in the user's browser.
(from azure documentation (Configuring the Application Insights SKD with ApplicationInsights.config))
So there are two cookies: ai_session, and ai_user.
And here comes my questions:
When are they initialized?
What is doing it?
How can I stop using them?
If I wanted to keep them, how could I change their expiration time?
Trying to remove them I made a project using ASP.NET Web Applications using the default template for Web Api, which includes MVC and Web Api.
Doing a research I found this approach to disable them but I don't have any WebSessionTrackingTelemetryModule. So I commented out "UserTelemetryInitializer" and "SessionTelemetryInitializer" and this is what I have:
<TelemetryInitializers>
<Add Type="Microsoft.ApplicationInsights.Extensibility.Web.SyntheticTelemetryInitializer, Microsoft.ApplicationInsights.Extensibility.Web" />
<Add Type="Microsoft.ApplicationInsights.Extensibility.Web.ClientIpHeaderTelemetryInitializer, Microsoft.ApplicationInsights.Extensibility.Web" />
<Add Type="Microsoft.ApplicationInsights.Extensibility.Web.UserAgentTelemetryInitializer, Microsoft.ApplicationInsights.Extensibility.Web" />
<Add Type="Microsoft.ApplicationInsights.Extensibility.Web.OperationNameTelemetryInitializer, Microsoft.ApplicationInsights.Extensibility.Web" />
<Add Type="Microsoft.ApplicationInsights.Extensibility.Web.OperationIdTelemetryInitializer, Microsoft.ApplicationInsights.Extensibility.Web" />
<!--<Add Type="Microsoft.ApplicationInsights.Extensibility.Web.UserTelemetryInitializer, Microsoft.ApplicationInsights.Extensibility.Web" />-->
<!--<Add Type="Microsoft.ApplicationInsights.Extensibility.Web.SessionTelemetryInitializer, Microsoft.ApplicationInsights.Extensibility.Web" />-->
<Add Type="Microsoft.ApplicationInsights.Extensibility.Web.AzureRoleEnvironmentTelemetryInitializer, Microsoft.ApplicationInsights.Extensibility.Web" />
<Add Type="Microsoft.ApplicationInsights.Extensibility.Web.DomainNameRoleInstanceTelemetryInitializer, Microsoft.ApplicationInsights.Extensibility.Web" />
<Add Type="Microsoft.ApplicationInsights.Extensibility.Web.BuildInfoConfigComponentVersionTelemetryInitializer, Microsoft.ApplicationInsights.Extensibility.Web" />
<Add Type="Microsoft.ApplicationInsights.Extensibility.Web.DeviceTelemetryInitializer, Microsoft.ApplicationInsights.Extensibility.Web" />
</TelemetryInitializers>
And :
<TelemetryModules>
<Add Type="Microsoft.ApplicationInsights.Extensibility.DependencyCollector.DependencyTrackingTelemetryModule, Microsoft.ApplicationInsights.Extensibility.DependencyCollector" />
<Add Type="Microsoft.ApplicationInsights.Extensibility.PerfCounterCollector.PerformanceCollectorModule, Microsoft.ApplicationInsights.Extensibility.PerfCounterCollector"/>
<Add Type="Microsoft.ApplicationInsights.Extensibility.Implementation.Tracing.DiagnosticsTelemetryModule, Microsoft.ApplicationInsights" />
<Add Type="Microsoft.ApplicationInsights.Extensibility.Web.RequestTrackingTelemetryModule, Microsoft.ApplicationInsights.Extensibility.Web"/>
<Add Type="Microsoft.ApplicationInsights.Extensibility.Web.ExceptionTrackingTelemetryModule, Microsoft.ApplicationInsights.Extensibility.Web" />
<Add Type="Microsoft.ApplicationInsights.Extensibility.Web.DeveloperModeWithDebuggerAttachedTelemetryModule, Microsoft.ApplicationInsights.Extensibility.Web" />
</TelemetryModules>
But it doesn't make a difference. Either I leave the modules commented or not, the cookies are still being generated.
Trying to remove the cookies, I commented the steps done in Startup and exclude from my project all the .js files, but the cookies keep appearing after every request.
So at this point I don't understand where the "Application Insights Javascript" takes place and I guess that what I'm missing is something in the backend. Am I wrong?
Finally, my commented Startup.cs looks like:
[assembly: OwinStartupAttribute(typeof(Try001.Startup))]
namespace Try001
{
public partial class Startup
{
public void Configuration(IAppBuilder app)
{
//ConfigureAuth(app);
}
}
}
And my Global.asax.cs looks like:
public class MvcApplication : System.Web.HttpApplication
{
protected void Application_Start()
{
//AreaRegistration.RegisterAllAreas();
//FilterConfig.RegisterGlobalFilters(GlobalFilters.Filters);
RouteConfig.RegisterRoutes(RouteTable.Routes);
//BundleConfig.RegisterBundles(BundleTable.Bundles);
}
}
Where RegisterRoutes is just doing the default routing. So I aimed to do just the very basic stuff to get it working, but I don't have a clue about where to keep digging.
Could someone enlighten me?
Thanks for reading so far.
Cookie initialization logic happens in Application Insights JavaScript SDK. If you look in the source of your page you will notice JS from //az416426.vo.msecnd.net/scripts/a/ai.0.js. You can also read/contribute to the source code of JavaScript SDK on GitHub: https://github.com/Microsoft/ApplicationInsights-JS
Replying to your questions:
When are they initialized and what is doing it?
They are initialized by JavaScript SDK when it attempts to send any telemetry item and checks if the cookie are not present, it creates them. For details see https://github.com/Microsoft/ApplicationInsights-JS/blob/master/JavaScript/JavaScriptSDK/Context/User.ts, there's also similar logic for session cookie.
How can I stop using them?
As of the more recent versions of the JavaScript SDK, you can now control the cookies as well as the local storage for both user information and the session buffer (used to rate limit the requests to AI) through the config object:
...snippet...
}({
instrumentationKey: "<your key>",
isCookieUseDisabled: true,
isStorageUseDisabled: true,
enableSessionStorageBuffer: true
});
If I wanted to keep them, how could I change their expiration time? There are two settings you can control:
session renewal time - how much time elapses before session is reset
with no activity (default is 30 minutes)
session expiration time - how much time
elapses before session is reset even with activity (default is 24 hours).
To change them set following values in this snippet next to where instrumentation key is set:
..snippet..
}({
instrumentationKey: "<your key>",
sessionRenewalMs:<your custom value in ms>,
sessionExpirationMs:<your custom value in ms>
});
Related
I would like to create multiple Cache Profiles in my Web.Config for different types of pages in my web application. For example, data forms do not change often so I would increase the Cache-Control max-age. And for other pages, more frequent requests to the proxy server or origin server will be made, so Cache-Control maxage and s-maxage values will be set differently.
I am able to set the max-age with no problem. But have difficulty looking for solutions on the internet over setting the s-maxage on the web.config.
<caching>
<outputCacheSettings>
<outputCacheProfiles>
<add name="Cache1Hour" duration="30" varyByParam="none" />
</outputCacheProfiles>
</outputCacheSettings>
</caching>
My last resort is to manually add maxage and s-maxage to the Action methods.
Response.Cache.SetMaxAge(maxAge);
Response.Cache.SetProxyMaxAge(proxyMaxAga);
Any advice on the best way to do this? Ideally, I would like to use an attribute to decorate my Action methods.
I have a web-application for the company's inner needs. However, customers should have access only by some specific URL sent by company.
So, to make things short, site can be accessed by everyone, but only by typing concrete URL. No search engine allowed to index any page of the site.
How can I reach it?
I found an approach based on robots.txt, but could it be implemented in web.config file?
There is a Web.Config solution as well, based on the User Agent. It uses the IIS rewrite module. But it is not foolproof since User Agents can be easly changed. Note that I did not create the list myself but found it recently on the web, but cannot seem to find the article anymore.
<rewrite>
<rules>
<rule name="Abuse User Agents Blocking" stopProcessing="true">
<match url=".*" ignoreCase="false" />
<conditions logicalGrouping="MatchAny">
<add input="{HTTP_USER_AGENT}" pattern="^.*(1Noonbot|1on1searchBot|3D_SEARCH|3DE_SEARCH2|3GSE|50.nu|192.comAgent|360Spider|A6-Indexer|AASP|ABACHOBot|Abonti|abot|AbotEmailSearch|Aboundex|AboutUsBot|AccMonitor\ Compliance|accoona|AChulkov.NET\ page\ walker|Acme.Spider|AcoonBot|acquia-crawler|ActiveTouristBot|Acunetix|Ad\ Muncher|AdamM|adbeat_bot|adminshop.com|Advanced\ Email|AESOP_com_SpiderMan|AESpider|AF\ Knowledge\ Now\ Verity|aggregator:Vocus|ah-ha.com|AhrefsBot|AIBOT|aiHitBot|aipbot|AISIID|AITCSRobot|Akamai-SiteSnapshot|AlexaWebSearchPlatform|AlexfDownload|Alexibot|AlkalineBOT|All\ Acronyms|Amfibibot|AmPmPPC.com|AMZNKAssocBot|Anemone|Anonymous|Anonymouse.org|AnotherBot|AnswerBot|AnswerBus|AnswerChase\ PROve|AntBot|antibot-|AntiSantyWorm|Antro.Net|AONDE-Spider|Aport|Aqua_Products|AraBot|Arachmo|Arachnophilia|archive.org_bot|aria\ eQualizer|arianna.libero.it|Arikus_Spider|Art-Online.com|ArtavisBot|Artera|ASpider|ASPSeek|asterias|AstroFind|athenusbot|AtlocalBot|Atomic_Email_Hunter|attach|attrakt|attributor|Attributor.comBot|augurfind|AURESYS|AutoBaron|autoemailspider|autowebdir|AVSearch-|axfeedsbot|Axonize-bot|Ayna|b2w|BackDoorBot|BackRub|BackStreet\ Browser|BackWeb|Baiduspider-video|Bandit|BatchFTP|baypup|BDFetch|BecomeBot|BecomeJPBot|BeetleBot|Bender|besserscheitern-crawl|betaBot|Big\ Brother|Big\ Data|Bigado.com|BigCliqueBot|Bigfoot|BIGLOTRON|Bilbo|BilgiBetaBot|BilgiBot|binlar|bintellibot|bitlybot|BitvoUserAgent|Bizbot003|BizBot04|BizBot04\ kirk.overleaf.com|Black.Hole|Black\ Hole|Blackbird|BlackWidow|bladder\ fusion|Blaiz-Bee|BLEXBot|Blinkx|BlitzBOT|Blog\ Conversation\ Project|BlogMyWay|BlogPulseLive|BlogRefsBot|BlogScope|Blogslive|BloobyBot|BlowFish|BLT|bnf.fr_bot|BoaConstrictor|BoardReader-Image-Fetcher|BOI_crawl_00|BOIA-Scan-Agent|BOIA.ORG-Scan-Agent|boitho.com-dc|Bookmark\ Buddy|bosug|Bot\ Apoena|BotALot|BotRightHere|Botswana|bottybot|BpBot|BRAINTIME_SEARCH|BrokenLinkCheck.com|BrowserEmulator|BrowserMob|BruinBot|BSearchR&D|BSpider|btbot|Btsearch|Buddy|Buibui|BuildCMS|BuiltBotTough|Bullseye|bumblebee|BunnySlippers|BuscadorClarin|Butterfly|BuyHawaiiBot|BuzzBot|byindia|BySpider|byteserver|bzBot|c\ r\ a\ w\ l\ 3\ r|CacheBlaster|CACTVS\ Chemistry|Caddbot|Cafi|Camcrawler|CamelStampede|Canon-WebRecord|Canon-WebRecordPro|CareerBot|casper|cataguru|CatchBot|CazoodleBot|CCBot|CCGCrawl|ccubee|CD-Preload|CE-Preload|Cegbfeieh|Cerberian\ Drtrs|CERT\ FigleafBot|cfetch|CFNetwork|Chameleon|ChangeDetection|Charlotte|Check&Get|Checkbot|Checklinks|checkprivacy|CheeseBot|ChemieDE-NodeBot|CherryPicker|CherryPickerElite|CherryPickerSE|Chilkat|ChinaClaw|CipinetBot|cis455crawler|citeseerxbot|cizilla.com|ClariaBot|clshttp|Clushbot|cmsworldmap|coccoc|CollapsarWEB|Collector|combine|comodo|conceptbot|ConnectSearch|conpilot|ContentSmartz|ContextAd|contype|cookieNET|CoolBott|CoolCheck|Copernic|Copier|CopyRightCheck|core-project|cosmos|Covario-IDS|Cowbot-|Cowdog|crabbyBot|crawl|Crawl_Application|crawl.UserAgent|CrawlConvera|crawler|crawler_for_infomine|CRAWLER-ALTSE.VUNET.ORG-Lynx|crawler-upgrade-config|crawler.kpricorn.org|crawler#|crawler4j|crawler43.ejupiter.com|Crawly|CreativeCommons|Crescent|Crescent\ Internet\ ToolPak\ HTTP\ OLE\ Control|cs-crawler|CSE\ HTML\ Validator|CSHttpClient|Cuasarbot|culsearch|Curl|Custo|Cutbot|cvaulev|Cyberdog|CyberNavi_WebGet|CyberSpyder|CydralSpider).*$" />
<add input="{HTTP_USER_AGENT}" pattern="^.*(D1GArabicEngine|DA|DataCha0s|DataFountains|DataparkSearch|DataSpearSpiderBot|DataSpider|Dattatec.com|Dattatec.com-Sitios-Top|Daumoa|DAUMOA-video|DAUMOA-web|Declumbot|Deepindex|deepnet|DeepTrawl|dejan|del.icio.us-thumbnails|DelvuBot|Deweb|DiaGem|Diamond|DiamondBot|diavol|DiBot|didaxusbot|DigExt|Digger|DiGi-RSSBot|DigitalArchivesBot|DigOut4U|DIIbot|Dillo|Dir_Snatch.exe|DISCo|DISCo\ Pump|discobot|DISCoFinder|Distilled-Reputation-Monitor|Dit|DittoSpyder|DjangoTraineeBot|DKIMRepBot|DoCoMo|DOF-Verify|domaincrawler|DomainScan|DomainWatcher|dotbot|DotSpotsBot|Dow\ Jonesbot|Download|Download\ Demon|Downloader|DOY|dragonfly|Drip|drone|DTAAgent|dtSearchSpider|dumbot|Dwaar|Dwaarbot|DXSeeker|EAH|EasouSpider|EasyDL|ebingbong|EC2LinkFinder|eCairn-Grabber|eCatch|eChooseBot|ecxi|EdisterBot|EduGovSearch|egothor|eidetica.com|EirGrabber|ElisaBot|EllerdaleBot|EMail\ Exractor|EmailCollector|EmailLeach|EmailSiphon|EmailWolf|EMPAS_ROBOT|EnaBot|endeca|EnigmaBot|Enswer\ Neuro|EntityCubeBot|EroCrawler|eStyleSearch|eSyndiCat|Eurosoft-Bot|Evaal|Eventware|Everest-Vulcan|Exabot|Exabot-Images|Exabot-Test|Exabot-XXX|ExaBotTest|ExactSearch|exactseek.com|exooba|Exploder|explorersearch|extract|Extractor|ExtractorPro|EyeNetIE|ez-robot|Ezooms|factbot|FairAd\ Client|falcon|Falconsbot|fast-search-engine|FAST\ Data\ Document|FAST\ ESP|fastbot|fastbot.de|FatBot|Favcollector|Faviconizer|FDM|FedContractorBot|feedfinder|FelixIDE|fembot|fetch_ici|Fetch\ API\ Request|fgcrawler|FHscan|fido|Filangy|FileHound|FindAnISP.com_ISP_Finder|findlinks|FindWeb|Firebat|Fish-Search-Robot|Flaming\ AttackBot|Flamingo_SearchEngine|FlashCapture|FlashGet|flicky|FlickySearchBot|flunky|focused_crawler|FollowSite|Foobot|Fooooo_Web_Video_Crawl|Fopper|FormulaFinderBot|Forschungsportal|fr_crawler|Francis|Freecrawl|FreshDownload|freshlinks.exe|FriendFeedBot|frodo.at|froGgle|FrontPage|Froola|FU-NBI|full_breadth_crawler|FunnelBack|FunWebProducts|FurlBot|g00g1e|G10-Bot|Gaisbot|GalaxyBot|gazz|gcreep|generate_infomine_category_classifiers|genevabot|genieBot|GenieBotRD_SmallCrawl|Genieo|Geomaxenginebot|geometabot|GeonaBot|GeoVisu|GermCrawler|GetHTMLContents|Getleft|GetRight|GetSmart|GetURL.rexx|GetWeb!|Giant|GigablastOpenSource|Gigabot|Girafabot|GleameBot|gnome-vfs|Go-Ahead-Got-It|Go!Zilla|GoForIt.com|GOFORITBOT|gold|Golem|GoodJelly|Gordon-College-Google-Mini|goroam|GoSeebot|gotit|Govbot|GPU\ p2p|grab|Grabber|GrabNet|Grafula|grapeFX|grapeshot|GrapeshotCrawler|grbot|GreenYogi\ [ZSEBOT]|Gromit|GroupMe|grub|grub-client|Grubclient-|GrubNG|GruBot|gsa|GSLFbot|GT::WWW|Gulliver|GulperBot|GurujiBot|GVC|GVC\ BUSINESS|gvcbot.com|HappyFunBot|harvest|HarvestMan|Hatena\ Antenna|Hawler|Hazel's\ Ferret\ hopper|hcat|hclsreport-crawler|HD\ nutch\ agent|Header_Test_Client|healia|Helix|heritrix|hijbul-heritrix-crawler|HiScan|HiSoftware\ AccMonitor|HiSoftware\ AccVerify|hitcrawler_|hivaBot|hloader|HMSEbot|HMView|hoge|holmes|HomePageSearch|Hooblybot-Image|HooWWWer|Hostcrawler|HSFT\ -\ Link|HSFT\ -\ LVU|HSlide|ht:|htdig|Html\ Link\ Validator|HTMLParser|HTTP::Lite|httplib|HTTrack|Huaweisymantecspider|hul-wax|humanlinks|HyperEstraier|Hyperix).*$" />
<add input="{HTTP_USER_AGENT}" pattern="^.*(ia_archiver|IAArchiver-|ibuena|iCab|ICDS-Ingestion|ichiro|iCopyright\ Conductor|id-search|IDBot|IEAutoDiscovery|IECheck|iHWebChecker|IIITBOT|iim_405|IlseBot|IlTrovatore|Iltrovatore-Setaccio|ImageBot|imagefortress|ImagesHereImagesThereImagesEverywhere|ImageVisu|imds_monitor|imo-google-robot-intelink|IncyWincy|Industry\ Cortexcrawler|Indy\ Library|indylabs_marius|InelaBot|Inet32\ Ctrl|inetbot|InfoLink|INFOMINE|infomine.ucr.edu|InfoNaviRobot|Informant|Infoseek|InfoTekies|InfoUSABot|INGRID|Inktomi|InsightsCollector|InsightsWorksBot|InspireBot|InsumaScout|Intelix|InterGET|Internet\ Ninja|InternetLinkAgent|Interseek|IOI|ip-web-crawler.com|Ipselonbot|Iria|IRLbot|Iron33|Isara|iSearch|iSiloX|IsraeliSearch|IstellaBot|its-learning|IU_CSCI_B659_class_crawler|iVia|iVia\ Page\ Fetcher|JadynAve|JadynAveBot|jakarta|Jakarta\ Commons-HttpClient|Java|Jbot|JemmaTheTourist|JennyBot|Jetbot|JetBrains\ Omea\ Pro|JetCar|Jim|JoBo|JobSpider_BA|JOC|JoeDog|JoyScapeBot|JSpyda|JubiiRobot|jumpstation|Junut|JustView|Jyxobot|K.S.Bot|KakcleBot|kalooga|KaloogaBot|kanagawa|KATATUDO-Spider|Katipo|kbeta1|Kenjin.Spider|KeywenBot|Keyword.Density|Keyword\ Density|kinjabot|KIT-Fireball|Kitenga-crawler-bot|KiwiStatus|kmbot-|kmccrew|Knight|KnowItAll|Knowledge.com|Knowledge\ Engine|KoepaBot|Koninklijke|KrOWLer|KSbot|kuloko-bot|kulturarw3|KummHttp|Kurzor|Kyluka|L.webis|LabelGrab|Labhoo|labourunions411|lachesis|Lament|LamerExterminator|LapozzBot|larbin|LARBIN-EXPERIMENTAL|LBot|LeapTag|LeechFTP|LeechGet|LetsCrawl.com|LexiBot|LexxeBot|lftp|libcrawl|libiViaCore|libWeb|libwww|libwww-perl|likse|Linguee|Link|link_checker|LinkAlarm|linkbot|LinkCheck\ by\ Siteimprove.com|LinkChecker|linkdex.com|LinkextractorPro|LinkLint|linklooker|Linkman|LinkScan|LinksCrawler|LinksManager.com_bot|LinkSweeper|linkwalker|LiteFinder|LitlrBot|Little\ Grabber\ at\ Skanktale.com|Livelapbot|LM\ Harvester|LMQueueBot|LNSpiderguy|LoadTimeBot|LocalcomBot|locust|LolongBot|LookBot|Lsearch|lssbot|LWP|lwp-request|lwp-trivial|LWP::Simple|Lycos_Spider|Lydia\ Entity|LynnBot|Lytranslate|Mag-Net|Magnet|magpie-crawler|Magus|Mail.Ru|Mail.Ru_Bot|MAINSEEK_BOT|Mammoth|MarkWatch|MaSagool|masidani_bot_|Mass|Mata.Hari|Mata\ Hari|matentzn\ at\ cs\ dot\ man\ dot\ ac\ dot\ uk|maxamine.com--robot|maxamine.com-robot|maxomobot|Maxthon$|McBot|MediaFox|medrabbit|Megite|MemacBot|Memo|MendeleyBot|Mercator-|mercuryboard_user_agent_sql_injection.nasl|MerzScope|metacarta|Metager2|metager2-verification-bot|MetaGloss|METAGOPHER|metal|metaquerier.cs.uiuc.edu|METASpider|Metaspinner|MetaURI|MetaURI\ API|MFC_Tear_Sample|MFcrawler|MFHttpScan|Microsoft.URL|MIIxpc|miner|mini-robot|minibot|miniRank|Mirror|Missigua\ Locator|Mister.PiX|Mister\ PiX|Miva|MJ12bot|mnoGoSearch|mod_accessibility|moduna.com|moget|MojeekBot|MOMspider|MonkeyCrawl|MOSES|Motor|mowserbot|MQbot|MSE360|MSFrontPage|MSIECrawler|MSIndianWebcrawl|MSMOBOT|Msnbot|msnbot-products|MSNPTC|MSRBOT|MT-Soft|MultiText|My_Little_SearchEngine_Project|my-heritrix-crawler|MyApp|MYCOMPANYBOT|mycrawler|MyEngines-US-Bot|MyFamilyBot|Myra|nabot|nabot_|Najdi.si|Nambu|NAMEPROTECT|NatchCVS|naver|naverbookmarkcrawler|NaverBot|Navroad|NearSite|NEC-MeshExplorer|NeoScioCrawler|NerdByNature.Bot|NerdyBot|Nerima-crawl-).*$" />
<add input="{HTTP_USER_AGENT}" pattern="^.*(T-H-U-N-D-E-R-S-T-O-N-E|Tailrank|tAkeOut|TAMU_CRAWLER|TapuzBot|Tarantula|targetblaster.com|TargetYourNews.com|TAUSDataBot|taxinomiabot|Tecomi|TeezirBot|Teleport|Teleport\ Pro|TeleportPro|Telesoft|Teradex\ Mapper|TERAGRAM_CRAWLER|TerrawizBot|testbot|testing\ of|TextBot|thatrobotsite.com|The.Intraformant|The\ Dyslexalizer|The\ Intraformant|TheNomad|Theophrastus|theusefulbot|TheUsefulbot_|ThumbBot|thumbshots-de-bot|tigerbot|TightTwatBot|TinEye|Titan|to-dress_ru_bot_|to-night-Bot|toCrawl|Topicalizer|topicblogs|Toplistbot|TopServer\ PHP|topyx-crawler|Touche|TourlentaScanner|TPSystem|TRAAZI|TranSGeniKBot|travel-search|TravelBot|TravelLazerBot|Treezy|TREX|TridentSpider|Trovator|True_Robot|tScholarsBot|TsWebBot|TulipChain|turingos|turnit|TurnitinBot|TutorGigBot|TweetedTimes|TweetmemeBot|TwengaBot|TwengaBot-Discover|Twiceler|Twikle|twinuffbot|Twisted\ PageGetter|Twitturls|Twitturly|TygoBot|TygoProwler|Typhoeus|U.S.\ Government\ Printing\ Office|uberbot|ucb-nutch|UCSD-Crawler|UdmSearch|UFAM-crawler-|Ultraseek|UnChaos|unchaos_crawler_|UnisterBot|UniversalSearch|UnwindFetchor|UofTDB_experiment|updated|URI::Fetch|url_gather|URL-Checker|URL\ Control|URLAppendBot|URLBlaze|urlchecker|urlck|UrlDispatcher|urllib|URLSpiderPro|URLy.Warning|USAF\ AFKN\|usasearch|USS-Cosmix|USyd-NLP-Spider|Vacobot|Vacuum|VadixBot|Vagabondo|Validator|Valkyrie|vBSEO|VCI|VerbstarBot|VeriCiteCrawler|Verifactrola|Verity-URL-Gateway|vermut|versus|versus.integis.ch|viasarchivinginformation.html|vikspider|VIP|VIPr|virus-detector|VisBot|Vishal\ For\ CLIA|VisWeb|vlad|vlsearch|VMBot|VocusBot|VoidEYE|VoilaBot|Vortex|voyager|voyager-hc|voyager-partner-deep|VSE|vspider).*$" />
<add input="{HTTP_USER_AGENT}" pattern="^.*(W3C_Unicorn|W3C-WebCon|w3m|w3search|wacbot|wastrix|Water\ Conserve|Water\ Conserve\ Portal|WatzBot|wauuu\ engine|Wavefire|Waypath|Wazzup|Wazzup1.0.4800|wbdbot|web-agent|Web-Sniffer|Web.Image.Collector|Web\ CEO\ Online|Web\ Image\ Collector|Web\ Link\ Validator|Web\ Magnet|webalta|WebaltBot|WebAuto|webbandit|webbot|webbul-bot|WebCapture|webcheck|Webclipping.com|webcollage|WebCopier|WebCopy|WebCorp|webcrawl.net|webcrawler|WebDownloader\ for|Webdup|WebEMailExtrac|WebEMailExtrac.*|WebEnhancer|WebFerret|webfetch|WebFetcher|WebGather|WebGo\ IS|webGobbler|WebImages|Webinator-search2.fasthealth.com|Webinator-WBI|WebIndex|WebIndexer|weblayers|WebLeacher|WeblexBot|WebLinker|webLyzard|WebmasterCoffee|WebmasterWorld|WebmasterWorldForumBot|WebMiner|WebMoose|WeBot|WebPix|WebReaper|WebRipper|WebSauger|Webscan|websearchbench|WebSite|websitemirror|WebSpear|websphinx.test|WebSpider|Webster|Webster.Pro|Webster\ Pro|WebStripper|WebTrafficExpress|WebTrends\ Link\ Analyzer|webvac|webwalk|WebWalker|Webwasher|WebWatch|WebWhacker|WebXM|WebZip|Weddings.info|wenbin|WEPA|WeRelateBot|Whacker|Widow|WikiaBot|Wikio|wikiwix-bot-|WinHttp.WinHttpRequest|WinHTTP\ Example|WIRE|wired-digital-newsbot|WISEbot|WISENutbot|wish-la|wish-project|wisponbot|WMCAI-robot|wminer|WMSBot|woriobot|worldshop|WorQmada|Wotbox|WPScan|wume_crawler|WWW-Mechanize|www.freeloader.com.|WWW\ Collector|WWWOFFLE|wwwrobot|wwwster|WWWWanderer|wwwxref|Wysigot|X-clawler|Xaldon|Xenu|Xenu's|Xerka\ MetaBot|XGET|xirq|XmarksFetch|XoviBot|xqrobot|Y!J|Y!TunnelPro|yacy.net|yacybot|yarienavoir.net|Yasaklibot|yBot|YebolBot|yellowJacket|yes|YesupBot|Yeti|YioopBot|YisouSpider|yolinkBot|yoogliFetchAgent|yoono|Yoriwa|YottaCars_Bot|you-dir|Z-Add\ Link|zagrebin|Zao|zedzo.digest|zedzo.validate|zermelo|Zeus|Zeus\ Link\ Scout|zibber-v|zimeno|Zing-BottaBot|ZipppBot|zmeu|ZoomSpider|ZuiBot|ZumBot|Zyborg|Zyte).*$" />
<add input="{HTTP_USER_AGENT}" pattern="^.*(Nessus|NESSUS::SOAP|nestReader|Net::Trackback|NetAnts|NetCarta\ CyberPilot\ Pro|Netcraft|NetID.com|NetMechanic|Netprospector|NetResearchServer|NetScoop|NetSeer|NetShift=|NetSongBot|Netsparker|NetSpider|NetSrcherP|NetZip|NetZip-Downloader|NewMedhunt|news|News_Search_App|NewsGatherer|Newsgroupreporter|NewsTroveBot|NextGenSearchBot|nextthing.org|NHSEWalker|nicebot|NICErsPRO|niki-bot|NimbleCrawler|nimbus-1|ninetowns|Ninja|NjuiceBot|NLese|Nogate|Nomad-V2.x|NoteworthyBot|NPbot|NPBot-|NRCan\ intranet|NSDL_Search_Bot|nu_tch-princeton|nuggetize.com|nutch|nutch1|NutchCVS|NutchOrg|NWSpider|Nymesis|nys-crawler|ObjectsSearch|oBot|Obvius\ external\ linkcheck|Occam|Ocelli|Octopus|ODP\ entries|Offline.Explorer|Offline\ Explorer|Offline\ Navigator|OGspider|OmiExplorer_Bot|OmniExplorer_Bot|omnifind|OmniWeb|OnetSzukaj|online\ link\ validator|OOZBOT|Openbot|Openfind|Openfind\ data|OpenHoseBot|OpenIntelligenceData|OpenISearch|OpenSearchServer_Bot|OpiDig|optidiscover|OrangeBot|ORISBot|ornl_crawler_1|ORNL_Mercury|osis-project.jp|OsO|OutfoxBot|OutfoxMelonBot|OWLER-BOT|owsBot|ozelot|P3P\ Client|page_verifier|PageBitesHyperBot|Pagebull|PageDown|PageFetcher|PageGrabber|PagePeeker|PageRank\ Monitor|pamsnbot.htm|Panopy|panscient.com|Pansophica|Papa\ Foto|PaperLiBot|parasite|parsijoo|Pathtraq|Pattern|Patwebbot|pavuk|PaxleFramework|PBBOT|pcBrowser|pd-crawler|PECL::HTTP|penthesila|PeoplePal|perform_crawl|PerMan|PGP-KA|PHPCrawl|PhpDig|PicoSearch|pipBot|pipeLiner|Pita|pixfinder|PiyushBot|planetwork|PleaseCrawl|Plucker|Plukkie|Plumtree|Pockey|Pockey-GetHTML|PoCoHTTP|pogodak.ba|Pogodak.co.yu|Poirot|polybot|Pompos|Poodle\ predictor|PopScreenBot|PostPost|PrivacyFinder|ProjectWF-java-test-crawler|ProPowerBot|ProWebWalker|PROXY|psbot|psbot-page|PSS-Bot|psycheclone|pub-crawler|pucl|pulseBot\ \(pulse|Pump|purebot|PWeBot|pycurl|Python-urllib|pythonic-crawler|PythonWikipediaBot|q1|QEAVis\ agent|QFKBot|qualidade|Qualidator.com|QuepasaCreep|QueryN.Metasearch|QueryN\ Metasearch|quest.durato|Quintura-Crw|QunarBot|Qweery_robot.txt_CheckBot|QweeryBot|r2iBot|R6_CommentReader|R6_FeedFetcher|R6_VoteReader|RaBot|Radian6|radian6_linkcheck|RAMPyBot|RankurBot|RcStartBot|RealDownload|Reaper|REBI-shoveler|Recorder|RedBot|RedCarpet|ReGet|RepoMonkey|RepoMonkey\ Bait|Riddler|RIIGHTBOT|RiseNetBot|RiverglassScanner|RMA|RoboPal|Robosourcer|robot|robotek|robots|Robozilla|rogerBot|Rome\ Client|Rondello|Rotondo|Roverbot|RPT-HTTPClient|rtgibot|RufusBot|Runnk\ online\ rss\ reader|s~stremor-crawler|S2Bot|SafariBookmarkChecker|SaladSpoon|Sapienti|SBIder|SBL-BOT|SCFCrawler|Scich|ScientificCommons.org|ScollSpider|ScooperBot|Scooter|ScoutJet|ScrapeBox|Scrapy|SCrawlTest|Scrubby|scSpider|Scumbot|SeaMonkey$|Search-Channel|Search-Engine-Studio|search.KumKie.com|search.msn.com|search.updated.com|search.usgs.gov|Search\ Publisher|Searcharoo.NET|SearchBlox|searchbot|searchengine|searchhippo.com|SearchIt-Bot|searchmarking|searchmarks|searchmee_v|SearchmetricsBot|searchmining|SearchnowBot_v1|searchpreview|SearchSpider.com|SearQuBot|Seekbot|Seeker.lookseek.com|SeeqBot|seeqpod-vertical-crawler|Selflinkchecker|Semager|semanticdiscovery|Semantifire1|semisearch|SemrushBot|Senrigan|SEOENGWorldBot|SeznamBot|ShablastBot|ShadowWebAnalyzer|Shareaza|Shelob|sherlock|ShopWiki|ShowLinks|ShowyouBot|siclab|silk|Siphon|SiteArchive|SiteCheck-sitecrawl|sitecheck.internetseer.com|SiteFinder|SiteGuardBot|SiteOrbiter|SiteSnagger|SiteSucker|SiteSweeper|SiteXpert|SkimBot|SkimWordsBot|SkreemRBot|skygrid|Skywalker|Sleipnir|slow-crawler|SlySearch|smart-crawler|SmartDownload|Smarte|smartwit.com|Snake|Snapbot|SnapPreviewBot|Snappy|snookit|Snooper|Snoopy|SocialSearcher|SocSciBot|SOFT411\ Directory|sogou|sohu-search|sohu\ agent|Sokitomi|Solbot|sootle|Sosospider|Space\ Bison|Space\ Fung|SpaceBison|SpankBot|spanner|Spatineo\ Monitor\ Controller|special_archiver|SpeedySpider|Sphider|Sphider2|spider|Spider.TerraNautic.net|SpiderEngine|SpiderKU|SpiderMan|Spinn3r|Spinne|sportcrew-Bot|spyder3.microsys.com|sqlmap|Squid-Prefetch|SquidClamAV_Redirector|Sqworm|SrevBot|sslbot|SSM\ Agent|StackRambler|StarDownloader|statbot|statcrawler|statedept-crawler|Steeler|STEGMANN-Bot|stero|Stripper|Stumbler|suchclip|sucker|SumeetBot|SumitBot|SummizeBot|SummizeFeedReader|SuperBot|superbot.com|SuperHTTP|SuperLumin|SuperPagesBot|Supybot|SURF|Surfbot|SurfControl|SurveyBot|suzuran|SWEBot|swish-e|SygolBot|SynapticWalker|Syntryx\ ANT\ Scout\ Chassis\ Pheromone|SystemSearch-robot|Szukacz).*$" />
</conditions>
<action type="CustomResponse" statusCode="403" statusReason="Forbidden" statusDescription="Forbidden" />
</rule>
</rules>
</rewrite>
More info:
http://www.blogtips.org/web-crawlers-love-the-good-but-kill-the-bad-and-the-ugly/
https://perishablepress.com/eight-ways-to-blacklist-with-apaches-mod_rewrite/
No, it cant be achieved by web config file.
Dealing with search bots is done with robots.txt, its pretty simple solution. As written above, crawlers cannot see your web config file.
But, in code you can determine that HttpUserAgent is google bot, after you determine that, u dont write content to web page, just send Response code 404, or which i think is better, code 410.
Im not 100% behind above solution because there was no need to use that, but logically i think that would do the same. And you would just waste google energy to crawl your page :)
Web.config is obviously not accessible to users or crawlers. You should be able to create a route "/robots.txt", if your application resides in the root folder of the website, and return the text that normally should be inside that file from a controller, OWIN middleware or HTTP handler. But why?
As explained in other answers the way to prevent bots from crawling a page is by using robots.txt.
However based on your description this might not even be needed.
Here is why:
In order for a page to be crawled it needs to be linked to. There are two ways for this to happen:
The page is linked in your root/front web page (i.e. the page shown by typing httpx://mysite.domain/). My understanding is this does not happen. The information you want out of search engines is only available from specific deep (secret?) URLs not linked in the web site.
The page is linked another web-site. I.e. someone posts the URL to the information in a different web-site. In this case if you want to prevent the content of the page being crawled you need to use robots.txt, however the URL itself will be indexed in the search engine as this is information in a different web page.
The question then is how likely is option 2? For an example of public URLs that are not indexed unless published by a third party see dropbox/google drive/one drive URLs used for sharing to everyone. The URL usually includes a GUID which by definition is practically impossible to guess or brute force.
I have create a new ASP.NET MVC 5 Project. I have installed through Nuget the AWS SDK for .NET and Session Provider and I have read this article in Amazon: Article
I have this configuration in the Web.Config
<sessionState
mode="Custom"
customProvider="DynamoDBSessionStoreProvider">
<providers>
<add name="DynamoDBSessionStoreProvider"
type="Amazon.SessionProvider.DynamoDBSessionStateStore, AWS.SessionProvider"
AWSProfileName="default"
Table="ASP.NET_SessionState"
Region="eu-west-1"
/>
</providers>
</sessionState>
I run the web app using the IIS Express and all works fine (I can login and logoff), but if I access to my DynamoDB I don't have any item in the table ASP.NET_SessionState.
It's working like the custom state provider is ignored...
What am I doing wrong?
Thanks!!
Are you storing anything in the session? If you are not you there is nothing to store and there will be no records in DynamoDB.
If you want to check that you have it setup correctly, run a test that adds some data into the session and then check DynamoDB. You should then see records there.
Using this in EC2 instance you should set up an IAM role for DynamoDB access. At the bottom of the article there is information on the role. http://aws.amazon.com/iam/ Then there should be no need for credentialing as it is handled in the VPC of your EC2 instance. Also make sure the sessionState tag is inside of the tag in your web.config.
This is what i have done
First need to login to amazon console and search for DynamoDB.
Then create a table with name ASP.NET_SessionState with key as SessionId (string), you need to set ReadCapacity to 3 and WriteCapacity to 1.
In Project Install packages AmazonSDK.Core and AmazonSDK.DynamoDBV2
In the web config in the system.web section
<sessionState timeout="20"
mode="Custom"
customProvider="DynamoDBSessionStoreProvider">
<providers>
<add name="DynamoDBSessionStoreProvider"
type="Amazon.SessionProvider.DynamoDBSessionStateStore, AWS.SessionProvider"
AWSAccessKey="{Access key}"
AWSSecretKey="{Sectret key}"
Table="ASP.NET_SessionState"
Region="us-east-1"
ReadCapacityUnits="3"
WriteCapacityUnits="1"/>
</providers>
</sessionState>
Then in the model you have give serializable Attribute. Some thing like below
[Serializable]
public class TestModel
{
public int Id { get; set; }
public string Name { get; set; }
}
Then in the code you need to set session, something like this
TestModel testModel = new TestModel();
testModel.Id = 1;
testModel.Name = "ItemName";
// When we add this to session an entry will be written to DynamoDB Asp.NetSessionTable
Session["Checkout"] = testModel;
No need to worry about connection amazon sdk will take care of it. You need to mention the correct Access key and Secret Key.
I'm currently having some configuration variables in my Web.config, eg:
<appSettings>
<add key="RecipientEmailForNotifications" value="asdf#sdf.com" />
<add key="NotifyNewEntries" value="true" />
<!-- etc... -->
</appSettings>
I have a view where admin users can change this values online, eg.
/Admin/Configuration/
Recipient email for notifications:
[___________]
Notify new entries:
[x] Yes
[SAVE]
But there are two problems with this approach:
Each time I deploy the web.config, I'm overriding those values that were saved online
Each time the web application restarts (because web applications are naturally stopped when there is no activity), these values that were saved online are reset to the originals of the last deployment
So the question is:
What is the best way of achieving this goal of having configuration variables that have to be persisted?
I think that using a table isn't very natural, but I might be wrong (please explain why).
Also, using app settings is very convenient when reading the values:
ConfigurationManager.AppSettings["RecipientEmailForNotifications"]
As opposed of using the database,
db.SomeConfigurationTable.SingleOrDefault(.....)
I'd use a combination of database store and caching.
If you are on a single web server you could store the settings in proc cache. If you are in a web farm you could look at something like couchbase (free memcached) or azure.
It might make sense to warm your cache on applicaiton start. That is read the settings from your database and populate your cache.
Another popular approach is to have a Get call that check the cache first and populate cache if cache is empty, e.g.
public string Get(string key, Func<string> getItemCallback)
{
var item = Cache.Get(key) as string;
if (item == null)
{
item = db.SomeConfigurationTable.SingleOrDefault(key);
Cache.Store(key, item);
}
return item;
}
Not forgetting you will need to burst your cache when updating settings.
First post, I'm a complete .Net/C# novice thrown in at the deep end!
I've inherited a C# wed application due to someone leaving at work and me being the only one with bandwidth! But not the .Net, C# knowledge!
The app is used by people on different sites all over the world. They log in using the corporate login details and as such they log into different servers depending on where they are located, (Europe, America or India).
The guy who wrote the app couldn't work out how to switch the ConnectionString in web.config depending on location, so duplicated the whole app for each domain! With the only variation being a single IP address in web.config for each duplicated version of the app! Then did a simple web front page which took the user to "their" version of the app depending on where they said they were in the world!
The first thing I want to do is to move to a single version to maintain, so I need to be able to switch the connection string or how to login?
I've spent several days trying to work out how I get to ConnectionString (defined in web.config) from my Login class, only to discover the values set in web.config seem to be read only, so I can't alter them.
So I guess the first question is, am I barking up the wrong tree? Can I just set all the information that AspNetActiveDirectoryMembershipProvider (see code later) requires and call it from my login class? Or is the ConnectionString route the Ipso facto way to set up connections in .Net/C#? So therefor I do need to find out how to change/specify/add the value at runtime.
Three possibilities I can think of:- (The first is the one I've ground to a hult with)
Change the ConnectionString for ADService in my web.config from my Login class?
Change what AspNetActiveDirectoryMembershipProvider uses, so from my Login class magicly get it to use EMEA_ADService or PACIFIC_ADService as defined in web.config?
Is it possible to define a new connectionString and call AspNetActiveDirectoryMembershipProvider all from my Login class, not using web.config for this connection at all?
Here's a bit of my/his web.config file and my Login class
Cuttings from Web.config
<connectionStrings>
<add name="ADService" connectionString="LDAP://12.345.67.8" /> *---- Original ConnectionString (IP address changed)----*
<add name="EMEA_ADService" connectionString="LDAP://12.345.67.8" /> *---- Added by me playing around unsuccessfully! ----*
<add name="PACIFIC_ADService" connectionString="LDAP://12.345.67.9" /> *---- Added by me playing around unsuccessfully! ----*
~
</connectionStrings>
<authentication mode="Forms">
<forms loginUrl="~/Login.aspx" timeout="2880" /> *---- The background class for this popup (Login.aspx.cs) is where I'm currently trying to affect ConnectionString----*
</authentication>
*---- Pretty sure this is the bit that actually does the login verification----*
<membership defaultProvider="AspNetActiveDirectoryMembershipProvider">
<providers>
<clear />
<add name="AspNetActiveDirectoryMembershipProvider" type="System.Web.Security.ActiveDirectoryMembershipProvider, System.Web, Version=4.0.0.0, Culture=neutral, PublicKeyToken=12345678" connectionStringName="ADService" applicationName="/." description="ADService" />
</providers>
</membership>
This is as far as I've got in my class before finding out that I don't appear to be able to alter ConnectionString!
Cuttings from Login.aspx.cs
public partial class Login : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
ConnectionStringSettingsCollection connections = ConfigurationManager.ConnectionStrings; //this is now working :)
string userDomain = Environment.UserDomainName; //Probably don't need this, it seems to give the login domain on the machine. Don't know yet if that will be the users machine or the server the app runs on?
if (connections.Count != 0)
{
foreach (ConnectionStringSettings connection in connections)
{
string testname = connections["ADService"].Name;
string testConnectionString = connections["ADService"].ConnectionString;
connections["ADService"].ConnectionString = "LDAP://12.345.67.9";
testConnectionString = connections["ADService"].ConnectionString;
Any hint would be very welcome!
P.S. I have requested a .Net/C# course at work! ;)
You wouldn't want to alter the existing connection string. Rather, you'd want to alter which connection string your Data Access Layer was using to call different service stacks. You could then choose a connection string at runtime based on whatever input parameters you wanted to use. which in your case might be an IP range.
asp.net mvc multiple connection strings
Handling multiple connection strings in ONE DataAccess Layer
http://msdn.microsoft.com/en-us/library/aa479086.aspx
The microsoft article is particularly interesting since it actually takes an architectural look at proper patterns for resolving dilemmas like yours. I think you got stuck with the short end of the stick! Best of luck!
The Web.config cannot be modified at Runtime. I would suggest setting some kind of flag via a login link or combobox on the website for people to use to choose where they want to login. It is not the job of the server to figure out what a user wants to do.