I am planning on creating a free open source Porn blocker software that as much as possible blocks porn websites.
The idea is to create a list of websites like xxxporn.xxx or whatever and once user at any time tries to visit that website in any web browser it just kills the request and the user goes no where.
I am good with programming and my problem isn't with code i just want to know from where should i start?
I heard about packet sniffers so how do i do it in C#? all i want is just a demo method or a code sample that shows me the currently vistied websites and kill the request when a predefined website is visited.
I wrote a web crawler and had to deal with filtering out porn on free crawls.
Looks for the following terms:
18 U.S.C 2257
18 U.S.C. 2257
section 2257 compliance
Most pornographic sites have these terms in their html source.
This is not an answer to your request, but rather only ones view on the subject.
Porn is not something that just Pops up while you are surfing regular web sites. like this one for example... Porn is something that you need to look for.
If you have small children and you dont want them to be exposed to things that are not under your control you can simply define all their surfing destinations with windows firewall.
If you have older children and you are afraid that they might wonder off in search for porn due to age or hormonal impulses, or get exposed by simply surfing to all sorts of dubious and pirated websites, you should have a talk with them and explain things in a grown manner and not try to block the reality of life in such medieval way.
In this modern age where internet governs all aspects of our lives, and there are far greater risks out there in cyberspace then porn, proper training and education on what is good and bad when surfing is the key to save kids from all risks and harmful contents.
I apologize that this has nothing to do with programming.
Related
I am working on a project where I have a JSON file and am trying to pass that data through MVC to be displayed on a web page. As part of the JSON file, I have some data that I am trying to pass to my view, it looks like:
"htmlContent": "<p>It was over two decades ago that Bill Gates declared ‘Content is King’ and the consensus still stands today with it arguably being the most important part of designing a website. </p><p>Content is essentially your UX. It encompasses the images, words, videos and data featured across your website. If the main purpose of your site is to share valuable and relevant content to engage your audience, then you should be considering your content long before embarking on a web project. All too often, businesses miss the opportunity to create impressive UX designs, instead waiting until the later stages of the project to sign off content which inevitably creates new challenges to overcome. </p>\r\n<p>Having a research strategy in place that supports a content-first design approach should be at the top of your agenda. When businesses choose to design content-first, they are putting their valuable resources centre stage, conveying their brand through effective and engaging UX design. Throughout this blog, we will share our tips on how you can develop a content-first design approach. </p>\r\n<h2><strong>How to develop a content-first design approach </strong> </h2>\r\n<p>Content can no longer be an after-thought, but there’s no denying that generating content can be a tricky. To get you thinking along the right lines and help put pen to paper, follow our top tips: </p>\r\n<h3><strong>Ask lots of questions</strong> </h3>\r\n<p>Generating content that successfully satisfies what your customers want to know requires a lot of research. Get into the habit of asking open-ended questions that answer the Who, What, Where, When, Why and How. Using this approach will allow you to delve deep and gain an understanding of what your website should include to build a considered site map. </p>\r\n<h3><strong>Consider your Information Architecture (IA)</strong> </h3>\r\n<p>How your content is organised and divided across the website is a crucial aspect of UX design. Without effective sorting, most users would be completely lost when navigating a site and there’s no point having memorable features if they can’t be found! Use card sorting exercises, tree tests, user journey mapping and user flow diagrams to form an understanding of how best to display your content in a logical and accessible way. </p>\r\n<h3><strong>Conduct qualitative and quantitative research</strong> </h3>\r\n<p>Although Google Analytics is extremely useful, it doesn’t hold all the answers. Google Analytics is great at telling you <em>what</em> your users are doing, but it doesn’t give you the insight into <em>why</em> they’re doing it. Qualitative one-to-one user interviews is an effective method of really getting to grips with your user needs to understand why they do what they do. User testing also falls into this category. Seeing a user navigate through your website on a mobile phone in day to day life can give you great insight for UX design in terms of context and situation. </p>\r\n<h3><strong>Align your content strategy with long-term business goals</strong> </h3>\r\n<p>Before beginning your web project, it’s important to understand the goals of the project and the pain points you are trying to solve. Include all the necessary stakeholders within this research to gain a comprehensive understanding of these insights before embarking on your web design project. </p>\r\n<h3><strong>Content first, design second</strong> </h3>\r\n<p>Avoid designing content boxes across your website and trying to squeeze the content into these boxes. When designing a new website, it may seem counter intuitive to begin with a page of words rather than a design mock-up. But, it’s important to remember that Lorem Ipsum isn’t going to help anyone either. Begin with the content your users need and then design out from there. Capturing the content and its structure can be done in many ways; we like to build content models based on IA site maps and qualitative user testing such as card sorting and user journey mapping. </p>\r\n<p>By using a content-first design approach, you can understand what content needs to fit into your website design. Analysing your website’s content needs in the early stages or, even better, prior to the project beginning, can effectively inform and shape all touch points ultimately generating an optimised result with reduced time delays and constraints along the way. If you have a web project in mind and need help on how to get started, get in touch with the team today. </p>",
In the view I am then accessing this JSON data through a foreach loop and can access it like so
#jsondata.htmlContent
This then gets said 'htmlcontent' from the JSON file, when I open the html web page the 'htmlcontext' is not working as I would expect it to, the '<p.>' tag does not display as a paragraph on the web page, instead the content on the web page is exactly the same as the JSON string.
How would I go about doing this and displaying the data in the tags?
I am in a bit of a crisis here. I would really appreciate your help on the matter.
My Final Year Project is a "Location Based Product Recommendation Service". Now, due to some communication gap, we got stuck with an extremely difficult algorithm. Here is how it went:
We had done some research about recommendation systems prior to the project defense. We knew there were two approaches, "Collaborative Filtering" and "Content Based Recommendation". We had planned on using whichever technique gave us the best results. So, in essence, we were more focused on the end product than the actual process. The HOD asked us what algorithms OUR product would use? But, my group members thought that he meant what are the algorithms that are used for "Content Based Recommendations". They answered with "Rule Mining, Classification and Clustering". He was astonished that we planned on using all these algorithms for our project. He told us that he would accept our project proposal if we use his algorithm in our project. He gave us his research paper, without any other resources such as data, simulations, samples, etc. The algorithm is named "Context Based Positive and Negative Spatio-Temporal Association Rule Mining" In the paper, this algorithm was used to recommend sites for hydrocarbon taps and mining with extremely accurate results. Now here are a few issues I face:
I am not sure how or IF this algorithm fits in our project scenario
I cannot find spatio-temporal data, MarketBaskets, documentation or indeed any helpful resource
I tried asking the HOD for the data he used for the paper, as a reference. He was unable to provide the data to me
I tried coding the algorithm myself, in an incremental fashion, but found I was completely out of my depth. I divided the algo in 3 phases. Positive Spatio-Temporal Association Rule Mining, Negative Spatio-Temporal Association Rule Mining and Context Based Adjustments. Alas! The code I write is not mature enough. I couldn't even generate frequent itemsets properly. I understand the theory quite well, but I am not able to translate it into efficient code.
When the algorithm has been coded, I need to develop a web service. We also need a client website to access the web service. But with the code not even 10% done, I really am panicking. The project submission is in a fortnight.
Our supervisor is an expert in Artificial Intelligence, but he cannot guide us in the algorithm development. He dictates the importance of reuse and utilizing open-source resources. But, I am unable to find anything of actual use.
My group members are waiting on me to deliver the algorithm, so they can deploy it as a web service. There are other adjustments than need to be done, but with the algorithm not available, there is nothing we can do.
I have found a data set of Market Baskets. It's a simple excel file, with about 9000 transactions. There is not spatial or temporal data in it and I fear adding artificial data would compromise the integrity of the data.
I would appreciate if somebody could guide me. I guess the best approach would be to use an open-source API to partially implement the algorithm and then build the service and client application. We need to demonstrate something on 17th of June. I am really looking forward to your help, guidance and constructive criticism. Some solutions that I have considered are:
Use "User Clustering" as a "Collaborate Filtering" technique. Then
recommend the products from similar users via an alternative "Rule
Mining" algorithm. I need all these algorithms to be openly available
either as source code or an API, if I have any chance of making this
project on time.
Drop the algorithm altogether and make a project that actually works
as we intended, using available resources. I am 60% certain that we
would fail or marked extremely low.
Pay a software house to develop the algorithm for us and then
over-fit it into our project. I am not inclined to do this because it
would be unethical to do this.
As you can clearly see, my situation is quite dire. I really do need extensive help and guidance if I am to complete this project properly, in time. The project needs to be completely deployed and operational. I really am in a loop here
"Collaborative Filtering", "Content Based Recommendation", "Rule Mining, Classification and Clustering"
None of these are algorithms. They are tasks or subtasks, for each of which several algorithms exist.
I think you had a bad start already by not really knowing well enough what you proposed... but granted, the advice from your advisor was also not at all helpful.
I am working on my mapper and I need to get the full map of newegg.com
I could try to scrap NE directly (which kind of violates NE's policies), but they have many products that are not available via direct NE search, but only via google.com search; and I need those links too.
Here is the search string that returns 16mil of results:
https://www.google.com/search?as_q=&as_epq=.com%2FProduct%2FProduct.aspx%3FItem%3D&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=&cr=&as_qdr=all&as_sitesearch=newegg.com&as_occt=url&safe=off&tbs=&as_filetype=&as_rights=
I want my scraper to go over all results and log hyperlinks to all these results.
I can scrap all the links from google search results, but google has limit of 100 pages for each query- 1,000 results and again, google is not happy with this approach. :)
I am new to this; Could you advise / point me in the right direction ? Are there any tools/methodology that could help me to achieve my goals?
I am new to this; Could you advise / point me in the right direction ?
Are there any tools/methodology that could help me to achieve my
goals?
Google takes a lot of steps to prevent you from crawling their pages and I'm not talking about merely asking you to abide by their robots.txt. I don't agree with their ethics, nor their T&C, not even the "simplified" version that they pushed out (but that's a separate issue).
If you want to be seen, then you have to let google crawl your page; however, if you want to crawl Google then you have to jump through some major hoops! Namely, you have to get a bunch of proxies so you can get past the rate limiting and the 302s + captcha pages that they post up any time they get suspicious about your "activity."
Despite being thoroughly aggravated about Google's T&C, I would NOT recommend that you violate it! However, if you absolutely need to get the data, then you can get a big list of proxies, load them in a queue and pull a proxy from the queue each time you want to get a page. If the proxy works, then put it back in the queue; otherwise, discard the proxy. Maybe even give a counter for each failed proxy and discard it if it exceeds some number of failures.
I've not tried it but you can use googles custom search API. Of course, its starts to cost money after 100 searches a day. I guess they must be running a business ;p
It might be a bit late but I think it is worth to mention that you can professionally scrape Google reliable and not cause problems with it.
Actually it is not of any threat I know about to scrape Google.
It is cahllenging if you are unexperienced but I am not aware about a single case of legal consequence and I am always following this topic.
Maybe one of the largest cases of scraping happened some years ago when Microsoft scraped Google to power Bing. Google was able to proof it by placing fake results which do not exist in real world and Bing suddenly took them up.
Google named and shamed them, that's all that happened as far as I remember.
Using the API is rarely ever a real use, it costs a lot of money to use it for even a small amount of results and the free amount is rather small (40 lookups per hour before ban).
The other downside is that the API does not mirror the real search results, in your case maybe less a problem but in most cases people want to get the real ranking positions.
Now if you do not accept Googles TOS or ignore it (they did not care about your TOS when they scraped you in their startup) you can go another route.
Mimic a real user and get the data directly from the SERPs.
The clue here is to send around 10 requests per hour (can be increased to 20) with each IP address (yes you use more than one IP). That amount has proven to cause no problem with Google over the past years.
Use caching, databases, ip rotation management to avoid hitting it more often than required.
The IP addresses need to be clean, unshared and if possible without abusive history.
The originally suggested proxy-list would complicate the topic a lot as you receive unstable, unreliable IPs with questionable absuive use, share and history.
There is an open source PHP project on http://scraping.compunect.com which contains all the features you need to start, I used it for my work which now runs for some years without troubles.
Thats a finished project which is mainly built to be used as customizable base of your project but runs standalone too.
Also PHP is not a bad choice, I originally was sceptical but I was running PHP (5) as background process for two years without a single interruption.
The performance is easily good enough for such a project so I would give it a shot.
Otherwise, PHP code is like C/JAVA .. you can see how things are done and repeat them in your own project.
Hello people from StackOverflow!
I come to you with yet another question. :)
As stated in some of my previous questions, I'm interested in creating a website that handles jobs and company openings for people to browse. I intend to have a way for people to upload CV's, apply to a position, and have companies post jobs as well.
Since I've never done a project of this scope before, I fear that I may be neglecting certain things that are a must for a web-targeted application.
I realize that is a very broad question, perhaps too broad to even answer. However, I'd really like someone to provide just a little input on this. :)
What things do I need to have in mind when I create a website of this type?
I'm going to be using ASP.Net and C#.
Edit: Just to clarify, the website is going to be local to a country in eastern europe.
Taking on careers.stackoverflow then? :)
One of the biggest things, is not even a technical thing to be thinking about - how are you going to pull in enough users to make the site take off?
It's a bit of a chicken and egg situation - if you don't have recruiters on the site, noone's CV will get viewed. If you don't have CVs listed, recruiters won't use the site. So first and foremost, you need to be thinking about how you will build up a community.
the site must have a good, easy to use, user experience. Make it easy for everyone to achieve what they want.
what makes your site stand out from others? why should people use yours instead of another one?
You could start with the free "Job Site Starter Kit":
http://www.asp.net/downloads/starter-kits/job/
* Enables job seekers to post resumes
* Enables job seekers to search for job postings
* Enables employers to enter profile of their company
* Enables employers to post one or more job postings
First you need a community. It doesn't really matter which one, but it would help if you were also a member of this community. Let's take Underwater Basket Weavers. Then find a problem that this community has or something this community needs to share. Almost invariably it involves information exchange but in some cases it may actually be service based. Then focus your efforts on solving or supplementing that issue. For our Underwater Basket Weavers, we may have a need to share techniques on how to weave specific materials, where to get materials. How could they share this information and how could you make it interesting to them?
Know your audience. Learn their issues. Apply yourself to filling that void.
I need to track only human visits to my article pages. I hear that SiteCatalyst is the best of the best for page tracking. Here is what I am trying to do. I need to track every human visit if possible because this will affect the amount of money i have to pay. I will need to download site statistics for all of my pages with an accurate hit count. Again, I don't want to track spiders/bots. Once I download the site statistics I will use it to update hit counts to each of my articles. Then I will pay my writers according to how many hits they receive. Is SiteCatalyst able to do this. If not, who do you think can do something like this?
Luke - Quick answer there currently is no %100 accurate way to get this.
Omniture's SiteCatalyst does provide a very good tool for acquiring visitor information. You can acquire visitor information from any of the other vendors as well including the free option Google Analytics.
You may have been lead to believe as I had that Omniture strips out all bots and spiders by default. Omniture states that most bots and spiders do not load images or execute JavaScript, which is what they rely upon for tracking. I am not sure what the exact percentage is, but not all bots and spiders act in this way.
In order for you to gain a more accurate report on the number of "humans" you will need to know what the IP address of the visitor is and possibly the user agent. You can populate the agent and IP in PHP with these two variables $_SERVER['HTTP_USER_AGENT'] and $_SERVER['REMOTE_ADDR']. You will then need to strip out the IP address of known bots/spiders from your reporting. You can do this with lists like this: http://www.user-agents.org/index.shtml or manually by looking at the user agent. Beware of relying upon the user agent as the bot can easily spoof this. This will never be %100 accurate because new bots/spiders pop up every day. I suggest looking further into "click fraud".
Contact me if you want further info.
omniture also weeds out traffic from known bots/spiders. But yeah...there is an accepted margin of error in the analytics industry because it can never be 100%, due to the nature of the currently available technology.