I'm working on a project for normalizing URL's.(i.e different URL's that map to the same web page should be identified and redundancy should be reduced as like a search engine).
So I'd like a dataset containing different URL's in order to test my method. Please provide links for normalization dataset(s).
I'm implementing this project in C# and I'd like your suggestions. Thanks in advance.
Since you asked I'd like your suggestions, leaving your question very open and thus open to which kind of suggestion you might get, I will go ahead and give you my suggestions. Though I admit I am not 100% sure what problem you wish to tackle? Are you asking for a program/code specific suggestion? A strategy for how to setup such a project? or do you wish to collect inspirations/idea's and improve your existing workflow? If you are seeking this third thing, I would suggest to take a look into two scenarios, inspired by a lecture that one of my Artificial Intelligence teachers once gave. Lets dive for a moment to how Ant colonies organise themselves:
top-down approach: a fantasy Imagine a queen in an antcology prescribing for each and every ant their routes to the sub colonies and thereby normalising multiple trace routes that varous ants all undertake to go to the same place, then it seems you want to group the ants together and let each group use just 1 route to their goals, and remove possible duplicate routes. This is one way how to make their routes more efficient. In reality ants actually work differently :
bottom-up approach: the reality:
A single ant has little meaning, but when a whole ant colony is studies, an organisation reveals. Thi sis because the ants themselves follow the scent traces of other ants, that way following eachother and ultimately finding their way to the nest. This way, the cleverness does not need to come from above/from a central database, but a tiny bit of intelligence built in each ant will make the same path re-useable. >> In this way you might want to think building your normalisation technique within each hyperlink that needs to be normalized.
I hope this can give you the suggestions you wished, otherwise if your question was not strategy based but specific code-problem related, ask question with program code in it, that is often much easier to solve than finding the best strategy. Good luck! My 2 cents.
Related
I am implementing full text search on a single entity, document which contains name and content. The content can be quite big (20+ pages of text). I am wondering how to do it.
Currently I am looking at using Redis and RedisSearch, but I am not sure if it can handle search in big chunks of text. We are talking about a multitenant application with each customer having more than 1000 documents that are quite big.
TLDR: What to use to search into big chunks of text content.
This space is a bit unclear to me, sorry for the confusion. Will update the question when I have more clarity.
I can't tell you what the right answer is, but I can give you some ideas about how to decide.
Normally if I had documents/content in a DB I'd be inclined to search there - assuming that the search functionality that I could implement was (a) functionally effect enough, (b) didn't require code that was super ugly, and (c) it wasn't going to kill the database. There's usually a lot of messing around trying to implement search features and filters that you want to provide to the user - UI components, logic components, and then translating that with how the database & query language actually works.
So, based on what you've said, the key trade-offs are probably:
Functionality / functional fit (creating the features you need, to work in a way that's useful).
Ease of development & maintenance.
Performance - purely on the basis that gathering search results across "documents" is not necessarily the fastest thing you can do with a IT system.
Have you tried doing a simple whiteboard "options analysis" exercise? If not try this:
Get a small number of interested and smart people around a whiteboard. You can do this exercise alone, but bouncing ideas around with others is almost always better.
Agree what the high level options are. In your case you could start with two: one based on MSSQL, the other based on Redis.
Draw up a big table - each option has it's own column (starting at column 2).
In Column 1 list out all the important things which will drive your decision. E.g. functional fit, Ease of development & maintenance, performance, cost, etc.
For each driver in column 1, do a score for each option.
How you do it is up to you: you could use a 1-5 point system (optionally you could use planning poker type approach to avoid anchoring) or you could write down a few key notes.
Be ready to note down any questions that come up, important assumptions, etc so they don't get lost.
Sometimes as you work through the exercise the answer becomes obvious. If it's really close you can rely on scores - but that's not ideal. It's more likely that of all the drivers listed some will be more important than others, so don't ignore the significance of those.
I wasn't quite sure how to word this question, as this is a field in which I am not very familiar, and I'm seeking less of a specific solution and more of what I should be looking to learn to better understand the problem...
if this is to be closed as a result, please suggest ways I can better express the question as I would very much like to get some input.
Basically the problem is this: I have a several different tables of data, each of which identifies different properties of a user. For example, one table might define a users demographic data (gender, location, etc.), another their interests, and another perhaps their favorite songs.
I want to be able to issue different searches of this data via an application running asp.net mvc, but rather than find specific matches (such as say a song title), I want to be able to do something like "women who like burgers and live in texas".
clearly this is a more dynamic search than just a simple keyword because the criteria can vary both by which data is being searched, what combinations of data is being aggregated, and what actually constitutes a match on each parameter.
If I want to research the different ways something like this can be accomplished, what should I look for? is this something Functional Programming could help resolve? or perhaps dynamic LINQ? i've seen some docs on expression trees which went completely over my head, but looked promising. however I wasn't sure this would fit because the data may change as well (such as new tables being added) and I'm not sure if that is something that needs to be fully defined ahead of time.
What concepts, algorithms and patterns should I explore that might help me create such a system?
I'm happy to learn, but this is something I'm completely in the dark about and don't even know where to begin, so any introductory concepts that I can start exploring would be greatly appreciated.
EDIT: I just realized I missed one important requirement which is that these searches also need to be saved. so in addition to dynamically searching the data, I also need a way to persist these searches.
the closest thing I can think of that does something like this is say a CRM or Project Management tool which lets you build queries on the fly and save them to be run on demand or on a schedule...
what are some of the strategies that these systems use? the more time i spend researching Dynamic LINQ the better it seems but I'm not sure if I am on the right track.
I've got an ASP.NET web application, that is essentially our intranet site. I made a lot of progress on the administration office's employee management pages. It ties into an SQL server database, and I'm using a three layered design (Objects, Logic, DataAccess). It was all reviewed and all of it was accepted, except! for the part that manages vacations and vacation histories.
My question, before I go into details is, how does one efficiently "untangle" code that is no longer necessary?
For example: previously I was treating each VacationDay as it's own entity with it's own history. Such that I could track the history of an individual day. To help in tracking, I have an enum called VacationDayAction, which includes options such as .Submitted, .RequestDenied, .CancellationRequested, and so on. This was in an attempt to provide meticulous detail for each day. It was then determined that we no longer need that. We do, however, still need VacationDays and all the basic functions of that (saving days, getting days, etc.), but now we no longer need any of the "history" related classes.
My problem is, when I right click a class that I no longer need in VS and go to "Show All References," I get a ton of results scattered across several pages. I need to get rid of all of them, without breaking the rest of the application. Is there not some kind of "smart" technique or method for easily untangling parts that are no longer necessary? This is particularly difficult because 90% of what I did was just fine, and needs to stay like it is. Yet scattered in that 90% is 10% of stuff that is no longer needed. I can't just go storming through with the delete key either, because with the removal of each reference, I need to be sure that any dependencies on that reference are also fixed in a way that they don't call stuff that isn't there anymore. And I still need the application is a compilable state, so that I can test along the way that the rest of the application didn't fall apart as a result of some deletion.
To give you an idea of my low level of experience, I started two years ago with having never used C#, ASP.Net, or Visual Studio. It blew my mind when, way after starting and as I was learning, someone taught me that I could use breakpoints. And then it really really blew my mind when I learned about multi-layered design. I'm wondering if there is not some technique or trick or feature that can help in scenarios like this, where you have to "untangle" and throw away unnecessary stuff.
This is not a simple question. In fact, I would say this is one of the major challenges for any systems developer; how to handle and get rid of old code which is not in use. There is lots of literature on this, and few really excellent answers. A good book may be "Working effectively with legacy code" by Michael Feathers, which deals with many related problems. It is no light read though, and will probably take some time to get through, but it will likely help you become a better coder, and better at these kinds of tasks in particular.
Maybe you can have a look at the Resharper tool? ( http://www.jetbrains.com/resharper/ ) It is a productivity tool which among other things shows "dead" code (unused code) in grey, and lets you remove it. It will also help you remove unused references from each class (again, they will be grayed out and let you remove them automatically).
Drawing diagrams where each major piece of code /component is a box with a line linking it to any related component might help you get a better overview; try to draw a hierarchy showing how different parts of the code are related and dependent.
The bottom line as far as I know, is that you just have to muddle through it, commenting out code a little at a time, then recompiling and testing it. If it still works, fine, now you can remove the commented out code completely. This would be easier if you had unit-tests covering your code, but I take it as a given that you don't, as is unfortunately often the case.
Hello people from StackOverflow!
I come to you with yet another question. :)
As stated in some of my previous questions, I'm interested in creating a website that handles jobs and company openings for people to browse. I intend to have a way for people to upload CV's, apply to a position, and have companies post jobs as well.
Since I've never done a project of this scope before, I fear that I may be neglecting certain things that are a must for a web-targeted application.
I realize that is a very broad question, perhaps too broad to even answer. However, I'd really like someone to provide just a little input on this. :)
What things do I need to have in mind when I create a website of this type?
I'm going to be using ASP.Net and C#.
Edit: Just to clarify, the website is going to be local to a country in eastern europe.
Taking on careers.stackoverflow then? :)
One of the biggest things, is not even a technical thing to be thinking about - how are you going to pull in enough users to make the site take off?
It's a bit of a chicken and egg situation - if you don't have recruiters on the site, noone's CV will get viewed. If you don't have CVs listed, recruiters won't use the site. So first and foremost, you need to be thinking about how you will build up a community.
the site must have a good, easy to use, user experience. Make it easy for everyone to achieve what they want.
what makes your site stand out from others? why should people use yours instead of another one?
You could start with the free "Job Site Starter Kit":
http://www.asp.net/downloads/starter-kits/job/
* Enables job seekers to post resumes
* Enables job seekers to search for job postings
* Enables employers to enter profile of their company
* Enables employers to post one or more job postings
First you need a community. It doesn't really matter which one, but it would help if you were also a member of this community. Let's take Underwater Basket Weavers. Then find a problem that this community has or something this community needs to share. Almost invariably it involves information exchange but in some cases it may actually be service based. Then focus your efforts on solving or supplementing that issue. For our Underwater Basket Weavers, we may have a need to share techniques on how to weave specific materials, where to get materials. How could they share this information and how could you make it interesting to them?
Know your audience. Learn their issues. Apply yourself to filling that void.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
We're hiring a .NET developer soon, and I was assigned to create a test, which would take aprox: 1h to solve. A test which would test the programmers knowledge in (mainly) C# and ASP.NET.
This is what i've come up with so far:
Use project #1 to read data(HTML) from the specified URL and output all links(anchors) containing anchor name “xxxxxxxxx”. You are free to use 3rd party libraries. My main thought here was to test how the developer would go about solving the problem. For example:
Create a regex which would parse all the data needed.
Create a DOM-tree and use XPATH to find all anchor nodes.
Iterate the whole string and perform manual string compares.
Create a new solution where you demonstrate the usage of .NET masterpages.
Connect the solution to the ******** database. And output all customers from the “********_customers” table.
Create a new button which refreshes all users using AJAX.
Pretty basic stuff. Though, I also added the one below. Mainly to test the developers OO knowledge. Do you think this is too "overkill", or what kind of test would you suggest? If you were to hire a ASP.NET developer, what would your main focus be? ADO.NET? IO? string handling?
Create an interface/abstract class implementation demonstrating the functionallity of either the Factory, Factory Method, Command or Decorator pattern. You wont need to implement any functionallity, just use comments in your abstract class.
Thanks in advance!
The task you gave is essentially a day or two worth of coding if you want to have reasonably readable code. Within an hour I guess I would do it, but you'd have to read code that has cryptically named methods, unreadable regexes, weird callbacks, no error handling and overall is pretty darn ugly. Looking at it you would not hire me.
Before you give your question to candidates, first make sure that your peers/programmers can do it first. And that you can code it in less than 60 minutes in a way that would satisfy you.
That said, I do not know if test is the best choice for hiring anyone. A few interviewing bloggers wrote about their experience coming from conducting tons of interviews:
Guerilla Guide to Interviewing by Joel Spolksy
Truth about interviewing, Get that job at Google (and many others) by Steve Yegge
I totally agree with them. Having conducted about a gazillion of interviews myself, I find that asking basic technology related questions is not nearly as good as asking to implement a bit of recursion or pointers (if someone claims to know C/C++).
By hiring someone who understands recursion/algorithms you get a smart guy who can learn new technology. When you hire someone who knows how to connect to a database, who knows how to connect to a database but not necessarily qualified to do much more than that.
There are a few sources of good programming questions that are somewhere between coding and algorithms that may inspire you. They do not test .NET at all, but are very good indicator of smart programmers.
Top Coder
Google Code jam
Within 1 hour you can only test his programming skills, but it's not enough to write the code sample.
Take a look at this C# / ASP.NET MVC test:
http://tests4geeks.com/test/asp-net-mvc-c-sharp
After the applicant will pass the test and result will be good, then invite him to the interview and talk about his experience. Ask about most difficult features, that he implemented in his projects. In other words, you must understand, if he know and can do enough to take part in your project.
If you still want to ask him to write some code. That is some idea:
There are the students and subjects. Please ask to write 3 pages (asp .net mvc or web-forms). First and second - for editing the dictionary of students and subjects. Third form must contain be the table. The students are in left column. The subjects are in the top row. The marks are at the intersection. Each mark can be edited (text box) and saved. Saving could be implemented by clicking the common button "Save". Or it could save each cell automatically using the Ajax.
This is very simple example, but it would show you how user writes the code, what techniques does he use.
I would have thought that it would be better to simply create a test that would make it easy for you to put developers into different 'skill buckets'.
Why not have three or four sections or features that the developer must 'layer' features on top one another to show their programming and design skills.
Part 1: Implement x easy difficulty
features.
Part 2: Implement x medium difficulty
features.
Part 3: Implement x difficult
features.
Part 4: Implement x very difficult features.
And give the developer 1 hour to write the application. Make it realistic that they can implement the features in the given time frame.
As Joel and Jeff say on the Stackoverflow podcast, there is a direct correlation between developer skill and speed.
Think about the way exams are structured? We can all get 100% of the questions correct in any exam we sit if we had infinite time, but in 1 hour?
This way, If a developer takes your test and only implements features up to Section 2 in the time period, then you should have a safe indication that they are not suitable for the job. Section 3 features all done then they are good enough and section 4 complete would indicate that they are very experienced and a slight cut above the rest.
However I would also look at the overall polish that the developer has given to the code. If they implemented all features up to section 4, but poorly, then they are also not going to be someone you want. IF a developer only did up to section 3 but implemented everything very elegantly, then I would want to hire them.
I also think that 1 hour is perhaps a little too long. I would aim for 10-40 minutes obviously you may need to cut out section 4 that I proposed.
You should check
GeekInterview -- a good source for interview questions
There are hundreds of questions.
I think you would be much better off coming up with a single question that will allow you to see more than just development skills using your target technologies. Strong problem solving skills are as important as expertise in a specific technology stack.
I would even recommend that you explore the two aspects of a candidate in different parts of the process. I usually ask a bunch of questions about the technology stack we are using on our project to gauge the candidates level of knowledge as it relates to that stack.
Then I ask them a pure problem solving question and I allow them to use whichever technology they are most comfortable with to solve the problem (their choice of technology can be an important indicator).
I particularly like Graph Theory related problems. The candidates solutions will tell you a ton about how they approach, solve problems as well as how they validate their solutions.
As part of the problem solving portion of the interview you should be looking for:
Proper data structure design
Implementation of OO best practices
Proper solution (can they debug problems effectively... one great way to see this is do not allow them to use a computer, make them code on a whiteboard and debug in their heads)
Proper solution validation (do they come up with test cases)
My 2 cents:
We have a programming test in my company that is easy. Basically, you have to implement the listener pattern extending the ArrayList class, create unit tests for it (based on at least what we require), document the corner cases, document the program itself if you want to, and then send the test back to us.
A developer has 48 hours to complete that test. We ask for production quality in the test. We want to test the following items:
Was the developer smart enough to cover the corner cases?
Is the developer implementation of multi-threading satisfactory?
Are the unit tests good enough? Do they cover enough cases?
Is the code well written and documented? Will someone be able to maintain that code in the future?
Does he care about his code? Did he explain why he did "A" and not "B"?
I don't think short tests are capable of evaluating a developer. You may ask for a tool or technology that someone have not been using in the past months, and whoever is being tested for that technology will need sometime to get up to speed - but if a developer was working with that the day before, he will know by memory how to use it, and he/ she will seem smarter than the other developer, what may not be true.
But if you ask for something that is tricky and you are interviewing the developer, you can check how he is going to solve the problem - I don't think it really matters if he/ she cannot get the 100% right answer, as long as he/ she can talk about the problems that you found on the code and show that they actually understand whatever you explained to them.
In the past we have used problems from Google code jam. the problems in the early rounds are easier and they get gradually harder. They are kind of algorithmic in nature, you can solve them in whatever language you like. As they get harder there is often an obvious 'brute force' kind of answer that won't work because of the size of the data. So you have to think of something more optimal.
The first test you suggested should take 10min-40min for a basic dev - I would use a web-crawler I have in my library that converts HTML to XML then easily use Linq to XML.
I would test for lambda expressions, performance patterns maintain files, or writing an object to several files dynamically.
Maybe you would like to test unmanged code, pointers etc.
I donno, im just writing-jabbering while things are comin up to my mind, i wrote things that was hard for me to implement.
few days ago I was invited to pass C# programming test at skillbox website there was 30 questions quiz and 45 time to pass it. Below is some of them:
1) What will be printed by running the program?
#if DEBUG
Console.WriteLine("DEBUG");
#else
Console.WriteLine("RELEASE");
#endif
2) What will be the result of calling SomeMethod():
public static void SomeMethod()
{
string s1 = "a";
string s2 = "b";
Swap(ref s1, ref s2);
Console.WriteLine(s1);
Console.WriteLine(s2);
}
public static void Swap(ref Object a, ref Object b)
{
Object t = b;
b = a;
a = t;
}
Here is a link for reference, I think you can find more C# quezzes there http://skillbox.io