Strategies for implementing a dynamic, extensible search system

Strategies for implementing a dynamic, extensible search system - c#

I wasn't quite sure how to word this question, as this is a field in which I am not very familiar, and I'm seeking less of a specific solution and more of what I should be looking to learn to better understand the problem...
if this is to be closed as a result, please suggest ways I can better express the question as I would very much like to get some input.
Basically the problem is this: I have a several different tables of data, each of which identifies different properties of a user. For example, one table might define a users demographic data (gender, location, etc.), another their interests, and another perhaps their favorite songs.
I want to be able to issue different searches of this data via an application running asp.net mvc, but rather than find specific matches (such as say a song title), I want to be able to do something like "women who like burgers and live in texas".
clearly this is a more dynamic search than just a simple keyword because the criteria can vary both by which data is being searched, what combinations of data is being aggregated, and what actually constitutes a match on each parameter.
If I want to research the different ways something like this can be accomplished, what should I look for? is this something Functional Programming could help resolve? or perhaps dynamic LINQ? i've seen some docs on expression trees which went completely over my head, but looked promising. however I wasn't sure this would fit because the data may change as well (such as new tables being added) and I'm not sure if that is something that needs to be fully defined ahead of time.
What concepts, algorithms and patterns should I explore that might help me create such a system?
I'm happy to learn, but this is something I'm completely in the dark about and don't even know where to begin, so any introductory concepts that I can start exploring would be greatly appreciated.
EDIT: I just realized I missed one important requirement which is that these searches also need to be saved. so in addition to dynamically searching the data, I also need a way to persist these searches.
the closest thing I can think of that does something like this is say a CRM or Project Management tool which lets you build queries on the fly and save them to be run on demand or on a schedule...
what are some of the strategies that these systems use? the more time i spend researching Dynamic LINQ the better it seems but I'm not sure if I am on the right track.

Related

[Full Text Search]Implement Full Text Search

I am implementing full text search on a single entity, document which contains name and content. The content can be quite big (20+ pages of text). I am wondering how to do it.
Currently I am looking at using Redis and RedisSearch, but I am not sure if it can handle search in big chunks of text. We are talking about a multitenant application with each customer having more than 1000 documents that are quite big.
TLDR: What to use to search into big chunks of text content.
This space is a bit unclear to me, sorry for the confusion. Will update the question when I have more clarity.

I can't tell you what the right answer is, but I can give you some ideas about how to decide.
Normally if I had documents/content in a DB I'd be inclined to search there - assuming that the search functionality that I could implement was (a) functionally effect enough, (b) didn't require code that was super ugly, and (c) it wasn't going to kill the database. There's usually a lot of messing around trying to implement search features and filters that you want to provide to the user - UI components, logic components, and then translating that with how the database & query language actually works.
So, based on what you've said, the key trade-offs are probably:
Functionality / functional fit (creating the features you need, to work in a way that's useful).
Ease of development & maintenance.
Performance - purely on the basis that gathering search results across "documents" is not necessarily the fastest thing you can do with a IT system.
Have you tried doing a simple whiteboard "options analysis" exercise? If not try this:
Get a small number of interested and smart people around a whiteboard. You can do this exercise alone, but bouncing ideas around with others is almost always better.
Agree what the high level options are. In your case you could start with two: one based on MSSQL, the other based on Redis.
Draw up a big table - each option has it's own column (starting at column 2).
In Column 1 list out all the important things which will drive your decision. E.g. functional fit, Ease of development & maintenance, performance, cost, etc.
For each driver in column 1, do a score for each option.
How you do it is up to you: you could use a 1-5 point system (optionally you could use planning poker type approach to avoid anchoring) or you could write down a few key notes.
Be ready to note down any questions that come up, important assumptions, etc so they don't get lost.
Sometimes as you work through the exercise the answer becomes obvious. If it's really close you can rely on scores - but that's not ideal. It's more likely that of all the drivers listed some will be more important than others, so don't ignore the significance of those.

Using application data structures other than xml

I'm designing a survey tool. The survey will be very static and because of that, I can avoid building some kind of table-driven survey designer to accommodate the 167 questions on the survey (all 1-5 rating questions in a radio box or checkbox layout).
I was thinking of building the survey questions in a large XML file, but my non-technical co-worker that will be making frequent edits to the survey will likely do things that will break the integrity/validity of the raw xml file (think punctuation and special characters).
The XML file might look something like:
<questions>
<question>
<type>checkbox</type>
<text>Which beers do you like most</text>
<choices>Bud,Miller,Piels</choices>
<Required>true</Required>
</question>
<question>
<type>radio</type>
<text>Which beer is your favorite</text>
<choices>Bud,Miller,Piels</choices>
<Required>true</Required>
</question>
</questions>
Please use your imagination that this structure will be a bit more complex and that there will be 165 more questions.
Complicating matters, I need these questions in some form of object-oriented layout so that I can take the results and align them to other stuff. I had considered hard-coding a very lengthy survey form with 167 questions, but I need the data in blocks so that I can parse out question 37 and align it to something else in some other feature, that is related to question 37.
Here's what I'd like to do in a .Net app:
Define a enumerable class for this.
Do something where I can manually fill an enumerable collection of this class with all of the data I need. Using the p-code that would be familiar in my .asp world . . .
questions q = new questions()
q.type = "checkbox";
q.text = "which beers do you enjoy"'
q.choices = "Bud,Miller,Peils";
q.required = true;
q.add
q.type = "radio";
q.text = "what is your favorite beer";
q.choices = "Bud,Miller,Peils";
q.required = true;
q.add
My hope is that this .cs file (though foreign looking to the lay person) would be much easier for my co-worker to maintain, without me having to worry about syntax errors.
So, I guess what I'm looking for some feedback on:
Is this just a dumb idea. Should I do this in XML and I'll just consume the XML file and be done with it.
WWYD - What would you do? Is there an easier way to do this?
I don't care about performance as a relatively small number of users are using this.
I don't care about maintainability, because we will write this feature properly in the summer.
I just need to create a data structure that is not in a DB and that can be maintained by a non-technical person with a text-editor (for now).
If anyone made it this far, I appreciate it.

Everyone uses Excel...so consider using a CSV format which can be read by you as well as Excel which your counterpart will be using. One must specify to the user that the columns can't be changed, which is not a drawback per-se, but the user exports the dynamic changes to CSV which the program reads and can verify.
Plus the user does not have to be trained to use Excel so it is a win/win situation per your requirements not to use XMl.

As permanent store XML is good.
But that does not mean the user needs to edit the XML directly.
I would build the ability to edit, add, and delete the questions in the app.
Yes a bit a trouble but if they hack the XML then that is also a lot of trouble.
How do you plan to save survey results?
How do you plan to collect the survey results?
There is more to this project than you are realizing.
Do you need to combine results from more than one device?
If more than one device then you need to separate the questions from the results so you can update the questions on more than one device.
There are tools to read and write XML to disk.
Reading XML with the XmlReader
I don't agree with doug that you need to embed a database.
For a small number of questions I would use XML.
I would read all the XML into an object collection (A List).
You don't need a class the implements IEnumerable.
You put you objects in a a collections that implements IEnumerable.
I would go WPF over WinForms.
A ListBox with a DataTemplate.
On the DataTemplate you can have a dynamic selector in code behind but that is a real hassel.
Consider a single template that you manipulate in code behind.
So they are not RadioButtons but you uncheck the others in code behind.
For filtering I would go LINQ in public properties but there is also CollectionViewSource.
Used XML for an app that was used to collect field measurements.
A lot like this in measuring devices could change and need to collect the measurements.
If you are set on user editing the questions directly then XML with XSD is the best I can think of.

If you are looking for a simple human readable structured format, then you could be interrested by YAML.
YAML is a human-readable data serialization format that takes concepts
from programming languages such as C, Perl, and Python, and ideas from
XML and the data format of electronic mail.
Your question file would look like this:
questions:
- id: 1
type: checkbox
text: Which beers do you like most
choices: Bud,Miller,Piels
Required: true
- id: 2
type: radio
text: Which beer is your favorite
choices: Bud,Miller,Piels
Required: true
Some YAML libraries exists in .NET (from the article):
https://github.com/aaubry/YamlDotNet
http://yaml.codeplex.com/
http://www.codeproject.com/Articles/28720/YAML-Parser-in-C
http://yaml-net-parser.sourceforge.net/

There are plenty of xml editing tools out there that will actually make it easier to edit than editing a text file directly. I use XML Marker and it's pretty easy to use. http://symbolclick.com/
It will be quicker to train them to edit using the tool than it will be to build one.

Two answers here;
a: Write it to allow a proper admin interface, using a database to allow admin users to add/edit questions, response options and include appropriate security, auditing etc. You mention that this may not be feasible in the short term or that a 'proper' feature will be added soon, in which case, scrap this!
b: People say they have frequent edits/changes to make, but is this not a requirement which is co-related to a complete feature? Could you not in the short term, accept manual requests for change via email or something else documented, and make them yourself? Do you think the time taking to add a question/response or change some wording would be less than needing to parse XML manually to find a syntax error from someone who isn't familiar?
You'll need to weigh up frequency of change with impact to yourself of making a change vs likelihood of user error, vs estimated time needed to identify and resolve a syntax error (plus the possible bad-will of having a change break things).
Despite what some people think, users don't like making mistakes! putting them in a position where they have admin level powers over a system they don't have a full technical grasp of, could reduce confidence and future buy-in to the feature you're due to develop.
TLDR; In my opinion, unless it's a major hassle, do the changes yourself in the short term, perhaps with a maximum amount of time you'll make them (I make one change set a week, on a Friday for example). Keep the system working perfectly, and involve the users without putting them in an uncomfortable position being an non voluntary early adopter for a feature which isn't finished.

I used my complete mastery over winforms to create a little mock GUI application that enables users to quickly create one dimensional non conditional lists of questions with different question types.
Once you decided on an xml scheme you can easily import and export xml files.
Are you interested in further development of the magical survey creator? If so tell me and I will send you a practically finished prototype tomorrow morning. (You should provide me with an xml scheme though, otherwise I will do it in CSV)
I enjoy the exercise.
Picture related. Don't be put off by the colors, that's how I like it during development, to see the pixel exact boundaries of controls.
Unless your coworkers have some experience with programming or xml editing they will hate you if you instruct them to edit any sort of "code".
Our secretaries put their hand in front of their faces and start chanting "no, no, no..." when I tell them how to operate VBA macros.

Difference between Mapping and Collections C# and what to use

ORGINAL SPEAL
Ok so im kind of in a pickle and maybe I can get some clarification and advice here. Im not much familiar with C# but I have done something similar in a Groovy/Grails web application.
So my issue. I have two objects. One is a Shipper(holds basic info on a shipping company). The other is a Vehicle object(Holds information about shipping vehicles). Now a Shipper can have numbers types of shipping vehicles. These are going to be populated in a local sql database in visual studio 2010. I was going to put all this information into one shipper object and one shipper table. Instead I am going the route of making the two different objects. My issue is bringing them back together from the db and linking them into one joined object. What I have done in a Groovy/Grails web application (with help) was Map two objects, A user and a role which came from 2 different tables, together off an id (I'm don't know if I fully understand this but I'm working with it - There was a lot of hand holding)
So in C#. I was looking at mapping and I'm not sure if this is needed or how to go about it. Taking Shipper + Vehicle and making a Shipper/Vehicle object. My lack of understanding in the subject I think makes these seem really trivial.
Or would this be something that I would need an collection for? Making a collection of these two objects.
So maybe a clarification on the two when it comes to C# or at least a Simplified explanation of the two and maybe some basic implementations of each.
I don't need anything too extensive. Im having some crazy coders block on this one for some reason (Might be the copious amounts of red bull and coffee).
Again I lack a lot of knowledge on these datatypes and c# in general.
Ill be monitoring this and updating as I personally progress or more questions/flaming arise. I just need opinions and help to get past this blocker.
EDIT/NEW SPEAL
Ok. So since I don't even know where to start. And given the information above.
I do not understand the difference between Collections and Mapping in C#(or any language at that matter) even after looking up the two. They seem similar to me.
so NEW QUESTION: In this situation, would you use a Map or a Collection. A "Why" would be nice but not needed I guess if thats asking too much.
If I can get that answered then I will be happen and try to go figure it out. I just don't wanna go down a rabbit hole that went the wrong way. I understand the hate of asking a question without showing what iv done. But I have not got that far because of this underlying question. Sorry for the "ignorance" but I would really like to understand which path to at least start down in this situation. I wasn't asking for "hey code this for me". Examples would surely help but a decent explanation would of been nice at least. But I guess ill just ask a yes/no, do this/that question and Ill take it from there.
-Sorry?

I'm in the middle of learning my C#, so I don't understand what you mean by the Map datastructure. Here's a very informative site regarding collections that will tell you about how to use each datastructure:
http://csharp.net-informations.com/collection/csharp-collection-tutorial.htm
If I knew what you mean by Map, I might be able to help you further.

Dataset for URL normalization

I'm working on a project for normalizing URL's.(i.e different URL's that map to the same web page should be identified and redundancy should be reduced as like a search engine).
So I'd like a dataset containing different URL's in order to test my method. Please provide links for normalization dataset(s).
I'm implementing this project in C# and I'd like your suggestions. Thanks in advance.

Since you asked I'd like your suggestions, leaving your question very open and thus open to which kind of suggestion you might get, I will go ahead and give you my suggestions. Though I admit I am not 100% sure what problem you wish to tackle? Are you asking for a program/code specific suggestion? A strategy for how to setup such a project? or do you wish to collect inspirations/idea's and improve your existing workflow? If you are seeking this third thing, I would suggest to take a look into two scenarios, inspired by a lecture that one of my Artificial Intelligence teachers once gave. Lets dive for a moment to how Ant colonies organise themselves:
top-down approach: a fantasy Imagine a queen in an antcology prescribing for each and every ant their routes to the sub colonies and thereby normalising multiple trace routes that varous ants all undertake to go to the same place, then it seems you want to group the ants together and let each group use just 1 route to their goals, and remove possible duplicate routes. This is one way how to make their routes more efficient. In reality ants actually work differently :
bottom-up approach: the reality:
A single ant has little meaning, but when a whole ant colony is studies, an organisation reveals. Thi sis because the ants themselves follow the scent traces of other ants, that way following eachother and ultimately finding their way to the nest. This way, the cleverness does not need to come from above/from a central database, but a tiny bit of intelligence built in each ant will make the same path re-useable. >> In this way you might want to think building your normalisation technique within each hyperlink that needs to be normalized.
I hope this can give you the suggestions you wished, otherwise if your question was not strategy based but specific code-problem related, ask question with program code in it, that is often much easier to solve than finding the best strategy. Good luck! My 2 cents.

Shorter naming convention for types

I am developing a framework, and some of the objects have reaaally long names. I don't really like this, but I don't like acronyms either. I am trying to come up with a shorter name for "EventModelSocket", basically a wrapper around the .Net socket class that implements various events, and methods to send files, objects, etc. Some of the objects have really long names due to this, such as "EventModelSocketObjectReceivedEventArgs" for example.
I've tried everything from a thesaurus, to a dictionary to sitting here for hours thinking.
When you come upon situations like this, what is the best way to name something?

Push some of it into the namespace.
For example:
EventModelSocketObjectReceivedEventArgs
becomes
EventModel.Sockets.ReceivedEventArgs

Well, are the long names hurting something?
(edit) two other thoughts:
use var in C# 3.0 - that'll save half the width
if you are using the type multiple times in a file, consider a type alias if it is annoying you:
using Fred = Namespace.VeryLongNameThatIsBeingAnnoying;

I would just suggest using the most concise naming that describes the object.
If EventModelSocketObjectReceivedEventArgs does that, move on.
My 2 cents.

Years ago when I was in a programming class, the prof quoted the statistic that a piece of code is typically read 600 times for each single time it got modified. Nowadays, I would assume that this is no longer true, particulary in TDD environments where there's lots of refactoring going on. Nevertheless, I think a given piece of code is still read many more times than it gets written. Therefore, I think the maxim that we should write for readability is still valid. The full form of a word in a name is more readable, since the brain doesn't have to do the conversion. Comprehension is faster and more accurate.
The tools we have today make this so easy with autocompletion and the like. Because of this, I use full words in variable names now, and I think it's a good way to go.

If you need to go through that much effort to find an alternative name, you already have the correct name. Object/method/property names should be self documenting. If they do not describe their exact purpose they are misnamed. There is nothing wrong with long names if they give the most clear understanding of the purpose of that object.
In this age of intellisense and large monitors there really is no excuse to not be as descriptive as possible in naming.

Don't remove the vowels or something crazy like that.
I'm with the "stick with the long name" people.
One thought is that if the names are that awkward, maybe some deeper rethinking of the system is needed.

I for one use the long name. With intellisense typing out the name isn't that important, unless you are using a 15 inch monitor.
If I had to reduce the name I might go with EvtMdlSck just make the work shorter but still understood. Even though that is not my preference.

Some criticisms on your naming...
Why DOES your component have the word "model" in its name - isnt that a bit redundant.
Since your component seems to be a messaging hub of some sort why not include
Message in its name. What about MessageSender.
To solve your problem I would create an interface and given it a generic name like
MessageSender and an implementation which is where you include the technology within the name like RandomFailingSocketMessageSender.
If one wishes to get a good example of this take a look at the Java or .Net libraries..
from Java.
interface - class/implementations...
Map - HashMap, LinkedHashMap.
List - LinkedList
Details regarding the technology or framework used eg words like "Socket" or perhaps to use a contrived example "MQSeries" shouldnt be part of the interface name at all.
MessageSender seems to IMHO sum up the purpose of your component. It seems strange that your thing which sends "files" and "events" doesnt include the those two descriptive words. The stuff your using in your naming is superfluous and IMHO doesnt match your description of the component.

In general I believe in classnames that accurately describe their function, and that's it's OK to have long names. If you think the names are really getting long, what I would suggest is finding a concept that is well-known to your programming team and abbreviating that. So if "Event Model Sockets" are a concept that everybody knows about, then abbreviate them to EMS. If you've got a package that is entirely about Event Model Sockets then abbreviate them to EMS in all the classes internal to that package. They key here is to make sure the name is in full for anyone who might not be familiar with the concept and abbreviated for anyone who is.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.