I'm working on somekind of "social network" and have created my own comment system for every post, since large number of posts and comment is expected I'm not sure if sql is the best approach here.
Currently I have two tables on Sql db:
Posts:
Columns: ID,PostText,DateTime,Username(that posted)
Comments:
Columns: ID,PostID(post which it belong),CommentText,DateTime,Username(that Commented)
I would also like to implement "Like system" but then I would have to make new table?
exemple:
Like:
Columns: ID,PostID,Username(that liked) - to prevent double voting
There might be alot of refresing and I'm little worried that Sql might start giving me headaches with high traffic on website.
But I need advice from someone that had experiences with both sql and permalinks.Which works better given that permalinks have ability to be crawled for links. Im also not sure if large number of files(permalinks) would be trouble for my host.
If you think permalinks are better can you explan a way to format them and sort somehow using folder hierarchy since I've never worked with it.
Thanks!
You really need a new table for likes.
I think if you have high traffic, you should have an additional history table and you have to keep old likes (for example older than a week) here or try to partition your table.
But these will the first steps there will be a lot of work with really high traffic sites.
Morzel
Related
I am trying to understand some basics of Lucene, the full text search engine. More specifically I am looking at Lucene.Net.
Today I have an old legacy .NET 4.8 web app. Some is MVC, but the newer parts follow a pretty nice API first pattern. The app holds a lot of records (app half a million) with tons of different fields. The search functionality there is outdated to say the least. It is a ton of old Linq2SQL queries that fan out in like queries.
I would like to introduce a new and better way to search records, so I started looking at Lucene.Net. But I am trying to understand one key concept, and I can't seem to find the answer anywhere, and I think it might be because it cannot be done, but I would like to make sure.
Is it possible to set up Lucene to monitor a SQL table or view so I don't have to maintain the Lucene index from within my code. The code of this app does not lend itself to easily keeping a Lucene index updated when things are added, changed or deleted. But the database is good source of truth. I can live with a small delay on having the index up to date. But basically I would like define for each business model what fields are part of the index and what the id is, and then be able to query with that index from the C# server side code of my Web App.
Is such a scenario even possible or am I asking too much?
It's totally possible, but not out of the box. You have to implement it if you want it. Fundamentally you need to implement three things.
A way to know every time a piece of relevant data in the sql database changes
A place to capture information about that change, call it a change log.
A routine that reads the change log, applies those changes to the
LuceneNet index and than marks the record in the change log has processed.
There are of course lots of different ways to handle each of these.
This SO answer Lucene.Net index updates, when a manual change is done in SQL Database provides more details on one way this can be accomplished.
I am building a Wiki / Blog similar application, and I have a question about the best way to store the View Count for each of the articles. The requirement is that I only want to store the unique number of users that viewed the article and not the total view count. So far I have come up with 3 different ways to accomplish:
1. SQL Server stored procedure: the problem with this approach is that the data is stored in XML data type and it might be a bit complicated to achieve the requirement using this method. I am leaving this as a last resort.
2. MSMQ: this would work great, since I can process the requests serially. The only problem with this approach is that, I cannot ensure that MSMQ is installed on the host server. This one is out of the question!
3. Using Application.Lock(): I know that using this method I can lock access to the Application object, update some entry in the application, update the database, and then call Application.Unlock(). While this sounds as a functional approach, it still feels like a workaround.
Does anyone has a suggestion on what I should do to achieve the requirement?
MsMQ and Application.Lock are def not the options to consider for something simple you want to do. (Application.Lock() is a def NO GO)
I also see no reason for XML. A stored proc does not rely on XML
Create a table
[page,userip]
on every view of the page
insert into <table>(page,userip) values(#page,#userip)
For the statistics just issue the a query
select count(*) from <table> group by userip having page=#page
This identifies a user on its IP, not completely failsafe as multiple users can come from the same ip.
But why not investigate google Analytics? All the info you need (and more)
I am trying to use MongoDB, C# and NoRM to work on some sample projects, but at this point I'm having a much harder time wrapping my head around the data model. With RDBMS's related data is no problem. In MongoDB, however, I'm having a difficult time deciding what to do with them.
Let's use StackOverflow as an example... I have no problem understanding that the majority of data on a question page should be included in one document. Title, question text, revisions, comments... all good in one document object.
Where I start to get hazy is on the question of user data like username, avatar, reputation (which changes especially often)... Do you denormalize and update thousands of document records every time there is a user change or do you somehow link the data together?
What is the most efficient way to accomplish a user relationship without causing tons of queries to happen on each page load? I noticed the DbReference<T> type in NoRM, but haven't found a great way to use it yet. What if I have nullable optional relationships?
Thanks for your insight!
The balance that I have found is using SQL as the normalized database and Mongo as the denormalized copy. I use a ESB to keep them in sync with each other. I use a concept that I call "prepared documents" and "stored documents". Stored documents are data that is only kept in mongo. Useful for data that isn't relational. The prepared documents contain data that can be rebuilt using the data within the normalized database. They act as living caches in a way - they can be rebuilt from scratch if the data ever falls out of sync (in complicated documents this is an expensive process because these documents require many queries to be rebuilt). They can also be updated one field at a time. This is where the service bus comes in. It responds to events sent after the normalized database has been updated and then updates the relevant mongo prepared documents.
Use each database to their strengths. Allow SQL to be the write database that ensures data integrity. Let Mongo be the read-only database that is blazing fast and can contain sub-documents so that you need less queries.
** EDIT **
I just re-read your question and realized what you were actually asking for. I'm leaving my original answer in case its helpful at all.
The way I would handle the Stackoverflow example you gave is to store the user id in each comment. You would load up the post which would have all of the comments in it. Thats one query.
You would then traverse the comment data and pull out an array of user ids that you need to load. Then load those as a batch query (using the Q.In() query operator). Thats two queries total. You would then need to merge the data together into a final form. There is a balance that you need to strike between when to do it like this and when to use something like an ESB to manually update each document. Use what works best for each individual scenario of your data structure.
I think you need to strike a balance.
If I were you, I'd just reference the userid instead of their name/reputation in each post.
Unlike a RDBMS though, you would opt to have comments embedded in the document.
Why you want to avoid denormalization and updating 'thousands of document records'? Mongodb db designed for denormalization. Stackoverlow handle millions of different data in background. And some data can be stale for some short period and it's okay.
So main idea of above said is that you should have denormalized documents in order to fast display them at ui.
You can't query by referenced document, in any way you need denormalization.
Also i suggest have a look into cqrs architecture.
Try to investigate cqrs and event sourcing architecture. This will allow you to update all this data by queue.
I'm working on some stuff for an in-house CRM. The company's current frontend allows for lots of duplicates. I'm trying to stop end-users from putting in the same person because they searched for 'Bill Johnson' and not 'William Johnson.' So the user will put in some information about their new customer and we'll find the similar names (including fuzzy names) and match them against what is already in our database and ask if they meant those things... Does such a database or technology exist?
I implemented such a functionality on one website. I use double_metaphone() + levenstein() in PHP. I precalculate a double_metaphone() for each entry in the dabatase, which I lookup using a SELECT of the first x chars of the 'metaphoned' searched term.
Then I sort the returned result according to their levenstein distance. double_metaphone() is not part of any PHP library (last time I checked), so I borrowed a PHP implementation I found somewhere a long while ago on the net (site no longer on line). I should post it somewhere I suppose.
EDIT: The website is still in archive.org:
http://web.archive.org/web/20080728063208/http://swoodbridge.com/DoubleMetaPhone/
or Google cache:
http://webcache.googleusercontent.com/search?q=cache:Tr9taWl9hMIJ:swoodbridge.com/DoubleMetaPhone/+Stephen+Woodbridge+double_metaphon
which leads to many other useful links with source code for double_metaphone(), including one in Javascript on github: http://github.com/maritz/js-double-metaphone
EDIT: Went through my old code, and here are roughly the steps of what I do, pseudo coded to keep it clear:
1) Precompute a double_metaphone() for every word in the database, i.e., $word='blahblah'; $soundslike=double_metaphone($word);
2) At lookup time, $word is fuzzy-searched against the database: $soundslike = double_metaphone($word)
4) SELECT * FROM table WHERE soundlike LIKE $soundlike (if you have levenstein stored as a procedure, much better: SELECT * FROM table WHERE levenstein(soundlike,$soundlike) < mythreshold ORDER BY levenstein(word,$word) ASC LIMIT ... etc.
It has worked well for me, although I can't use a stored procedure, since I have no control over the server and it's using MySQL 4.20 or something.
I asked a similar question once. Name Hypocorism List I never did get around to doing anything with it but the problem has come up again at work so I might write and open source a library in .net for doing some matching.
Update:
I ported the perl module I mentioned there to C# and put it up on github. http://github.com/stimms/Nicknames
Implement the Levenshtein distance:
http://en.wikipedia.org/wiki/Levenshtein_distance
This can be written as a SQL Function and queried many different ways.
Well SSIS has some fuzzy logic tasks we use to find duplicates after the fact.
I think though you need to have your logic look at more than just the name for best results. If they are putting in address, email or phone information, perhaps you could look for people with the same last name with one or more of those other matches and ask if one of them will do. You could also make a table of nicknames for various names and match on that. You won't get all of them, but you could get some of the most common in your country at least.
You can use SOUNDEX to get similar sounding names. However, it won't match with William and Bill for example.
Try this in SQL as an example.
SELECT SOUNDEX('John'), SOUNDEX('Jon')
There is some built-in SOUNDS LIKE functionality in SQL Server, see SOUNDEX http://msdn.microsoft.com/en-us/library/aa259235%28SQL.80%29.aspx
As for full / nickname searching there isn't anything built it that I am aware of. Nicknames vary by region and it's a lot of information to keep track of. There might be a database linking full names to nicknames that you could leverage in your own application.
In a desktop application, I need to store a 'database' of patient names with simple information, which can later be searched through. I'd expect on average around 1,000 patients total. Each patient will have to be linked to test results as well, although these can/will be stored seperately from the patients themselves.
Is a database the best solution for this, or overkill? In general, we'll only be searching based on a patient's first/last name, or ID numbers. All data will be stored with the application, and not shared outside of it.
Any suggestions on the best method for keeping all such data organized? The method for storing the separate test data is what seems to stump me when not using databases, while keeping it linked to the patient.
Off the top of my head, given a List<Patient>, I can imagine several LINQ commands to make searching a breeze, although with a list of 1,000 - 10,000 patients, I'm unsure if there's any performance concerns.
Use a database. Mainly because what you expect and what you get (especially over the long term) tend be two totally different things.
This is completely unrelated to your question on a technical level, but are you doing this for a company in the United States? What kind of patient data are you storing?
Have you looked into HIPAA requirements and checked to see if you're a covered entity? Be sure that you're complying with all legal regulations and requirements!
I think 1000 is to much to try to store in XML. I'd go with a simple db type, like access or Sqlite. Yes, as a matter of fact, I'd probably use Sqlite. Sql Server Express is probably overkill for it. http://sqlite.phxsoftware.com/ is the .net provider.
I would recommend a database. You can use SQL Server Express for something like that. Trying to use XML or something similar would probably get out of hand with that many rows.
For smaller databases/apps like this I've yet to notice any performance hits from using LINQ to SQL or Entity Framework.
I would use SQL Server Express because it has the best tool support (IDE integration) from Microsoft. I don't see any reason to consider it overkill.
Here's an article on how to embed it directly in your application (no separate installation needed).
If you had read-only files provided by another party in some kind of standard format which were meant to be used by the application, then I would consider simply indexing them according to your use cases and running your searches and UI against that. But that's still some customized work.
Relational databases are great for storing data in tables, and for representing the relationships between tables. Typically there are also good tools for getting the data in and out.
There are other systems you could use to store your data, but none which would so quickly be mapped to your input (you didn't mention how your data would get into this system) and then be queryable against with least effort.
Now, which database to choose...
Use Database...but maybe just SQLite, instead of a fully fledged database like MS SQL (Express).