Should Elasticsearch be Installed on the Same Server as Site?

Should Elasticsearch be Installed on the Same Server as Site? - c#

I'm developing an ASP.NET MVC 4 commerce site but since my background is in front-end web dev with HTML/CSS/Javascript and C# with XNA all this server and database business is doing my head in.
At the moment the site uses an elasticsearch Index with NEST to perform searches for products since the products db is huge and I like the smart queries elasticsearch offers. And that all works great on my local testing environment.
My question is: is it generally a good idea to have your elasticsearch client and indices stored on a separate host from the actual site or is it OK to have it on the same server?
I understand there's the issue of space at play here, but I've heard elasticsearch queries also tie up server resources that could be better spent handling other tasks like the impending flood of payments that are sure to come through?

It really depends on your load, your data and your current server. How many users do you have on your website? How big is your index? How powerful is your current server?
It's usually best practice to put elasticsearch on a separate machine, even more than one in order to talke advantage of its distributed features. With two machines you can for example distribute your shards over them and configure one replica (default value), so that every machine contains a whole copy of the data. And if the load increases you can always add new nodes to the cluster, as long as you allocated enough shards (it's common practice to over-allocate shards a little).
On the other hand, I've also used it embedded in a Java application, as you can read in this article I recently wrote.

Related

How to develop in MVC (C#) with existing database

I'm building an application that will use a large database that's currently hosted on Azure SQL. I also want to use ASP.net Identity. Additionally, my local machine cannot connect to the Azure SQL database due to security restrictions (I can't remove these, they are corporate IT policies).
When developing, do either of the following make sense? Or is there another option that I'm unaware of?
Add the fields from the large database, and maybe a few rows of sample data, to my localdb that's being used by default by Visual Studio? If I do this, how do I migrate over to the existing Azure database when it's time to go live?
Host the development application on Azure. This wouldn't be ideal, given that I'd need to upload the application with every change.

You could do that for small scale testing and demonstration purposes yes. Essentially to interact with the database in ASP you create an instance of the database with the reference link to the local one. Providing they are identical, you could simply just change the link to the company database when it’s time to go live. You should be careful however as working with relatively small datasets means everything will run relatively smoothly and quickly but if your coding is sloppy, it could slow the entire thing down with big data sets.
As for developing, I would personally develop on a small scale yourself locally until you’re happy with the result. However, before you do a full scale launch, I would do a pilot launch in a small section to highlight any bugs you may have pushed and halt this on azure. Then after you’ve ruled out the obvious bugs, you’ve got a much safer launch.

to work in an develop-release separated environment:
you need a intranet copy of remote database first, then use code first approach to continue working,
reverse your database to code-first:
https://learn.microsoft.com/en-us/ef/core/get-started/aspnetcore/existing-db
https://cmatskas.com/scaffolding-dbcontext-and-models-with-entityframework-core-2-0-and-the-cli/
https://wildermuth.com/2017/12/20/Reverse-Engineering-Existing-Databases-in-Entity-Framework-Core-2
Enable database migrate: https://msdn.microsoft.com/en-us/library/dn579398(v=vs.113).aspx
Add identity framework to intranet database with code first: https://learn.microsoft.com/en-us/aspnet/identity/overview/getting-started/adding-aspnet-identity-to-an-empty-or-existing-web-forms-project
carefully maintenance migration code in later tasks, remote database will be auto-updated after your code is released

Connecting to an horizontally scaled SQL database in different scenarios

I have built a web service in C#/WebApi2. It is completely REST based, and scales horizontally very easily with a load balancer in front of it since it has no state itself.
However, I'm looking for info/solution on how to handle the database scalability, and I would like to start without focusing on any particular technology, more specific, I would like to use Dapper ORM In combination with multiple DB's if possible.
For example, I can connect to a PostgreSQL using Dapper and the NGPSQL ADO.NET driver, but, are there components which handle the case of having one master PGSQL database and four slaves to read from? Are there already C# components that handle these situations, where you can have connections to all of these DB's and depending on the operation it chooses either the master in case of write actions or slave in case of read and load balances over the slaves (since the number of reads will be significantly higher than the writes, this would be a fairly good solution).
What if I have a master - master situation? And what about similar situations with other DB's such as MS SQL with AlwaysOn for example, or MySQL cluster and it's variations? Is there any components to handle this kind of thing, and if not, does anybody have any pointers on documentation/lectures/blogs/tutorials on this topic. I cannot imagine I'm the first one to encounter this, and writing a completely custom made connection pool might be just re-inventing the wheel...
I know it is a general question, but I have the feeling there should have been done work regarding this topic, I just can find it. I know in cloud scenarios, Azure and AWS, you have solutions for this a specific load balancers, but, I would need this for an on-premise solution as well. Any info would be appreciated.

One way to scale a database horizontally is to split your database into multiple databases - each having different set of data. Something like this:
Meta database (that has info on user, etc)
- Database 1 (has data for first 100000 users)
- Database 2 (has data for next 100000 users)
- Database 3 (has data for next 100000 users)
Your API requests would route the query to the respective database based on info from Meta database.
This provides for scalability but not availability. Many multi-tenant SAAS apps use this structure.
Some references:
http://jamesgolick.com/2010/3/30/what-does-scalable-database-mean.html
https://developer.salesforce.com/page/Multi_Tenant_Architecture

integration between mvc4 application and mongo db

I am currently working on asp.net mvc4 web application. Part of the application, users can log in and browse the site etc. The data for the site is stored in a sql server database, contains users information etc.
A new feature to the site will be for all users to add comments to particular products shown on the site. As there could be hundreds of thousands of customers and thousands of products, this is alot of data.
So I have started looking at a NoSql option for this data and not store it in the relational sql server database. I have been looking at Mongo Db. My first question, is this a correct approach I am taking?
Next topic, how easily does c#/.net integrate with a mongo database. I havent worked with this before so my knowledge in the area is poor. Ideally, I would be querying (for the want of the correct term) the mongo db for comments based on a particular products identifier. I presume I can write a query style to get this data.
My next question is around the redundancy of a mongo db. With sql server, I have a fail over server if an issue occurs with the main db server. Is there a similar concept with mongo or how does it work? My consideration is for mongo to run on the same server as the sql server database. The data in the mongo db will not be mission critical, but the data in sql server is. My web application will run on multiple servers in a load balanced environment.
Can a mongo db be easily moved to another server? ie. how well can it be scaled out. Even can data from it be copied to another mongo db?
I appreciate my questions are of a beginner standard but I am currently researching the topic so assistance would be great.

Sql server should suffice for housing comments as long as you have some caching configured. The good thing about Sql Server is the data integrity of the foreign keys as well as the querying power.
However, working with Mongo in C# is not a huge deal. There is a slight learning curve, but this is with learning any new technology.
Connecting and Using MongoDB
MongoDB has official drivers and NuGet packages for you to use. http://www.mongodb.org/display/DOCS/CSharp+Language+Center for more information there.
Redundancy
Mongo supports replica sets where your second server would mimic all the data from the first server. Information on setting this up can be found here: http://docs.mongodb.org/manual/tutorial/deploy-replica-set/ It should be noted though that querying is a bit different in MongoDB than Sql Server.
Now I personally use mongoDB in one of my enterprise applications, but I would say as a rule of thumb: If you don't absolutely need to use it you would probably be better off sticking with one database engine. Mostly so that you only have to manage one database engine. Just my opinion though. Maybe redis for caching?

If you have not hardware memory problem(you can buy a lots of memory , you will need) Mongo can be your solution.

the thing is in mongodb design you will do a kind of denormalization...
and in my opinion hundreds of thousands user case your sql server is enough... do some more denormalizations in your db design and try implementing good cache design....
you say you are new to mongodb... so there is going to be a learning curve...
put more rams and cpus till you will have millions users...
to feel safe with mongodb you are going to need at least 3 servers
please also check this link
is this the optimal minimum setup for mongodb to allow for sharding/scaling?

try this
MVC Application With MongoDB - Part 1
MVC Application With MongoDB - Part 2
Getting Started With MongoDB in ASP.Net MVC4

.NET Data Storage - Database vs single file

I have a C# application that allows one user to enter information about customers and job sites. The information is very basic.
Customer: Name, number, address, email, associated job site.
Job Site: Name, location.
Here are my specs I need for this program.
No limit on amount of data entered.
Single user per application. No concurrent activity or multiple users.
Allow user entries/data to be exported to an external file that can be easily shared between applications/users.
Allows for user queries to display customers based on different combinations of customer information/job site information.
The data will never be viewed or manipulated outside of the application.
The program will be running almost always, minimized to the task bar.
Startup time is not very important, however I would like the queries to be considerably fast.
This all seems to point me towards a database, but a very lightweight one. However I also need it to have no limitations as far as data storage. If you agree I should use a database, please let me know what would be best suited for my needs. If you don't think I should use a database, please make some other suggestions on what you think would be best.

My suggestion would be to use SQLite. You can find it here: http://sqlite.org/. And you can find the C# wrapper version here: http://sqlite.phxsoftware.com/
SQLite is very lightweight and has some pretty powerful stuff for such a lightweight engine. Another option you can look into is Microsoft Access.

You're asking the wrong question again :)
The better question is "how do I build an application that lets me change the data storage implementation?"
If you apply the repository pattern and properly interface it you can build interchangable persistence layers. So you could start with one implementation and change it as-needed wihtout needing to re-engineer the business or application layers.
Once you have a repository interface you could try implementations in a lot of differnt approaches:
Flat File - You could persist the data as XML, and provided that it's not a lot of data you could store the full contents in-memory (just read the file at startup, write the file at shutdown). With in-memory XML you can get very high throughput without concern for database indexes, etc.
Distributable DB - SQLite or SQL Compact work great; they offer many DB benefits, and require no installation
Local DB - SQL Express is a good middle-ground between a lightweight and full-featured DB. Access, when used carefully, can suffice. The main benefit is that it's included with MS Office (although not installed by default), and some IT groups are more comfortable having Access installed on machines than SQL Express.
Full DB - MySql, SQL Server, PostGreSQL, et al.
Given your specific requirements I would advise you towards an XML-based flat file--with the only condition being that you are OK with the memory-usage of the application directly correlating to the size of the file (since your data is text, even with the weight of XML, this would take a lot of entries to become very large).
Here's the pros/cons--listed by your requirements:
Cons
No limit on amount of data entered.
using in-memory XML would mean your application would not scale. It could easily handle a 10MB data-file, 100MB shouldn't be an issue (unless your system is low on RAM), above that you have to seriously question "can I afford this much memory?".
Pros
Single user per application. No concurrent activity or multiple users.
XML can be read into memory and held by the process (AppDomain, really). It's perfectly suited for single-user scenarios where concurrency is a very narrow concern.
Allow user entries/data to be exported to an external file that can be easily shared between applications/users.
XML is perfect for exporting, and also easy to import to Excel, databases, etc...
Allows for user queries to display customers based on different combinations of customer information/job site information.
Linq-to-XML is your friend :D
The data will never be viewed or manipulated outside of the application.
....then holding it entirely in-memory doesn't cause any issues
The program will be running almost always, minimized to the task bar.
so loading the XML at startup, and writing at shutdown will be acceptible (if the file is very large it could take a while)
Startup time is not very important, however I would like the queries to be considerably fast
Reading the XML would be relatively slow at startup; but when it's loaded in-memory it will be hard to beat. Any given DB will require that the DB engine be started, that interop/cross-process/cross-network calls be made, that the results be loaded from disk (if not cached by the engine), etc...

It sounds to me like a database is 100% what you need. It offers both the data storage, data retrieval (including queries) and the ability to export data to a standard format (either direct from the database, or through your application.)
For a light database, I suggest SQLite (pronounced 'SQL Lite' ;) ). You can google for tutorials on how to set it up, and then how to interface with it via your C# code. I also found a reference to this C# wrapper for SQLite, which may be able to do much of the work for you!

How about SQLite? It sounds like it is a good fit for your application.
You can use System.Data.SQLite as the .NET wrapper.

You can get SQL Server Express for free. I would say the question is not so much why should you use a database, more why shouldn't you? This type of problem is exactly what databases are for, and SQL Server is a very powerful and widely used database, so if you are going to go for some other solution you need to provide a good reason why you wouldn't go with a database.

A database would be a good fit. SQLite is good as others have mentioned.
You could also use a local instance of SQL Server Express to take advantage of improved integration with other pieces of the Microsoft development stack (since you mention C#).
A third option is a document database like Raven which may fit from the sounds of your data.
edit
A fourth option would be to try Lightswitch when the beta comes out in a few days. (8-23-2010)
/edit
There is always going to be a limitation on data storage (the empty space of the hard disk). According to wikipedia, SQL Express is limited to 10 GB for SQL Server Express 2008 R2

Storing Images in DB - Networked Desktop Applications

Related:
Storing Images in DB - Yea or Nay?
After reading the above question, it seems the preferred method for image storage with databases is to store only the filepath within the database. However, most of these answers seem to focus on web servers.
In my case, I'm developing a desktop application that will be used across multiple computers within an intranet. A dedicated server will host the database, containing information related to performing tests on various equipment.
Images need to be stored on the server in some way. Would storing the images in the database be the correct approach in this case, or even the only approach?
Pros:
Backup is limited to only the database.
No need to open up the server's file system to the network.
Single protocol for server information access.
Protected file access. (User can't go in and delete all the images)
Cons
Performance issues in future if there's too many images.
Edit: As stated in the tags, the application is being written in C#/.NET. If writing the images to the file system is an option in this case, I could use some help understanding how this is done.
Edit 2: As elaborated some in the comments below, for now I'm assuming a MySQL database, although the FileStream capabilities of SQL Server 2008 could potentially change that.
Also in my case, images will be added often, and can be considered read-only after this point since they should never be changed, and will just be read out when needed. Images will likely be small (~70k each), and I'm also considering some other binary format storage on the server, files which are ~20k each which I can likely apply the same approach for storing and retrieving.

I'd suggest keeping those files on disk in the file system, rather than in the database. File system for files, databases for relational data, etc.
Deliver by Web Service
Consider delivering those images to your desktop app by hosting a web service/app on that DB machine. That app's job it is to serve only images. Setup a web server on that machine with an ASP.NET application. Have an .ashx handle requests and stream the binary image. Something like this:
http://myserver/myapp/GetImage.ashx?CustomerID=123&ImageID=456
Security
If intranet security is an issue, this would be the point where you could ensure that the user is authenticated and authorized for read access to the image. Audit trails could be implemented here as well.
File System Security
Regarding security on those images, consider that NTFS gives you a lot of measures to ensure that only those who are authorized can read/delete/put files as required. The task then would be to define those roles and implement Windows security groups.
Future Needs
This approach allows you to securely consume those images from anywhere on the intranet. Perhaps this app would be migrated to a web application at some point? Perhaps a feature request comes from the customer where a web solution is appropriate?
This might sound like overkill rather than reading a blob from the database, but it's great from a security perspective. Consider your customers' and patients' expectations on privacy and security.
<%# WebHandler Language="C#" Class="Handler" %>
public class Handler : IHttpHandler {
public void ProcessRequest (HttpContext context)
{
//go to the DB and get the path for this ID.
string filePath = GetImagePath(context.Request.QueryString["ImageID"]);
//now you have the path on disk; read the file
byte[] imgBytes=GetBytesFromDisk(filePath);
// send back as byte[]
context.Response.BinaryWrite(imgBytes);
}

I think the answer is that there is no right answer. As with most things in programming (and life), It DEPENDS.
Here are some Pros and Cons of storing in DB:
PROS
Easy backup, management and one stop shop for data in your application
Less dependencies in your app and fewer moving parts. KISS Principle
Works fine on small files under 1GB.
Hey its a DB, so saves can be done inside transactions and rolled back if there are network problems
Sharepoint and TFS store everything in the DB and work just fine. even the big boys do it
Security can be easily controlled by the app and not involve file/folder permissions
Cons
Eats up db space
Potentially effect performance if not done right
Not such a great idea if always storing large files (>1GB) unless using Filestream in SQL Server 2k8
Requires you to implement a decent caching strategy (although you would probably want this anyways)
File system feels more natural than DB and easier for manually replacing/viewing files.
I guess when it comes to your situation, I would lean towards the simplicity of storing in the DB.

From an architecture perspective, you'll get the best performance by splitting the solution into two pieces: a database server, and an image server.
You would do this both in order to keep row sizes small, and also to separate your transactional environment from content. Relational databases in the vein of SQL Server and mysql will support big BLOBs but aren't optimized for them.
Most people equate "image server" to "web server" because they work on web applications and therefore have a de facto image repository (a directory on a local disk). However, this does not have to be the case. Images can be served from any location over any protocol.
You mentioned a C#/.NET platform and an intranet. Can we assume a Windows environment, possibly Active Directory?
If so, a plain vanilla file server could be your image server. Set up a file share, set read/create (but not modify/delete) permissions on it for all users of this app, store the UNC path somewhere in the database (so you don't have to redeploy the app if you decide to relocate it), and have your client application generate a unique, relative path using something reliable like a Guid.
It's not as elegant as a web service (which is my preferred approach), nor quite as maintenance-free as the pure-database approach, but my impression of this topic is that you're on a tight budget with a short delivery deadline, and a Windows or NFS file server is cheaper, easier, and faster to set up and maintain (including backups) than a full-fledged web server, so it might be just what you're looking for here.
Most businesses already have a file server, so usually this won't require any new infrastructure whatsoever. But even if you don't, I've seen file servers run off old reconditioned workstations - it's not fancy, but in a low-traffic environment it gets the job done.
If you choose this approach, I would suggest some kind of directory structure on the file share to simplify backups, archiving, etc. For example:
\\ImageServer\MyAppRepository\yyyy-mm\{image-file-name-or-guid}.{ext}.
Hope that helps.

How many images are we talking? Are they unique/updated frequently? If not can you package the images with the client that you are going distribute to multiple computers?
Personally, I would avoid storing images in the database, and instead as you said store the file paths.
If you have read through all of the other similar questions (This, this, and this) but are still asking if this is a good idea, then maybe your problem is different enough that this would be a good idea.

My company developed a Windows forms c# application that stores images in a database and it worked out pretty well. We have been actively using it since 2003 and have about 150 gigs of data in the system.
First, let me say that this is NOT the optimal performance architecture. We have had some problems with keeping the database statistics up to date and keeping the indexes tuned correctly. We basically have to re-index the system monthly. You need to be aware that the built-in optimization system of most RDBMS servers is not set up for large collections of binary objects.
The reason we chose to put the images in the database is because of database level replication. Our system is spread across seven offices in five states and I needed to sync the data to each site. So, I pinned up a VPN between each site and our corporate office and set up SQL merge replication on the database. In this way, I can sync the data and images at the same time with only one channel open between offices.
So, I would say that images in the database is not the optimal solution in most cases but it worked out for our requirements.

I don't think it matters where the images are stored. Pick the simplest approach that will work. But you should have an architecture where you can change the approach if it proves to be the wrong one.
To accomplish this, I would put the data and the image storage both behind a web services interface. Pick a technology - doesn't matter. All access to the data (and images) would be the same way - through the web service.
By doing this, you have decoupled where the data is stored from the desktop application. The desktop app doesn't care. All it knows is that the server at a certain address can get it the data.
Then store the data and the images wherever you want. Choose the simplest thing for you. If you end up having issues, then (and only then) should you add additional complexity in order to solve the problem. The good news is that the additional complexity and work shouldn't affect the desktop applications at all. You can make the changes on the server without having to deploy a new version of the desktop applications.

If you're looking for alternatives, one of my favorites is a ten-line HTTP POST file upload handler (PHP, .NET, Java, etc.) + one webserver. When the script validates max file size, and possibly extracts the width & height, it inserts a row into the database. Retrieval need not go through the script. Standard file hosting will work. This would require you to open port 80. You needn't complicate this with SOAP or anything. A regular upload handler would do the job.
Then there's WebDAV, along the same lines. Of course, with this method, you'd have to monitor the filesystem and adjust the database accordingly. You could use a polling service or hook into file system events. Actually, you could also inject an ISAPI filter or Apache handler to perform the database updates.
You could use FTP. Add an extension to ProFTPd that will update the database and keep everything in sync.
Lots of ways to avoid putting image data into tables.
If you opt for the database solution, just be sure to segment your BLOBs into separate tables. Separate table spaces / devices / partitions, if you can. Or, use Oracle and ignore everything I've said.

Use Amazon S3 storage for your images
Just store the GUID or other file name in the DB
Amazon is simple , fast, cheap. secure etc etc
It scales fine, and optionally provides CDN like edge services directly from S3
Storing images in the DB always seems to turn into a nightmare over time

It seems to me that what you want to do something like what Infovark do.
They use Firebird for this and I'll give you a link on Firebird and storing image

you should try MS SQl 2008, it comes with a Type: FileStream, which automatically store blob in file system.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.