My application have a lot of logs. Currently part of this logs I write into azure blob file, part of them into db.
I need to use only file in blob or on machine. I need to have possibility querying and filtering rows of this file to find that I need. I don't need to use existing logs, I can create any predefined structure.
The reason why I don't want to use db is the cost of db, given the rapid growth in db size.
So, what is the best way to implement this?
I will be glad for any suggestions.
Depending on what and how you are logging, application logs are typically written using the System.Diagnostic.Trace class. The log level and storage (file or blob) can be configured through the portal. Read more about that here.
Related
Is it possible to have a Redis server running on two machines and each server specifies in the config file the same snapshot dump file name and directory, with the directory and file obviously being shared between both machines?
RavenDB seems to work fine with that, I can setup the whole server file directory on a Dropbox folder on my machine and do the same on the other machine with the two drop boxes syncing while the RavenDb servers read and write data from/to the database that is stored within the drop box folder.
I understand both DBs' concepts are very different, I just use the RavenDB experience as example to explain what I try to accomplish. Please note this is just for developing purposes not to run in production.
I am running Redis in Version 2.4.5 as a Windows service and use BookSleeve as client within C# .Net 4.5
Thanks
Most certainly not. This would be a sure way to ensure a corrupt file.
You might want to watch progress on Redis Cluster (http://redis.io/topics/cluster-spec), currently at the specification stage.
The only time you would use the dump file on a system which does not have persistence enabled is on boot time. However, if persistence is disabled it doesn't read from the dump file.
Even without server specific data on the dump file the possibility for corruption comes at any and every point when both services write to the file. You could set the persistence settings to only save if there have been, say, 59 million changes in 60 seconds. This could allow you to read the file on load but not save to it. You would then need to use
redis config set save ""
To disable saving in both but be able to save when you want, you would do the above and issue a by save command.
I also have to advise against doing this over a shared file system, which is what you'll need to do this with multiple machines accessing the same file. In your case you are talking about Dropbox as your shared file system, but this is likely to kill performance if you are persisting to disk.
But ultimately, I'd have to ask why you think you need this?
If you are using one for read only, then use a slave or two and do reads on the slaves. This way you don't have to worry about multiple instances corrupting a persistence file. This avoid the need for shared storage as you have two nodes running each with a copy of the data. This provides redundancy and you can relatively easily work a master/slave failover setup.
Ultimately, if you are just using it to develop something against, I don't see the need for such a setup. Just store configuration where you can download it (Dropbox, github, etc) and develop away. It isn't difficult, and certainly less complicated, to simply copy your dump file to Dropbox or anywhere else you need it than to do what you describe.
We have an application on the web that must allow the user to upload files with zip codes, these files are .csv's files. Any user will be able to upload the file from their computer, the issue is that the file may contain thousands of records. Right now i am getting the file, making sure it has the right headers but I am pushing the records one by one into the database.
I am using c# asp.net, is there a better way to do this?, more efficient from the code?. We cant use any external importers or data importers or tools like sql server business intelligence. How can I do this?, i was reading something about putting it in memory and then push it to the database?. Any urls, examples or suggestions would be much appreciated.
Regards
Firstly, I'm pretty sure that what you are asking is actually "How do you process a large file and insert the processed data into the database?".
Now assuming I am correct I would say the question is akin to 'how long is a piece of string?'. The reality is that an implementation for processing large files into a database is highly specific to your requirements.
However, at the simplest end of the spectrum you could simply upload the file straight into a table (or folder) and create a windows service that runs every x minutes, traverses through the table, picks each file and processes your data using bulk inserts and the prepare method (which may give you some performance benefits).
Alternatively you could look at something like MSMQ (Microsoft Message Queuing) and save any uploaded files direct to a queue which is then completely independent of your application and can be processed at any point in time along with easily scaled out.
At the end of the day though, honestly I don't think anyone here can give you a 'correct' answer to your question cause there really isn't one and you'll only be able to find improvements to your implementation by experimentation.
if this contains up to a million record, best to do this is to create a service to manage the inserting of records into the database to avoid timeout and prevent the web iis stress.
if you make it a windows service you can notify the service to process the zip files in certain directory where it was uploaded.
also, i would suggest to use bulk insert for more faster database transactions.
if there are validation you can probably stage the data into a different database and validate the data then push to the final database.
Since these records are in the same table and would then not be related to each other, Parallel.ForEach may be a valid answer here. Assuming you have a static method (may not necessarily need to be static) that inserts an individual record into the db, you can run Parallel.ForEach loop over an array where each index of the array represents a line of the CSV.
This assumes that uploading the large file to the server isn't the initial issue. If that is also part of the issue I would reccomend zipping the file and then using something like SharpZipLib to unzip it once it is uploaded. Since text compresses very well this may be the biggest boon to performance from the user's perspective.
Problem:
I have multiple instances of the same C# application running on different PCs (OS: Windows XP, Windows 7) in the same LAN. I have to share some configuration data among them. Each process must have read-write access to the data. My employer insists on storing these shared data in a file, which is in a shared directory on one of these PCs.
Possible solutions:
Exclusive file opening: The data is stored in a TXT file (serialization to and from a binary file is also an option). Each process uses File.Open with FileShare.None when trying to open the file. Getting an IOException means that the file is already in use, so the process has to wait and try again later.
SQL Server CE embedded DB: The data is stored in an SDF file. The engine can handle at most 256 simultaneous connections (v3.5 SP2), which is more than enough.
SQLite embedded DB: The data is stored in an SQLite DB file. The documentation says SQLite works, but may be unreliable when used on a network share.
Other?
What is the preferred way to do this?
Don't know if is the best way, but I've done this in C ages ago, it was working well for me.
Each process will read and create a personal copy of the file and then work on that.
At a fixed moment (upon process termination or triggered via some UI or whatever you feel like) each process will send its copy of the file to a master process in charge of rebuilding the original file in the shared directory and signaling the other process that they need to reload.
Each process reloads the file (containing infos coming from all the other processes).
Of course this solution requires that the file writing process has knowledge on how to rebuild the file and how to resolve conflicts (but this depends on data format)
You don't really describe the type of data you're working with so I'd say the answer varies.
Using a proper DBMS for this would be best if the data you are working with could generally be considered record/field oriented (and under rare circumstance even if it isn't). In this case I would recommend MSSQL CE since its runtime will mitigate multi-user issues for you.
SQLite was generally considered a single user/application database (at least back when I used it in C) though things could have changed in the last 5 years. If you're using .NET 4 then there are few free adapters available for use from what I've found unless you're comfortable with a mixed framework application.
I would only monitor the file locking manually if you're in a situation where the data is pretty flat by design (like a log file), though if it was log like data I would probably look into how some of the open source log libraries do it. You basically said you have control over the data structure so I'd suggest redesigning the data to be more normalized/rigid to avoid using this solution.
Create a web service and make your programs pull the configuration from there. You can control file locking from inside the web service and not have to deal with that at the program level. This also affords you the abstraction that if you decide to change how the settings are stored (e.g. move them from a file to a database) you can do this without having to make any changes to your program.
Sometimes I need to set some string values from code, for example:
Page.Title = "This is a test page.";
or
lblSupportInfo.Text = "Please contact xxx-xxx-xxxx for support.";
These are just examples of data that can change anytime. Is it better stored in the settings file in application scope? What are other options.
It would be better to be able to change this by updating a configuration file rather than a code release (Resources).
How do other people handle this? If there are too many of them, the web.config can be very long.
I prefer to store such items in a database, as updates are a LOT easier than updating a .config file, or code. Most commercial software that I've worked with does the same.
If a DB isn't an option, then the .Config files or some other text file will do. Even an XML file would work as a viable option. With a separate XML file or text file, you also avoid the hassle of losing the session state of your users, which happens if you update the .config file and are using the standard in-proc session management.
In one of our applications, we use an XML document that was created by filling a Dataset from a database, and then using the Dataset's WriteXml() function to preserve the data as a file that can be deployed with the application. This is specifically intended for a group of people who are on the road, and can't always connect to the server to get the most recent data. The data is used to populate a survey form for secret shoppers. Their results are also saved in teh same way, and serialized on the laptop. When they connect to the corporate network, the results are uploaded via a web service, and processed into the results tables in the database.
We use a Key Value Pair in the Database. We load that key value pair into the Cache and then create Static Class for getting the values in Code on demand and easily. We make an admin page for clearing them out of cache when needed and the code recognizes that they have been clear and relaods them into cache on demand. This can be expanded to deal with other languages and be plugged into all other models as needed, however I would imagine you want want to only use it the Presentation layer. You can key you custom exceptions and then catch them in the presentation layer and that key will correspond to the appropriate message. This would give a Robust environment with lots of potential for growth. Hope this Helps
i did this in asp.net 2.0, so things might have changed since then, but the best way to do this is to store it in the database, and then use the asp.net resource provider to load the values.
So to your application it would look like its loading from a resource file, but it would from the database, and you get all the nice compile time benefits and built in asp.net tooling for resources and you can change your locale if you need to.
this article was the inspiration for the solution http://msdn.microsoft.com/en-us/library/aa905797.aspx
Related:
Storing Images in DB - Yea or Nay?
After reading the above question, it seems the preferred method for image storage with databases is to store only the filepath within the database. However, most of these answers seem to focus on web servers.
In my case, I'm developing a desktop application that will be used across multiple computers within an intranet. A dedicated server will host the database, containing information related to performing tests on various equipment.
Images need to be stored on the server in some way. Would storing the images in the database be the correct approach in this case, or even the only approach?
Pros:
Backup is limited to only the database.
No need to open up the server's file system to the network.
Single protocol for server information access.
Protected file access. (User can't go in and delete all the images)
Cons
Performance issues in future if there's too many images.
Edit: As stated in the tags, the application is being written in C#/.NET. If writing the images to the file system is an option in this case, I could use some help understanding how this is done.
Edit 2: As elaborated some in the comments below, for now I'm assuming a MySQL database, although the FileStream capabilities of SQL Server 2008 could potentially change that.
Also in my case, images will be added often, and can be considered read-only after this point since they should never be changed, and will just be read out when needed. Images will likely be small (~70k each), and I'm also considering some other binary format storage on the server, files which are ~20k each which I can likely apply the same approach for storing and retrieving.
I'd suggest keeping those files on disk in the file system, rather than in the database. File system for files, databases for relational data, etc.
Deliver by Web Service
Consider delivering those images to your desktop app by hosting a web service/app on that DB machine. That app's job it is to serve only images. Setup a web server on that machine with an ASP.NET application. Have an .ashx handle requests and stream the binary image. Something like this:
http://myserver/myapp/GetImage.ashx?CustomerID=123&ImageID=456
Security
If intranet security is an issue, this would be the point where you could ensure that the user is authenticated and authorized for read access to the image. Audit trails could be implemented here as well.
File System Security
Regarding security on those images, consider that NTFS gives you a lot of measures to ensure that only those who are authorized can read/delete/put files as required. The task then would be to define those roles and implement Windows security groups.
Future Needs
This approach allows you to securely consume those images from anywhere on the intranet. Perhaps this app would be migrated to a web application at some point? Perhaps a feature request comes from the customer where a web solution is appropriate?
This might sound like overkill rather than reading a blob from the database, but it's great from a security perspective. Consider your customers' and patients' expectations on privacy and security.
<%# WebHandler Language="C#" Class="Handler" %>
public class Handler : IHttpHandler {
public void ProcessRequest (HttpContext context)
{
//go to the DB and get the path for this ID.
string filePath = GetImagePath(context.Request.QueryString["ImageID"]);
//now you have the path on disk; read the file
byte[] imgBytes=GetBytesFromDisk(filePath);
// send back as byte[]
context.Response.BinaryWrite(imgBytes);
}
I think the answer is that there is no right answer. As with most things in programming (and life), It DEPENDS.
Here are some Pros and Cons of storing in DB:
PROS
Easy backup, management and one stop shop for data in your application
Less dependencies in your app and fewer moving parts. KISS Principle
Works fine on small files under 1GB.
Hey its a DB, so saves can be done inside transactions and rolled back if there are network problems
Sharepoint and TFS store everything in the DB and work just fine. even the big boys do it
Security can be easily controlled by the app and not involve file/folder permissions
Cons
Eats up db space
Potentially effect performance if not done right
Not such a great idea if always storing large files (>1GB) unless using Filestream in SQL Server 2k8
Requires you to implement a decent caching strategy (although you would probably want this anyways)
File system feels more natural than DB and easier for manually replacing/viewing files.
I guess when it comes to your situation, I would lean towards the simplicity of storing in the DB.
From an architecture perspective, you'll get the best performance by splitting the solution into two pieces: a database server, and an image server.
You would do this both in order to keep row sizes small, and also to separate your transactional environment from content. Relational databases in the vein of SQL Server and mysql will support big BLOBs but aren't optimized for them.
Most people equate "image server" to "web server" because they work on web applications and therefore have a de facto image repository (a directory on a local disk). However, this does not have to be the case. Images can be served from any location over any protocol.
You mentioned a C#/.NET platform and an intranet. Can we assume a Windows environment, possibly Active Directory?
If so, a plain vanilla file server could be your image server. Set up a file share, set read/create (but not modify/delete) permissions on it for all users of this app, store the UNC path somewhere in the database (so you don't have to redeploy the app if you decide to relocate it), and have your client application generate a unique, relative path using something reliable like a Guid.
It's not as elegant as a web service (which is my preferred approach), nor quite as maintenance-free as the pure-database approach, but my impression of this topic is that you're on a tight budget with a short delivery deadline, and a Windows or NFS file server is cheaper, easier, and faster to set up and maintain (including backups) than a full-fledged web server, so it might be just what you're looking for here.
Most businesses already have a file server, so usually this won't require any new infrastructure whatsoever. But even if you don't, I've seen file servers run off old reconditioned workstations - it's not fancy, but in a low-traffic environment it gets the job done.
If you choose this approach, I would suggest some kind of directory structure on the file share to simplify backups, archiving, etc. For example:
\\ImageServer\MyAppRepository\yyyy-mm\{image-file-name-or-guid}.{ext}.
Hope that helps.
How many images are we talking? Are they unique/updated frequently? If not can you package the images with the client that you are going distribute to multiple computers?
Personally, I would avoid storing images in the database, and instead as you said store the file paths.
If you have read through all of the other similar questions (This, this, and this) but are still asking if this is a good idea, then maybe your problem is different enough that this would be a good idea.
My company developed a Windows forms c# application that stores images in a database and it worked out pretty well. We have been actively using it since 2003 and have about 150 gigs of data in the system.
First, let me say that this is NOT the optimal performance architecture. We have had some problems with keeping the database statistics up to date and keeping the indexes tuned correctly. We basically have to re-index the system monthly. You need to be aware that the built-in optimization system of most RDBMS servers is not set up for large collections of binary objects.
The reason we chose to put the images in the database is because of database level replication. Our system is spread across seven offices in five states and I needed to sync the data to each site. So, I pinned up a VPN between each site and our corporate office and set up SQL merge replication on the database. In this way, I can sync the data and images at the same time with only one channel open between offices.
So, I would say that images in the database is not the optimal solution in most cases but it worked out for our requirements.
I don't think it matters where the images are stored. Pick the simplest approach that will work. But you should have an architecture where you can change the approach if it proves to be the wrong one.
To accomplish this, I would put the data and the image storage both behind a web services interface. Pick a technology - doesn't matter. All access to the data (and images) would be the same way - through the web service.
By doing this, you have decoupled where the data is stored from the desktop application. The desktop app doesn't care. All it knows is that the server at a certain address can get it the data.
Then store the data and the images wherever you want. Choose the simplest thing for you. If you end up having issues, then (and only then) should you add additional complexity in order to solve the problem. The good news is that the additional complexity and work shouldn't affect the desktop applications at all. You can make the changes on the server without having to deploy a new version of the desktop applications.
If you're looking for alternatives, one of my favorites is a ten-line HTTP POST file upload handler (PHP, .NET, Java, etc.) + one webserver. When the script validates max file size, and possibly extracts the width & height, it inserts a row into the database. Retrieval need not go through the script. Standard file hosting will work. This would require you to open port 80. You needn't complicate this with SOAP or anything. A regular upload handler would do the job.
Then there's WebDAV, along the same lines. Of course, with this method, you'd have to monitor the filesystem and adjust the database accordingly. You could use a polling service or hook into file system events. Actually, you could also inject an ISAPI filter or Apache handler to perform the database updates.
You could use FTP. Add an extension to ProFTPd that will update the database and keep everything in sync.
Lots of ways to avoid putting image data into tables.
If you opt for the database solution, just be sure to segment your BLOBs into separate tables. Separate table spaces / devices / partitions, if you can. Or, use Oracle and ignore everything I've said.
Use Amazon S3 storage for your images
Just store the GUID or other file name in the DB
Amazon is simple , fast, cheap. secure etc etc
It scales fine, and optionally provides CDN like edge services directly from S3
Storing images in the DB always seems to turn into a nightmare over time
It seems to me that what you want to do something like what Infovark do.
They use Firebird for this and I'll give you a link on Firebird and storing image
you should try MS SQl 2008, it comes with a Type: FileStream, which automatically store blob in file system.