Building aggregation/summary reporting database against Oracle, Sql Server and Mongodb

Building aggregation/summary reporting database against Oracle, Sql Server and Mongodb - c#

This is a design, since i've not done anything similar in the past, and is a good challange. I have a server which supports Oracle, Sql Server and Mongodb. You can select which one to use at startup. Essentially each server stores xml packets, which are split down into their component elements.
I need to build a reporting database which provides aggration and summary data for reports for the dashboard, but the problem (opportunity) is Mongodb. I could easily use sql server reporting services to build the reportdb, same with Oracle, or I could something like Crystal which works against both, or even create a db, and set a bundle of triggers on each table, with some pl/sql logic with Oracle, or T-Sql with Sql to create the reporting db on the fly. And that would take care of report. But their is mongodb. Little or no reporting infrastructure, certainly not outside BIRT, or jaspersoft (Java). I'm using C#.
I was thinking of having c# server component, which intercepts incoming xml packets, and extracts the appropriate element field data, and writes it into a reporting db, perhaps something like sqlite (which may be too small). If it was running on sql server, or Oracle then I would use that db instance to support the reporting db.
On any database, i'm really only supporting upto 6 months data. The data will be classified as 24 hours, 1 week, 1 month, 3 months, 6 months, with a progressive archive onto on compression and backup db.
But this is where it gets hazy. For instance, using sqlite as the reporting db, and mongodb as the xml databse. Taking an example. If a user wants to drill down, would I have to provide some kind of dynamic update that would pull the additional reporting info from Mongodb, or could all be done at the server component stage, when it's been writen in to sqlite.
Or is all f bol.cks
Any ideas or thoughts greatly appreciated.
Bob.

In terms of getting data from mongodb for reporting you can write your own code on top of
1) mongodb queries
2) Aggregation framework
3) In database Map/reduce or
4) Use the hadoop connector.
You can use the C# driver for it. Apart from that as you mentioned there is a the Jaspersoft integration or Pentaho (http://wiki.pentaho.com/display/BAD/Create+a+Report+with+MongoDB)

I think Microsoft's Biztalk Server best suits your need. You can use the pipeline component of the Biztalk server to actually process the incoming messages. (You can do simple property promotions, transformations etc.) You can use the Biztalk Orchestrations for actual processing of the data. And for Aggregation and Reporting you can use Biztalk's Business Activity Monitoring. It supports Real Time Aggregation of Data and puts them into your Database. It has a BAM Portal from which you can see all the stored and aggregated data. In case you want to have your own style of reports you can use Microsoft's Report Builder 3 and deploy your reports using SSRS.

Have a look at Nucleon BI Studio. You can get a fully-featured free 30 day trial, and the full version is $250. I've used it in the past, it's not bad, and a fraction of what it would cost to develop.
I am not associated with the company in any way.

Perhaps I don't understand your question entirely but I will give it a shot: first your question, summarized.
You want to generate reports based on different types of datastores: sql this, sql that or a document database. The current options you feel you have are the build in reporting of various types.
You have various points available for getting the data. You can intercept the data as it comes into the system or derive the information from your databases. In order to make a dynamic report with drill down it really depends on the type of reporting tool you want to use. You will simply need to build a facade that hides the datastore-- either by intercepting the packets and storing them in a database of your choice or actually building them from your chosen datastore through that same abstraction/facade. You can even think of a hybrid solution where you initialize from the datastore, such as mongo, on initializing your reporting component and then update dynamically based on incoming packets.
It all depends on where you want to go.

Related

Tools or data base tips to show big data information?

I'm having a problem with an application builded in .net core(c#) and SQL server 2017 with angular js version 1.x (frondend).
The problem is the following we have are very big tables, with millons of records. Only a simple select count in one of theses tables takes to long. we execute the query directly from the code without passing through any ORM librery, but even without using any ORM the queries take too long.
I was asking myself ¿if there is another better way to consult these giant tables likes (external tools, another type of database, etc.) since in many cases you need to show reports and see statistics graphs?.

One possible strategy is to use table partitions using a partition function that match your business needs. With this you can split data in table among many files, thus reducing the number of results to scan.
See this link for detailed info.

OLTP databases like SQL Server are not designed for handling OLAP (aggregate) queries in the real time in case of large datasets. Typical workarounds are:
limit number of aggregated rows with extra WHERE conditions, and add indexes for these columns. This is usually is possible with historic data like orders, events log etc - show reports only for last month or year.
use materialized views and use it for reports that doesn't need much detalization
configure slave read-only instance of SQL Server, possibly add columnstore indexes, and use it for OLAP queries.
replicate your SQL Server data to specialized (possibly, distributed) analytical database that can handle OLAP queries in the real-time (like Amazon Redshift, Vertica, MongoDb, ElasticSearch, Yandex ClickHouse etc)
If reports can be configured by end users ensure that your ROLAP-like engine produces efficient SQL GROUP BY queries.

c# Service to copy data between two sql servers

I have two sql servers installed on my computer (SQL2008 EXPRESS) and also SQL2008 that comes with the specific software that we are using.
I need to make a service that runs all the time and at a specific time updates the non existing records in the SQL2008 EXPRESS from SQL2008.. can you suggest a way of doing this?
Currently the best thing I got is making a local copy in excel file, but that will result 365 excel files per year which I dont think is a good idea :)
p.s. sorry if my english is bad :)

You don't have to hand-craft your own software for that. There are 3rd party tools like OpenDbDiff or RedGate dbdiff to do that. These tools generate the differential sql that you can apply on your target database.

I'm confused when you mention Excel. What would Excel have anything to do with moving data from one SQL database to another?
The short answer is, if you need a C# service, then write a C# service that copies the data directly from one database to the other. The problem that you are trying to solve is not very clear.
Having said all that, and with my limited understanding of the problem, it sounds like what you need is a SQL job that is scheduled to run once a day that copies the data from one server to the other. Since it sounds like they are on separate instances, you'll just need to set up a linked server on either the source or destination database and either push or pull the data into the correct table(s).
EDIT:
Ok, so if a windows service is a requirement, that is perfectly acceptable. But, like I mentioned, you should forget about Excel. You wouldn't want to go from SQL->Excel->SQL if you have no other reason for the data to exist in Excel.
Here is some information on creating a windows service:
Easiest language for creating a Windows service
Here is a simple tutorial on accessing SQL in C#: http://www.codeproject.com/Articles/4416/Beginners-guide-to-accessing-SQL-Server-through-C
If you want a more formal solution (read: data access layer), I'd point you toward Entity Framework. The complexity of the project will probably be the driving factor on whether you just want to do SQL statements in your code vs. going with a full blown DAL.

RavenDB - synchronize with Sql Server DB

I was thinking about utilizing RavenDB for some of my look-up scenarios I am doing in a high throughput application. This would replace all of the look-up calls I need to make to the DB to get things like site location, etc. Looking at a couple of options really (also .Net caching). I know that you can replicate Indexes from RavenDB to SQL Server, but wondering if anyone has done the reverse where they sync RavenDB with Sql Server?
Any suggestions / comments would be appreciated.
--S

I've done a similar scenario where data needed to be transferred in batch from a SQL Server system nightly into our RavenDB instance.
I couldn't find an off the shelf tool to do what I wanted as typically you should optimise the model you give RavenDB differently to SQL Server.
I wrote a custom console app that put the data into my RavenDB instance.
For example my console app:
Compacted several relationships into one document
Dealt with the different datatypes
TLDR: I wrote my own console app as I couldn't find a generic product that could do it.

So far the only avaible solution is write your own sync process.
I was looking for ways to improve the search scenearios using RavenDB , the RavenDB will be filled using my sql server relational database.
I think it should be a better way, however the only i can think rith now is to use a ETL process that keeps updating your NoSQL version of your structured data.

.NET Data Storage - Database vs single file

I have a C# application that allows one user to enter information about customers and job sites. The information is very basic.
Customer: Name, number, address, email, associated job site.
Job Site: Name, location.
Here are my specs I need for this program.
No limit on amount of data entered.
Single user per application. No concurrent activity or multiple users.
Allow user entries/data to be exported to an external file that can be easily shared between applications/users.
Allows for user queries to display customers based on different combinations of customer information/job site information.
The data will never be viewed or manipulated outside of the application.
The program will be running almost always, minimized to the task bar.
Startup time is not very important, however I would like the queries to be considerably fast.
This all seems to point me towards a database, but a very lightweight one. However I also need it to have no limitations as far as data storage. If you agree I should use a database, please let me know what would be best suited for my needs. If you don't think I should use a database, please make some other suggestions on what you think would be best.

My suggestion would be to use SQLite. You can find it here: http://sqlite.org/. And you can find the C# wrapper version here: http://sqlite.phxsoftware.com/
SQLite is very lightweight and has some pretty powerful stuff for such a lightweight engine. Another option you can look into is Microsoft Access.

You're asking the wrong question again :)
The better question is "how do I build an application that lets me change the data storage implementation?"
If you apply the repository pattern and properly interface it you can build interchangable persistence layers. So you could start with one implementation and change it as-needed wihtout needing to re-engineer the business or application layers.
Once you have a repository interface you could try implementations in a lot of differnt approaches:
Flat File - You could persist the data as XML, and provided that it's not a lot of data you could store the full contents in-memory (just read the file at startup, write the file at shutdown). With in-memory XML you can get very high throughput without concern for database indexes, etc.
Distributable DB - SQLite or SQL Compact work great; they offer many DB benefits, and require no installation
Local DB - SQL Express is a good middle-ground between a lightweight and full-featured DB. Access, when used carefully, can suffice. The main benefit is that it's included with MS Office (although not installed by default), and some IT groups are more comfortable having Access installed on machines than SQL Express.
Full DB - MySql, SQL Server, PostGreSQL, et al.
Given your specific requirements I would advise you towards an XML-based flat file--with the only condition being that you are OK with the memory-usage of the application directly correlating to the size of the file (since your data is text, even with the weight of XML, this would take a lot of entries to become very large).
Here's the pros/cons--listed by your requirements:
Cons
No limit on amount of data entered.
using in-memory XML would mean your application would not scale. It could easily handle a 10MB data-file, 100MB shouldn't be an issue (unless your system is low on RAM), above that you have to seriously question "can I afford this much memory?".
Pros
Single user per application. No concurrent activity or multiple users.
XML can be read into memory and held by the process (AppDomain, really). It's perfectly suited for single-user scenarios where concurrency is a very narrow concern.
Allow user entries/data to be exported to an external file that can be easily shared between applications/users.
XML is perfect for exporting, and also easy to import to Excel, databases, etc...
Allows for user queries to display customers based on different combinations of customer information/job site information.
Linq-to-XML is your friend :D
The data will never be viewed or manipulated outside of the application.
....then holding it entirely in-memory doesn't cause any issues
The program will be running almost always, minimized to the task bar.
so loading the XML at startup, and writing at shutdown will be acceptible (if the file is very large it could take a while)
Startup time is not very important, however I would like the queries to be considerably fast
Reading the XML would be relatively slow at startup; but when it's loaded in-memory it will be hard to beat. Any given DB will require that the DB engine be started, that interop/cross-process/cross-network calls be made, that the results be loaded from disk (if not cached by the engine), etc...

It sounds to me like a database is 100% what you need. It offers both the data storage, data retrieval (including queries) and the ability to export data to a standard format (either direct from the database, or through your application.)
For a light database, I suggest SQLite (pronounced 'SQL Lite' ;) ). You can google for tutorials on how to set it up, and then how to interface with it via your C# code. I also found a reference to this C# wrapper for SQLite, which may be able to do much of the work for you!

How about SQLite? It sounds like it is a good fit for your application.
You can use System.Data.SQLite as the .NET wrapper.

You can get SQL Server Express for free. I would say the question is not so much why should you use a database, more why shouldn't you? This type of problem is exactly what databases are for, and SQL Server is a very powerful and widely used database, so if you are going to go for some other solution you need to provide a good reason why you wouldn't go with a database.

A database would be a good fit. SQLite is good as others have mentioned.
You could also use a local instance of SQL Server Express to take advantage of improved integration with other pieces of the Microsoft development stack (since you mention C#).
A third option is a document database like Raven which may fit from the sounds of your data.
edit
A fourth option would be to try Lightswitch when the beta comes out in a few days. (8-23-2010)
/edit
There is always going to be a limitation on data storage (the empty space of the hard disk). According to wikipedia, SQL Express is limited to 10 GB for SQL Server Express 2008 R2

Best means to store data locally when offline

I am in the midst of writing a small program (more to experiment with vs 2010 than anything else)
Despite being an experiment it has some practical use for our local athletics club.
My thought was to access the DB (currently online) to download the current members and store locally on a laptop (this is a MS sql table, used to power the club's website).
Take the laptop to the event (yes there ARE places that don't have internet coverage), add members to that days race (also a row from a sql table (though no changes would be made to this), record results (new records in 3rd table)
Once home, showered and within internet access again, upload/edit the tables as per the race results/member changes etc.
So I was thinking I'd do something like write xml files locally with the data, including a field to indicate changes etc?
If anyone can point me in a direction I would appreciate it...hell if anyone could tell me if this has a name, I'd appreciate it.

Essentially what you need is, in addition to your remote data store, a local data store on your desktop. You could then write your code by hand to sync the data stores when you go offline / online, or you could use the Microsoft Sync framework to handle it for you.
I've personally used the Sync framework on a number of projects and once you get used to the conventions, it's pretty easy to use.

If a local storage format is what your after. SQLite is one option. You can copy your tables from the server to your local SQLite db.
You could also save your data to files, but XML is a horrible format for doing this. You'll probably want to use YAML or JSON instead.

You may want to take a look at SQL Server Compact -- it provides some decent capabilities with synchronizing back with the mothership SQL server.

If you're using MS SQL Server for production, and you only need to work offline on your personal computer, you could install MS SQL Server Express locally. The advantage here over using a different local datastore is that you can reuse your schema, stored procedures, etc. essentially only needing to change the connection string to your application (which you could run locally too through Visual Studio). You would have to write code to manually sync your online and offline db instances, but since it's a small application, it may be reasonable to just copy the entire database from production to local and then from local to production when you get home (assuming you're the only one updating the db, and wouldn't be potentially wiping out any new records entered in production while you were at the event).

Google Gears http://gears.google.com/ is intended if your app is a web app (which I didn't quite get what it is from your description)

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.