Database daily counter in MySQL - c#

I want to create counters in MySQL database that stores the number of views per products in a specific day, month and year.
The problems is that I have around 2000 products. I thought about using the following schema:
id (BIGINT)
year (INT[4])
month (TINYINT)
day (TINYINT)
product_id (INT)
pageviews (BIGINT)
The problem with that solution is that if, in the worst case scenario, each product is viewed each day, I will have 2000 rows in my database each day. Multiply is by 36 days and I will have 72,000 each month.
I wanted to know if there is a better way to implement this. I thought that the daily data will be kept only in memory as Application variable (developing in .NET) and as ArrayList. IF I chose that direction, I will have less rows/data, 2000 rows each month.
I really want to keep the cumulative daily page-views. By the way, previews is just for illustration purposes, I will store different data, but that data will be updated very frequently.
If I use the daily column, it will be update very often, almost every 2-5 seconds. I intend to update MySQL async by calling an ASP.NET webservice from Javascript, passing the product_id and increase the counter +1. That's to prevent the application to wait for the update to occur.
I also want to know the estimate table size. IF I'm doing it right:
BIGINT = 8 bytes
Datetime = 10 bytes
INT - 4 bytes
^if I decide to go with a datatime column instead of year/month/day
30 bytes * 2000 = 60,000 bytes
60KB (approx.) * 30 days = 180KB month
180KB month * 12 months = 2160KB / year
Did I get it right?

Ok I'm not that great in Database design but I know that much...
you should ask yourself for how long you want to save the daily hits on these 2K products?
If you want to hold them just for a week then you can calculate how much it'll take in Size (also if you want to hold them for a month).
If you want to save it for let's say a month then you could always make a table each month and save the data from the daily hits to that table and sum them up so you'll have only 2K rows in each table
Or you could use a "warehouse" table that will hold all the hits per let's say month and every month the data will be copied to that warehouse table after you sum it up for each month.
If you want these hits to be saved indefinitely then that is the way as far as I know (Again not a master in Database design).
Again this is what you should do if you want to save the hits up to a month.
Hope I helped you,
Sagi.

Related

Sql database structure for time based sensor data

So here is the deal,
I will be dealing with 1 second data that may accumulate for up to a month with up to 40 sensors (columns) this data will be in exactly one second increments and I need to be able to quickly run logic on these sensor data values. So doing the math:
30 days * 24 hours/day * 60 min/hour * 60 sec/min is roughly:
2.5 million rows x 40 columns is:
100 million data points
Most the data is of double datatype, but I will have booleans, ints, and strings as well. I am entirely new to this so all my ideas on how to handle this are entirely original and therefore they may be absurd and I may be missing the obvious solution. So here are some options I have considered:
1)1 massive table - I am concerned about the performance of this option.
2)DataTables by date, thus reducing the size of an individual table to 86,000 rows.
3)DataTables by hour only dealing with 3600 rows per table, however at that point I start to have an excessive number of tables.
Let me give a little more detail on the nature of the logic - it is entirely sequential, meaning I will start at the beginning and go from the first row all the way to the last. I will essentially be making circles through the data with a number of different algorithms in order to produce my desired results. I am running Sql on Azure. My natural inclination is to move fast and make a lot of mistakes, but in this case I think some experience and advice on how I set up this database will pay off. So, any suggestions?

Approach to storing and retrieving time when having multiple systems talking to each other

Consider this scenario with two applications whose data is synced
Application 1 | Application 2
Data from application1 is inserted/updated into application2 based on last modified date of records.
How can we make sure that the last modified date is not dependent on time zones.
I would use a rowversion column in both applications. To decide what to sync you can have a query or a merge statement and in the where clause you can compare the rows table1.RowVersion > table2.RowVersion. (table1 is from application1 and table2 is from application2)
This might help: ToUniversalTime
Convert the times to universal times, these take into account the time zone, daylight savings, etc. If you convert both times before adding the records, you should be able to get the correct last modified date.

Slow queries searching for records starting at an earliest datetime using C# SDK and dashboard

I’m having problems with slow queries when returning a fixed number of events starting from a specified earliest datetime, from an index/sourcetype containing tens to hundreds of millions of records. E.g.:
index=test sourcetype=touch_tuio earliest="09/29/2015:00:00:00" | fields _time,host,screen_id,tuio_type
Using the SDK and a maximum limit of 50000 events, because events appear in descending date order and are cropped at 50000, we only get the latest 50000 values which do not start at the requested earliest date. So in this case the resulting date range of the above query is "from": 2015-10-13T12:30:14 "to": 2015-10-13T13:08:41. (where 2015-10-13T13:08:41 is the most recent record in Splunk).
If I use tail, I get the correct values (in ascending date order) starting at the correct earliest datetime, but then the query time is unacceptably long in both C# SDK and dashboard (and will take longer and longer with increasing data). E.g.
index=test sourcetype=touch_tuio earliest="09/29/2015:00:00:00" | fields _time,host,screen_id,tuio_type | tail 50000
In the dashboard, even a query such as below takes a very long time, looking for 10 records starting at a particular date:
index=test sourcetype=touch_tuio earliest="09/29/2015:00:00:00" | fields _time,host,screen_id,tuio_type | tail 10
It seems that Splunk needs to go through EVERY record just to give me 10 records starting at a specific date (09/29/2015:00:00:00). This takes an extreme amount of time.
Does anyone know what I may be doing wrong here?
Is there another way to query a fixed number of events beginning at a selected earliest date, without using tail, or without the increasing search time?
Thank you!

Weekly/Fortnightly Time Filter data structure

I want to set up a weekly/fortnightly time filter Such that things can only happen within certain times on certain days. I was wondering if there was a data convention out there for something like this. Currently the best i can come up with is something that represents start and end time for each 14 days in the fortnight but this cant handle the case when there are multiple times a day eg Monday 0900-1200 and Monday 1300-1500. I'm restricted to one time span per day.
example of what i have come up with so far
class schedule{
MondayWeekOneStartTime;
MondayWeekOneEndTime;
TuesdayWeekOneStartTime;
TuesdayWeekOneEndTime;
...
}
This Data Structure will be stored as a table(s) in an sql database so i want to keep the amount of columns to a minimum. Is there a better way to represent this type of schedule or am i stuck with this way?
What about this?
class ScheduleDay
{
DayOfWeek Day;
Timespan[] Times
}
class Schedule
{
ScheduleDay[] Days;
}
And in your DB you create two tables, one for Schedule with id and (per example) name, and other for ScheduleDay with id, scheduleid, day, start, end.
In this way you can handle as many timespans as you want.

What is the best approach to calculate a formula which changes value each day?

I use the following columns stored in a SQL table called tb_player:
Date of Birth (Date), Times Played (Integer), Versions (Integer)
to calculate a "playvalue" (integer) in the following formula:
playvalue = (Today - Date of Birth) * Times Played * Versions
I display upto 100 of these records with the associataed playvalue on a webpage at any time.
My question is, what is the most efficient way of calculating this playvalue given it will change only once a day, due to the (today-date of birth) changing? The other values (times played & versions) remain the same.
Is there a better way than calculating this on the fly each time for the 100 records? If so, is it more efficient to do the calculation in a stored proc or in VB.NET/C#?
In a property/method on the object, in C#/VB.NET (your .NET code).
The time to execute a simple property like this is nothing compared to the time to call out-of-process to a database (to fetch the rows in the first place), or the transport time of a web-page; you'll never notice it if just using it for UI display. Plus it is on your easily-scaled-out hardware (the app server), and doesn't involve a huge update daily, and is only executed for rows that are actually displayed, and only if you actually query this property/method.
Are you finding that this is actually causing a performance problem? I don't imagine it would be very bad, since the calculation is pretty straightforward math.
However, if you are actually concerned about it, my approach would be to basically set up a "playvalue cache" column in the tb_player table. This column will store the calculated "playvalue" for each player for the current day. Set up a cronjob or scheduled task to run at midnight every day and update this column with the new day's value.
Then everything else can simply select this column instead of doing the calculation, and you only have to do the calculation once a day.

Categories

Resources