Redis - Hits count tracking and querying in given datetime range

Redis - Hits count tracking and querying in given datetime range - c#

I have many different items and I want to keep a track of number of hits to each item and then query the hit count for each item in a given datetime range, down to every second.
So i started storing the hits in a sorted set, one sorted set for each second (unix epoch time) for example :
zincrby ItemCount:1346742000 item1 1
zincrby ItemCount:1346742000 item2 1
zincrby ItemCount:1346742001 item1 1
zincrby ItemCount:1346742005 item9 1
Now to get an aggregate hit count for each item in a given date range :
1. Given a start datetime and end datetime:
Calculate the range of epochs that fall under that range.
2. Generate the key names for each sorted set using the epoch values example:
ItemCount:1346742001, ItemCount:1346742002, ItemCount:1346742003
3. Use Union store to aggregate all the values from different sorted sets
ZUINIONSTORE _item_count KEYS....
4. To get the final results out:
ZRANGE _item_count 0, -1 withscores
So it kinda works, but i run into problem when I have a big date range like 1 month, the number of key names calculated from step 1 & 2 run into millions (86400 epoch values per day).
With such large number of keys, ZUINIONSTORE command fails - the socket gets broken. Plus it takes a while to loop through and generate that many keys.
How can i design this in Redis in a more efficient way and still keep the tracking granularity all the way down to seconds and not minutes or days.

yeah, you should avoid big unions of sorted sets. a nice trick you can do, assuming you know the maximum hits an item can get per second.
sorted set per item with timestamps as BOTH scores and values.
but the scores are incremented by 1/(max_predicted_hits_per_second), if you are not the first client to write them. this way the number after the decimal dot is always hits/max_predicted_hits_per second, but you can still do range queries.
so let's say max_predicted_hits_per_second is 1000. what we do is this (python example):
#1. make sure only one client adds the actual timestamp,
#by doing SETNX to a temporary key)
now = int(time.time())
rc = redis.setnx('item_ts:%s' % itemId, now)
#just the count part
val = float(1)/1000
if rc: #we are the first to incement this second
val += now
redis.expire('item_ts:%s' % itemId, 10) #we won't need that anymore soon, assuming all clients have the same clock
#2 increment the count
redis.zincrby('item_counts:%s' % itemId, now, amount = val)
and now querying a range will be something like:
counts = redis.zrangebyscore('item_counts:%s' % itemId, minTime, maxTime + 0.999, withscores=True)
total = 0
for value, score in counts:
count = (score - int(value))*1000
total += count

Related

array.Max(); for the array index rather than the value

I am creating an array that will take in 24 numbers and display them in a table. I have used "arrayname".Max(); to determine the highest number but I need to display the array slot with the highest number
e.g. hour 15 had the highest number so 15 will displayed in a message rather than the number assigned to 15.
My code is as follows:
public void busiest(int[] A)
{
int busy;
busy = A.Max(); //Displays the highest values in a given set i.e. an array
Console.WriteLine("\nThe busiest time of day was hour " + busy);
}
Could anyone say if i'm missing something simple to display the slot rather than the assigned number?
Thanks

That you need to call is Array.IndexOf:
Array.IndexOf(A, A.Max());
For further info regarding this method, please have a look here.
Beware that if there are more than one elements in the array with the same value, the index of the first of them would be returned from this method. For instance if the maximum value is 10 and there are two elements one at position with index 2 and one at position with index 3, then this method would return the value of 2.

Combining multiple date spans of different frequencies

An order consists of a startDate the starting date of the span, endDate the ending date of the span, unitAmount the number of units per delivery, frequencyAmount the number of times the units are delivered per frequency, and frequencyId the frequency of the delivery. For example: From 2017-01-01 to 2017-04-01 6 units are delivered 5 times per week. It covers 13 calendar weeks for (6*5) units per week resulting in a total of 390 units for the entire order.
Multiple orders can be created overlapping the same dates. This is allowed due to it being impossible to write 2 orders of 1 unit 3 times per week and 5 units 1 times per week as a single order, also for different frequencies like 10 units 1 time per month.
Problem: I can not figure out a way to validate that these orders do not go over certain set limits. For example, I want to make sure the orders stay under 40 units total per month and I might have 4 orders overlapping each other of different date spans, units, and frequencies.
I thought to combine all the orders by calculating how many units in total each order has and what percentage of the order overlaps another order. However, when I validating a larger frequency, say <1000 units per year, and my orders that I have combined are smaller. I end up having to extrapolate and overestimating how many units are being called for. (For example, orders than combine to be 200 units for a single month, it is okay since it is only for a single month, but if I figure the yearly amount from that (200*12) it is 1200 and over the 1000 unit limit, but in reality the total units might still be under.
Orders:
|-----------------4 6x/week---------------|
|-------------8 1x/week-------------|
|-------10 2x/month-------|
===========================================================
Combined Orders:
|--a---|---b---|--------c--------|---d----|----e---|
I am checking if each span of the combined orders (a-e) are over any daily, weekly, monthly, or yearly limits. Different units have different limits and I need to be able to validate at these different frequencies.
I feel like I am going about this the wrong way, I keep running into issues with this approach such as my overestimation when extrapolating. Another issue, looking at my diagram, the 3rd order for 10 units 2 times per month (lets say the order is a month long), when combined, falls into 2 spans b and c. The 10 units could have been delivered twice in b, none in c or 1 in each b and c or none in b and twice in c. So if I am converting to a weekly amount to combine I have to assume the total units were delivered in both spans b and c in a worst case scenario which leads to overestimations. If I figure out the the percentage of units per each span, it leads to underestimations.
Has anyone else faced a similiar issue or does anyone else think they have a solution to this problem?
Thanks
EDIT: Another situation can occur, imagine a limit of 30 units per month:
Orders:
|---10 units 2x/week---| |---10 units 2x/week---|
2017-01-01 2017-01-31
===========================================================
Combined Orders:
|--------20------------| |----------20-----------|
In this case, it goes over the limit not due to overlaps. This makes me believe that I will also have to calculate the amount of units in each month (or week, day, year) between the earliest startDate and the furthest endDate. Unless there is a better way, but I have a feeling this is the only way.

I do not see any way other then what #Furmek suggested in his 3rd comment.
The prototype of the solution is below.
-- Here we calculate daily unit amount for all orders that overlap a validation period.
-- [frequency] is in days
;WITH ValidationPeriodOrders AS(
SELECT ( unitAmount * frequencyAmount ) / [frequency] AS AvgDailyAmount
-- Adjust order dates to be within Validation period
CASE WHEN startDate < ValidationPeriodStart THEN ValidationPeriodStart ELSE startDate END AS OrderStart,
CASE WHEN endDate > ValidationPeriodEnd THEN ValidationPeriodEnd ELSE endDate END AS OrderEnd
FROM Order
-- Look for orders that fall inside a validation period
WHERE endDate BETWEEN ValidationPeriodStart AND ValidationPeriodEnd
OR startDate BETWEEN ValidationPeriodStart AND ValidationPeriodEnd
)
-- Get total units per validation period
SELECT SUM( AvgDailyAmount * DATEDIFF( dd, OrderStart, OrderEnd ))
FROM ValidationPeriodOrders
Since there is no (reliable) way to tell when an order was delivered I would suggest rounding down the number (rather than mathematical rounding).

Ideas about Generating Untraceable Invoice IDs

I want to print invoices for customers in my app. Each invoice has an Invoice ID. I want IDs to be:
Sequential (ids entered lately come late)
32 bit integers
Not easily traceable like 1 2 3 so that people can't tell how many items we sell.
An idea of my own:
Number of seconds since a specific date & time (e.g. 1/1/2010 00 AM).
Any other ideas how to generate these numbers ?

I don't like the idea of using time. You can run into all sorts of issues - time differences, several events happening in a single second and so on.
If you want something sequential and not easily traceable, how about generating a random number between 1 and whatever you wish (for example 100) for each new Id. Each new Id will be the previous Id + the random number.
You can also add a constant to your IDs to make them look more impressive. For example you can add 44323 to all your IDs and turn IDs 15, 23 and 27 into 44338, 44346 and 44350.

There are two problems in your question. One is solvable, one isn't (with the constraints you give).
Solvable: Unguessable numbers
The first one is quite simple: It should be hard for a customer to guess a valid invoice number (or the next valid invoice number), when the customer has access to a set of valid invoice numbers.
You can solve this with your constraint:
Split your invoice number in two parts:
A 20 bit prefix, taken from a sequence of increasing numbers (e.g. the natural numbers 0,1,2,...)
A 10 bit suffix that is randomly generated
With these scheme, there are a bout 1 million valid invoice numbers. You can precalculate them and store them in the database. When presented with a invoice number, check if it is in your database. When it isn't, it's not valid.
Use a SQL sequence for handing out numbers. When issuing a new (i.e. unused) invoice number, increment the seuqnce and issue the n-th number from the precalculated list (order by value).
Not solvable: Guessing the number of customers
When you want to prevent a customer having a number of valid invoice numbers from guessing how much invoice numbers you have issued yet (and there for how much customers you have): This is not possible.
You have hare a variant form the so called "German tank problem". I nthe second world war, the allies used serial numbers printed on the gear box of german tanks to guestimate, how much tanks Germany had produced. This worked, because the serial number was increasing without gaps.
But even when you increase the numbers with gaps, the solution for the German tank problem still works. It is quite easy:
You use the method described here to guess the highest issued invoice number
You guess the mean difference between two successive invoice numbers and divide the number through this value
You can use linear regression to get a stable delta value (if it exists).
Now you have a good guess about the order of magnitude of the number of invoices (200, 15000, half an million, etc.).
This works as long there (theoretically) exists a mean value for two successive invoice numbers. This is usually the case, even when using a random number generator, because most random number generators are designed to have such a mean value.
There is a counter measure: You have to make sure that there exists no mean value for the gap of two successive numbers. A random number generator with this property can be constructed very easy.
Example:
Start with the last invoice number plus one as current number
Multiply the current number with a random number >=2. This is your new current number.
Get a random bit: If the bit is 0, the result is your current number. Otherwise go back to step 2.
While this will work in theory, you will very soon run out of 32 bit integer numbers.
I don't think there is a practical solution for this problem. Either the gap between two successive number has a mean value (with little variance) and you can guess the amount of issued numbers easily. Or you will run out of 32 bit numbers very quickly.
Snakeoil (non working solutions)
Don't use any time based solution. The timestamp is usually easy guessable (probably an approximately correct timestamp will be printed somewhere on invoice). Using timestamps usually makes it easier for the attacker, not harder.
Don't use insecure random numbers. Most random number generators are not cryptographically safe. They usually have mathematical properties that are good for statistics but bad for your security (e.g. a predicable distribution, a stable mean value, etc.)

One solution may involve Exclusive OR (XOR) binary bitmaps. The result function is reversible, may generate non-sequential numbers (if the first bit of the least significant byte is set to 1), and is extremely easy to implement. And, as long as you use a reliable sequence generator (your database, for example,) there is no need for thread safety concerns.
According to MSDN, 'the result [of a exclusive-OR operation] is true if and only if exactly one of its operands is true.' reverse logic says that equal operands will always result false.
As an example, I just generated a 32-bit sequence on Random.org. This is it:
11010101111000100101101100111101
This binary number translates to 3588381501 in decimal, 0xD5E25B3D in hex. Let's call it your base key.
Now, lets generate some values using the ([base key] XOR [ID]) formula. In C#, that's what your encryption function would look like:
public static long FlipMask(long baseKey, long ID)
{
return baseKey ^ ID;
}
The following list contains some generated content. Its columns are as follows:
ID
Binary representation of ID
Binary value after XOR operation
Final, 'encrypted' decimal value
0 | 000 | 11010101111000100101101100111101 | 3588381501
1 | 001 | 11010101111000100101101100111100 | 3588381500
2 | 010 | 11010101111000100101101100111111 | 3588381503
3 | 011 | 11010101111000100101101100111110 | 3588381502
4 | 100 | 11010101111000100101101100111001 | 3588381497
In order to reverse the generated key and determine the original value, you only need to do the same XOR operation using the same base key. Let's say we want to obtain the original value of the second row:
11010101111000100101101100111101 XOR
11010101111000100101101100111100 =
00000000000000000000000000000001
Which was indeed your original value.
Now, Stefan made very good points, and the first topic is crucial.
In order to cover his concerns, you may reserve the last, say, 8 bytes to be purely random garbage (which I believe is called a nonce), which you generate when encrypting the original ID and ignore when reversing it. That would heavily increase your security at the expense of a generous slice of all the possible positive integer numbers with 32 bits (16,777,216 instead of 4,294,967,296, or 1/256 of it.)
A class to do that would look like this:
public static class int32crypto
{
// C# follows ECMA 334v4, so Integer Literals have only two possible forms -
// decimal and hexadecimal.
// Original key: 0b11010101111000100101101100111101
public static long baseKey = 0xD5E25B3D;
public static long encrypt(long value)
{
// First we will extract from our baseKey the bits we'll actually use.
// We do this with an AND mask, indicating the bits to extract.
// Remember, we'll ignore the first 8. So the mask must look like this:
// Significance mask: 0b00000000111111111111111111111111
long _sigMask = 0x00FFFFFF;
// sigKey is our baseKey with only the indicated bits still true.
long _sigKey = _sigMask & baseKey;
// nonce generation. First security issue, since Random()
// is time-based on its first iteration. But that's OK for the sake
// of explanation, and safe for most circunstances.
// The bits it will occupy are the first eight, like this:
// OriginalNonce: 0b000000000000000000000000NNNNNNNN
long _tempNonce = new Random().Next(255);
// We now shift them to the last byte, like this:
// finalNonce: 0bNNNNNNNN000000000000000000000000
_tempNonce = _tempNonce << 0x18;
// And now we mix both Nonce and sigKey, 'poisoning' the original
// key, like this:
long _finalKey = _tempNonce | _sigKey;
// Phew! Now we apply the final key to the value, and return
// the encrypted value.
return _finalKey ^ value;
}
public static long decrypt(long value)
{
// This is easier than encrypting. We will just ignore the bits
// we know are used by our nonce.
long _sigMask = 0x00FFFFFF;
long _sigKey = _sigMask & baseKey;
// We will do the same to the informed value:
long _trueValue = _sigMask & value;
// Now we decode and return the value:
return _sigKey ^ _trueValue;
}
}

perhaps idea may come from the millitary? group invoices in blocks like these:
28th Infantry Division
--1st Brigade
---1st BN
----A Co
----B Co
---2nd BN
----A Co
----B Co
--2nd Brigade
---1st BN
----A Co
----B Co
---2nd BN
----A Co
----B Co
--3rd Brigade
---1st BN
----A Co
----B Co
---2nd BN
----A Co
----B Co
http://boards.straightdope.com/sdmb/showthread.php?t=432978
groups don't have to be sequential but numbers in groups do
UPDATE
Think about above as groups differentiated by place, time, person, etc. For example: create group using seller temporary ID, changing it every 10 days or by office/shop.
There is another idea, you may say a bit weird but... when I think of it I like it more and more. Why not to count down these invoices? Choose a big number and count down. It's easy to trace number of items when counting up, but counting down? How anyone would guess where is a starting point? It's easy to implement,
too.

If the orders sit in an inbox until a single person processes them each morning, seeing that it took that person till 16:00 before he got round to creating my invoice will give me the impression that he's been busy. Getting the 9:01 invoice makes me feel like I'm the only customer today.
But if you generate the ID at the time when I place my order, the timestamp tells me nothing.
I think I therefore actually like the timestamps, assuming that collisions where two customers simultaneously need an ID created are rare.

You can see from the code below that I use newsequentialid() to generate a sequential number then convert that to a [bigint]. As that generates a consistent increment of 4294967296 I simply divide that number by the [id] on the table (it could be rand() seeded with nanoseconds or something similar). The result is a number that is always less than 4294967296 so I can safely add it and be sure I'm not overlapping the range of the next number.
Peace
Katherine
declare #generator as table (
[id] [bigint],
[guid] [uniqueidentifier] default( newsequentialid()) not null,
[converted] as (convert([bigint], convert ([varbinary](8), [guid], 1))) + 10000000000000000000,
[converted_with_randomizer] as (convert([bigint], convert ([varbinary](8), [guid], 1))) + 10000000000000000000 + cast((4294967296 / [id]) as [bigint])
);
insert into #generator ([id])
values (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
select [id],
[guid],
[converted],
[converted] - lag([converted],
1.0)
over (
order by [id]) as [orderly_increment],
[converted_with_randomizer],
[converted_with_randomizer] - lag([converted_with_randomizer],
1.0)
over (
order by [id]) as [disorderly_increment]
from #generator
order by [converted];

I do not know the reasons for the rules you set on the Invoice ID, but you could consider to have an internal Invoice Id which could be the sequential 32-bits integer and an external Invoice ID that you can share with your customers.
This way your internal Id can start at 1 and you can add one to it everytime and the customer invoice id could be what ever you want.

I think Na Na has the correct idea with choosing a big number and counting down. Start off with a large value seed and either count up or down, but don't start with the last placeholder. If you use one of the other placeholders it will give an illusion of a higher invoice count....if they are actually looking at that anyway.
The only caveat here would be to modify the last X digits of the number periodically to maintain the appearance of a change.

Why not taking an easy readable Number constructed like
first 12 digits is the datetime in a yyyymmddhhmm format (that ensures the order of your invoice IDs)
last x-digits is the order number (in this example 8 digits)
The number you get then is something like 20130814140300000008
Then do some simple calculations with it like the first 12 digits
(201308141403) * 3 = 603924424209
The second part (original: 00000008) can be obfuscated like this:
(10001234 - 00000008 * 256) * (minutes + 2) = 49995930
It is easy to translate it back into an easy readable number but unless you don't know how the customer has no clue at all.
Alltogether this number would look like 603924424209-49995930
for an invoice at the 14th August 2013 at 14:03 with the internal invoice number 00000008.

You can write your own function that when applied to the previous number generates the next sequential random number which is greater than the previous one but random. Though the numbers that can be generated will be from a finite set (for example, integers between 1 and 2 power 31) and may eventually repeat itself though highly unlikely. To Add more complexity to the generated numbers you can add some AlphaNumeric Characters at the end. You can read about this here Sequential Random Numbers.
An example generator can be
private static string GetNextnumber(int currentNumber)
{
Int32 nextnumber = currentNumber + (currentNumber % 3) + 5;
Random _random = new Random();
//you can skip the below 2 lines if you don't want alpha numeric
int num = _random.Next(0, 26); // Zero to 25
char let = (char)('a' + num);
return nextnumber + let.ToString();
}
and you can call like
string nextnumber = GetNextnumber(yourpreviouslyGeneratedNumber);

generate 6 digit number which will expire after 5 second

Based on current date and time, can i generate 6 digit number? One person said me to use timestamp. Please guide me how to generate 6 digit number based on current time stamp. I need to generate this in such way later I can check the number was generate before 5 second or not. I need to know what logic should I use to reverse logic to find out when the number was generated. Please help with sample code.
Which crypto technique i can use to generate digit if i input current date like DateTime.Now.ToString("yyyyMMddHHmmssffff") ?.

You could do this:
public static int GetTimestamp()
{
// 10m ticks in a second, so 50m in 5 seconds
const int ticksIn5Seconds = 50000000;
return (int)((DateTime.Now.Ticks / ticksIn5Seconds) % 1000000);
}
This gets a number with one to six digits, which changes every five seconds.
Edit:
If course, this is not cryptographically secure: if you observe one number then you know what later ones are going to be (because they just increase by 1 each time). If unpredictability is a requirement, you need a different approach.

// Get currect tick count
string sTicks = DateTime.Now.Ticks.ToString();
// get 5 least-significat digits
string sNum = sTicks.Substring(sTicks.Length - 5, 5);
You need to take into consideration that after 100000 ticks, there's a somewhat high probability that you`ll get the same numbers again.
There are 10,000 ticks in a millisecond.

Selecting random item from list having probability weighting using c#?

I have a scenario where i a was taking a list of users (20 users) from my database, where i was giving
weighting for users
first 5 users probability factor of 4
next 5 users probability factor of 3
next 5 users probability factor of 2
next 5 users probability factor of 1
So an user that occurs in the first 5 users is 4 times more
likely to occur than an user in the last 5.
So how can i select a random user from the list using probability in c#?
can anybody help me in doing this i am totally stuck up logically?

You could add the uses the number of probability times in the list. So the 5 first users are 4 times in the list, next 5 users 3 times and so on. Then just select one user from the complete list.

Create a list of partial sums of weights. In your example, it would be
[4, 8, 12, 16, 20, 23, ...]
The last element is the sum of all weights. Pick a random number between 0 and this sum (exclusive). Then your element is the first element with partial sum greater then the random number. So if you got 11, you need the third element, if you got 16, the fifth, etc.

I have a (bit hacky) solution for you:
Create a list containing the users, where each user is added as often as his weightage is. (e.g. User has a weightage of 5, add him 5 times to the list). Then us a Random to fetch a user from that list, that should solve your problem.

One solution would be to find the smallest common denominator of the weights (or just multiply them together) and create a new list that contains the keys of the first list, but multiple times, ie:
user1
user1
user2
user3
user3
user3
Then just to a newList.skip(Random.Next(newList.Count)).Take(1) and you are set!

You could apportion the probability range amongst the users using a dictionary. eg
User 1 has 1-4 (so max of 4)
User 2 has 5-8 (max of 8) etc etc...
Then after selecting the random number find which user within the dictionary it relates to. You can do this using Linq like so...
int iUser = users.Where(p => (choice <= p.Value)).First().Key;
..where users is a Dictionary<int,int> (Key = user number, Value = max value) and choice is the randomly generated value.
This is obviously more complex than the "multiple entries" method proposed by others but has its advantages if you
a) need a fractional weighting which makes the common denominator of your multiple entry method very small (resulting in many entries) or
b) need to weight heavily in favour of particular users (which would again have the effect of making the multiple entry method very large).
Working Example at ideone.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.