I have to read some data sequentially(from a file) and put the data into a matrix. I don't know the rank of the matrix initially. For example consider the data is plotted on an x, y plane with years on the Y axis and increments in the x axis. At first the data came in for 1990 with 3 increments
year increment(1991) increment(1992) increment(1993)
1990 12 25 35
Note that I will only know about the increments after reading the data line. So next is 1989 with 4 increments. So it should be
year increment(1990) increment(1991) increment(1992) increment(1993)
1989 23 33 43 53
1990 0 12 25 35
Note that when the new data came in another increment year came in the y axis(1990).As there is no increment year of 1990 for year 1990 this has to be filled with zero or kept it empty, but the
In the end I have to create a matrix. For example
year increment(1990) increment(1991) increment(1992) increment(1993)
1989 23 33 43 53
1990 0 12 25 35
1991 0 0 23 33
To build up the matrix, the difficult part is I don't know the years/increments initially, I will only know after reading the entire data. I would like to plot the matrix while reading the data so that I can avoid more than one pass through the data.
The placement of the matrix in the xy axis will be only known after the entire data is processed!
Any suggestions?
I quite like the sparse matrix solution, but you could use a version of http://en.wikipedia.org/wiki/Dynamic_array. Dynamic arrays are arrays that you resize when they get too full. Resizing is expensive, but if you increase the size by a constant factor every time you resize the cost of resizing works out so that the total cost is still O(n) if the final size has n elements.
To use dynamic arrays for this, you could create two dynamic arrays for each row, one growing with years larger than those seen so far, and one growing with years smaller than those seen so far (so with years in decreasing order along the array).
Another way to do this would be to create a single area of storage for the matrix, with only the central section used, so there is always space to add entries in any direction. You would then have to check that increasing the size of this storage by a constant factor when you were about to run over the edges would lead to a total cost of at most O(n). I suspect that it would, but the constant factors might not be very good.
You can build it as a sparse matrix with SortedList<int, SortedList<int, int>>
Related
I am working on a personal project, and I need to calculate what increase in numbers will bump the average to the next increment, I am able to do this long winded with if statements but wondered if there is already an algorithm or method for this.
Example
8 numbers averaging 750.7
which numbers need increasing to get to 751
Not really a programming problem, maybe there are simpler Maths formulas, but the following works.
If you want to increase each number by the same amount then:
Multiply the average value you want to have by the number of elements
751 * 8 = 6008
Minus the sum of your existing elements and Divide by the number of elements
6008 - 6005.6 = 2.4
2.4 / 8 = 0.3
Each number needs to be increased by 0.3 to make your average 751.
If you want to just increment 1 number to increase your average then:
Multiply the average value you want to have by the number of elements
751 * 8 = 6008
minus all the existing numbers except the one you want to increase / last value.
This will leave you with the new last value you should use.
I have list of billions of items in SQL which can be shuffled by user at random, by moving them inside list to another position, I consider using simple double divide solution:
Id, Rank
1 10
2 20
3 30
4 40
5 50
Now user moves item id=3 to first position and I perform item rank recalculation based on their adjasent items (0 - means no relative from left, max - no relative from right):
Id, Rank
3 (0+10)/2 = 5
1 10
2 20
4 40
5 50
Now there is a bug - until it reach epsilon for double, it will work, after that you will get a couple of elements with epsilon and they are not possible to move.
This can be avoided by infrequent recalculation of stack rank for entire collection, but I hesitate at the moment to implement this, because this looks too much.
I wanted to know is there some other algorithmic solution other than changing billions of items or is there a well-known name to this problem to find appropriate solution myself.
First of all pardon me to raise this question here (not sure). Not good in maths so need help from others to understand how to calculate.
I have to calculate proportional ratio score. For doing that i am taking two input values
ValueA = 3
ValueB = 344.
To find the percentage of the proportional ratio ((ValueB-ValueA)/ValueA )*100)
that formula gives me the score 11366.6.
Now i have to match with proportional percentage against with following table,
no idea how to match with percentage
for example the score comes around 43.12 % then i will pick the value 5 (>40 -50)
% Ratio Score
0 0
≤10 1
>10 – 20 2
>20 – 30 3
>30 – 40 4
>40 – 50 5
>50 – 60 6
>60 – 70 7
>70 – 80 8
>80 – 90 9
>90 – 100 10
your formula is of (as you can see by the 11366.6 percentage) - it should be
100.0*(ValueB-ValueA)/(double)ValueB
this will give you values in between 0 and 100 percent if ValueB is always bigger than ValueA (if not use):
100.0*Math.Abs(ValueB - ValueA)/(double)Math.Max(ValueA, ValueB)
based on the table your score should than be simply:
var score = (int)Math.Ceiling(percentage / 10.0)
You should swap value a and value b of you get percentages bigger than 100. By the way, finding the proportional value is not unique and the formula you have provided is one way to do that. I guess Valuea/valueb is also a possibility for example.
Preface: I'm currently learning about ANNs because I have ~18.5k images in ~83 classes. They will be used to train a ANN to recognize approximately equal images in realtime. I followed the image example in the book, but it doesn't work for me. So I'm going back to the beginning as I've likely missed something.
I took the Encog XOR example and extended it to teach it how to add numbers less than 100. So far, the results are mixed, even for exact input after training.
Inputs (normalized from 100): 0+0, 1+2, 3+4, 5+6, 7+8, 1+1, 2+2, 7.5+7.5, 7+7, 50+50, 20+20.
Outputs are the numbers added, then normalized to 100.
After training 100,000 times, some sample output from input data:
0+0=1E-18 (great!)
1+2=6.95
3+4=7.99 (so close!)
5+6=9.33
7+8=11.03
1+1=6.70
2+2=7.16
7.5+7.5=10.94
7+7=10.48
50+50=99.99 (woo!)
20+20=41.27 (close enough)
From cherry-picked unseen data:
2+4=7.75
6+8=10.65
4+6=9.02
4+8=9.91
25+75=99.99 (!!)
21+21=87.41 (?)
I've messed with layers, neuron numbers, and [Resilient|Back]Propagation, but I'm not entirely sure if it's getting better or worse. With the above data, the layers are 2, 6, 1.
I have no frame of reference for judging this. Is this normal? Do I have not enough input? Is my data not complete or random enough, or too weighted?
You are not the first one to ask this. It seems logical to teach an ANN to add. We teach them to function as logic gates, why not addition/multiplication operators. I can't answer this completely, because I have not researched it myself to see how well an ANN performs in this situation.
If you are just teaching addition or multiplication, you might have best results with a linear output and no hidden layer. For example, to learn to add, the two weights would need to be 1.0 and the bias weight would have to go to zero:
linear( (input1 * w1) + (input2 * w2) + bias) =
becomes
linear( (input1 * 1.0) + (input2 * 1.0) + (0.0) ) =
Training a sigmoid or tanh might be more problematic. The weights/bias and hidden layer would basically have to undo the sigmoid to truely get back to an addition like above.
I think part of the problem is that the neural network is recognizing patterns, not really learning math.
ANN can learn arbitrary function, including all arithmetics. For example, it was proved that addition of N numbers can be computed by polynomial-size network of depth 2. One way to teach NN arithmetics is to use binary representation (i.e. not normalized input from 100, but a set of input neurons each representing one binary digit, and same representation for output). This way you will be able to implement addition and other arithmetics. See this paper for further discussion and description of ANN topologies used in learning arithmetics.
PS. If you want to work with image recognition, its not good idea to start practicing with your original dataset. Try some well-studied dataset like MNIST, where it is known what results can be expected from correctly implemented algorithms. After mastering classical examples, you can move to work with your own data.
I am in the middle of a demo that makes the computer to learn how to multiply and I share my progress on this: as Jeff suggested I used the Linear approach and in particular ADALINE. At this moment my program "knows" how to multiply by 5. This is the output I am getting:
1 x 5 ~= 5.17716232607829
2 x 5 ~= 10.147218373698
3 x 5 ~= 15.1172744213176
4 x 5 ~= 20.0873304689373
5 x 5 ~= 25.057386516557
6 x 5 ~= 30.0274425641767
7 x 5 ~= 34.9974986117963
8 x 5 ~= 39.967554659416
9 x 5 ~= 44.9376107070357
10 x 5 ~= 49.9076667546553
Let me know if you are interested in this demo. I'd be happy to share.
I wan't to generate a fictional job title from some information I have about the visitor.
For this, I have a table of about 30 different job titles:
01 CEO
02 CFO
03 Key Account Manager
...
29 Window Cleaner
30 Dishwasher
I'm trying to find a way to generate one of these titles from a few different variables like name, age, education history, work history and so on. I wan't it to be somewhat random but still consistent so that the same variables always result in the same title.
I also wan't the different variables to have some impact on the result. Lower numbers are "better" jobs and higher numbers are "worse" jobs, but it doesn't have to be very accurate, just not completely random.
So take these two people as an example.
Name: Joe Smith
Number of previous employers: 10
Number of years education: 8
Age: 56
Name: Samantha Smith
Number of previous employers: 1
Number of years education: 0
Age: 19
Now the reason I wan't the name in there is to have a bit of randomness, so that two co-workers of the same age with the same background doesn't get exactly the same title. So I was thinking of using the number of letters in the name to mix it up a bit.
Now I can generate consistent numbers in an infinite number of ways, like the number of letters in the name * age * years of education * number of employers. This would come out as 35 840 for Joe Smith and 247 for Samantha Smith. But I wan't it to be a number between 1-30 where Samantha is closer to 25-30 and Joe is closer to 1-5.
Maybe this is more of a math problem than a programming problem, but I have seen a lot of "What's your pirate name?" and similar apps out there and I can't figure out how they work. "What's your pirate name?" might be a bad example, since it's probably completely random and I wan't my variables to matter some, but the idea is the same.
What I have tried
I tried adding weights to variable groups so I would get an easier number to use in my calculations.
Age
01-20 5
20-30 4
30-40 3
40-50 2
...
Years of education
00-01 0
01-02 1
02-03 2
04-05 3
...
Add them together and play around with those numbers, but there was a lot of problems like everyone ending up in pretty much the same mid-range (no one got to be CEO or dishwasher, everyone was somewhere in the middle), not to mention how messy the code was.
Is there a good way to accomplish what I want to do without having to build a massive math engine?
int numberOfTitles = 30;
var semiRandomID = person.Name.GetHashCode()
^ person.NumberOfPreviousEmployers.GetHashCode()
^ person.NumberOfYearsEducation.GetHashCode()
^ person.Age.GetHashCode();
var semiRandomTitle = Math.Abs(semiRandomID) % numberOfTitles;
// adjust semiRandomTitle as you see fit
semiRandomTitle += ((person.Age / 10) - 2);
semiRandomTitle += (person.NumberOfYearsEducation / 2);
The semiRandomID is a number that is generated from unique hashes of each component. The numbers are unique so that you will always generate the same number for "Joe" for example, but they don't mean anything. It's just a number. So we take all those unique numbers and generate one job title out of the 30 available. Every person has the same chance to get each job title (probably some math freak will proof that there's egde cases to the contrary, but for all practical, non-cryptographic means, it's sufficient).
Now each person has one job title assigned that looks random. However, as it's math and not randomness, they will get the same every time.
Now lets assume Joe got Taxi-Driver, the number 20. However, he has 10 years of formal education, so you decide you want to have that aspect have some weight. You could just add the years onto the job title number, but that would make anyone with 30 years of college parties CEO, so you decide (arbitrarily) that each year of education counts for half a job title. You add (NumberOfYearsEducation / 2) to the job title.
Lets assume Jane got CIO, the number 5. However, she is only 22 years old, a little young to be that high on the list. Again, you could just add the years onto the job title number, but that would make anyone with 30 years of age a CEO, so you decide (arbitrarily) that each year counts as 1/10 of a job title. In addition, you think that being very young should instead subtract from the job title. All years below the first 20 should indeed be a negative weight. So the formula would be ((Age / 10) - 2). One point for each 10 years of age, with the first 2 counting as negative.