I am using a multi-dimensional SVM classifier (SVM.NET, a wrapper for libSVM) to classify a set of features.
Given an SVM model, is it possible to incorporate new training data without having to recalculate on all previous data? I guess another way of putting it would be: is an SVM mutable?
Actually, it's usually called incremental learning. The question has come up before and is pretty well answered here : A few implementation details for a Support-Vector Machine (SVM).
In brief, it's possible but not easy, you would have to change the library you are using or implement the training algorithm yourself.
I found two possible solutions, SVMHeavy and LaSVM, that supports incremental training. But I haven't used either and don't know anything about them.
Online and incremental although similar but differ slightly. In online, its generally a single pass(epoch=1) or number of epochs could be configured. Where as, incremental would mean that you already have a model; no matter how it is built, but then model can be mutable by new examples. Also, a combination of online and incremental is often what is required.
Here is a list of tools with some remarks on the online and/or incremental SVM : https://stats.stackexchange.com/questions/30834/is-it-possible-to-append-training-data-to-existing-svm-models/51989#51989
Related
In the last few months, I worked through Python and Tensorflow, building a neural network. The network performs pretty well on a large amount of data (the precision of my prediction is 85% after training on a data-set of 120000 records).
My neural network makes use of batch-normalization, learning-rate decay, dropout.. It uses an Adam-Optimizer to minimize the loss. After training I store my model by a saver to store the mean/variance-variables of batch_normalization:
saver = tf.compat.v1.train.Saver(tf.global_variables())
saver.save(sess, "sessionSave")
After searching for a proper way to convert this model to c#, I found a Tensorflow- implementation for .NET (SciSharp). But I cannot find an implementation of batch_normalization. In this specific case it would be the following python code call which I need to convert:
Z_BN = tf.contrib.layers.batch_norm(Z, is_training=train,updates_collections=ops.GraphKeys.UPDATE_OPS, scope="scope"+str(i), reuse=True)
If there is a way to convert this call, I would need to solve the problem of the saved mean/variance-variables differently as well. I think, I am not able to implement batch_normalization myself. Is there someone who can provide an implementation for this requirement?
If you only need your model to work in .NET for inference, it should be sufficient to convert it to some widely used format such as ONNX, which you can later use with ML.NET: https://learn.microsoft.com/en-us/dotnet/machine-learning/tutorials/object-detection-onnx
You may also try to use the model directly via ML.NET, but your mileage may vary: https://learn.microsoft.com/en-us/dotnet/machine-learning/tutorials/image-classification
If neither of these approaches work for you, or you need to fine-tune/train the model from C# too, you can try a commercial solution: Gradient, which exposes the entire TensorFlow 1.15 APIs to .NET. Though in that case you would have to use the supported BatchNormalization class from either tensorflow.keras.layers or tensorflow.layers (the later one is deprecated) instead of the one from contrib. The later is a wrapper anyway.
Disclaimer: I am the author of Gradient.
I'm currently a researcher for AI company.
I require a serialization solution that is to store very similar structure wise , but vastly different types, interface/base class, and internal generic lists and arrays.
I'm working on CS(C#) due to unique requirements of my work, porting to Java for example isn't an option.
Suffice to say XML doesn't quite cut it - some Nuget packaged upgraded versions of the Microsoft default appear to be a a bit too static or their patterns seem 'clumsy'.
My next line of research Led to JSON (.net).
However I'm unsure if this is the best option - especially considering the complexity of the classes to saved - and the potential for a REST architecture distribution implementation soon.
Thanks for you time, and suggestions. Links to examples of your recommendations containing similarly complex class structures would be appreciated.
You should check out serialization and deserialization of dynamic objects. Your JSON could be as complex as possible. This should give you some idea.
https://thewayofcode.wordpress.com/2012/09/18/c-dynamic-object-and-json-serialization-with-json-net/
I am in a bit of a crisis here. I would really appreciate your help on the matter.
My Final Year Project is a "Location Based Product Recommendation Service". Now, due to some communication gap, we got stuck with an extremely difficult algorithm. Here is how it went:
We had done some research about recommendation systems prior to the project defense. We knew there were two approaches, "Collaborative Filtering" and "Content Based Recommendation". We had planned on using whichever technique gave us the best results. So, in essence, we were more focused on the end product than the actual process. The HOD asked us what algorithms OUR product would use? But, my group members thought that he meant what are the algorithms that are used for "Content Based Recommendations". They answered with "Rule Mining, Classification and Clustering". He was astonished that we planned on using all these algorithms for our project. He told us that he would accept our project proposal if we use his algorithm in our project. He gave us his research paper, without any other resources such as data, simulations, samples, etc. The algorithm is named "Context Based Positive and Negative Spatio-Temporal Association Rule Mining" In the paper, this algorithm was used to recommend sites for hydrocarbon taps and mining with extremely accurate results. Now here are a few issues I face:
I am not sure how or IF this algorithm fits in our project scenario
I cannot find spatio-temporal data, MarketBaskets, documentation or indeed any helpful resource
I tried asking the HOD for the data he used for the paper, as a reference. He was unable to provide the data to me
I tried coding the algorithm myself, in an incremental fashion, but found I was completely out of my depth. I divided the algo in 3 phases. Positive Spatio-Temporal Association Rule Mining, Negative Spatio-Temporal Association Rule Mining and Context Based Adjustments. Alas! The code I write is not mature enough. I couldn't even generate frequent itemsets properly. I understand the theory quite well, but I am not able to translate it into efficient code.
When the algorithm has been coded, I need to develop a web service. We also need a client website to access the web service. But with the code not even 10% done, I really am panicking. The project submission is in a fortnight.
Our supervisor is an expert in Artificial Intelligence, but he cannot guide us in the algorithm development. He dictates the importance of reuse and utilizing open-source resources. But, I am unable to find anything of actual use.
My group members are waiting on me to deliver the algorithm, so they can deploy it as a web service. There are other adjustments than need to be done, but with the algorithm not available, there is nothing we can do.
I have found a data set of Market Baskets. It's a simple excel file, with about 9000 transactions. There is not spatial or temporal data in it and I fear adding artificial data would compromise the integrity of the data.
I would appreciate if somebody could guide me. I guess the best approach would be to use an open-source API to partially implement the algorithm and then build the service and client application. We need to demonstrate something on 17th of June. I am really looking forward to your help, guidance and constructive criticism. Some solutions that I have considered are:
Use "User Clustering" as a "Collaborate Filtering" technique. Then
recommend the products from similar users via an alternative "Rule
Mining" algorithm. I need all these algorithms to be openly available
either as source code or an API, if I have any chance of making this
project on time.
Drop the algorithm altogether and make a project that actually works
as we intended, using available resources. I am 60% certain that we
would fail or marked extremely low.
Pay a software house to develop the algorithm for us and then
over-fit it into our project. I am not inclined to do this because it
would be unethical to do this.
As you can clearly see, my situation is quite dire. I really do need extensive help and guidance if I am to complete this project properly, in time. The project needs to be completely deployed and operational. I really am in a loop here
"Collaborative Filtering", "Content Based Recommendation", "Rule Mining, Classification and Clustering"
None of these are algorithms. They are tasks or subtasks, for each of which several algorithms exist.
I think you had a bad start already by not really knowing well enough what you proposed... but granted, the advice from your advisor was also not at all helpful.
I tried to google but didn't find a decent tutorial with snippet code.
Does anyone used typed DataSets\DataTable in c# ?
Is it from .net 3.5 and above?
To answer the second parts of the question (not the "how to..." from the title, but the "does anyone..." and "is it...") - the answer would be a yes, but a yes with a pained expression on my face. For new code, I would strongly recommend looking at a class-based model; pick your poison between the many ORMs, micro-ORMs, and raw ADO.NET. DataTable itself does still have a use, in particular for processing and storing unpredictable data (where you have no idea what the schema is in advance). By the time you are talking about typed data-sets, I would suggest you obviously know enough about the type that this no longer applies, and an object-model is a very valid alternative.
It is still a supported part of the framework, and it is still in use as a technology. It has some nice features like the diff-set. However, most (if not all) of that is also available against an object-based design, with classes and properties (without the added overhead of the DataTable abstraction).
MSDN has guidance. It really hasn't changed since typed datasets were first introduced.
http://msdn.microsoft.com/en-us/library/esbykkzb(v=VS.100).aspx
There are tons of videos available here: http://www.learnvisualstudio.net/series/aspdotnet_2_0_data_access_and_databinding/
And I found one more tutorial here: http://www.15seconds.com/issue/031223.htm
Sparingly.... Unless you need to know to maintain legacy software, learn an ORM or two, particularly in conjunction with LINQ.
Some of my colleagues have them, the software I work on doesn't use them at all, on account of some big mouth developer getting his way again...
I'm implementing my own BigNumber class in C# for educational purposes. For a start, I intend it to code the basic arithmetic, relational operators and certain math methods. The values will be stored in a byte array.
Could you guys give me some tips on how I would design such a class or rather the proper way of designing such a class ?
Edit:
I'm not asking for help on how to implement the specific operators and methods. I'd like to know how the class should be structured internally.
I did that once in C++. I recommend that you read The Art of Computer Programming. Volume 2 has all the details of the algorithms for implementing big numbers. It's a great resource (for this and many other problems.)
The book should be available from most public libraries around you (or any university library).
BTW. No need to read the whole book, if you just want you can just use it as a reference for the algorithms that you need.
UPDATE: As for the API you should try to mimic the existing APIs for number in .NET. Something like Int32.
As for the internal class design, it should be pretty straightforward because there should be very few units interacting. You could abstract the "storage" (byte array) part away and iterate over the "digits" using standard iterators over some generic storage provider. This would allow you to change to use int arrays for example. If you do this then you can automatically change the base of your numbers and enable your implementation to store "more" per digit. This implies that the base of the operations won't be static but would be determined by the "digit" size.
I had fun implementing mine, it's a simple but nice project. In my case I didn't go fancy with the internal design. Good luck!
Wikipedia has a pretty good reference page on Arbitrary-precision arithmetic. It provides a general overview of many of the issues you'll face, as well as links to various implementations.
Apart from what has already been said, you can find the BigNumber project on codeplex.com, so you can look at the source code for several implementations. Check http://bignumber.codeplex.com/.