Populating a Dictionary with PdfOutlines

Populating a Dictionary with PdfOutlines - c#

So I am attempting to write a program that combines a PDF from a list of PDF's that are already available. I've got most of it done to this point, but I'm having an issue with one step in particular. I'm also writing this in c# if that isn't apparent.
I have an array of strings that contains a category name, and for each category name I want to create a variable of type PdfOutline that gets initialized to null that I can iterate through later in the program.
I've tried to look into this myself and it seems like dictionary is the way to do it, but I'm not really sure how to go about it. Firstly, is making a dictionary the right way to do it and secondly I need help with implementation.
Thanks for your time!

Something like the following would work (not syntax-checked this, sorry):
// This requires the array of Categories to have no duplicates.
public Dictionary<string, PdfOutline> BuildUpMyCollectionOfOutlines(string[] categories)
{
return categories.ToDictionary(cat => new KeyValuePair<string, PdfOutline>(cat, null));
}
If you do it this way, then you can later consume the result as such (although having a function to do this is silly, is just my way to show you how to consume it):
public PdfOutline GetOutlineByCategory(Dictionary<string, PdfOutline> outlines, string category)
{
// This will be problematic if the category isn't actually in the dictionary.
return outlines[category];
}
Whether you should use a Dictionary<string, PdfOutline> versus something else, like a List<KeyValuePair<string, PdfOutline>> depends on 1) how many of these you'll have and 2) how you'll be accessing them. For example, if you have 10,000 of them and you need to randomly and repeatedly find them by Category name, then the Dictionary is the right approach because it hashes things for quicker searching (think an index in a table in a database). However, if you have 10,000 but only need to find 2 of them, or vice versa, only have 10 of them, then the overheads in building up that quick-searching capability is wasted. So Dictionary vs other is best answered by the question, "If this was in a database table, would you index it?"

Related

Is it OK to use dictionaries when I don't need to quickly access their values?

Normally, I use a dictionary like a list, but with a key of a different type. I like the ability to quickly access individual items in the dictionary without having to loop through it until I find the item with the right property (because the property I'm looking for is in the Key).
But there is another possible use of a dictionary. I could just use the Key to store property A and the Value to store property B without ever using the dictionary's special functionality. For example, I could store a list of persons just by storing the forename in the key and the family name in the value (let's assume, for the sake of simplicity, that there won't ever be two people with the same forename, because I just couldn't come up with an better example). I would only use that dictionary to loop through it in a foreach loop and add items to it (no removing, sorting or accessing individual items). There would actually be no difference to using a List<KeyValuePair<string, string>> from using a Dictionary<string, string> (at least not in the example that I gave - I know that I could e. g. store multiple items wiht the same key in the list).
So, to sum it up, what should I do when I don't need to use the special functionalities a dictionary provides and just use it to store something that has exactly two properties:
use a Dictionary<,>
use a List<KeyValuePair<,>
use a List<MyType> with MyType being a custom class that contains the two properties and a constructor.

Don't use dictionaries for that.
If you don't want to create a class for this purpose, use something like List<Tuple<T1,T2>>. But keep in mind a custom class will be both more readable and more flexible.
Here's the reason: it will be much more easy to read your code if you use proper data structures. Using a dictionary will only confuse the reader, and you'll have problems the day a duplicate key shows up.
If someone reads your code and sees a Dictionary being used, he will assume you really mean to use a map-like structure. Your code should be clear and your intent should be obvious when reading it.

If you're concerned with performance you should probably store the data in a List. A Dictionary has lots of internal overhead. Both memory as well as CPU.
If you're just concerned with readability, chose the data structure that best captures your intent. If you are storing key-value pairs (for example, custom fields in a bug tracker issue) then use a Dictionary. If you are just storing items without them having some kind of logical key, use a List.
It takes little work to create a custom class to use as an item in a List. Using a Dictionary just because it gives you a Key property for each item is a misuse of that data structure. It is easy to create a custom class that also has a Key property.

Use List<MyType> where MyType includes all the values.
The problem with the dictionary approach is that it's not flexible. If you later decide to add middle names, you'll need to redesign your whole data structure, rather than just adding another field to MyType.

Multiple options for keys in a dictionary?

Half the time I need to find a value based on string, their name, and the other half I need to find a value based on an int, their user ID.
Currently I have two dictionaries to solve this dilemma - one that uses a string as a key and one that uses an int as a key. I was wondering if there is a more efficient way to do this - a way to get a value based on int or string.
public static Dictionary<int, Player> nPlayers = new Dictionary<int, Player>();
public static Dictionary<string, Player> sPlayers = new Dictionary<string, Player>();
After scanning the other questions, someone mentioned using a dictionary of dictionaries. If anyone can elaborate on this (if it's the solution I'm looking for), that'd be grand
I don't know much about a tuple, but from what I understand it requires two keys, and what I am looking for takes one or the other.
Would Dictionary<object, Player> do the trick? I have no idea.
Please help me in my narrow-minded coding experience. ;_;

As per your comment when user logged in to system you adding them to a dictionary, here you have to add to both dictionaries.
I think you can do this in another way,
public static List<Player> nPlayers = new List<Player>();
That's only you need, add players when they logged in.
If you want to search by ID, Name or whatever you can query nPlayers and find the Player.
var playerByID = nPlayers.Where(p= p.ID==givenID).FirstOrDefault();
var playerByName = nPlayers.Where(p= p.Name==givenName).FirstOrDefault();

I don't think having a Dictionary<object,Player> is a better idea than having two distinct dictionaries. It will probably take the same amount of memory (since each Player reference will still be stored twice in the unified dictionary), will probably be less clear, and might (conceivably) cause problems with hashcode collisions since your key can be several different types.
I would just keep two dictionary, PlayersByName and PlayersByID, and use them when appropriate.

Do you want to know in the future the original data type that was put in the Dictionary? If not, you have two options:
Stringly typed! - Just use a string as the key and when adding to it, call .ToString() on the integers :)
Be objective - Use an object as the key, that way you can put anything you like inside it.
Based on the 2, I'd recommend the first as you still have some kind of type restrictions in there.
If you do want to know the original data type in the future - your implementation is fine :)

Your solution (2 dictionaries) is correct one. Dictionary can only be indexed by one stable key. As result you have to keep separate dictionaries to index by different keys.

List of Dictionaries (Search optimization, C#)

I have a List of Dictionaries, List<Dictionary<String,Object>>. The key is an identifier of some abstract record. These Dictionaries come from various places. The size of each Dictionary is in the range [0, 1000].
All Dictionaries contain unique keys. After accumulating some Dictionaries I must make a search by key. It could be done by iterating the List and calling search method on every Dictionary or it could be done by copying all Dictionaries into one. These approaches do not offer very good performance. I am interested in ways to optimize this task.
Edit:
Thank you guys! Maybe I'll change the accumulation method and as result eliminate the problem itself!

Are you expecting there to be lots of key fetches after an initial population phase? If so, amalgamate everything into a single dictionary. If you'll only be doing a few fetches, I can't see any way you could get better than asking every dictionary.
Of course you could create a hybrid approach: create a new (initially empty) dictionary for the amalgamated results, and populate it as you're asked for keys - by searching through all the rest each time you're asked for a key which isn't already in your "big" dictionary.
Is there no way of predicting which dictionary would have a particular key?

If there is any way to localize a dictionary of interest by specifying a key, you can try, naturaly, to create a cross association table where you can try to match the key to dictionary.
If not, imho, don't see any other option that just iterate over collection and ask for the key , may be using standart for and not nicer linq coding.

Adding to what Jon said, there is an API called as PowerCollections which contains MultiDictionary. If my memory is not corrupted, I believe, you can use this for the purpose mentioned.
http://powercollections.codeplex.com/discussions/242163

It sounds like you have lots of dictionaries to "speed up" (assumption of motive) searches that are limited to certain "abstract record" types.
You can get away with one single dictionary, but on limited searches check the result is required abstract record type after finding it. Rather than maintaining a single dictionary for each and every abstract record type as at present.

c# Lookup for a value in Dictionary of Dictionary

I am in process of implementing design where information is organized in following way:
Multiple sources will feed us with multiple case information.
Each source is identified by SourceID string and each case is idenfied by CaseID string.
The information packet will be encapsulated as InfoObject.
I thought of coding it as:
// MasterDB :(0..m)SourceID -- 1..n -> [CaseID, InfoObject]
//
private Dictionary<string, Dictionary<string, InfoObject>> MasterDB;
Class InfoObject
{
string user;
}
This way addition and removal becomes very easy as they use SourceID and CaseID as key.
However lookup is little special. I want to lookup for a particular user (which is embedded inside the InfoObject).
How should I reorganize such that things become little efficient both for lookup and addition/removal?
UPDATE:
I tried a different approach using LINQ
var targetList = from entry in MasterDB
from entrytarget in entry.Value
where (entrytarget.Value.user == username)
select entrytarget.Value;
Minor issue is that the returned list is a IEnumerable list. Not sure if I can make LINQ output some other way.

you could just have a separate lookup for users:
Dictionary<string, InfoObject> userLookup;
This only of course if you want to optimize for lookup speed, the downside is that addition and removal you have to do now on two separate data structures which have to keep in sync.

I would propose to maintain additionally another dictionary, mapping the user into the appropriate collection of all relevant SourceID/CaseID pairs.

I would use a Dictionary<string, IList<InfoObject> since nothing guarantees a user can't have more than one SourceID/CaseID pair.
Whenever you insert something in your other dictionary, you also insert the same InfoObjects in that dictionary of IList. This way the InfoObject retrieved is the same object.

Common problem for me in C#, is my solution good, stupid, reasonable? (Advanced Beginner)

Ok, understand that I come from Cold Fusion so I tend to think of things in a CF sort of way, and C# and CF are as different as can be in general approach.
So the problem is: I want to pull a "table" (thats how I think of it) of data from a SQL database via LINQ and then I want to do some computations on it in memory. This "table" contains 6 or 7 values of a couple different types.
Right now, my solution is that I do the LINQ query using a Generic List of a custom Type. So my example is the RelevanceTable. I pull some data out that I want to do some evaluation of the data, which first start with .Contains. It appears that .Contains wants to act on the whole list or nothing. So I can use it if I have List<string>, but if I have List<ReferenceTableEntry> where ReferenceTableEntry is my custom type, I would need to override the IEquatable and tell the compiler what exactly "Equals" means.
While this doesn't seem unreasonable, it does seem like a long way to go for a simple problem so I have this sneaking suspicion that my approach is flawed from the get go.
If I want to use LINQ and .Contains, is overriding the Interface the only way? It seems like if there way just a way to say which field to operate on. Is there another collection type besides LIST that maybe has this ability. I have started using List a lot for this and while I have looked and looked, a see some other but not necessarily superior approaches.
I'm not looking for some fine point of performance or compactness or readability, just wondering if I am using a Phillips head screwdriver in a Hex screw. If my approach is a "decent" one, but not the best of course I'd like to know a better, but just knowing that its in the ballpark would give me little "Yeah! I'm not stupid!" and I would finish at least what I am doing completely before switch to another method.
Hope I explained that well enough. Thanks for you help.

What exactly is it you want to do with the table? It isn't clear. However, the standard LINQ (-to-Objects) methods will be available on any typed collection (including List<T>), allowing any range of Where, First, Any, All, etc.
So: what is you are trying to do? If you had the table, what value(s) do you want?
As a guess (based on the Contains stuff) - do you just want:
bool x= table.Any(x=>x.Foo == foo); // or someObj.Foo
?

There are overloads for some of the methods in the List class that takes a delegate (optionally in the form of a lambda expression), that you can use to specify what field to look for.
For example, to look for the item where the Id property is 42:
ReferenceTableEntry found = theList.Find(r => r.Id == 42);
The found variable will have a reference to the first item that matches, or null if no item matched.
There are also some LINQ extensions that takes a delegate or an expression. This will do the same as the Find method:
ReferenceTableEntry found = theList.FirstOrDefault(r => r.Id == 42);

Ok, so if I'm reading this correctly you want to use the contains method. When using this with collections of objects (such as ReferenceTableEntry) you need to be careful because what you're saying is you're checking to see if the collection contains an object that IS the same as the object you're comparing against.
If you use the .Find() or .FindAll() method you can specify the criteria that you want to match on using an anonymous method.
So for example if you want to find all ReferenceTableEntry records in your list that have an Id greater than 1 you could do something like this
List<ReferenceTableEntry> listToSearch = //populate list here
var matches = listToSearch.FindAll(x => x.Id > 1);
matches will be a list of ReferenceTableEntry records that have an ID greater than 1.
having said all that, it's not completely clear that this is what you're trying to do.

Here is the LINQ query involved that creates the object I am talking about, and the problem line is:
.Where (searchWord => queryTerms.Contains(searchWord.Word))
List<queryTerm> queryTerms = MakeQueryTermList();
public static List<RelevanceTableEntry> CreateRelevanceTable(List<queryTerm> queryTerms)
{
SearchDataContext myContext = new SearchDataContext();
var productRelevance = (from pwords in myContext.SearchWordOccuranceProducts
where (myContext.SearchUniqueWords
.Where (searchWord => queryTerms.Contains(searchWord.Word))
.Select (searchWord => searchWord.Id)).Contains(pwords.WordId)
orderby pwords.WordId
select new {pwords.WordId, pwords.Weight, pwords.Position, pwords.ProductId});
}
This query returns a list of WordId's that match the submitted search string (when it was List and it was just the word, that works fine, because as an answerer mentioned before, they were the same type of objects). My custom type here is queryTerms, a List that contains WordId, ProductId, Position, and Weight. From there I go about calculating the relevance by doing various operations on the created object. Sum "Weight" by product, use position matches to bump up Weights, etc. My point for keeping this separate was that the rules for doing those operations will change, but the basic factors involved will not. I would have even rather it be MORE separate (I'm still learning, I don't want to get fancy) but the rules for local and interpreted LINQ queries seems to trip me up when I do.
Since CF has supported queries of queries forever, that's how I tend to lean. Pull the data you need from the db, then do your operations (which includes queries with Aggregate functions) on the in-memory table.
I hope that makes it more clear.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.