Get all item's in a category? - c#

I only get 100 results. Code excerpt:
FindingServicePortTypeClient client = FindingServiceClientFactory.getServiceClient(config);
FindItemsByCategoryRequest req = new FindItemsByCategoryRequest();
req.categoryId = new string[] { "1249" };
req.sortOrder = SortOrderType.BestMatch;
req.sortOrderSpecified = true;
req.itemFilter = new ItemFilter[]
{
new ItemFilter()
{
name = ItemFilterType.EndTimeFrom,
value = new string[]
{
DateTime.Now.Add(TimeSpan.FromHours(1)).ToString("yyyy-MM-ddTHH:mm:ss")
}
}
};
PaginationInput pi = new PaginationInput();
pi.entriesPerPage = int.MaxValue;
pi.entriesPerPageSpecified = true;
req.paginationInput = pi;
FindItemsByCategoryResponse response = client.findItemsByCategory(req);
As you can see I tried using int.MaxValue, but to no avail. Is it not possible to get all items from a category?

Well first-off eBay will limit your pagination input entries per page to 100, and your total items returned to 10,000 on any particular search (see http://developer.ebay.com/Devzone/finding/CallRef/findItemsByCategory.html#Request.paginationInput). So this won't work whether it's logical or not. Think of the immense server load they would have to deal with if you could return a result of 100,000+ items in one call.
Now, you might think there's still a clever way to get past the block of limits and just stay under the quantified limits. But according to http://developer.ebay.com/Devzone/finding/CallRef/findItemsByCategory.html#Request.paginationInput.pageNumber (2 entries below the first link) you can't even access results past the 100th page. So at 100 results per page and 100 pages you really only can get any of the first 10,000 (point being that you can not start at page 101, because it's simply disallowed). Again, this is probably because of the resources it would take them to access past that point. This must be somewhat disappointing...
Sorry about that :/, but it's the full story.

Related

List populating with repeating data when it shouldn't

I am working on a programming project for class, and I wanted to add something extra to the project by randomly generating data for it. My issue is that I have a list populating with copies of the same data even though it seems to be generating completely different things each time a new object is created. When I attempt to debug, I encounter very strange behavior. This is my code:
private void PopulateAtRandom(int amount)
{
// create a list of first names from a text file of 200 names
List<string> firstnames = new List<string>();
StreamReader reader = new StreamReader("Random First Names.txt");
while (!reader.EndOfStream)
firstnames.Add(reader.ReadLine());
reader.Close();
// create a list of last names from a text file of 500 names
List<string> lastnames = new List<string>();
reader = new StreamReader("Random Last Names.txt");
while (!reader.EndOfStream)
lastnames.Add(reader.ReadLine());
reader.Close();
// create a list of majors from a text file of 198 majors
List<string> majors = new List<string>();
reader = new StreamReader("Majors.txt");
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
majors.Add(line.Substring(0, line.IndexOf(" - ")));
}
reader.Close();
// create a list of high schools from a text file of 860 schools
List<string> highschools = new List<string>();
reader = new StreamReader("All Illinois High Schools.txt");
while (!reader.EndOfStream)
highschools.Add(reader.ReadLine().Split(',')[0]);
reader.Close();
// create a list of colleges from a text file of 9436 schools
List<string> colleges = new List<string>();
reader = new StreamReader("All US Colleges.txt");
while (!reader.EndOfStream)
colleges.Add(reader.ReadLine());
reader.Close();
students = new List<Student>();
for (int i = 0; i < amount; i++)
{
bool graduate = random.NextDouble() >= 0.5;
string fName = firstnames[random.Next(firstnames.Count)];
string lName = lastnames[random.Next(lastnames.Count)];
string major = majors[random.Next(majors.Count)];
int gradYear = RandomGauss(1950, 2017, 2013, (graduate ? 10 : 4));
string prevSchool = graduate ? colleges[random.Next(colleges.Count)]
: highschools[random.Next(highschools.Count)];
string otherInfo = graduate ? RandomWithDefault<string>(major, 0.05, majors)
: "" + RandomGauss(0, 60, 0, 15) + " transfer credits";
Student student = new Student(graduate, fName, lName, major, gradYear, prevSchool, otherInfo);
students.Add(student); /* I put a breakpoint here for debugging */
}
}
/**
* <summary>
* Return a random integer in the given range based on the specified gaussian distribution
* </summary>
*/
private int RandomGauss(int min, int max, double mean, double sigma){...}
/**
* <summary>
* Randomly return either the default value or a different value based on the given odds
* </summary>
*/
private T RandomWithDefault<T>(T defaultValue, double oddsOfOther, List<T> otherOptions){...}
private void buttonSubmit_Click(object sender, EventArgs e)
{
for (int i = 0; i < students.Count; i++)
{
Student student = students[i];
listBox.Items.Add(student); /* I put another breakpoint here for debugging */
}
}
I have been using PopulateAtRandom(1000); in my constructor. When buttonSubmit_Click() is called, listBox will display one of two things. The first entry is always unique, then either a) the next 500-ish entries are one student and the rest are a second student, or b) the rest of the entries are alternating between two different students. However, when I go to debug, I can see that every new entry into students is unique, as it should be. Then, when I check how listBox.Items is being populated, I find that same pattern of the first few being unique and the rest displaying only two different students. The actual act of debugging seems to affect this as well. For example, I will stop at the first breakpoint 20 times then let the program finish on its own until I reach the second breakpoint. As I stop on the second breakpoint, I find that each of those 20 students, plus another, display properly, then the following 979 follow that same pattern as earlier. I see this same effect no matter how many times I stop at the first breakpoint.
I have tried searching the internet for similar instances of this behavior, but I am not getting anywhere, which is probably because I am not sure how to word this issue. When I search using the title I provided for this question, I do not get anything remotely related to my problem, so if any of you know of a similar issue, please point me in the right direction. My only thought is that this is an issue with memory allocation. The PopulateAtRandom() method is using up a lot of memory with the lists I create before attempting to populate students, so maybe the program is recycling the same memory address for each new Student, and since students is really just a list of memory addresses, it ends up with repeats of the same addresses. C# does not seem to have a nice way of giving me the memory address of an object, so I haven't been able to confirm that. If this is the case, though, I am still not sure how to circumvent that issue, so any advice would be greatly appreciated. Thank you!
RandomGauss probably leverages the Random class, which creates a seed based on the time when it's instantiated. I'm guessing the RandomGauss method instantiates a new Random instance each time it's invoked. When you aren't debugging, your loop repeats a lot of times before the system's clock ticks to change time, so many of your Random instances end up using the same seed, and hence produce the same result the first time you ask them for a random number.
The solution is to create a single Random instance and storing it to a field on your class.
e.g., instead of this:
/**
* <summary>
* Return a random integer in the given range based on the specified gaussian distribution
* </summary>
*/
private int RandomGauss(int min, int max, double mean, double sigma){
Random random = new Random();
// code that uses random ...
}
you want something more like this:
private Random random = new Random();
/**
* <summary>
* Return a random integer in the given range based on the specified gaussian distribution
* </summary>
*/
private int RandomGauss(int min, int max, double mean, double sigma){
// code that uses random ...
}
PS--there are utility methods that help with reading text from files.
// create a list of first names from a text file of 200 names
List<string> firstnames = File.ReadAllLines("Random First Names.txt").ToList();

Extracting text with greater font weight

I have a number of documents with predicted placement of certain text which I'm trying to extract. For the most part, it works very well, but I'm having difficulties with a certain fraction of documents which have slightly thicker text.
Thin text:
Thick text:
I know it's hard to tell the difference at this resolution, but if you look at MO DAY YEAR TIME (2400) portion, you can tell that the second one is thicker.
The thin text gives me exactly what is expected:
09/28/2015
0820
However, the thick version gives me a triple of every character with white space in between each duplicated character:
1 1 11 1 1/ / /1 1 19 9 9/ / /2 2 20 0 01 1 15 5 5
1 1 17 7 70 0 02 2 2
I'm using the following code to extract text from documents:
public static Document GetDocumentInfo(string fileName)
{
// Using 11 in x 8.5 in dimensions at 72 dpi.
var boudingBoxes = new[]
{
new RectangleJ(446, 727, 85, 14),
new RectangleJ(396, 702, 43, 14),
new RectangleJ(306, 680, 58, 7),
new RectangleJ(378, 680, 58, 7),
new RectangleJ(446, 680, 45, 7),
new RectangleJ(130, 727, 29, 10),
new RectangleJ(130, 702, 29, 10)
};
var data = GetPdfData(fileName, 1, boudingBoxes);
// I would populated the new document with extracted data
// here, but it's not important for the example.
var doc = new Document();
return doc;
}
public static string[] GetPdfData(string fileName, int pageNum, RectangleJ[] boundingBoxes)
{
// Omitted safety checks, as they're not important for the example.
var data = new string[boundingBoxes.Length];
using (var reader = new PdfReader(fileName))
{
if (reader.NumberOfPages < 1)
{
return null;
}
RenderFilter filter;
ITextExtractionStrategy strategy;
for (var i = 0; i < boundingBoxes.Length; ++i)
{
filter = new RegionTextRenderFilter(boundingBoxes[i]);
strategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), filter);
data[i] = PdfTextExtractor.GetTextFromPage(reader, pageNum, strategy);
}
return data;
}
}
Obviously, if nothing else works, I can get rid of duplicate characters after reading them in, as there is a very apparent pattern, but I'd rather find a proper way than a hack. I tried looking around for the past few hours, but couldn't find anyone encountering a similar issue.
EDIT:
I finally came across this SO question:
Text Extraction Duplicate Bold Text
...and in the comments it's indicated that some of the lower quality PDF producers duplicate text to simulate boldness, so that's one of the things that might be happening. However, there is a mention of omitting duplicate text at the location, which I don't know how can be achieved since this portion of my code...
data[i] = PdfTextExtractor.GetTextFromPage(reader, pageNum, strategy);
...reads in the duplicated text completely in any of the specified locations.
EDIT:
I now have come across documents that duplicate contents up to four times to simulate thickness. That's a very strange way of doing things, but I'm sure designers of that method had their reasons.
EDIT:
I produced A solution (see my answer). It processes the data after it's already extracted and removes any repetitions. Ideally this would have been done during the extraction process, but it can get pretty complicated and this seemed like a very clean and easy way of getting the same accomplished.
As #mkl has suggested, one way of tackling this issue is to override LocationExtractionStrategy; however, things get pretty complicated since it would require comparison of locations for each character found at specific boundaries. I tried doing some research in order to accomplish that, but due to poor documentation, it was getting a bit out of hand.
So, instead as I created a post-processing method, loosely based around what #TheMuffinMan has suggested, to clean up any repetitions. I decided not to deal with pixels, but rather with character count anomalies in known static locations. In my case, I know that the second data piece extracted can never be greater than three characters, so it's a good comparison point for me. If you know the document layout, you can use anything on it that you know will always be of fixed length.
After I extract the data with the method listed in my original post, I check to see if the second data piece is greater than three in length. If it returns true, then I divide the given length by three, as that's the most characters it can have and since all repitions come out to even length, I know I'll get an even number of repetition cases:
var data = GetPdfData(fileName, 1, boudingBoxes);
if (data[1].Length > 3)
{
var count = data[1].Length / 3;
for (var i = 0; i < data.Length; ++i)
{
data[i] = RemoveRepetitions(data[i], count);
}
}
As you can see, I then loop over the data and pass each piece into RemoveRepetitions() method:
public static string RemoveRepetitions(string original, int count)
{
if (original.Length % count != 0)
{
return null;
}
var temp = new char[original.Length / count];
for (int i = 0; i < original.Length; i += count)
{
temp[i / count] = original[i];
}
return new string(temp);
}
This method takes the string and the number of expected repetitions, which we calculated earlier. One thing to note is that I don't have to worry about the white spaces that are inserted in the duplicated process, as the example shows in the original post, due to the fact that count will represent the total number of characters where only one should have been.

Sorting records by reddit algorithm using mongodb

I'm trying to implement the reddit algorithm as a sorting option in my app but I'm constantly hitting walls all over the place.
I started my implementation using this (Sorting mongodb by reddit ranking algorithm) post as a guide line.
I tried to convert it to c#; below is my attempt at the conversion.
var map = new BsonJavaScript(
#"function() {
function hot(score, date){
var order = log10(Math.max(Math.abs(score), 1));
var sign = score>0 ? 1 : score<0 ? -1 : 0;
var seconds = epochSeconds(date) - 1134028003;
var product = order + sign * seconds / 45000;
return Math.round(product*10000000)/10000000;
}
function log10(val){
return Math.log(val) / Math.LN10;
}
function epochSeconds(d){
return (d.getTime() - new Date(1970,1,1).getTime())/1000;
}
emit( hot(this.VoteCount, this.CreatedAt), this );
}"
);
var reduce = new BsonJavaScript(
#"function(){}"
);
var finalize = new BsonJavaScript(
#"{ 'out': { 'inline': 1 } }"
);
return db.Posts.MapReduce(new MapReduceArgs { MapFunction = map, ReduceFunction = reduce, FinalizeFunction = finalize }).GetResults();
He's the results I'm getting from the implementation;
He's the actual dataset.
For some reason the function returns 2 objects instead of 4.
Also, what would I need to modify for the function to return the entire post object along with the calculated score?
Would really appreciate it if someone would help me out :)
Thanks in advance,
Jean
Fixed it by making 2 modifications.
These 2 resources where extremely helpful;
http://docs.mongodb.org/manual/reference/command/mapReduce/#mapreduce-map-cmd
http://docs.mongodb.org/manual/reference/method/db.collection.mapReduce/#db.collection.mapReduce
Firstly I changed what parameters I pass to emit. I'm assigning a "score" value to the post object on the fly and run the hot function on it. Then I pass the key parameter for emit as the object key and the value parameter as the new post object with the score value. **emit(key, value)
this.score = hot(this.VoteCount, this.CreatedAt);
emit( this._id, this );
Then I changed how I get the results to;
db.Posts.MapReduce(new MapReduceArgs { MapFunction = map, ReduceFunction = reduce}).InlineResults
Hope this is helpful someone else :)
I'll post benchmarks using this method of calculating the score to calculating the score in C# when I have some free time.
Alt implementation / Update:
I switched to a simpler / faster decay algorithm used by Hacker News as it still meets my requirements. http://amix.dk/blog/post/19574
Score = (P-1) / (T+2)^G
where,
P = points of an item (and -1 is to negate submitters vote)
T = time since submission (in hours)
G = Gravity, defaults to 1.8 in news.arc

Retrieve data from array and display in textbox, 5 strings at a time

I want to display specific arrays from a very large text file.
Below the coding is part of the file.
What I want to do is display specific strings from the text file.
For example the example shows the Footlocker page. On the Footlocker Shop page I want to retrieve the last 5 updates in the text file beginning with "footlocker" posting only Footlocker's most recent posts. I have tried many ways including array.sort I am not sure how you would do this. Thanks for your help.
Footlocker's page
//declaring string
string footlockerPosts =
sr.ReadToEnd();
//initialising string
string[] footlockerArray = footlockerPosts.Split('\n');
string[] sort = footlockerArray;
var target = "F";
var results = Array.FindAll(sort, f => f.Equals(target));
for (int i = footlockerArray.Length - 1; i > footlockerArray.Length - 7; i--)
{
footlockerArray.Reverse();
footlockerExistingBlogTextBox.Text += footlockerArray[i];
}
sr.Close();
return;
}
This is a small snippet of my file.
File
Footlocker,Rick,What a fabulous shop.
Footlocker,Ioela,Fantastic and incredible service.
Footlocker,Fisi,Can't wait to go back to shop!
Footlocker,Allui,Lovin' the new design and layout!
Footlocker,Rich,Can't wait for next season clothing range.
Hypebeast,Johnny,I didn’t get proper service from the shop assistant.
Hypebeast,Dalas,Awesome range of goods, great service.
Hypebeast,King,Cool music great staff.
Hypebeast,Nelson,Overated shop.
Hypebeast,Rick,Lovely place lovely people.
Hypebeast,Rick,What a fabulous shop.
Hypebeast,Ioela,Fantastic and incredible service.
Hypebeast,Fisi,Can't wait to go back to shop!
Hypebeast,Allui,Lovin' the new design and layout!
Hypebeast,Rich,Can't wait for next season clothing range.
Lonestar,Johnny,I didn’t get proper service from the waiter.
Lonestar,Dalas,Awesome range of food, great service.
Lonestar,King,Cool music great staff.
Lonestar,Nelson,Overated restaurant.
Lonestar,Rick,Lovely place lovely people.
Try the following and let me know if it gets you the results you want
var results = sort.Where(r => r.StartsWith(target)).Reverse();
foreach (string result in results)
{
footlockerExistingBlogTextBox.Text += result;
}
Hope that helps!
Also just a note: There is not reason to assign another variable equal to the original array. You can remove this line string[] sort = footlockerArray; and point to the original footlockerArray.

How can I convert this to return multiple items?

I have a list of games, which I need to select randomly, by the day of week,
the following code works perfectly for 1 game.
var gameOfTheDay = games.AllActive[(int)(DateTime.Today.GetHashCode()) % games.AllActive.Count()];
What I need is to return more then 1 game, randomized, based on X ( X in the case above is the day of the week, I will change it to a specific string )
I need this to create a semi-random generation of items.
Semi - since I want to feed it a keyword, and get the same results Per keyword
Random - since I need to make the game list random
For example, every time you enter Page with title "hello", you will see THE SAME games, that were selected specificly for that keyword from the games list based on the keyword "hello".
In the same way the gameOfTheDay Works.
You can use LINQ for this:
int limit = 10;
string keyword = "foo";
Random rng = new Random(keyword.GetHashCode());
var gamesOfTheDay = games.OrderBy(x => rng.Next()).Take(limit);
However, this will have some overhead for the sort. If you have a lot of games compared to the amount you're selecting—enough that the sort might be too expensive, and enough that it's safe to just keep retrying in the event of a collision—manually doing it might be faster:
HashSet<Game> gamesOfTheDay = new HashSet<Game>();
while(gamesOfTheDay.Count < limit && gamesOfTheDay.Count < games.Length)
{
int idx = rng.Next(games.Length);
gamesOfTheDay.Add(games[idx]);
}
Note that in either case the Random is constructed with a seed dependent on the keyword, so the order will be the same every time for that keyword. You could similarly combine the hashes of the current DateTime and the keyword to get a unique random sequence for that day-keyword combination.
Use similar code to what you have now to randomly add games to a list of games (which will initially be empty) - if the game is already in the list, don't add it.
Stop when the list is the right size.
Untested code:
var rand = new Random();
var randomGames = new List<game>();
while(randomGames.Count < limit)
{
var aGame = games.AllActive[rand.Next(limit)];
if (!randomGames.Contains(aGame))
{
randomGames.Add(aGame);
}
}

Categories

Resources