I'm trying to do a semi-automated graph layout program. After reading the input, the program should generate a graph (with a specified layout). The tricky part is that the user should be able to choose a subset of the nodes and rearrange those with another algorithm (while saving the rest in their original position). I've went through graphviz and while this option is possible it is very limited (using "pin" only with the 'neato' and 'fdp' algorithms... i would prefer my graph to be oriented)
another requirement would be the price.. I've seen that yworks can do what i need but its paid for...
at this point I'll take any language I can get thanx
currently I'm trying out graphsharp but the documentation is very poor
For python see answers to this question
For java see ansers to this question
If you're happy with semi-automated graph layout, then what you might do is the following (which I did in the past):
download yEd (also from yworks)
save a graph and look at the file: its XML, and the format is not too complex
write some code that save your graph into an XML file compatible with yEd
open your file form yEd, and use built-in layout algorithms
save again your file from yEd
I hope this helps.
Related
Background
I am working on a WPF windows application and I want add embedded PDF viewer with only basic functionalities including PDF view, text search and page navigation.
I tried embedded Internet Explorer and Adobe PDF Reader installed method (this way ) but this method is not suitable for our requirement as Adobe PDF Reader has too may external links which can not be allowed because of the security reasons of the application.
Therefore, I am trying to use moonpdf library. This library works fine with our requirements but the only problem is there is no text search functionality in this library. (I think it shows PDF as images)
Then, I have download moonpdf source code and realized that moonpdf is using libmupdf.dll wrapping to c#.
I can modify the moonpdf source code and mupdf source code for our requirement if needed.
My Question
Is there any text search functionalities in mupdf? if so how can I use it?
In the basic mupdf library, there are several functions for searching for text. These work by searching a page for a text string, in a few different variants, and returns the area for all hits of the given text. You need to iterate over the pages yourself (in order to do forward or reverse search).
fz_quad hits[1000];
count = fz_search_page(ctx, page, needle, hits, nelem(hits));
That said, I do not know how or even if "moonpdf" has wrapped these functions.
You can certainly extract the text from a document, the MuPDF library will do that. I believe it's up to you to apply your own search criteria after that. I'm afraid I'm not expert enough to answer the 'how to' part of it though. I imagine one of the mutool examples would be helpful here though. I'll see if I can get one of the developers to answer.
I'm looking to add ebook support (.pdf .txt .epub .mobi .rtf Support) to a game I'm making in Unity using C#. Thing is I really do not know where to start when it comes to this and most of my google searches have gotten me nothing but Ebooks about programming or game development. So I'm hoping someone here would have a good idea where I could start and/or information that would help set me in the right direction.
So just to summarize my comments on the OP:
With the least amount of work, embedding a web browser control like Webkit is probably the best option. It should properly read all the common filetypes you mentioned, save for .epub and .mobi. A separate library or control will need to be obtained or coded for those to work. Additionally, if the user already has a default program set up to open those filetypes, you can open them outside of the game with Process.Start(...) which is part of the System.Diagnostics namespace.
If it comes down to you having to code this yourself, a PDF is just drawn graphics on a canvas, txt is just raw text data, and an rtf is text data with some markup to get the formatting right. Coding a component that opens those for you should not be abnormally difficult.
I want to develop a asp.net web application which should do the following task
a) user should be able to add content to the document. Content to be added can include text as well as image, screen shots etc.
b) user should be able to search based on some keywords. when searching with the keyword appropriate content along with images(if any) should be shown to user.
I am not sure what should be the proper approach for this. One way i think is to store text content in some xml file and later search for keywords by going though each node of xml and displaying. but i am not sure how to attach image content with xml. Also this method doesn't seem to be nice and efficient if with time document size increases a lot.
Anyone please suggest some proper way to do above requirement. Any hint would be appreciated.
Split it to two tasks. Editation and search.
Full text search is solved problem. Simply use Sphinx Search and you are done. Sphinx is simple to use and can do everything you will need. It has MySQL interface (your app connects to sphinx the same way as to second MySQL database).
Editation is a bit more complicated. If I understand correctly, you want multiple users to edit single document concurrently.
I recommend using websockets to notify other clients about changes in document. Long-polling and Server Sent Events have ugly side effects, like stopping browser from making another requests to server. To implement client side in Javascript, I would use React, Angular or similar framework to make updates as easy as possible.
Server side requires modification-friendly representation of a document, so if one user changes one part, and another user another part, your app should be able to merge changes. Changing completely different parts is easy, but it may be tricky to change the same paragraph or document node. Exact representation of each change depends on format of your document.
I do not see much benefits of using XML rather than any other format. It may be practical for document representation, but it will not help with merging of colliding modifications. I would start with plain array of strings, each representing a single paragraph. Extending it to full XML document is the easy part, once two users can edit the same paragraph.
To store images in XML, simply store files using their hash as a file name and then use such name to link the file in XML. Git does the same thing and it works nicely. You may want to count references to identify unused files.
I have a PDF and want to extract the text contained in it. I've tried a few different PDF libraries and they all return basically the same results. When extracting the text from a two page document with literally hundreds of words, only a dozen or so words from the header are returned.
Is there any way to tell if the text I'm after is actually text or a raster image of the text? I'm thinking something along the lines of Firebug's "Inspect Element" but at this point I'll take any solution that tells what I'm really looking at.
This project really doesn't justify attempting to use OCR. And, although a simple solution, using fields in the PDF is not an option since the generator of the file is a third party.
If Acrobat/Reader can select the text, then it Is Text.
Reasons your library might not be able to find the text in question:
Complex/bad fonts or encodings. Adobe can be very forgiving of garbage in, somehow managing to get Good Info out.
The text could be in an annotation rather than the page contents. It won't matter what program parses the content stream if you need to look in the annot array instead.
You didn't name a particular library, so it's possible that the library you're using doesn't look inside XObject Forms. That's unlikely in an even remotely mature API, but stranger things have happened.
If you can get away with copy/pasta from Reader, then just go that route.
Have you tried Amyuni PDF Creator .Net? It allows you to enumerate all components from a specified rectangular region of a page and inspect their type from a predefined types list. You could run a quick test using the trial version and the following code sample for text extraction:
// open a PDF file
axPDFCreactiveX1.Open(System.IO.Directory.GetCurrentDirectory()+"\\sampleBookmarks.pdf", "");
axPDFCreactiveX1.Refresh ();
String text = axPDFCreactiveX1.GetRawPageText (1);
MessageBox.Show (text);
Additionally, it provides Tesseract OCR integration in case you needed it.
Disclaimer: I am part of the development team of this product.
Check this site out. It may contain some helpful code snippets. http://www.codeproject.com/KB/cs/PDFToText.aspx
I am currently working on a project and my goal is to locate text in an image. OCR'ing the text is not my intention as of yet. I want to basically obtain the bounds of text within an image. I am using the AForge.Net imaging component for manipulation. Any assistance in some sense or another?
Update 2/5/09:
I've since went along another route in my project. However I did attempt to obtain text using MODI (Microsoft Office Document Imaging). It allows you to OCR an image and pull text from it with some ease.
This is an active area of research. There are literally oodles of academic papers on the subject. It's going to be difficult to give you assistance especially w/o more deatails. Are you looking for specific types of text? Fonts? English-only? Are you familiar with the academic literature?
"Text detection" is a standard problem in any OCR (optical character recognition) system and consequently there are lots of bits of code on the interwebs that deal with it.
I could start listing piles of links from google but I suggest you just do a search for "text detection" and start reading :). There is ample example code available as well.
recognizing text inside an image is indeed a hot topic for researchers in that field, but only begun to grow out of control when captcha's became the "norm" in terms of defense against spam bots. Why use captcha's as protection? well because it is/was very hard to locate (and read) text inside an image!
The reason why I mention captcha's is because the most advancement* is made within that tiny area, and I think that your solution could be best found there.
especially because captcha's are indeed about locating text (or something that resembles text) inside a cluttered image and afterwards trying to read the letters correctly.
so if you can find yourself a good open source captcha breaking tool you probably have all you need to continue your quest...
You could probably even throw away the most dificult code that handles the character recognition itself, because those OCR's are used to read distorted text, something you don't have to do.
*: advancement in terms of visible, usable, and practical information for a "non-researcher"
If you're ok with using an online API for this, the API at http://www.wisetrend.com/wisetrend_ocr_cloud.shtml can do text detection in addition to just OCR.
Stroke width transform can do that for you. That's at least what MS developed for their mobile phone OS. A discussion on the implementation is here at https://stackoverflow.com/