How to read xml file and plot it to WebMatrix (C#) page - c#

I have never so much as successfully loaded an xml document in WebMatrix (or JavaScript for that matter), despite how many times I have tried.
No online example is complete or accurate.
if I had an xml file that contained this (and only this):
<?xml version="1.0" encoding="utf-8" ?>
<someNode>
<someValue>HEY THERE! I'M XML!</someValue>
<someValueTwo>Another Value</someValueTwo>
</someNode>
How could I (start from loading the xml document etc.) plot the values of the two elements on one of my WebMatrix pages (please include where the code you are providing goes, as I have no clue).
Please include all the places I would need the code not just loading or writing to page.
*******UPDATE:******
Oh, well, nvm, xml is useless anyway, I'll just use another database or render the contents of another external cshtml page or something.
I have looked at XPath before and such, but I am not really worried about selecting them by tagname, id, attribute, etc because researching that would have been easy.
Sometimes I wonder why XML is still around...(Yes, I know it (is supposed to) help communicate data across very different languages and platforms) but the truth is, even with online tutorials and walkthroughs, it doesn't work, and it doesn't seem like there's anyone left who does know how to do this really. Why is it still around? Isn't it time that WebMatrix got rid of the xml based methods, since they don't work, and since there isn't anyone left who knows how to so much as try to load an xml file without getting errors?
Just curious if xml will be faded out in future updates.
*******UPDATE:******
Okay, I'm told that xml is supposed to still be useful. Do you just have to not be me to get it to work, lol?

I've never done anything with WebMatrix so I'm not gonna have much to say on that front. I am however confused by your attitude toward XML. If you mean only that XML is not useable in the WebMatrix context I can neither confirm nor deny that; however XML is dead useful in a plethora of other scenarios, and it is unclear from your question if you know that or not. Sorry I can't help on your question, just clarifying that XML is a very powerful tool as far as data file formats are concerned.

Related

C# documentation XML: Is there a DTD?

I'm writing XML documentation for a couple of C# projects. I want to be able to parse the file that comes out into HTML suitable for posting on GitHub Pages. I've got a pretty good handle on the parsing process itself, but I need an idea of all the elements that might show up in documentation.
I thought that a DTD might be the best resource for this, does that exist? Or is this all completely free-form?
I realize there are a few tools that already exist to do this, but I want finer control over the process and honestly it's good practice anyway.
XML validation is a product of some sort of application. The documentation comments in C# are just XML. It is the application that you parse/process the comments with that determines if there are any rules for what elements are allowed. There are multiple applications that can convert XML documentation comments into documentation, and you could even build your own if needed.
As mentioned, MSDN has a list of available elements. These are standardized across most parsers. If you need to add additional elements it is fine if you have a parser that can handle them. You could even add an XSD or DTD if you wanted for that application, if supported.

Using application data structures other than xml

I'm designing a survey tool. The survey will be very static and because of that, I can avoid building some kind of table-driven survey designer to accommodate the 167 questions on the survey (all 1-5 rating questions in a radio box or checkbox layout).
I was thinking of building the survey questions in a large XML file, but my non-technical co-worker that will be making frequent edits to the survey will likely do things that will break the integrity/validity of the raw xml file (think punctuation and special characters).
The XML file might look something like:
<questions>
<question>
<type>checkbox</type>
<text>Which beers do you like most</text>
<choices>Bud,Miller,Piels</choices>
<Required>true</Required>
</question>
<question>
<type>radio</type>
<text>Which beer is your favorite</text>
<choices>Bud,Miller,Piels</choices>
<Required>true</Required>
</question>
</questions>
Please use your imagination that this structure will be a bit more complex and that there will be 165 more questions.
Complicating matters, I need these questions in some form of object-oriented layout so that I can take the results and align them to other stuff. I had considered hard-coding a very lengthy survey form with 167 questions, but I need the data in blocks so that I can parse out question 37 and align it to something else in some other feature, that is related to question 37.
Here's what I'd like to do in a .Net app:
Define a enumerable class for this.
Do something where I can manually fill an enumerable collection of this class with all of the data I need. Using the p-code that would be familiar in my .asp world . . .
questions q = new questions()
q.type = "checkbox";
q.text = "which beers do you enjoy"'
q.choices = "Bud,Miller,Peils";
q.required = true;
q.add
q.type = "radio";
q.text = "what is your favorite beer";
q.choices = "Bud,Miller,Peils";
q.required = true;
q.add
My hope is that this .cs file (though foreign looking to the lay person) would be much easier for my co-worker to maintain, without me having to worry about syntax errors.
So, I guess what I'm looking for some feedback on:
Is this just a dumb idea. Should I do this in XML and I'll just consume the XML file and be done with it.
WWYD - What would you do? Is there an easier way to do this?
I don't care about performance as a relatively small number of users are using this.
I don't care about maintainability, because we will write this feature properly in the summer.
I just need to create a data structure that is not in a DB and that can be maintained by a non-technical person with a text-editor (for now).
If anyone made it this far, I appreciate it.
Everyone uses Excel...so consider using a CSV format which can be read by you as well as Excel which your counterpart will be using. One must specify to the user that the columns can't be changed, which is not a drawback per-se, but the user exports the dynamic changes to CSV which the program reads and can verify.
Plus the user does not have to be trained to use Excel so it is a win/win situation per your requirements not to use XMl.
As permanent store XML is good.
But that does not mean the user needs to edit the XML directly.
I would build the ability to edit, add, and delete the questions in the app.
Yes a bit a trouble but if they hack the XML then that is also a lot of trouble.
How do you plan to save survey results?
How do you plan to collect the survey results?
There is more to this project than you are realizing.
Do you need to combine results from more than one device?
If more than one device then you need to separate the questions from the results so you can update the questions on more than one device.
There are tools to read and write XML to disk.
Reading XML with the XmlReader
I don't agree with doug that you need to embed a database.
For a small number of questions I would use XML.
I would read all the XML into an object collection (A List).
You don't need a class the implements IEnumerable.
You put you objects in a a collections that implements IEnumerable.
I would go WPF over WinForms.
A ListBox with a DataTemplate.
On the DataTemplate you can have a dynamic selector in code behind but that is a real hassel.
Consider a single template that you manipulate in code behind.
So they are not RadioButtons but you uncheck the others in code behind.
For filtering I would go LINQ in public properties but there is also CollectionViewSource.
Used XML for an app that was used to collect field measurements.
A lot like this in measuring devices could change and need to collect the measurements.
If you are set on user editing the questions directly then XML with XSD is the best I can think of.
If you are looking for a simple human readable structured format, then you could be interrested by YAML.
YAML is a human-readable data serialization format that takes concepts
from programming languages such as C, Perl, and Python, and ideas from
XML and the data format of electronic mail.
Your question file would look like this:
questions:
- id: 1
type: checkbox
text: Which beers do you like most
choices: Bud,Miller,Piels
Required: true
- id: 2
type: radio
text: Which beer is your favorite
choices: Bud,Miller,Piels
Required: true
Some YAML libraries exists in .NET (from the article):
https://github.com/aaubry/YamlDotNet
http://yaml.codeplex.com/
http://www.codeproject.com/Articles/28720/YAML-Parser-in-C
http://yaml-net-parser.sourceforge.net/
There are plenty of xml editing tools out there that will actually make it easier to edit than editing a text file directly. I use XML Marker and it's pretty easy to use. http://symbolclick.com/
It will be quicker to train them to edit using the tool than it will be to build one.
Two answers here;
a: Write it to allow a proper admin interface, using a database to allow admin users to add/edit questions, response options and include appropriate security, auditing etc. You mention that this may not be feasible in the short term or that a 'proper' feature will be added soon, in which case, scrap this!
b: People say they have frequent edits/changes to make, but is this not a requirement which is co-related to a complete feature? Could you not in the short term, accept manual requests for change via email or something else documented, and make them yourself? Do you think the time taking to add a question/response or change some wording would be less than needing to parse XML manually to find a syntax error from someone who isn't familiar?
You'll need to weigh up frequency of change with impact to yourself of making a change vs likelihood of user error, vs estimated time needed to identify and resolve a syntax error (plus the possible bad-will of having a change break things).
Despite what some people think, users don't like making mistakes! putting them in a position where they have admin level powers over a system they don't have a full technical grasp of, could reduce confidence and future buy-in to the feature you're due to develop.
TLDR; In my opinion, unless it's a major hassle, do the changes yourself in the short term, perhaps with a maximum amount of time you'll make them (I make one change set a week, on a Friday for example). Keep the system working perfectly, and involve the users without putting them in an uncomfortable position being an non voluntary early adopter for a feature which isn't finished.
I used my complete mastery over winforms to create a little mock GUI application that enables users to quickly create one dimensional non conditional lists of questions with different question types.
Once you decided on an xml scheme you can easily import and export xml files.
Are you interested in further development of the magical survey creator? If so tell me and I will send you a practically finished prototype tomorrow morning. (You should provide me with an xml scheme though, otherwise I will do it in CSV)
I enjoy the exercise.
Picture related. Don't be put off by the colors, that's how I like it during development, to see the pixel exact boundaries of controls.
Unless your coworkers have some experience with programming or xml editing they will hate you if you instruct them to edit any sort of "code".
Our secretaries put their hand in front of their faces and start chanting "no, no, no..." when I tell them how to operate VBA macros.

C#: Autogenerating DDL and ORM classes from XML schema (XSD) file

I have a rather large XSD file available here.
I want to generate the following from the file:
Generate DDL (for PostgreSQL), the DDL should contain initial values where appropriate, as specified by 'permitted' values in the XSD
Generate an ORM that will allow me to perform CRUD operations on the records in the database created in step 1
Can anyone suggest a tool or series of tools/technologies to achieve this?
In case I have to roll my own solution, can someone suggest a good tutorial for XSLT (preferably a cookbook - since I already know some XML/XPath).
Incidentally, I tried xsd.exe on Windows - it failed and printed an error message suggesting that there was a circular reference in the XSD file. I then tried xsd.exe on mono, that worked - but the file created had some invalid statements. I am guessing (perhaps incorrectly) that xsd.exe is NOT the way to achieve these twin goals - if I am wrong, let me know.
Also, I took at Ann Lewkowicz's XSLT transform file to generate a DDL from an XSD file - BUT that appeared to have got stuck in an infinite loop - and also complained about 'infinite recursion'
So I need help with the following:
First of all, can anyone test/check if the XSD file is indeed screwed up? - and if it is, how to fix it?
How do I go about generating a DDL and ORM from the XSD file?
Personally I would have written the generator myself. There may be good generators out there, but I haven't seen any. All I've tried using (though I never used an XSD as the starting point) generate terrible code, and worse, are rather impossible to customize to handle every quirk that inevitably turns up.
Doing so is a lot less work than people seem to imagine, and gives many benefits, not least that you'll actually have total control over exactly what is generated. And you could even (and quite easily) take it to the next level and generate the stuff at run-time. The latter is hardly meaningful if the schema is final, but can be a huge time-saver if it's constantly evolving.
I'm quite sure this isn't the answer you were hoping for, and I'd be interested too if anyone knows of good tools for the job.

Extracting data from an ASPX page

I've been entrusted with an idiotic and retarded task by my boss.
The task is: given a web application that returns a table with pagination, do a software that "reads and parses it" since there is nothing like a webservice that provides the raw data. It's like a "spider" or a "crawler" application to steal data that is not meant to be accessed programmatically.
Now the thing: the application is made with standart aspx webform engine, so nothing like standard URLs or posts, but the dreadful postback engine crowded with javascript and non accessible html. The pagination links call the infamous javascript:__doPostBack(param, param) so I think it wouldn't even work if I try even to simulate clicks on those links.
There are also inputs to filter the results and they are also part of the postback mechanism, so I can't simulate a regular post to get the results.
I was forced to do something like this in the past, but it was on a standard-like website with parameters in the querystring like pagesize and pagenumber so I was able to sort it out.
Anyone has a vague idea if this is doable, or if I should tell to my boss to quit asking me to do this retarded stuff?
EDIT: maybe I was a bit unclear about what I have to achieve. I have to parse, extract and convert that data in another format - let's say excel - and not just read it. And this stuff must be automated without user input. I don't think Selenium would cut it.
EDIT: I just blogged about this situation. If anyone is interested can check my post at http://matteomosca.com/archive/2010/09/14/unethical-programming.aspx and comment about that.
Stop disregarding the tools suggested.
No, the parser you can write isn't WatiN or Selenium, both of those Will work in that scenario.
ps. had you mentioned anything on needing to extract the data from flash/flex/silverlight/similar this would be a different answer.
btw, reason to proceed or not is Definitely not technical, but ethical and maybe even lawful. See my comment on the question for my opinion on this.
WatiN will help you navigate the site from the perspective of the UI and grab the HTML for you, and you can find information on .NET DOM parsers here.
Already commented but think thus is actually an answer.
You need a tool which can click client side links and wait while page reloads.
Tool s like selenium can do that.
Also (from comments) WatiN WatiR
#Insane, the CDC's website has this exact problem, and the data is public (and we taxpayers have paid for it), I'm trying to get the survey and question data from http://wwwn.cdc.gov/qbank/Survey.aspx and it's absurdly difficult. Not illegal or unethical, just a terrible implementation that appears to be intentionally making it difficult to get the data (also inaccessible to search engines).
I think Selenium is going to work for us, thanks for the suggestion.

Simple screen scraping and analyze in .NET

I'm building a small specialized search engine for prise info. The engine will only collect specific segments of data on each site. My plan is to split the process into two steps.
Simple screen scraping based on a URL that points to the page where the segment I need exists. Is the easiest way to do this just to use a WebClient object and get the full HTML?
Once the HTML is pulled and saved analyse it via some script and pull out just the segment and values I need (for example the price value of a product). My problem is that this script somehow has to be unique for each site I pull, it has to be able to handle really ugly HTML (so I don't think XSLT will do ...) and I need to be able to change it on the fly as the target sites updates and changes. I will finally take the specific values and write these to a database to make them searchable
Could you please give me some hints on how to architect the best way? Would you do different then described above?
Well, i would go with the way you describe.
1.
How much data is it going to handle? Fetching the full HTML via WebClient / HttpWebRequest should not be a problem.
2.
I would go for HtmlAgilityPack for HTML parsing. It's very forgiving, and can handle prety ugly markup. As HtmlAgilityPack supports XPath, it's pretty easy to have specific xpath selections for individual sites.
I'm on the run and going to expand on this answer asap.
Yes, a WebClient can work well for this. The WebBrowser control will work as well depending on your requirements. If you are going to load the document into a HtmlDocument (the IE HTML DOM) then it might be easier to use the web browser control.
The HtmlDocument object that is now built into .NET can be used to parse the HTML. It is designed to be used with the WebBrowser control but you can use the implementation from the mshtml dll as well. I hav enot used the HtmlAgilityPack, but I hear that it can do a similar job.
The HTML DOM objects will typically handle, and fix up, most ugly HTML That you throw at them. As well as allowing a nicer way to parse the html, document.GetElementsByTag to get a collection of tag objects for example.
As for handling the changing requirements of the site, it sounds like a good candidate for the strategy pattern. You could load the strategies for each site using reflection or something of that sort.
I have worked on a system that uses XML to define a generic set of parameters for extracting text from HTML pages. Basically it would define start and end elements to begin and end extraction. I have found this technique to work well enough for a small sample, but it gets rather cumbersome and difficult to customize as the collection of sites gets larger and larger. Keeping the XML up to date and trying to keep a generic set of XML and code the handle any type of site is difficult. But if the type and number of sites is small then this might work.
One last thing to mention is that you might want to add a cleaning step to your approach. A flexible way to clean up HTML as it comes into the process was invaluable on the code I have worked on in the past. Perhaps implementing a type of pipeline would be a good approach if you think the domain is complex enough to warrant it. But even just a method that runs some regexes over the HTML before you parse it would be valuable. Getting rid of images, replacing particular mis-used tags with nicer HTML , etc. The amount of really dodgy HTML that is out there continues to amaze me...

Categories

Resources