How to properly extract content from a blog article?

How to properly extract content from a blog article? - c#

I am trying to extract content from a blog article like this:
static void GetBlogData (string blogPostUrl)
{
string blogPostContent = null;
WebClient client = new WebClient ();
//client.Headers.Add (HttpRequestHeader.Referer, "http://www.stackoverflow.com");
TextWriter writer = new StreamWriter ("/home/nanda/projects/mono/common/article");
try
{
blogPostContent = client.DownloadString (blogPostUrl);
}
catch (Exception ex)
{
Term.PrintLn ("Unable to download\n{0}", ex.Message);
}
if (blogPostContent != null)
{
writer.WriteLine (blogPostContent);
}
else
{
Term.PrintLn ("No content found");
}
}
I am aware that this is too simple of an approach, but I want to know why I am unable to extract content from some URLs like they have a block or something. How can I detect if a website/blog is blocking me from downloading its content?

A website cannot block you from downloading its content without blocking the site's consultation from a browser.
If your download fails, it means either:
a) your url is wrong
b) the website needs some form of identification and your request lacks something (probably a cookie)

Related

The operation has timed out on uploading document and updating metadata in sharepoint library using clientcontext.executequery()

I wrote a program using CSOM to upload documents to SharePoint and insert metadata to the properties. once a while(like every 3 months) the SharePoint server gets busy or we reset IIS or any other communication problem that it may have, we get "The operation has timed out" error on clientContext.ExecuteQuery(). To resolve the issue I wrote an extension method for ExecuteQuery to try every 10 seconds for 5 times to connect to the server and execute the query. My code works in the Dev and QA environment without any problem but in Prod, when it fails the first time with timeout error, in the second attempt, it only uploads the document but it doesn't update the properties and all the properties are empty in the library. It doesn't return any error as result of ExecteQuery() but It seems from the two requests in the batch witch are uploading the file and updating the properties, it just does uploading and I don't know what happens to the properties. It kinda removes that from the batch in the second attempt!
I used both upload methods docs.RootFolder.Files.Add and File.SaveBinaryDirect in different parts of my code but I copy just one of them here so you can see what I have in my code.
I appreciate your help.
public static void ExecuteSharePointQuery(ClientContext context)
{
int cnt = 0;
bool isExecute = false;
while (cnt < 5)
{
try
{
context.ExecuteQuery();
isExecute = true;
break;
}
catch (Exception ex)
{
cnt++;
Logger.Error(string.Format("Communication attempt with SharePoint failed. Attempt {0}", cnt));
Logger.Error(ex.Message);
Thread.Sleep(10000);
if (cnt == 5 && isExecute == false)
{
Logger.Error(string.Format("Couldn't execute the query in SharePoint."));
Logger.Error(ex.Message);
throw;
}
}
}
}
public static void UploadSPFileWithProperties(string siteURL, string listTitle, FieldMapper item)
{
Logger.Info(string.Format("Uploading to SharePoint: {0}", item.pdfPath));
using (ClientContext clientContext = new ClientContext(siteURL))
{
using (FileStream fs = new FileStream(item.pdfPath, FileMode.Open))
{
try
{
FileCreationInformation fileCreationInformation = new FileCreationInformation();
fileCreationInformation.ContentStream = fs;
fileCreationInformation.Url = Path.GetFileName(item.pdfPath);
fileCreationInformation.Overwrite = true;
List docs = clientContext.Web.Lists.GetByTitle(listTitle);
Microsoft.SharePoint.Client.File uploadFile = docs.RootFolder.Files.Add(fileCreationInformation);
uploadFile.CheckOut();
//Update the metadata
ListItem listItem = uploadFile.ListItemAllFields;
//Set field values on item
foreach (List<string> list in item.fieldMappings)
{
if (list[FieldMapper.SP_VALUE_INDEX] != null)
{
TrySet(ref listItem, list[FieldMapper.SP_FIELD_NAME_INDEX], (FieldType)Enum.Parse(typeof(FieldType), list[FieldMapper.SP_TYPE_INDEX]), list[FieldMapper.SP_VALUE_INDEX]);
}
}
listItem.Update();
uploadFile.CheckIn(string.Empty, CheckinType.OverwriteCheckIn);
SharePointUtilities.ExecuteSharePointQuery(clientContext);
}
catch (Exception ex)
{
}
}
}
}

There's too many possible reasons for me to really comment on a solution, especially considering it's only on the prod environment.
What I can say is that it's probably easiest to keep a reference to the last uploaded file. If your code fails then check if the last file has been uploaded correctly.
Side note: I'm not sure if this is relevant but if it's a large file you want to upload it in slices.

Dropbox.Api failing to upload large files

I am uploading files to dropbox using the following code.
I am using the nuget package Dropbox.Api and getting the exception System.Threading.Tasks.TaskCanceledException("A task was canceled.")
From this SO Question it appears to be a timeout issue.
So how do I modify the following code to set the timeout.
public async Task<FileMetadata> UploadFileToDropBox(string fileToUpload, string folder)
{
DropboxClient client = new DropboxClient(GetAccessToken());
using (var mem = new MemoryStream(File.ReadAllBytes(fileToUpload)))
{
string filename = Path.GetFileName(fileToUpload);
try
{
string megapath = GetFullFolderPath(folder);
string megapathWithFile = Path.Combine(megapath, Path.GetFileName(Path.GetFileName(filename))).Replace("\\", "/");
var updated = client.Files.UploadAsync(megapathWithFile, WriteMode.Overwrite.Instance, body: mem);
await updated;
return updated.Result;
}
catch (Exception ex)
{
return null;
}
}
}

Try creating and initializing the client like this:
var config = new DropboxClientConfig();
config.HttpClient.Timeout = new TimeSpan(hr, min, sec); // choose values
var client = DropboxClient(GetAccessToken(), config);
Reference:
http://dropbox.github.io/dropbox-sdk-dotnet/html/M_Dropbox_Api_DropboxClient__ctor_1.htm

One more thing to keep in mind is UploadAsync will not work for files larger than 150MB as per documentation. One will have to use UploadSessionStartAsync based implementation for it. I was making the mistake without realizing it and it took ages for me to fish the problem out.

Unable to download a Sitecore file in Xamarin

I have a Xamarin mobile project setup with the Sitecore MobileSDK (WebAPI) and can browse through the items using the ReadItemsRequestWithPath method. Now I am trying to download a pdf file that is stored in the 'media library' directory. I have tried the following:
private async void DownloadFile()
{
try
{
string instanceUrl = "https://sitecoredev10.myapp.org";
using (var demoCredentials = new SecureStringPasswordProvider("username", "password"))
using
(
var session =
SitecoreWebApiSessionBuilder.AuthenticatedSessionWithHost(instanceUrl)
.Credentials(demoCredentials)
.BuildReadonlySession())
{
var request = ItemWebApiRequestBuilder.DownloadResourceRequestWithMediaPath("/sitecore/media library/path/to/file")
.Build();
Stream response = await session.DownloadMediaResourceAsync(request);
// Process Stream...
}
}
catch (Exception e)
{
Debug.WriteLine(e.Message);
Exception originalError = e.InnerException;
Console.WriteLine(originalError.Message); // access original error
Console.WriteLine(originalError.StackTrace);
}
}
However, I'm getting the following exception:
[Sitecore Mobile SDK] Unable to download data from the internet
2016-11-03 15:24:53.411 SitecoreDemo.iOS[88136:15581600] Not Found
Should I be using a different method other than DownloadMediaResourceAsync? How can I download the file?

How to read Word document from C# on close event?

I'm trying to upload a Word document to a webserver when it's closed in Word.
My code looks like this:
((DocumentEvents_Event)doc).Close += DocumentClose;
private void DocumentClose()
{
var url = Config.GetValue("ApiUrl");
try
{
using (var client = new WebClient())
{
var response = client.UploadData(url, File.ReadAllBytes(_applicationWord.ActiveDocument.FullName));
}
}
catch (Exception e)
{
_notifyIcon.ShowBalloonTip("Word " + WordTools.WordVersionValueToKey(_applicationWord.Version), e.Message, BalloonIcon.Error);
}
}
But unfortunatelly this is not working. ReadAllBytes throws exception "The process cannot access the file because it is being used by another process." Well, quite obvious this other process must be Word itself ;)
What would be a proper way to handle this? As far as I know there is no DocumentAfterClose event...

detect when uploadValues is completed?

I want to display a message box after posting data to a remote php file..
PS: The php file return the string "END" when the data is completely processed
if (1 == outputToGui)
{
CompressFile("allFilesList.txt");
byte[] allFilesList = File.ReadAllBytes("allFilesList.txt.gz");
string URIx = "http://example.com/post.php";
System.Collections.Specialized.NameValueCollection data = new System.Collections.Specialized.NameValueCollection();
data.Add("serial", serial);
data.Add("data", Convert.ToBase64String(allFilesList));
using (WebClient tayba = new System.Net.WebClient())
{
try
{
tayba.Headers[HttpRequestHeader.ContentType] = "application/x-www-form-urlencoded";
tayba.Proxy = null;
tayba.UploadValues(URIx, "POST", data);
}
catch (Exception E) { }
}
}
MessageBox.Show("upload completed"); // this message show up before the php file process the posted data sometimes.. ?!!!!
The problem is that the message box show up before the php file process the posted data sometimes.. ?!!!!

Thats because operation is asynchronous. Upload might still be in progress when messagebox is called.
Search webclient for correct completion event and add your message there. META: _webClient.eventName += (sender, args) => MessageBox.Show("Ta-dah!");
Assign event listener before you start the upload.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to properly extract content from a blog article? - c#

A website cannot block you from downloading its content without blocking the site's consultation from a browser. If your download fails, it means either: a) your url is wrong b) the website needs some form of identification and your request lacks something (probably a cookie)

Related

The operation has timed out on uploading document and updating metadata in sharepoint library using clientcontext.executequery()

Dropbox.Api failing to upload large files

Unable to download a Sitecore file in Xamarin

How to read Word document from C# on close event?

detect when uploadValues is completed?

Categories

Resources