Can someone tell me if there is a way to get each characters location in X,Y coordinates from a PDF.
i appreciate that it may not be XY i just need a way to identify where a text character is on a page.
the characters are not raster, so i don't need to recognise them.
i have started with this.
$Path = "C:\temp\test.pdf"
$reader = New-Object iTextSharp.text.pdf.pdfreader -ArgumentList $Path
for ($page = 1; $page -le $reader.NumberOfPages; $page++)
{
$text = [iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($reader,$page).Split([char]0x000A)
}
$reader.Close()
I'm not familiar at all with PowerShell, but you can do it like this in C#. FYI you will either need iTextSharp 5.5.10 or iText 7.0.1 for .NET to get this to run.
void Run()
{
PdfReader reader = new PdfReader("/path/to/input.pdf");
var s = PdfTextExtractor.GetTextFromPage(reader, 1, new LocationTextExtractionStrategy(new Local()));
}
private class Local : LocationTextExtractionStrategy.ITextChunkLocationStrategy
{
public LocationTextExtractionStrategy.ITextChunkLocation CreateLocation(TextRenderInfo renderInfo, LineSegment baseline)
{
// you need the info per character, so iterate all characters per TextRenderInfo
foreach (TextRenderInfo tr in renderInfo.GetCharacterRenderInfos())
{
LineSegment bl = tr.GetBaseline();
// do something with the info
Console.WriteLine(tr.GetText() + " # (" + bl.GetStartPoint()[Vector.I1] + ", " + bl.GetStartPoint()[Vector.I2] + ")");
}
return new LocationTextExtractionStrategy.TextChunkLocationDefaultImp(baseline.GetStartPoint(), baseline.GetEndPoint(), renderInfo.GetSingleSpaceWidth());
}
}
Based on blagae answer, here is a powershell script that will basically run his C# code. I didn't find an easy way to use LocationTextExtractionStrategy directly in powershell. You will need iTextSharp 5.5.10 as it is the first public version to expose LocationTextExtractionStrategy.
$Source = #"
using System;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
public class PdfHelper
{
public static void Run(string filePath)
{
PdfReader reader = new PdfReader(filePath);
for(var page = 1; page <= reader.NumberOfPages; page++)
{
PdfTextExtractor.GetTextFromPage(reader, page, new LocationTextExtractionStrategy(new Local()));
}
}
}
class Local : LocationTextExtractionStrategy.ITextChunkLocationStrategy
{
public LocationTextExtractionStrategy.ITextChunkLocation CreateLocation(TextRenderInfo renderInfo, LineSegment baseline)
{
// you need the info per character, so iterate all characters per TextRenderInfo
foreach (TextRenderInfo tr in renderInfo.GetCharacterRenderInfos())
{
LineSegment bl = tr.GetBaseline();
// do something with the info
Console.WriteLine(tr.GetText() + " # (" + bl.GetStartPoint()[Vector.I1] + ", " + bl.GetStartPoint()[Vector.I2] + ")");
}
return new LocationTextExtractionStrategy.TextChunkLocationDefaultImp(baseline.GetStartPoint(), baseline.GetEndPoint(), renderInfo.GetSingleSpaceWidth());
}
}
"#
$DLLPath = "$PSScriptRoot\iTextSharp.dll"
Add-Type -Path $DLLPath
Add-Type -ReferencedAssemblies $DLLPath -TypeDefinition $Source -Language CSharp
$Path = "C:\temp\test.pdf"
[PdfHelper]::Run($Path)
Related
I am currently trying to create a Autodesk Revit add in, which check the geometry of a room. I am struggling to do this due to an error which says "Revit cannot run the external command. AutodeskRevit.Exceptions.InvalidOperationException. HelloWorld.Class1 does not inherit IExternalCommand.
Sorry I'm new to C# and Autodesk Revit
So I am assuming the IExternalCommand needs to be inputted into the code to run the command. When I do so include the IExternalComand I receive a visual studio error saying "Class1 does not implement interface member 'IExternalCommand.Execute(ExternalCommandData, ref string, ElementSet)".
Here is my code:
using System;
using Autodesk.Revit.UI;
using Autodesk.Revit.DB;
using Autodesk.Revit.ApplicationServices;
using Autodesk.Revit.Attributes;
using Autodesk.Revit.Creation;
using Autodesk.Revit.Exceptions;
using Autodesk.Revit.DB.Architecture;
namespace HelloWorld
{
[Autodesk.Revit.Attributes.Transaction(Autodesk.Revit.Attributes.TransactionMode.Manual)]
[Autodesk.Revit.Attributes.Regeneration(Autodesk.Revit.Attributes.RegenerationOption.Manual)]
class Class1 :IExternalCommand
{
public void GetRoomDimensions(Autodesk.Revit.DB.Document doc, Room room)
{
String roominfo = "Room dimensions:\n";
// turn on volume calculations:
using (Transaction t = new Transaction(doc, "Turn on volume calculation"))
{
t.Start();
AreaVolumeSettings settings = AreaVolumeSettings.GetAreaVolumeSettings(doc);
settings.ComputeVolumes = true;
t.Commit();
}
roominfo += "Vol: " + room.Volume + "\n";
roominfo += "Area: " + room.Area + "\n";
roominfo += "Perimeter: " + room.Perimeter + "\n";
roominfo += "Unbounded height: " + room.UnboundedHeight + "\n";
TaskDialog.Show("Revit", roominfo);
}
}
}
Thanks for any advise.
An IExternalCommand runs from the Execute method. You need to have an Execute method defined in your class. From there you can call your GetRoomDimensions method
public Result Execute(
ExternalCommandData commandData,
ref string message,
ElementSet elements)
{
UIApplication application = commandData.Application;
Document mainDocument = application.ActiveUIDocument.Document;
if(elements.Size > 0)
{
//Only 1 room should be selected
return Result.Failed;
}
Room room = null;
foreach(Element element in elements)
{
room = element as Room;
}
if(room == null)
{
//A non-room element was selected
return Result.Failed;
}
GetRoomDimensions(mainDocument, room);
return Result.Success
}
Here is a link explaining the IExternalCommand in depth:
https://knowledge.autodesk.com/search-result/caas/CloudHelp/cloudhelp/2016/ENU/Revit-API/files/GUID-797F9E50-08C4-4E58-8CF0-8B4C68035409-htm.html
I am trying to write in C# what this powershell script does ( See image ).
Went halfway but not much luck ( I use these namespaces
using Microsoft.SqlServer.Dts.Runtime;
using Microsoft.SqlServer;
using Microsoft.SqlServer.Management.IntegrationServices;
)
I managed to get my code to work upto the point
SqlConnection ssisConnection = new SqlConnection(#"Data
Source=MHPDW2;Initial Catalog=master;Integrated Security=SSPI;");
IntegrationServices ssisServer = new IntegrationServices(ssisConnection);
foreach ?? ( you continue )
But no luck after that how to iterate through each folder. BTW the powershell script works fine with no issues.
Help! Click link below for powershell script code picture
powerscript code
$sqlConnectionString = "Data Source=MHPAPP2;Initial Catalog=master;Integrated Security=SSPI;"
$sqlConnection = New-Object System.Data.SqlClient.SqlConnection $sqlConnectionString
# Create the Integration Services object
$integrationServices = New-Object $ISNamespace".IntegrationServices" $sqlConnection
if ($integrationServices.Catalogs.Count -gt 0)
{
$catalog = $integrationServices.Catalogs["SSISDB"]
write-host "Enumerating all folders..."
$folders = $catalog.Folders
if ($folders.Count -gt 0)
{
foreach ($folder in $folders)
{
$foldername = $folder.Name
Write-Host "Exporting Folder " $foldername " ..."
# Create a new file folder
mkdir $ProjectFilePath"\"$foldername
# Export all projects
$projects = $folder.Projects
if ($projects.Count -gt 0)
{
foreach($project in $projects)
{
$fullpath = $ProjectFilePath + "\" + $foldername + "\" + $project.Name + ".ispac"
Write-Host "Exporting to " $fullpath " ..."
[System.IO.File]::WriteAllBytes($fullpath, $project.GetProjectBytes())
}
}
}
}
}
Write-Host "All done."
I would like to connect DotCMIS.dll to my SharePoint but does not work correct.
I open the script in the SharePoint 2013 Management Shell.
I use my user permissions (This is not a Farm user)
Probably here's the problem with giving the correct link. org.apache.chemistry.dotcmis.binding.atompub.url=?
Have you got any idea where link in sharepoint have to go?
Website of example:
http://chemistry.apache.org/dotnet/powershell-example.html
Error
You cannot call a method on a null-valued expression.
At line:6 char:7
+ $b = $contentStream.Stream.Read($buffer, 0, 4096)
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvokeMethodOnNull
Important part of my Script
$sp["org.apache.chemistry.dotcmis.binding.atompub.url"] = "http://localhost/_layouts/15/start.aspx#/SitePages/WebSite.aspx"
$sp["org.apache.chemistry.dotcmis.user"] = "mylogin"
$sp["org.apache.chemistry.dotcmis.password"] = "mypassword"
All Script
# load DotCMIS DLL
[Reflection.Assembly]::LoadFile("C:\dotCmisServer\DotCMIS.dll")
# -----------------------------------------------------------------
# helper functions
function New-GenericDictionary([type] $keyType, [type] $valueType) {
$base = [System.Collections.Generic.Dictionary``2]
$ct = $base.MakeGenericType(($keyType, $valueType))
New-Object $ct
}
function New-ContentStream([string] $file, [string] $mimetype) {
$fileinfo = ([System.IO.FileInfo]$file)
$contentStream = New-Object "DotCMIS.Data.Impl.ContentStream"
$contentStream.Filename = $fileinfo.Name
$contentStream.Length = $fileinfo.Length
$contentStream.MimeType = $mimetype
$contentStream.Stream = $fileinfo.OpenRead()
$contentStream
}
function Download-ContentStream([DotCMIS.Client.IDocument] $document, [string] $file) {
$contentStream = $document.GetContentStream()
$fileStream = [System.IO.File]::OpenWrite($file)
$buffer = New-Object byte[] 4096
do {
$b = $contentStream.Stream.Read($buffer, 0, 4096)
$fileStream.Write($buffer, 0, $b)
}
while ($b -ne 0)
$fileStream.Close()
$contentStream.Stream.Close()
}
# -----------------------------------------------------------------
# create session
$sp = New-GenericDictionary string string
$sp["org.apache.chemistry.dotcmis.binding.spi.type"] = "atompub"
$sp["org.apache.chemistry.dotcmis.binding.atompub.url"] = "http://localhost/_layouts/15/start.aspx#/SitePages/WebSite.aspx"
$sp["org.apache.chemistry.dotcmis.user"] = "mylogin"
$sp["org.apache.chemistry.dotcmis.password"] = "mypassword"
$factory = [DotCMIS.Client.Impl.SessionFactory]::NewInstance()
$session = $factory.GetRepositories($sp)[0].CreateSession()
# print the repository infos
$session.RepositoryInfo.Id
$session.RepositoryInfo.Name
$session.RepositoryInfo.Vendor
$session.RepositoryInfo.ProductName
$session.RepositoryInfo.ProductVersion
# get root folder
$root = $session.GetRootFolder()
# print root folder children
$children = $root.GetChildren()
foreach ($object in $children) {
$object.Name + " (" + $object.ObjectType.Id + ")"
}
# run a quick query
$queryresult = $session.Query("SELECT * FROM cmis:document", $false)
foreach ($object in $queryresult) {
foreach ($item in $object.Properties) {
$item.QueryName + ": " + $item.FirstValue
}
"----------------------------------"
}
# create a folder
$folderProperties = New-GenericDictionary string object
$folderProperties["cmis:name"] = "myNewFolder"
$folderProperties["cmis:objectTypeId"] = "cmis:folder"
$folder = $root.CreateFolder($folderProperties)
# create a document
$documentProperties = New-GenericDictionary string object
$documentProperties["cmis:name"] = "myNewDocument"
$documentProperties["cmis:objectTypeId"] = "cmis:document"
$source = $home + "\source.txt"
$mimetype = "text/plain"
$contentStream = New-ContentStream $source $mimetype
$doc = $folder.CreateDocument($documentProperties, $contentStream, $null)
# download a document
$target = $home + "\target.txt"
Download-ContentStream $doc $target
# clean up
$doc.Delete($true)
$folder.Delete($true)
Unfortunately in SharePoint Foundation 2013 I have to write own C# software.
SharePoint Foundation 2013 IMPORT\EXPORT dbo.allDocs file\files
Struggling with a C# Component. What I am trying to do is take a column that is ntext in my input source which is delimited with pipes, and then write the array to a text file. When I run my component my output looks like this:
DealerID,StockNumber,Option
161552,P1427,Microsoft.SqlServer.Dts.Pipeline.BlobColumn
Ive been working with the GetBlobData method and im struggling with it. Any help with be greatly appreciated! Here is the full script:
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
string vehicleoptionsdelimited = Row.Options.ToString();
//string OptionBlob = Row.Options.GetBlobData(int ;
//string vehicleoptionsdelimited = System.Text.Encoding.GetEncoding(Row.Options.ColumnInfo.CodePage).GetChars(OptionBlob);
string[] option = vehicleoptionsdelimited.Split('|');
string path = #"C:\Users\User\Desktop\Local_DS_CSVs\";
string[] headerline =
{
"DealerID" + "," + "StockNumber" + "," + "Option"
};
System.IO.File.WriteAllLines(path + "OptionInput.txt", headerline);
using (System.IO.StreamWriter file = new System.IO.StreamWriter(path + "OptionInput.txt", true))
{
foreach (string s in option)
{
file.WriteLine(Row.DealerID.ToString() + "," + Row.StockNumber.ToString() + "," + s);
}
}
Try using
BlobToString(Row.Options)
using this function:
private string BlobToString(BlobColumn blob)
{
string result = "";
try
{
if (blob != null)
{
result = System.Text.Encoding.Unicode.GetString(blob.GetBlobData(0, Convert.ToInt32(blob.Length)));
}
}
catch (Exception ex)
{
result = ex.Message;
}
return result;
}
Adapted from:
http://mscrmtech.com/201001257/converting-microsoftsqlserverdtspipelineblobcolumn-to-string-in-ssis-using-c
Another very easy solution to this problem, because it is a total PITA, is to route the error output to a derived column component and cast your blob data to a to a STR or WSTR as a new column.
Route the output of that to your script component and the data will come in as an additional column on the pipeline ready for you to parse.
This will probably only work if your data is less than 8000 characters long.
I've create a basic C# class that implements Microsoft.Data.Schema.ScriptDom and Microsoft.Data.Schema.ScriptDom.Sql interface. These two assemblies are part of Visual Studio Database Edition (VSDB) and are the parsing/scripting API's. You can parse SQL text and output a format SQL script. For more information about VSDB assemblies, see this blog post . Since they are redistributable, I've included both assemblies and the PowerShell script here:
#requires -version 2
add-type -path .\Microsoft.Data.Schema.ScriptDom.dll
add-type -path .\Microsoft.Data.Schema.ScriptDom.Sql.dll
$Source = #"
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Microsoft.Data.Schema.ScriptDom;
using Microsoft.Data.Schema.ScriptDom.Sql;
using System.IO;
public class SQLParser
{
private IScriptFragment fragment;
public SQLParser(SqlVersion sqlVersion, bool quotedIdentifier, string inputScript)
{
switch (sqlVersion)
{
case SqlVersion.Sql80:
SQLParser80 (quotedIdentifier, inputScript);
break;
case SqlVersion.Sql90:
SQLParser90 (quotedIdentifier, inputScript);
break;
case SqlVersion.Sql100:
SQLParser100 (quotedIdentifier, inputScript);
break;
}
}
private void SQLParser100 (bool quotedIdentifier, string inputScript)
{
TSql100Parser parser = new TSql100Parser(quotedIdentifier);
Parse(parser, inputScript);
}
private void SQLParser90 (bool quotedIdentifier, string inputScript)
{
TSql90Parser parser90 = new TSql90Parser(quotedIdentifier);
Parse(parser90, inputScript);
}
private void SQLParser80 (bool quotedIdentifier, string inputScript)
{
TSql80Parser parser80 = new TSql80Parser(quotedIdentifier);
Parse(parser80, inputScript);
}
private void Parse(TSql100Parser parser, string inputScript)
{
IList<ParseError> errors;
using (StringReader sr = new StringReader(inputScript))
{
fragment = parser.Parse(sr, out errors);
}
if (errors != null && errors.Count > 0)
{
StringBuilder sb = new StringBuilder();
foreach (var error in errors)
{
sb.AppendLine(error.Message);
sb.AppendLine("offset " + error.Offset.ToString());
}
throw new ArgumentException("InvalidSQLScript", sb.ToString());
}
}
private void Parse(TSql90Parser parser, string inputScript)
{
IList<ParseError> errors;
using (StringReader sr = new StringReader(inputScript))
{
fragment = parser.Parse(sr, out errors);
}
if (errors != null && errors.Count > 0)
{
StringBuilder sb = new StringBuilder();
foreach (var error in errors)
{
sb.AppendLine(error.Message);
sb.AppendLine("offset " + error.Offset.ToString());
}
throw new ArgumentException("InvalidSQLScript", sb.ToString());
}
}
private void Parse(TSql80Parser parser, string inputScript)
{
IList<ParseError> errors;
using (StringReader sr = new StringReader(inputScript))
{
fragment = parser.Parse(sr, out errors);
}
if (errors != null && errors.Count > 0)
{
StringBuilder sb = new StringBuilder();
foreach (var error in errors)
{
sb.AppendLine(error.Message);
sb.AppendLine("offset " + error.Offset.ToString());
}
throw new ArgumentException("InvalidSQLScript", sb.ToString());
}
}
public IScriptFragment Fragment
{
get { return fragment; }
}
}
"#
$refs = #("Microsoft.Data.Schema.ScriptDom","Microsoft.Data.Schema.ScriptDom.Sql")
add-type -ReferencedAssemblies $refs -TypeDefinition $Source -Language CSharpVersion3 -passThru
I'm using PowerShell V2 add-type to create a runtime type. I've tested the script on 3 different machines. On one machine the script works as expected on the other two machines the following error is produced. Both referenced assemblies are placed in the same folder as the PowerShell script. Any ideas on what I'm doing wrong?
PS C:\Users\u00\bin> .\SQLParser.ps1
Add-Type : (0) : Metadata file 'Microsoft.Data.Schema.ScriptDom.dll' could not be found
(1) : using System;
At C:\Users\u00\bin\SQLParser.ps1:125 char:9
+ add-type <<<< -ReferencedAssemblies $refs -TypeDefinition $Source -Language CSharpVersion3 -passThru
+ CategoryInfo : InvalidData: (error CS0006: M...ld not be found:CompilerError) [Add-Type], Exception
+ FullyQualifiedErrorId : SOURCE_CODE_ERROR,Microsoft.PowerShell.Commands.AddTypeCommand
Add-Type : (0) : Metadata file 'Microsoft.Data.Schema.ScriptDom.Sql.dll' could not be found
(1) : using System;
At C:\Users\u00\bin\SQLParser.ps1:125 char:9
+ add-type <<<< -ReferencedAssemblies $refs -TypeDefinition $Source -Language CSharpVersion3 -passThru
+ CategoryInfo : InvalidData: (error CS0006: M...ld not be found:CompilerError) [Add-Type], Exception
+ FullyQualifiedErrorId : SOURCE_CODE_ERROR,Microsoft.PowerShell.Commands.AddTypeCommand
Add-Type : Cannot add type. There were compilation errors.
At C:\Users\u00\bin\SQLParser.ps1:125 char:9
+ add-type <<<< -ReferencedAssemblies $refs -TypeDefinition $Source -Language CSharpVersion3 -passThru
+ CategoryInfo : InvalidData: (:) [Add-Type], InvalidOperationException
+ FullyQualifiedErrorId : COMPILER_ERRORS,Microsoft.PowerShell.Commands.AddTypeCommand
Pretty simple, once you know ;-)
Max's example works because those assemblies are in the GAC so they can be referenced by name. Your assemblies aren't, so they need to be referenced by path.
You don't need the Add-Type references at the top either, at least, not for that script -- just change your last couple of lines to this:
$PSScriptRoot = (Split-Path $MyInvocation.MyCommand.Path -Parent)
$refs = #("$PSScriptRoot\Microsoft.Data.Schema.ScriptDom.dll","$PSScriptRoot\Microsoft.Data.Schema.ScriptDom.Sql.dll")
add-type -ReferencedAssemblies $refs -TypeDefinition $Source -Language CSharpVersion3 -passThru
I have a sample I've use during our PS track. It's kinda basic but works. Here's the code using SMO:
$Assem = ("Microsoft.SqlServer.Smo","Microsoft.SqlServer.ConnectionInfo")
$Source = #"
public class MyMSSql
{
public static string getEdition(string sqlName)
{
string sqlEdition;
Microsoft.SqlServer.Management.Smo.Server sname = new Microsoft.SqlServer.Management.Smo.Server(sqlName);
sqlEdition = sname.Information.Edition;
return sqlEdition;
}
public string getSqlEdition(string sqlName)
{
string sqlEdition;
Microsoft.SqlServer.Management.Smo.Server sname = new Microsoft.SqlServer.Management.Smo.Server(sqlName);
sqlEdition = sname.Information.Edition;
return sqlEdition;
}
}
"#;
Add-Type -ReferencedAssemblies $Assem -TypeDefinition $Source
[MyMSSql]::getEdition("MAX-PCWIN1")
#Developer Edition (64-bit)
$MySQLobj = New-Object MyMSSql
$MySQLobj.getSqlEdition("MAX-PCWIN1")
Hope this will give you a hint.
Max
If you put the VSTSDB assemblies in the same dir as the script then you don't want to use "." in a relative path. "." will be relative to the directory where the script is invoked. Try something like this instead:
$ScriptDir = Split-Path $MyInvocation.MyCommand.Path -Parent
Add-Type -Path "$ScriptDir\Microsoft.Data.Schema.ScriptDom.dll"