Lower Your Expectations
When the CAD user looks at a PDF file in a viewer, (especially if the PDF is crisp) they see objects, intelligent objects. They see dimensions, block inserts, text and more. What they do not realize is that PDF files cannot contain these objects.
A PDF is not a DWG, it is a popular plot file format. The only content in a PDF is images, paths and text. There is not even a circle object in a PDF file, it must be represented as four bezier curves.
Silk Purse from a Sow’s Ear
Here is one example of unreasonable expectations. The evaluator expected high quality linework from a scanned image of this old hand drawn sheet.
After all, if a product is converting to a drawing, it must be able to perform magic in the process right? The process of converting an image to vectors is called vectorization. There are numerous vectorization products costing up to thousands of dollars and absolutely none of them can create usable geometry from sources like this.
Your Scans for CAD … Skip PDF
If you are scanning for inclusion in CAD applications, skip the PDF format and go straight to a raster file in TIFF or PNG format (whichever is smaller). The CAD engine can handle the image much better as an image file, rather than bundling it inside a PDF. If your recipients don’t have CAD, then scanning to PDF makes sense, otherwise it doesn’t.
Images Not Linework: What appears as linework in a PDF may be a raster image. During import or conversion they typically remain as raster images, they are not vectorized. If your PDF came from a scanner, you have an image! Sometimes high resolution scans look like vectors but they are not. To check for yourself, open the PDF in Adobe’s free viewer and zoom to 800%+ to see the rough edges confirming an image.
Vectors Not Text: What appears as text in a PDF may be vectors or filled paths. During import they will remain as such, they are not converted to text objects. For example, two crossing lines may look like the letter X, but that doesn’t make it so. To check for yourself, open the PDF in the Adobe free viewer and use the find tool on what appears to be text. If the viewer highlights it, there is a good chance you will get mtext objects. Even CAD engines will output vectors when certain fonts (like ROMANS) are used.
The precision from the original drawing is lost forever. Even if the PDF contains vectors, the coordinates have been scaled to page coordinates and a lot of precision has been lost. Don’t ever consider designing anything important from the contents of a PDF. If you design a building based on the contents of a PDF, let us know so we can be sure to never get near it.
PDF File Sizes
The PDF file size mostly depends on the content of the PDF, here are some contributing problems.
Overscanned Images: It seems scanner manufacturers are in a race to see who can produce the highest dpi scan resolution. If the source document has crisp linework then 300 dpi might be justified, but more than that is overkill and only results in a larger file that takes longer to email and process, especially displaying it your drawing.
Bad Vector Definitions: Even a vector based PDF can be a bloated mess. Keep in mind that not all PDF creators are equal. For example, while one driver may write out text as a lean text element, another may (especially if its width factor is changed) write out each letter as a large collection of filled triangles. Take for example this innocent looking letter “S”, which contained 471 objects!
The CAD engine doesn’t know it’s a letter S and has to treat it with as much importance as any other part of your file. This users file (1.2mb) contained nearly 325,000 objects that obviously was bogging down the CAD engine. You can turn off the PDFOSNAP system variable to improve snaps.
With the understanding of these limitations (and now having reasonable expectations), there are converters like DotSoft’s PDF2DWG that make it possible to import these primitives from PDF files into AutoCAD.