CAD | Algorist

Extract features from CAD files Part 3: The future

Tue, 28 Sep 2021 00:00:00 +0000

In this third part of the series of posts on extracting objects from a CAD document, I’ll discuss how you might develop this CAD extraction tool further and the problems you might be able to solve.

Prologue

In Part 1 we looked at the structure of a CAD file and built up a strategy to extract seat types and locations from an architect’s floor plan. The motivation for this is to provide seat location data to a model that creates a stack plan with optimal locations of teams to office amenities and other teams they collaborate with.

In Part 2 we built an extraction tool based on the Python ezdxf package that can read and query DXF files. We loaded a floor plan in DXF, printed the block types in a layer, extracted all block inserts that matched a layer and/or block type query and outputted them to a pandas dataframe.

Additional feature extraction

As long as a CAD drawing is segregated by block type and/or layer, several floor features can be extracted with their own query and appended to the dataframe of floor features. Currently only assignable seating has been extracted from a single layer, but what else might you want to extract?

Floor features a team may need

Often teams will use meeting rooms. Some teams use meeting rooms more than others, or need more meeting rooms simultaneously. Large teams will typically require larger meeting rooms compared with small teams. The typical meeting room layout is to have a single table in the centre of the room, which means that running a query for, e.g., “small conference table” and “large conference table” will return you the count and the centre point of every small and large conference room, respectively. Alternatively, if the seats in a meeting room are stored on a layer separate to assignable desk seats then this could be a way of tracking the location and seating capacity of a meeting room.

A similar approach could be used to extract printers, breakout rooms, kitchen facilities and everything else a team may have a preference for. It could even be used to track accessibility requirements (e.g., proximity to a lift, disabled toilets, etc.).

Neighbourhoods/zoning

A related - but more complicated - concept is to define neighbourhoods or zones within a floor. Does a team need quiet space away from stairwells, kitchens and thoroughfares? What about key swipe access to secured areas on a floor? As long as this is defined in an appropriately named block or layer then we can extract it, calculating the spatial overlap with seats in that zone using a Python function.

Drawing/export

ezdxf has addon modules, one of which deals with drawing and export of CAD files. Below is a very simple example of how to read a DXF file and render as a PNG image file to disk.

from ezdxf import recover
from ezdxf.addons.drawing import matplotlib
# Exception handling left out for compactness:
doc, auditor = recover.readfile('your.dxf')
if not auditor.has_errors:
matplotlib.qsave(doc.modelspace(), 'your.png')

Streamline the use of ODA Converter

In part 1 we discussed the need to convert a CAD file from proprietary DWG format to DXF format using the open source file converter provided by the Open Design Alliance. Instead of opening this software separately we can call it from ezdxf using another addon.

from ezdxf.addons import odafc
# Load a DWG file
doc = odafc.readfile('my.dwg')
# Use loaded document like any other ezdxf document
print(f'Document loaded as DXF version: {doc.dxfversion}.')
msp = doc.modelspace()
...
# Export document as DWG file for AutoCAD R2018
odafc.export_dwg(doc, 'my_R2018.dwg', version='R2018')

Generative design

The ultimate aim of the stack plan modelling tool is to develop it into a product. What better way to do this than to create a workflow that starts with a CAD drawing and ends with an annotated CAD drawing labelled according to team locations? You could reassign seats from your seats layer to new layers named after the relevant teams on that floor. However, the more appropriate method is likely to use attribute definitions.

Below is an example of isolating an entity and changing an attribute. For a real use case, entities would be a list of block IDs of seats that belong to a certain team and the attribute to change would be the team ID.

doc = ezdxf.readfile(filepath)
model = doc.modelspace()
# load a data frame with unique ID that refers to a block insert "handle", and a team name
# define a query_string to search for the appropriate layer/blocks that refer to your allocatable seats
entities = [x for x in model.query(query_string) if x.has_dxf_attrib('name')]
if len(entities):
entity = entities[0] # process first entity found
for attrib in entity.attribs:
if attrib.dxf.tag == "diameter": # identify attribute by tag
attrib.dxf.text = "17mm" # change attribute content

Stretch goal

In practice a client will take a stack plan recommendation and remodel a floor to include additional teams, additional seats (if the recommended teams for a floor mean the floor is over capacity) or to space desks out if the floor is under capacity. Wouldn’t it be great if we could generate floor plans automatically, or at least edit an existing one to accommodate these changes? Between a model that can learn floor plan design principles (e.g., minimum separation between objects, coincidence of a seat with a desk, etc.) and the ability to insert new block references into the model (see below), we would have all we need to achieve this.

An example from the ezdxf documentation.

import ezdxf
import random
def get_random_point():
"""Returns random x, y coordinates."""
x = random.randint(-100, 100)
y = random.randint(-100, 100)
return x, y
# Get the modelspace of the drawing.
msp = doc.modelspace()
# Get 50 random placing points.
placing_points = [get_random_point() for _ in range(50)]
for point in placing_points:
# Every flag has a different scaling and a rotation of -15 deg.
random_scale = 0.5 + random.random() * 2.0
# Add a block reference to the block named 'FLAG' at the coordinates 'point'.
msp.add_blockref('FLAG', point, dxfattribs={
'xscale': random_scale,
'yscale': random_scale,
'rotation': -15
})
# Save the drawing.
doc.saveas("blockref_tutorial.dxf")

Extract features from CAD documents Part 2: Using ezdxf

Wed, 14 Apr 2021 00:00:00 +0000

In this second part of the series of posts on extracting objects from a CAD document, I’ll go through the process of using the ezdxf package to implement the extraction strategy discussed in part one.

Prologue

In part one we looked at the structure of a CAD file and built up a strategy to extract seat types and locations from an architect’s floor plan. The motivation for this is to provide seat location data to a model that creates a stack plan with optimal locations of teams to office amenities and other teams they collaborate with.

In summary, our strategy is:

load the DXF file;
create an object from the model space;
build a layer and block type query;
extract unique ID, x and y features from every element returned from the query, recording the block type and layer it came from;
check for errors and write to csv file.

ezdxf

Since the genetic algorithm we’ve built is in R, I tried to build an extraction tool in R. However, the only DXF file loader package I could find is called ezdxf and is built in Python. I’m not a fan of reinventing the wheel so I wrote a custom package in Python that is built on top of ezdxf’s classes and methods.

The following functions are building blocks to load the model space from a dxf file, query the model space object for block inserts that belong to a certain block type or layer, and to extract them. They rely heavily on the documentation and tutorials from the ezdxf readme website so I would encourage the interested reader to refer to those documents for background information.

Load file

import ezdxf
import numpy as np
import pandas as pd
import sys
def load(filepath):
"""
Loads the modelspace of a dxf file using the ezdxf package.
Parameters
----------
filepath: string
The file path to a dxf file
Returns
An exdxf modelspace object.
-------
"""
try:
doc = ezdxf.readfile(filepath)
msp = doc.modelspace()
except IOError:
print(f'Not a DXF file or a generic I/O error.')
sys.exit(1)
except ezdxf.DXFStructureError:
print(f'Invalid or corrupted DXF file.')
sys.exit(2)
return msp

Print blocks that belong to a layer

def print_blocks(model, layer):
"""
Prints all blocks belonging to a layer
Parameters
----------
model : ezdxf modelspace object
A modelspace object created from a loaded dxf file.
See the ezdxf help file for 'readfile' and 'modelspace' methods
layer : string
A string specifying the layer name in the modelspace object
Returns
A list of string elements
-------
"""
query_string = '*[layer ? \"%s\"]' % layer
entities = model.query(query_string)
layers = set(i.dxf.layer for i in entities)
for i in layers:
layer_entity = model.query('*[layer == \"%s\"]' % i)
layer_entity = [x for x in layer_entity if x.has_dxf_attrib('name')]
blocks = set(j.dxf.name for j in layer_entity)
print(i + ":")
for x in blocks:
print(x)

Extract inserts belonging to a block type or layer

The first function extracts all inserts of all blocks based on a query made on the model object. Within the body of the function there is code to check for negative x values and mirror them if it finds any. This is because when I compared the DWG and DXF files I noticed that some objects had been mirrored in the DXF file for some reason. This fix worked for me but it may not work for you, depending on your version of CAD software you used, the conversion software and the point of origin (the 0,0 point). This issue is not due to Python as the issue appeared in the input DXF file prior to extraction. You have been warned!

def extract_query(model, query_string, scaling = 0.001):
"""
Extracts all objects from an ezdxf model object that are returned by a query.
Parameters
----------
model : ezdxf modelspace object
A modelspace object created from a loaded dxf file.
See the ezdxf help file for 'readfile' and 'modelspace' methods
query_string : string
A string specifying a layer or block based query
scaling : positive real value
Depending on the units of the modelspace and the desired units, a scaling
may be preferred. The default is a unit of mm and a desired unit of metre.
(Default value = 0.001)
Returns
Pandas dataframe object with ID, x and y coordinate values. One row per object.
-------
"""
entities = [x for x in model.query(query_string) if x.has_dxf_attrib('name')]
coords = [i.dxf.insert for i in entities]
id = ['UID' + i.dxf.handle for i in entities]
output = pd.DataFrame(coords, columns = ['x', 'y', 'z'], index = id).drop(['z'], axis=1)
output.index.name = "id"
output['type'] = [i.dxf.name for i in entities]
output['layer'] = [i.dxf.layer for i in entities]
# A known issue with converting DWG files to DXF is that some elements have their x coordinate reversed
if len(output.x[output.x <= 0]) != 0:
print("Negative x values! This is a known issue with dxf files")
print("Mirroring negative x values")
output.x = np.abs(output.x)
# Apply scaling factor
output = output.apply(lambda x: x * scaling if x.name in ['x', 'y'] else x)
return output

The second function builds the query and passes it to the first.

def extract(model, layer=None, block=None, scaling = 0.001):
"""
Extracts all objects of a certain block type within a specified layer of a dxf file.
Parameters
----------
model : ezdxf modelspace object
A modelspace object created from a loaded dxf file.
See the ezdxf help file for 'readfile' and 'modelspace' methods
layer : string
A string specifying the layer name in the modelspace object
block : string
A string specifying the block name in the modelspace object
scaling : positive real value
Depending on the units of the modelspace and the desired units, a scaling
may be preferred. The default is a unit of mm and a desired unit of metre.
(Default value = 0.001)
Returns
Pandas dataframe object with ID, x and y coordinate values. One row per object.
-------
"""
if layer is None:
query_string = '*[name==\"%s\"]' % block
elif block is None:
query_string = '*[layer==\"%s\"]' % layer
else:
query_string = '*[layer==\"%s\" & name==\"%s\"]' % (layer, block)
output = extract_query(model, query_string, scaling)
return output

Note that the default scaling is 0.01. This is because typical CAD files dealing with building drawings have units of cm, and I would like to store x,y positions in metres.

Taking it further

At Arcadis Gen we take a package based approach to consultancy to make enduring products from one-off engagements and shorten the development cycle for future similar projects. To do this I developed a Python package called cadextract, built on top of ezdxf and providing helper functions to extract seat plans. The functions in this package look a little similar to the above, but also include fuzzy_extract to deal with queries using regular expressions, batch_extract that extracts from multiple DXF files using the same query and stores them in a single dataframe, plotting objects, and more.

In the third and final part of this series on CAD I’ll take a more speculative look at where you might be able to take this programmatic approach, and where future opportunities might lie.

Extract features from CAD documents Part 1: A primer

Mon, 01 Mar 2021 00:00:00 +0000

This is the first part of a series of posts describing my experience of extracting objects from a CAD document.

Motivation

Recently at work a project came up that involved the optimisation of building occupancy during the refit of a 16 floor central London office block. As we intend to optimise team locations down to the seat level within a floor, we were given floor plans in both pdf and dwg file formats.

The original scope allowed for us to rely on another team to manually encode the distances between each desk on each floor, but there are many downsides to this approach: human error; costly rework when plans change; lack of ability to iteratively improve on feature extraction; and huge risk of schedule overrun when modelling has to wait on this input data. The alternative was to programmatically scrape the CAD file for seat ID and X and Y coordinates. I’ll present the solution in Part 2, but first here is what I learned about the structure of CAD files that I needed before writing an extraction utility.

File types

There are generally two file types in the world of CAD: DWG and DXF. DWG is meant as shorthand for drawing, and is a proprietary file format belonging to AutoCAD. It’s not very useful on its own so I found it best to convert to DXF using the open source file converter provided by the Open Design Alliance.

DXF is short for design exchange format, and comes in binary and ASCII flavours. I prefer the ASCII version as it is human readable but the downside is that it is uncompressed and so file sizes have the potential to be large. It is much more easily shared between programs (even Inkscape) and I find that LibreCAD is a really lightweight way of browsing floor plans in this format.

File structure

A CAD document comprises one model space and zero or more paper spaces. A model space is a limitless field in XYZ space with a certain unit and coordinate frame of reference. A paper space is what you might expect: a layout designed for presentation and printing. Paper spaces are constrained in spatial extent and will have a scale, where the units in model space are scaled for rendering. A perspective is also defined, which affects the perception of distance and distortion (if in 3d).

For extracting objects and their locations we need the model space rather than the paper space.

Components of a CAD drawing

Building up from the smallest elements, we have…

Entities

Entities are the most primitive element of a CAD drawing, including points, lines, rectangles, circles and elliptical arcs.

Blocks

Blocks are a group of one or more entities. It functions as a template, which can be inserted into the model space multiple times. Each instance of a block is called an insert. If you update a block (by editing an element within the block, for instance) then every insert of that block is updated.

Layers

A layer organises many elements and inserts under common attributes and under common command (e.g., visibility, locked/unlocked). Whilst a layer may contain many elements, an element can only be on a single layer. This fact makes layers useful for searching for entities and inserts.

Extraction strategy

For my purpose, I need to extract each seat (or alternatively, each desk) from each floor. This means that I should search for every insert of the appropriate seat block, likely located within a common layer. Ideally that search would return a list containing all the inserts, each of which should contain a unique identifier and an X/Y coordinate.

For that we will use the ezdxf package in Python in Part 2.