Extract features from CAD documents Part 2: Using ezdxf

In this second part of the series of posts on extracting objects from a CAD document, I’ll go through the process of using the ezdxf package to implement the extraction strategy discussed in part one.

Prologue

In part one we looked at the structure of a CAD file and built up a strategy to extract seat types and locations from an architect’s floor plan. The motivation for this is to provide seat location data to a model that creates a stack plan with optimal locations of teams to office amenities and other teams they collaborate with.

In summary, our strategy is:

  • load the DXF file;
  • create an object from the model space;
  • build a layer and block type query;
  • extract unique ID, x and y features from every element returned from the query, recording the block type and layer it came from;
  • check for errors and write to csv file.

ezdxf

Since the genetic algorithm we’ve built is in R, I tried to build an extraction tool in R. However, the only DXF file loader package I could find is called ezdxf and is built in Python. I’m not a fan of reinventing the wheel so I wrote a custom package in Python that is built on top of ezdxf’s classes and methods.

The following functions are building blocks to load the model space from a dxf file, query the model space object for block inserts that belong to a certain block type or layer, and to extract them. They rely heavily on the documentation and tutorials from the ezdxf readme website so I would encourage the interested reader to refer to those documents for background information.

Load file

import ezdxf
import numpy as np
import pandas as pd
import sys

def load(filepath):
    """
    Loads the modelspace of a dxf file using the ezdxf package.

    Parameters
    ----------
    filepath: string
        The file path to a dxf file

    Returns
    An exdxf modelspace object.
    -------

    """

    try:
        doc = ezdxf.readfile(filepath)
        msp = doc.modelspace()
    except IOError:
        print(f'Not a DXF file or a generic I/O error.')
        sys.exit(1)
    except ezdxf.DXFStructureError:
        print(f'Invalid or corrupted DXF file.')
        sys.exit(2)
    return msp
def print_blocks(model, layer):
    """
    Prints all blocks belonging to a layer

    Parameters
    ----------
    model : ezdxf modelspace object
        A modelspace object created from a loaded dxf file. 
        See the ezdxf help file for 'readfile' and 'modelspace' methods
        
    layer : string
        A string specifying the layer name in the modelspace object
    
    Returns
    A list of string elements
    -------

    """   
    query_string = '*[layer ? \"%s\"]' % layer
    entities = model.query(query_string)
    layers = set(i.dxf.layer for i in entities)
    for i in layers:
        layer_entity = model.query('*[layer == \"%s\"]' % i)
        layer_entity = [x for x in layer_entity if x.has_dxf_attrib('name')]
        blocks = set(j.dxf.name for j in layer_entity)
        print(i + ":")
        for x in blocks:
            print(x)

Extract inserts belonging to a block type or layer

The first function extracts all inserts of all blocks based on a query made on the model object. Within the body of the function there is code to check for negative x values and mirror them if it finds any. This is because when I compared the DWG and DXF files I noticed that some objects had been mirrored in the DXF file for some reason. This fix worked for me but it may not work for you, depending on your version of CAD software you used, the conversion software and the point of origin (the 0,0 point). This issue is not due to Python as the issue appeared in the input DXF file prior to extraction. You have been warned!

def extract_query(model, query_string, scaling = 0.001):
    """
    Extracts all objects from an ezdxf model object that are returned by a query.

    Parameters
    ----------
    model : ezdxf modelspace object
        A modelspace object created from a loaded dxf file. 
        See the ezdxf help file for 'readfile' and 'modelspace' methods
        
    query_string : string
        A string specifying a layer or block based query
        
    scaling : positive real value
        Depending on the units of the modelspace and the desired units, a scaling
        may be preferred. The default is a unit of mm and a desired unit of metre.
         (Default value = 0.001)

    Returns
    Pandas dataframe object with ID, x and y coordinate values. One row per object.
    -------

    """
    entities = [x for x in model.query(query_string) if x.has_dxf_attrib('name')]
    coords = [i.dxf.insert for i in entities]
    id = ['UID' + i.dxf.handle for i in entities]
    output = pd.DataFrame(coords, columns = ['x', 'y', 'z'], index = id).drop(['z'], axis=1)
    output.index.name = "id"
    output['type'] = [i.dxf.name for i in entities]
    output['layer'] = [i.dxf.layer for i in entities]
    # A known issue with converting DWG files to DXF is that some elements have their x coordinate reversed
    if len(output.x[output.x <= 0]) != 0:
        print("Negative x values! This is a known issue with dxf files")
        print("Mirroring negative x values")
        output.x = np.abs(output.x)
    # Apply scaling factor
    output = output.apply(lambda x: x * scaling if x.name in ['x', 'y'] else x)
    return output

The second function builds the query and passes it to the first.

def extract(model, layer=None, block=None, scaling = 0.001):
    """
    Extracts all objects of a certain block type within a specified layer of a dxf file.

    Parameters
    ----------
    model : ezdxf modelspace object
        A modelspace object created from a loaded dxf file. 
        See the ezdxf help file for 'readfile' and 'modelspace' methods
        
    layer : string
        A string specifying the layer name in the modelspace object
        
    block : string
        A string specifying the block name in the modelspace object
        
    scaling : positive real value
        Depending on the units of the modelspace and the desired units, a scaling
        may be preferred. The default is a unit of mm and a desired unit of metre.
         (Default value = 0.001)

    Returns
    Pandas dataframe object with ID, x and y coordinate values. One row per object.
    -------

    """
    if layer is None:
        query_string = '*[name==\"%s\"]' % block
    elif block is None:
        query_string = '*[layer==\"%s\"]' % layer
    else:
        query_string = '*[layer==\"%s\" & name==\"%s\"]' % (layer, block)
    output = extract_query(model, query_string, scaling)
    return output

Note that the default scaling is 0.01. This is because typical CAD files dealing with building drawings have units of cm, and I would like to store x,y positions in metres.

Taking it further

At Arcadis Gen we take a package based approach to consultancy to make enduring products from one-off engagements and shorten the development cycle for future similar projects. To do this I developed a Python package called cadextract, built on top of ezdxf and providing helper functions to extract seat plans. The functions in this package look a little similar to the above, but also include fuzzy_extract to deal with queries using regular expressions, batch_extract that extracts from multiple DXF files using the same query and stores them in a single dataframe, plotting objects, and more.

In the third and final part of this series on CAD I’ll take a more speculative look at where you might be able to take this programmatic approach, and where future opportunities might lie.

Richard Davey
Richard Davey
Group Leader, Data & Decision Science

My interests include earth science, numerical modelling and problem solving through optimisation.

Related