Skip to content

Molecule

A Molecule defines a collection of atoms by their geometry (in bohr), atomic numbers, charge and the multiplicity. It can be created from a variety of sources and data formats.

The Molecule building block is a critical input (and often output) of most workflows.

Creating a new Molecule

A common method of Molecule creation is to directly set the fields:

import sierra
from sierra.inputs import *

# Build a molecule from raw data, note the distances are in Bohr
he2 = Molecule(atomic_numbers=[2, 2], geometry=[0, 0, 0, 0, 0, 5])
print(he2)
#> Molecule(formula='He2', eoi='6e4877e')
print(he2.measure([0, 1]))
#> 5.0

Here, the charge and multiplicity are set to the defaults of 0 and 1, respectively.

Element symbols can also be used in place of atomic numbers for initialization:

from sierra.inputs import *

# Build a molecule from symbols, note the distances are in Bohr
he2 = Molecule(symbols=["He", "He"], geometry=[0, 0, 0, 0, 0, 5])
print(he2)
#> Molecule(formula='He2', eoi='6e4877e')

Importing common file formats

It is also common to construct a Molecule from SDF, XYZ or XYZ+ text. These formats specify positions in Angstrom and Molecule will convert these to Bohr to store in the geometry field.

A compatible file can be loaded with the file field:

from sierra.inputs import *

# Build a molecule from a SDF, XYZ or XYZ+ file
water = Molecule(file="examples/atoms.xyz")

or the file content can be passed to the data field:

from sierra.inputs import *

# Build a molecule from SDF, XYZ or XYZ+ contents
# Note the distances are in Angstrom
water = Molecule(
    data="""
O 0 0 0
H 0 0 1
H 0 1 0
"""
)
print(water)
#> Molecule(formula='H2O', eoi='6398bd8')

Generating from a SMILES string

A molecule can be straightforwardly generated from a smiles string.

from sierra.inputs import *

butane = Molecule(smiles="CCCC")
print(butane)
#> Molecule(formula='C4H10', eoi='0aabcca')

Here, our internal conformers tools are used to generate a structure from the SMILES string. Note that this implementation prioritizes speed of execution to obtain a reasonable structure rather than a rigorous conformational search. Please use the Conformer workflow for full control over geometry generation.

Importing from PubChem

A very useful form of making a Molecule is via the PubChem interface. The pubchem attribute can be used to automatically search pubchem for the best common name match and generate a Molecule.

from sierra.inputs import *

caffeine = Molecule(pubchem="caffeine")
print(caffeine)
#> Molecule(formula='C8H10N4O2', eoi='6812f19')

Warning

The pubchem interface sends data to PubChem servers and should not be used for proprietary material. This is the only operation in Sierra which reaches to an outside server, all other calls, including the Conformer workflow, run locally.

Exporting a Molecule

Molecule objects can easily be exported to a file or a string variable in XYZ+ format:

from pathlib import Path
from sierra.inputs import *

mol = Molecule(pubchem="caffeine")

# Write a molecule as XYZ+ format
xyz_text = mol.write()
print(xyz_text)
"""
24
0 1
O     0.470000000014   2.568799999980   0.000600000013
O    -3.127099999991  -0.443600000008  -0.000299999980
N    -0.968599999993  -1.312500000015   0.000000000000
N     2.218199999976   0.141200000002  -0.000299999980
N    -1.347700000022   1.079700000022  -0.000099999993
N     1.411900000022  -1.937199999987   0.000199999987
C     0.857899999991   0.259199999998  -0.000800000000
C     0.389700000018  -1.026399999992  -0.000399999974
C     0.030699999986   1.421999999991  -0.000600000013
C    -1.906099999974  -0.249500000003  -0.000399999974
C     2.503200000019  -1.199799999986   0.000299999980
C    -1.427599999992  -2.696000000005   0.000800000000
C     3.192600000010   1.206099999994   0.000299999980
C    -2.296900000025   2.188100000003   0.000700000007
H     3.516299999989  -1.578699999975   0.000800000000
H    -1.045099999975  -3.197300000018  -0.893700000012
H    -2.518600000009  -2.759599999991   0.001099999980
H    -1.044700000002  -3.196299999978   0.895699999986
H     4.199199999985   0.780099999989   0.000199999987
H     3.046799999995   1.809200000014  -0.899200000019
H     3.046600000008   1.808300000021   0.900399999993
H    -1.808699999994   3.165100000025  -0.000299999980
H    -2.932199999985   2.102700000026   0.888100000011
H    -2.934599999986   2.102100000013  -0.884900000010

"""

# Write to a file
mol.write(filename=Path("caffeine.xyz+"))
The comment line will contain the charge and multiplicity, as per the XYZ+ standard.

Fields

atomic_numbers

The (n, ) atomic numbers of the Molecule.

  • Type: Optional[Array]
  • Additional Details: shape: (-1,)
charge

The overall charge of the molecule.

  • Type: int
  • Default: 0
geometry

The (n, 3) coordinates for the molecule in Bohr.

  • Type: Array
  • Additional Details: shape: (-1, 3)
multiplicity

The overall multiplicity of the molecule. A value of None refers to the lowest multiplicity given the electron number parity.

  • Type: Optional[int]
masses

The (n, ) array of masses of the atoms. This field is read-only.

  • Type: Array[float]
  • Additional Details: shape: (-1,)
symbols

The (n, ) array of symbols of the atoms. This field is read-only.

  • Type: Array[str]
  • Additional Details: shape: (-1,)

Functions

Molecule.measure

measure((List[int]) indices) -> float:

For a list of two, three or four atom indices, this function returns the corresponding bond length, angle or dihedral angle, respectively.

Arguments

indices

A list of two, three or four atom indices.

  • Type List[int]

Returns

The corresponding value as a float.

Molecule.write

write((str) format, (Optional[Union[str, Path]]) filename) -> Optional[str]:

Write the molecule into a file or string, using the specified file format.

Arguments

format

The file format.

  • Type: str
  • Default: "xyz"
filename

If specified, the molecule is written to this file. If no filename is provided, the result is returned as a str.

  • Type: Optional[Union[str, Path]]
  • Default: None

Returns

None if a filename has been specified, and otherwise a str with the same content.