You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
chore(pre-commit): use mdformat to format Markdown (#941)
Follow up deepmodeling/deepmd-kit@69eb0c3, use mdformat to format
Markdown.
- Replace blacken-docs with mdformat
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated pre-commit config to switch to an mdformat-based markdown
formatter, add exclusions for tests, and include several mdformat
plugins.
* **Documentation**
* Added detailed BondOrderSystem docs with usage examples and
sanitization/charge guidance.
* Expanded and clarified multiple system docs (mixed, multi, system)
with examples and formatting improvements.
* Minor formatting/readability edits to AGENTS.md and dpdata-cli
README/SKILL.md.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: njzjz-bot <njzjz-bot@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Copy file name to clipboardExpand all lines: docs/systems/bond_order_system.md
+14-6Lines changed: 14 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,7 @@
1
-
2
1
## BondOrderSystem
2
+
3
3
A new class {class}`BondOrderSystem <dpdata.BondOrderSystem>` which inherits from class {class}`System <dpdata.System>` is introduced in dpdata. This new class contains information of chemical bonds and formal charges (stored in `BondOrderSystem.data['bonds']`, `BondOrderSystem.data['formal_charges']`). Now BondOrderSystem can only read from .mol/.sdf formats, because of its dependency on rdkit (which means rdkit must be installed if you want to use this function). Other formats, such as pdb, must be converted to .mol/.sdf format (maybe with software like open babel).
In sdf file, all molecules must be of the same topology (i.e. conformers of the same molecular configuration).
15
17
`BondOrderSystem <dpdata.BondOrderSystem>` also supports initialize from a {class}`rdkit.Chem.rdchem.Mol` object directly.
18
+
16
19
```python
17
20
from rdkit import Chem
18
21
from rdkit.Chem import AllChem
@@ -25,21 +28,26 @@ system = dpdata.BondOrderSystem(rdkit_mol=mol)
25
28
```
26
29
27
30
### Bond Order Assignment
31
+
28
32
The {class}`BondOrderSystem <dpdata.BondOrderSystem>` implements a more robust sanitize procedure for rdkit Mol, as defined in {class}`dpdata.rdkit.santizie.Sanitizer`. This class defines 3 level of sanitization process by: low, medium and high. (default is medium).
29
-
+ low: use `rdkit.Chem.SanitizeMol()` function to sanitize molecule.
30
-
+ medium: before using rdkit, the programm will first assign formal charge of each atom to avoid inappropriate valence exceptions. However, this mode requires the rightness of the bond order information in the given molecule.
31
-
+ high: the program will try to fix inappropriate bond orders in aromatic hetreocycles, phosphate, sulfate, carboxyl, nitro, nitrine, guanidine groups. If this procedure fails to sanitize the given molecule, the program will then try to call `obabel` to pre-process the mol and repeat the sanitization procedure. **That is to say, if you wan't to use this level of sanitization, please ensure `obabel` is installed in the environment.**
32
-
According to our test, our sanitization procedure can successfully read 4852 small molecules in the PDBBind-refined-set. It is necessary to point out that the in the molecule file (mol/sdf), the number of explicit hydrogens has to be correct. Thus, we recommend to use
33
-
`obabel xxx -O xxx -h` to pre-process the file. The reason why we do not implement this hydrogen-adding procedure in dpdata is that we can not ensure its correctness.
33
+
34
+
- low: use `rdkit.Chem.SanitizeMol()` function to sanitize molecule.
35
+
- medium: before using rdkit, the programm will first assign formal charge of each atom to avoid inappropriate valence exceptions. However, this mode requires the rightness of the bond order information in the given molecule.
36
+
- high: the program will try to fix inappropriate bond orders in aromatic hetreocycles, phosphate, sulfate, carboxyl, nitro, nitrine, guanidine groups. If this procedure fails to sanitize the given molecule, the program will then try to call `obabel` to pre-process the mol and repeat the sanitization procedure. **That is to say, if you wan't to use this level of sanitization, please ensure `obabel` is installed in the environment.**
37
+
According to our test, our sanitization procedure can successfully read 4852 small molecules in the PDBBind-refined-set. It is necessary to point out that the in the molecule file (mol/sdf), the number of explicit hydrogens has to be correct. Thus, we recommend to use
38
+
`obabel xxx -O xxx -h` to pre-process the file. The reason why we do not implement this hydrogen-adding procedure in dpdata is that we can not ensure its correctness.
34
39
35
40
```python
36
41
import dpdata
37
42
38
43
for sdf_file in glob.glob("bond_order/refined-set-ligands/obabel/*sdf"):
BondOrderSystem implement a method to assign formal charge for each atom based on the 8-electron rule (see below). Note that it only supports common elements in bio-system: B,C,N,O,P,S,As
Copy file name to clipboardExpand all lines: docs/systems/multi.md
+5-3Lines changed: 5 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,15 +1,16 @@
1
1
# `MultiSystems`
2
2
3
-
The Class {class}`dpdata.MultiSystems`can read data from a dir which may contains many files of different systems, or from single xyz file which contains different systems.
3
+
The Class {class}`dpdata.MultiSystems` can read data from a dir which may contains many files of different systems, or from single xyz file which contains different systems.
4
4
5
-
Use {meth}`dpdata.MultiSystems.from_dir` to read from a directory, {class}`dpdata.MultiSystems` will walk in the directory
6
-
Recursively and find all file with specific file_name. Supports all the file formats that {class}`dpdata.LabeledSystem` supports.
5
+
Use {meth}`dpdata.MultiSystems.from_dir` to read from a directory, {class}`dpdata.MultiSystems` will walk in the directory
6
+
Recursively and find all file with specific file_name. Supports all the file formats that {class}`dpdata.LabeledSystem` supports.
7
7
8
8
Use {meth}`dpdata.MultiSystems.from_file` to read from single file. Single-file support is available for the `quip/gap/xyz` and `ase/structure` formats.
9
9
10
10
For example, for `quip/gap xyz` files, single .xyz file may contain many different configurations with different atom numbers and atom type.
11
11
12
12
The following commands relating to {class}`dpdata.MultiSystems` may be useful.
or let dpdata infer the format (`vasp/poscar`) of the file from the file name extension
22
+
19
23
```python
20
24
d_poscar = dpdata.System("my.POSCAR")
21
25
```
26
+
22
27
The number of atoms, atom types, coordinates are loaded from the `POSCAR` and stored to a data {class}`System <dpdata.System>` called `d_poscar`.
23
28
A data {class}`System <dpdata.System>` (a concept used by [deepmd-kit](https://github.com/deepmodeling/deepmd-kit)) contains frames that has the same number of atoms of the same type. The order of the atoms should be consistent among the frames in one {class}`System <dpdata.System>`.
24
29
It is noted that `POSCAR` only contains one frame.
25
30
If the multiple frames stored in, for example, a `OUTCAR` is wanted,
31
+
26
32
```python
27
33
d_outcar = dpdata.LabeledSystem("OUTCAR")
28
34
```
35
+
29
36
The labels provided in the `OUTCAR`, i.e. energies, forces and virials (if any), are loaded by {class}`LabeledSystem <dpdata.LabeledSystem>`. It is noted that the forces of atoms are always assumed to exist. {class}`LabeledSystem <dpdata.LabeledSystem>` is a derived class of {class}`System <dpdata.System>`.
30
37
31
38
The {class}`System <dpdata.System>` or {class}`LabeledSystem <dpdata.LabeledSystem>` can be constructed from the [supported file formats](../formats.rst) with the `format key` in the table passed to argument `fmt`.
32
39
33
-
34
-
35
40
### Access data
41
+
36
42
These properties stored in {class}`System <dpdata.System>` and {class}`LabeledSystem <dpdata.LabeledSystem>` can be accessed by operator `[]` with the key of the property supplied, for example
43
+
37
44
```python
38
45
coords = d_outcar["coords"]
39
46
```
40
-
Available properties are (nframe: number of frames in the system, natoms: total number of atoms in the system)
41
47
42
-
| key | type | dimension | are labels | description
43
-
| --- | --- | --- | --- | ---
44
-
| 'atom_names' | list of str | ntypes | False | The name of each atom type
45
-
| 'atom_numbs' | list of int | ntypes | False | The number of atoms of each atom type
46
-
| 'atom_types' | np.ndarray | natoms | False | Array assigning type to each atom
47
-
| 'cells' | np.ndarray | nframes x 3 x 3 | False | The cell tensor of each frame
48
-
| 'coords' | np.ndarray | nframes x natoms x 3 | False | The atom coordinates
| 'forces' | np.ndarray | nframes x natoms x 3 | True | The atom forces |
59
+
| 'virials' | np.ndarray | nframes x 3 x 3 | True | The virial tensor of each frame |
53
60
54
61
### Dump data
62
+
55
63
The data stored in {class}`System <dpdata.System>` or {class}`LabeledSystem <dpdata.LabeledSystem>` can be dumped in 'lammps/lmp' or 'vasp/poscar' format, for example:
tuple(1,2,3) means don't copy atom configuration in x direction, make 2 copys in y direction, make 3 copys in z direction.
92
110
111
+
tuple(1,2,3) means don't copy atom configuration in x direction, make 2 copys in y direction, make 3 copys in z direction.
93
112
94
113
### perturb
114
+
95
115
By the following example, each frame of the original system (`dpdata.System('./POSCAR')`) is perturbed to generate three new frames. For each frame, the cell is perturbed by 5% and the atom positions are perturbed by 0.6 Angstrom. `atom_pert_style` indicates that the perturbation to the atom positions is subject to normal distribution. Other available options to `atom_pert_style` are`uniform` (uniform in a ball), and `const` (uniform on a sphere).
0 commit comments