编辑: 我不是阿L | 2019-07-02 |
2011 ChemAxon UGM, San Diego, USA, 28th September
2011 overview ? Chemical Pattern Matching ? Efficient Single Pattern Matching ? Multiple Pattern Matching ? Toolkit Code Generation ? Performance Figures ? Conclusions Previous work ? Efficient Protein and Nucleic Acid Perception from Simple Atomic Connectivity www.
daylight.com/meetings/mug96/sayle/sayle.html Describes algorithms for perceiving protein sequence and PDB atom names from SMILES, MDL or XYZ file of a protein. ? 1st Class SMARTS patterns www.daylight.com/meetings/emug97/Sayle/ Describes SMARTS syntax and SMARTS algebra, a set of semantics preserving transformations that can be used to optimize SMARTS patterns. Chemical pattern matching ? The identification of a specific subgraph within a graph, also known as subgraph isomorphism ? Typically to identify a functional group or substructure in a molecule connection table. ? Query patterns are typically specified as SMARTS, MDL query files, CDX or Marvin files. ? Matching is performed using C Ullman'
s isomorphism algorithm [1970] C McGregor'
s backtracking search [1981] chemical database searching ? Although a backtracking atom-by-atom match is very efficient for matching a single pattern against a single molecule, well known optimizations exist for scanning a large database of target molecules. C Fingerprint screening/inverted indices C Character frequency (histogram) screening C Triage substructure identification http://www.daylight.com/meetings/emug00/Sayle/substruct.html Toolkit (SMARTS?) performance ? Time taken to find O=[C,N]aa[N,O;
!H0] hits in 250,251 SMILES of the NCI August
2000 data. ? Most time is typically spent on molecule I/O. ToolKit Times (secs) ChemAxon JChem v5.5 58.8 RDKit v2011_03_2 131.2 OpenBabel v2.3.0 272.5 PerlMol 2107.9 CDK v1.2.10 DNF Cheminformatics applications ? Compound Filtering ? Fingerprint generation C Database clustering ? Atom Typing C Property prediction filtering radioactive compounds ? A molecule is radioactive if any of its atoms are radioactive. An atom is radioactive if its is not stable . ? If an isotope is specified it must be one of the
255 known stable nuclides, otherwise the corresponding element must have at least one stable isotope. ? Elements H to 82Pb, with exceptions of 43Tc and 61Pm. ? Hence stable is [0#1,1#1,2#1,0#2,3#2,4#2…] . ? Hence, radioactive is [!0,!#1;
!1,!#1;
!2,!#2;
…] . Radioactive smarts [!0,!#1;
!1,!#1;
!2,!#1;
!0,!#2;
!3,!#2;
!4,!#2;
!0,!#3;
!6,!#3;
!7,!#3;
!0,!#4;
!9,!#4;
!0,!#5;
!10,!#5;
!11,!#5;
!0,!#6;
!12,!#6;
!13,!#6;
!0,!#7;
!14,! #7;
!15,!#7;
!0,!#8;
!16,!#8;
!17,!#8;
!18,!#8;
!0,!#9;
!19,!#9;
!0,!#10;
!20,!#10;
!21,!#10;
!22,!#10;
!0,!#11;
!23,!#11;
!0,!#12;
!24,!#12;
!25,! #12;
!26,!#12;
!0,!#13;
!27,!#13;
!0,!#14;
!28,!#14;
!29,!#14;
!30,!#14;
!0,!#15;
!31,!#15;
!0,!#16;
!32,!#16;
!33,!#16;
!34,!#16;
!36,!#16;
!0, !#17;
!35,!#17;
!37,!#17;
!0,!#18;
!36,!#18;
!38,!#18;
!40,!#18;
!0,!#19;
!39,!#19;
!41,!#19;
!0,!#20;
!40,!#20;
!42,!#20;
!43,!#20;
!44,!#20;
!46,!#20;
!0,!#21;
!47,!#21;
!0,!#22;
!46,!#22;
!47,!#22;
!48,!#22;
!49,!#22;
!50,!#22;
!0,!#23;
!51,!#23;
!0,!#24;
!50,!#24;
!52,!#24;
!53,!# 24;
!54,!#24;
!0,!#25;
!55,!#25;
!0,!#26;
!54,!#26;
!56,!#26;
!57,!#26;
!58,!#26;
!0,!#27;
!59,!#27;
!0,!#28;