编辑: 我不是阿L 2019-07-02
Efficient SIMULTANEOUS matching of multiple SMARTs using the chemaxon toolkits Roger Sayle Nextmove software ltd Cambridge uk

2011 ChemAxon UGM, San Diego, USA, 28th September

2011 overview ? Chemical Pattern Matching ? Efficient Single Pattern Matching ? Multiple Pattern Matching ? Toolkit Code Generation ? Performance Figures ? Conclusions Previous work ? Efficient Protein and Nucleic Acid Perception from Simple Atomic Connectivity www.

daylight.com/meetings/mug96/sayle/sayle.html Describes algorithms for perceiving protein sequence and PDB atom names from SMILES, MDL or XYZ file of a protein. ? 1st Class SMARTS patterns www.daylight.com/meetings/emug97/Sayle/ Describes SMARTS syntax and SMARTS algebra, a set of semantics preserving transformations that can be used to optimize SMARTS patterns. Chemical pattern matching ? The identification of a specific subgraph within a graph, also known as subgraph isomorphism ? Typically to identify a functional group or substructure in a molecule connection table. ? Query patterns are typically specified as SMARTS, MDL query files, CDX or Marvin files. ? Matching is performed using C Ullman'

s isomorphism algorithm [1970] C McGregor'

s backtracking search [1981] chemical database searching ? Although a backtracking atom-by-atom match is very efficient for matching a single pattern against a single molecule, well known optimizations exist for scanning a large database of target molecules. C Fingerprint screening/inverted indices C Character frequency (histogram) screening C Triage substructure identification http://www.daylight.com/meetings/emug00/Sayle/substruct.html Toolkit (SMARTS?) performance ? Time taken to find O=[C,N]aa[N,O;

!H0] hits in 250,251 SMILES of the NCI August

2000 data. ? Most time is typically spent on molecule I/O. ToolKit Times (secs) ChemAxon JChem v5.5 58.8 RDKit v2011_03_2 131.2 OpenBabel v2.3.0 272.5 PerlMol 2107.9 CDK v1.2.10 DNF Cheminformatics applications ? Compound Filtering ? Fingerprint generation C Database clustering ? Atom Typing C Property prediction filtering radioactive compounds ? A molecule is radioactive if any of its atoms are radioactive. An atom is radioactive if its is not stable . ? If an isotope is specified it must be one of the

255 known stable nuclides, otherwise the corresponding element must have at least one stable isotope. ? Elements H to 82Pb, with exceptions of 43Tc and 61Pm. ? Hence stable is [0#1,1#1,2#1,0#2,3#2,4#2…] . ? Hence, radioactive is [!0,!#1;

!1,!#1;

!2,!#2;

…] . Radioactive smarts [!0,!#1;

!1,!#1;

!2,!#1;

!0,!#2;

!3,!#2;

!4,!#2;

!0,!#3;

!6,!#3;

!7,!#3;

!0,!#4;

!9,!#4;

!0,!#5;

!10,!#5;

!11,!#5;

!0,!#6;

!12,!#6;

!13,!#6;

!0,!#7;

!14,! #7;

!15,!#7;

!0,!#8;

!16,!#8;

!17,!#8;

!18,!#8;

!0,!#9;

!19,!#9;

!0,!#10;

!20,!#10;

!21,!#10;

!22,!#10;

!0,!#11;

!23,!#11;

!0,!#12;

!24,!#12;

!25,! #12;

!26,!#12;

!0,!#13;

!27,!#13;

!0,!#14;

!28,!#14;

!29,!#14;

!30,!#14;

!0,!#15;

!31,!#15;

!0,!#16;

!32,!#16;

!33,!#16;

!34,!#16;

!36,!#16;

!0, !#17;

!35,!#17;

!37,!#17;

!0,!#18;

!36,!#18;

!38,!#18;

!40,!#18;

!0,!#19;

!39,!#19;

!41,!#19;

!0,!#20;

!40,!#20;

!42,!#20;

!43,!#20;

!44,!#20;

!46,!#20;

!0,!#21;

!47,!#21;

!0,!#22;

!46,!#22;

!47,!#22;

!48,!#22;

!49,!#22;

!50,!#22;

!0,!#23;

!51,!#23;

!0,!#24;

!50,!#24;

!52,!#24;

!53,!# 24;

!54,!#24;

!0,!#25;

!55,!#25;

!0,!#26;

!54,!#26;

!56,!#26;

!57,!#26;

!58,!#26;

!0,!#27;

!59,!#27;

!0,!#28;

下载(注:源文件不在本站服务器,都将跳转到源网站下载)
备用下载
发帖评论
相关话题
发布一个新话题