List of chemical databases

This is a list of websites that contain lists of chemicals, or databases of chemical information. There is further detail on the content of these and other resources in a Wikibook of information sources.

abbreviation full name operator selects contains id prefix quality link entries
ACToR Environmental Protection Agency toxicology information; occurrence "ACToR". 893,280
AtomWork Inorganic Material Database National Institute for Materials Science crystal structures "AtomWork". 82,000
Beilstein Beilstein database Elsevier organic compounds properties closed access
BIAdb Benzylisoquinoline Alkaloid Database "BIAdb". 846
BindingDB The Binding Database Skaggs School of Pharmacy and Pharmaceutical Sciences at the University of California, San Diego noncovalent association of molecules in solution ChEMBL SMILES InChiKey targets "BindingDB".
BindingMOAD Binding Mother of All Databases protein ligand structures "BindingMOAD". 36047
BMDB Bovine Metabolome Database Collaborative Drug Discovery BMDB manually selected and checked "BMDB". 7859
BMRB Biological Magnetic Resonance Data Bank University of Wisconsin biological molecules including ligands, cofactors, peptides, saccharides NMR spectroscopy "BMRB".
BRENDA Technical University of Braunschweig enzymes ligands "BRENDA".
Carotenoids Database carotenoids CA "Carotenoids". 1195
CCCBDB Computational Chemistry Comparison and Benchmark DataBase National Institute of Standards and Technology gas phase molecules "CCCDBD" 2069
CCRIS Chemical Carcinogenesis Research Information System National Library of Medicine substances that affect tumors CCRIS from primary literature, reviewed by experts "CCRIS subset of PubChem". 9562[1][2]
CDD Public drug candidates limited access 3,000,000
ChEBI Chemical Entities of Biological Interest ELIXIR small chemical compounds from PDBeChem ChEMBL KEGG IntEnz "ChEBI". 60,000
Chematica Merck organic chemicals reaction pathway calculation; Beilstein CAS SMILES proprietary 7,000,000
ChEMBL Chemicals from European Molecular Biology Laboratory EMBL molecules with drug-like properties "ChEMBL". 1,961,000
cheML.io Departments of Computer Science and Chemistry at Nazarbayev University de novo molecules generated by ML models SMILES, computed properties artificially generated "cheML.io".[3] 2,800,000
ChemDB chemical database small molecules "ChemDB". 5,000,000
ChemExper Chemexper Chemical Directory catalogue chemicals CASno Structure SMILES "ChemExper".
Chemical Book East West University commercially available compounds CASno, suppliers, properties "Chemical Book". 200,000
Chemical Register from 20,000 vendors CASno mainly from larger-scale suppliers "Chemical Register". 1,750,000
ChemIDplus National Library of Medicine other NLM databases; regulated substances CASNo UNII structure CMNPD https://chem.nlm.nih.gov/chemidplus/chemidlite.jsp 400,000
ChemSpider Royal Society of Chemistry from 275 data sources "ChemSpider". 88,000,000
ChemIndex chemical database substances CAS Search; suppliers "Chemindex".
CMNPD Comprehensive Marine Natural Products Database Peking University from literature and other databases structural classification; species CMNPD curated https://www.cmnpd.org/ 31,561
COD Crystallography Open Database Vilnius University small molecules (open source) crystal structure atomic coordinates COD curated "COD". 478,715
Common Chemistry American Chemical Society structure CAS SMILES InCh https://commonchemistry.cas.org/[4] ~500,000
Compendium of Pesticide Common Names British Crop Production Council Pesticides with ISO common names structure, CASNo, IUPAC name, SMILES, InChI curated "Compendium of Pesticide Common Names". 1,800
CompTox CompTox Chemicals Dashboard US Environmental Protection Agency chemicals evaluated for potential health risks "CompTox".
CosIng Cosmetic Ingredients European Commission cosmetic ingredients "CosIng".
CrystalWorks Science and Technology Facilities Council "CrystalWorks".
CSD Cambridge Structural Database Cambridge Crystallographic Data Centre "CSD". 1,038,250
CSDB Carbohydrate Structure Database Zelinsky Institute of Organic Chemistry carbohydrates structures references CSDB ID "CSDB".
CTD Comparative Toxicogenomics Database Department of Biological Sciences at North Carolina State University MeSH CASNo ChEBI PubChem genes, pathways "CTD".
DDB Dortmund Data Bank pure compounds, mixtures, gas hydrates physical properties "DDB".
Dissociation Constants IUPAC Digitized pKa Dataset IUPAC dissociation constants "Dissociation Constants". GitHub.
DETHERM DECHEMA thermophysical properties "DETHERM". 75,000
DrugBank University of Alberta drugs "DrugBank".
DrugCentral University of New Mexico pharmaceuticals products containing substance "DrugCentral".
DTP/NCI DTP Open Compound collection National Cancer Institute Development Therapeutics Program Cancer therapeutics Cancer Chemotherapy National Service Center number "DTP/NCI". 250,000
ECHA REACH database European Chemicals Agency EINECS ELINCS NLP CASNo HPhrases pictograms tonnage "ECHA/REACH". 245,000
EAWAG-BBD Biocatalysis/Biodegradation Database Eawag: Swiss Federal Institute of Aquatic Science and Technology CAS SMILES pubchem pathways "EAWAG-BBD". 1396
eMolecules drug screening chemicals list of suppliers and catalog numbers "eMolecules". 8,000,000[5]
ENCS Japanese Existing and New Chemical Substances Inventory regulated chemicals "ENCS (in Japanese)".
Evaluated Kinetic Data IUPAC rate constants curated "Evaluated Kinetic Data".
FDA SRS Food and Drug Administration Substance Registration System U.S. National Library of Medicine ingredients in FDA regulated products UNII inchikey "FDA SRS". 781,000
FEMA Flavor Ingredient Library Flavor and Extract Manufacturers Association CAS CFR FEMA number "FEMA".
FooDB Food Database University of Alberta Food components and additives "FooDB". 70926
GlyTouCan international glycan structure repository Ministry of Education, Culture, Sports, Science & Technology glycans WURCS GlycoCT PubChem CID G "Glycan Repository". 122194
Gmelin Gmelin database Elsevier inorganic and organometallic compounds closed access 1,500,000
G-SRS Global Substance Registration System CAS PubChem ChEMBL INN UNII "G-SRS". 109,260
GMD Golm Metabolome Database GC/MS of metabolites "GMD".
Guide to PHARMACOLOGY IUPHAR drugs and targets INN CAS ChEBI ChEMBL DrugBank PubChem "Guide to PHARMACOLOGY".
Henry's law constants Max Planck Institute for Chemistry volatile compounds Henry's law constants from literature "Henry's law constants". 46434
HMDB Human Metabolome Database Genome Canada metabolites found in the human body biochemical data, clinical data HMDB "HMDB". 114,222[6]
HugeMDB Huge Molecular Database Elegant Mathematics LLC Small molecules (most of entries have <100 atoms) major conformers with its 3D and easy search on them M good correlated with PubChem on data that is available on PubChem "HugeMDB". 102 million
ICSC ILO International Chemical Safety Cards International Labour Organization CAS, EC number, UNnumber "ICSC". 1784
ICSD Inorganic Crystal Structure Database FIZ Karlsruhe GmbH "ICSD". 161,030
IEDB Immune Epitope Database National Institute of Allergy and Infectious Diseases Epitopes mainly peptides and carbohydrates "IEDB". 3,002 non-peptides
IUPAC-NIST Solubility Database https://srdata.nist.gov/solubility/index.aspx
JECDB Japan Existing Chemical Database CAS EINECS RTECS SDBS TSCA graph of number of articles per year "JECDB".
J-GLOBAL Nikaji Japan Science and Technology Agency "J-GLOBAL".
KEGG Kyoto Encyclopedia of Genes and Genomes Kyoto University Bioinformatics Center Compounds Glycans (also enzymes, reactions, pathways) CAS ChEBI ChEMBL MASSBANK NIKKAJI PubChem PDB-CCD "KEGG".
Ki Database PDSP ligand binding "Ki Database".
KNApSAcK Nara Institute of Science and Technology InChI CAS SMILES organisms C00 "KNApSAcK".
LINCS Library of Integrated Network-based Cellular Signatures small molecules PubChem ChEMBL SMILES InChI LSM "LINCS". 43,700
LipidBank Japanese Conference on the Biochemistry of Lipids lipids "LipidBank". 7,009
LMSD LIPID MAPS Structure Database Lipids HMDB ChEBI PubChem InChI LMFA "LMSD". 44701
LOLI List of Lists safety data sheets, regulation "LOLI".
Mcule supplied chemicals InChI, SMILES, SDF, physichochemical properties "Mcule". 45,000,000
MediaDB Institute for Systems Biology growth media "MediaDB". 288
Merck Index Royal Society of Chemistry drugs "Merck-Index". 11,500
MeSH Medical Subject Headings US National Library of Medicine biomedical thesaurus hierarchy of descriptors to literature with MeSH ID "MeSH".
MetaCyc SRI International metabolic pathways; metabolites "MetaCyc".
MetaboLights EMBL-EBI MTBL "MetaboLights".
MetaNetX SIB Swiss Institute of Bioinformatics metabolic networks, metabolites, biochemical reactions, cellular compartments metabolic models, SBML, InChI, InChIKey, SMILES MNXM unified namespace for metabolites and biochemical reactions in the context of metabolic models "MetaNetX". 240 metabolic models, 1292154 metabolites, 74613 reactions, 44 compartments
METLIN Metabolite and Chemical Entity Database tandem mass spectrometry of metabolites "METLIN". 960,000
MINAS Metal Ions in Nucleic AcidS University of Zurich https://www.minas.uzh.ch/
ModelSeed KEGG

MetaCyc

metabolic pathways

CPD "ModelSeed".
MolPort catalog chemicals "MolPort".
MoNA Mass Bank of North America mass spectra splash legg chemspider pubchem chebi CAS "MoNA". 200,000
npatlas The Natural Products Atlas Simon Fraser University microbial and fungal products smiles, organism NPA npatlas[7] 33434
NIOSH pocket guide NIOSH Pocket Guide to Chemical Hazards National Institute for Occupational Safety and Health commonly used chemicals exposure limits "NIOSH". 2 August 2024. 677
NIST Webbook NIST Chemistry Webbook National Institute of Standards and Technology spectra CAS ionization energy mass spectrum, InChI C+CAS "NIST Webbook".
NMRShiftDB University of Cologne organic nuclear magnetic resonance spectra "NMRShiftDB". 43,581
NORMAN SLE NORMAN Suspect List Exchange environmental monitoring "NORMAN SLE". 110,000
OMG Open Macromolecular Genome Jackson group at University of Illinois at Urbana-Champaign synthetically accessible linear homopolymers SMILES of linear homopolymers Github / Zenodo 12,886,131
ORD Open Reaction Database ORD consortium Organic reactions machine-readable reaction schemes "ORD"[8] 2,000,000
OrgSyn Organic Syntheses Organic Syntheses, Inc. Reliable chemical reactions Searchable experimental procedures Peer reviewed "OrgSyn search".
PDB PDBe Protein Data Bank in Europe EMBL-EBI has some chemicals as well as proteins "PDBe".
PATENTSCOPE WIPO "PATENTSCOPE". 16,000,000
PDB RSCB Protein Data Bank "PDB". 166,891
PharmGKB Shriram Center for Bioengineering and

Chemical Engineering

drugs targets prescribing info curated "PharmGKB".
PHAROS Illuminating the Druggable Genome National Institutes of Health drug ligands; targets[9] https://pharos.nih.gov/ 355932 ligands

20412 targets

Phenol-Explorer polyphenols found in food "Phenol-Explorer". 500
Phosida PHOsphorylation SIte DAtabase protein modifications "Phosida".
PoLyInfo Polymer Database National Institute for Materials Science physical properties "PoLyInfo". 26,000
PPDB Pesticide Properties Database Agriculture & Environment Research Unit, University of Hertfordshire Pesticides and their metabolites Chemical structure, physicochemical properties, human health and ecotoxicological data curated "PPDB". 2000[10]
Probes and Drugs
ProCarDB Prokaryotic Bacterial Carotenoid DataBase IMTECH spectra references "ProCarDB". 1800
PubChem National Library of Medicine National Center for Biotechnology Information from 748 data sources Structures, Names and Identifiers, Chemical and Physical Properties, Spectral Information, Related Records, Chemical Vendors, Pharmacology and Biochemistry, Use and Manufacturing, Safety and Hazards, Toxicity, Literature, Patents, Biomolecular Interactions and Pathways, Biological Test Results "PubChem". 103,000,000
Reaxys Elsevier chemical compounds Searchable chemical reactions "About Reaxys". 118,000,000
Ref-DB Re-referenced Protein Chemical shift Database proteins from BioMagResBank Re-referenced NMR shift "Ref-DB". 2162
Rhea Swiss Institute of Bioinformatics biochemical reactions ChEBI curated "Rhea".
RÖMPP Thieme Gruppe "RÖMPP".
RTECS Registry of Toxic Effects of Chemical Substances Dassault Systèmes Toxicity, Literature "Biovia-RTECS". 8 September 2023. 160,000
RxNav U.S. National Library of Medicine   drugs interactions "RxNav".
SaguaroChem De Novo Chem Chemical reactions from the patent literature Chemical reaction SMILES, annotated procedures, characterization data, reference metadata Curated from patent literature "SaguaroChem". 4 July 2024. 2,091,105
SciFinder Chemical Abstracts Service of American Chemical Society organic, inorganic chemicals, proteins CASNo paid access only 130,000,000
ScrubChem scraped from PubChem "ScrubChem". 2,282,992
SDBS Spectral Database for

Organic Compounds

National Institute of Advanced Industrial Science and Technology (AIST), Japan Organic compounds Spectra:IR Raman MASS ESR 1H NMR 13C NMR SDBS No curated "SDBS". 34,000
Serum Metabolome Database The Metabolomics Innovation Centre found in blood serum "Serum Metabolome DB". 4,651
Solvent Selection Tool ACS Green Chemistry Institute Solvents Principal components analysis of physical properties curated "Solvent Selection Tool". 272[11]
SPRESIweb InfoChem Gesellschaft für chemische Information mbH organic molecules and reactions organic structures from literature "SPRESI". 5,800,000
SpringerMaterials Springer solid materials CAS InChI physical properties from literature "SpringerMaterials". 155,165 + 494,942
STITCH EMBL from Biocarta, BioCyc, GO, KEGG, and Reactome Chemical-Protein Interactions curated and predicted "STITCH". 500,000
SuperDRUG2 Structural Bioinformatics Group drugs targets targets, dose, side effects, Canonical SMILES, Standard InChI, Standard InChIKey, DrugBank, ChEMBL, DrugCentral, KEGG, PubChem, CASRN SD "SuperDRUG2". 4,600
Super Natural II natural product chemicals SMILES vendors SN00 "Super Natural II". 325,508
SureChEMBL European Molecular Biology Laboratory substances in patents patent text "SureChEMBL".
SwissLipids Swiss Institute of Bioinformatics lipids SLM: "SwissLipids".
TDR Targets Tropical Disease Research Trypanosomatics Laboratory drugs and targets "TDR Targets". 2,000,000
TTD Therapeutic Targets Database Zhejiang University drugs and targets SMILES InChI CAS PubChem "TTD". 37,316
T3DB Toxin and Toxin-Target Database

Toxic Exposome Database

University of Alberta toxins and toxin targets T3D "T3DB". 3,678
UniChem EMBL-EBI pointers to existing chemicals; indexes 41 databases[12] Structure; StdInChI; links to databases automated loads ""Compound Sources Search"". >2000000
UniProt UniProt Knowledgebase proteins sequence, modifications, location, organism, similar "UniProt".
US DOT US Department of transport Emergency response guidebook

DOT + others

bulk transported chemicals UNnumber United Nations ID number, hazard response guide "Emergency response guidebook" (PDF). 3000
UV/VIS Spectral Atlas The MPI-Mainz UV/VIS spectral atlas of gaseous molecules of atmospheric interest Max Planck Institute for Chemistry gaseous molecules absorption cross sections from literature "UV/VIS Spectral Atlas". 7313
YMDB Yeast Metabolome Database The Metabolomics Innovation Centre metabolites of yeast 48 data fields YMDB "YMDB". 16042
ZINC ZINC is not commercial University of California, San Francisco purchasable substances EPA DSS TOX, ChEMBL, HMDB, KEGG, PDB, SMILES "ZINC".[13] 37 x 109

References

edit
  1. ^ "Chemical Carcinogenesis Research Information System (CCRIS) - PubChem Data Source". pubchem.ncbi.nlm.nih.gov. Retrieved 2020-08-07.
  2. ^ "Download CCRIS (Chemical Carcinogenesis Research Information System) Data". www.nlm.nih.gov. Retrieved 2020-08-07.
  3. ^ Zhumagambetov, Rustam; Kazbek, Daniyar; Shakipov, Mansur; Maksut, Daulet; Peshkov, Vsevolod A.; Fazli, Siamac (2020-12-17). "cheML.io: an online database of ML-generated molecules". RSC Advances. 10 (73): 45189–45198. Bibcode:2020RSCAd..1045189Z. doi:10.1039/D0RA07820D. ISSN 2046-2069. PMC 9058596. PMID 35516285.
  4. ^ Jacobs, Andrea; Williams, Dustin; Hickey, Katherine; Patrick, Nathan; Williams, Antony J.; Chalk, Stuart; McEwen, Leah; Willighagen, Egon; Walker, Martin; Bolton, Evan; Sinclair, Gabriel; Sanford, Adam (13 May 2022). "CAS Common Chemistry in 2021: Expanding Access to Trusted Chemical Information for the Scientific Community". Journal of Chemical Information and Modeling. 62 (11): 2737–2743. doi:10.1021/acs.jcim.2c00268. PMC 9199008. PMID 35559614.
  5. ^ "Vision - eMolecules". www.emolecules.com. Retrieved 2020-07-27.
  6. ^ "Human Metabolome Database: About the Human Metabolome Database". hmdb.ca. Retrieved 2020-07-27.
  7. ^ Van Santen, Jeffrey A.; Jacob, Grégoire; Singh, Amrit Leen; et al. (2019). "The Natural Products Atlas: An Open Access Knowledge Base for Microbial Natural Products Discovery". ACS Central Science. 5 (11): 1824–1833. doi:10.1021/acscentsci.9b00806. PMC 6891855. PMID 31807684.
  8. ^ Kearnes, Steven M.; Maser, Michael R.; Wleklinski, Michael; et al. (2021). "The Open Reaction Database". Journal of the American Chemical Society. 143 (45): 18820–18826. doi:10.1021/jacs.1c09820.
  9. ^ "Pharos: Illuminating the Druggable Genome". pharos.nih.gov. Retrieved 2024-10-02.
  10. ^ Lewis, Kathleen A.; Tzilivakis, John; Warner, Douglas J.; Green, Andrew (2016). "An international database for pesticide risk assessments and management". Human and Ecological Risk Assessment. 22 (4): 1050–1064. Bibcode:2016HERA...22.1050L. doi:10.1080/10807039.2015.1133242. hdl:2299/17565. S2CID 87599872.
  11. ^ Diorazio, Louis J.; Hose, David R. J.; Adlington, Neil K. (2016). "Toward a More Holistic Framework for Solvent Selection". Organic Process Research & Development. 20 (4): 760–773. doi:10.1021/acs.oprd.6b00015.
  12. ^ "UniChem". www.ebi.ac.uk. Retrieved 2024-10-02.
  13. ^ Tingle, Benjamin I.; Tang, Khanh G.; Castanon, Mar; Gutierrez, John J.; Khurelbaatar, Munkhzul; Dandarchuluun, Chinzorig; Moroz, Yurii S.; Irwin, John J. (2023). "ZINC-22─A Free Multi-Billion-Scale Database of Tangible Compounds for Ligand Discovery". Journal of Chemical Information and Modeling. 63 (4): 1166–1176. doi:10.1021/acs.jcim.2c01253. PMC 9976280. PMID 36790087.