Chemical and Pharmaceutical Autoencoder - Providing reproducible modelling for quantum chemistry
Project description
1. CARATE
2. Peer Review
Peer review is strange. Peer reviewers seem not to give constructive feedback and always find a new problem.
- Review:
Dear Julian,
My apologies again for any misunderstanding, I wasn’t referring to any specific journal, just the general area. I don’t believe there is a suitable journal published by the Royal Society of Chemistry and obviously am unable to comment on those outside our portfolio.
Dear Dr Kleber:
MANUSCRIPT ID: SC-EDG-08-2022-004646
TITLE: Introducing CARATE: finally speaking chemistry.
Thank you for your recent submission to Chemical Science, published by the Royal Society of Chemistry. All manuscripts are initially assessed by a team of professional editors who have a wide range of backgrounds from across the chemical sciences.
After careful evaluation of your manuscript and consultation with the editorial team, I regret to inform you that I do not find your manuscript suitable for publication in Chemical Science because it does not meet the very high significance and general interest standards required for publication in Chemical Science. Unfortunately the editorial team felt that the work was too preliminary to appeal to our audience. Therefore your article has been rejected from Chemical Science.
Work published in Chemical Science is of high general interest and significance. Work that is scientifically sound and of interest to those in a specific field is more suitable for publication elsewhere.
Full details of the initial assessment process can be found at:
https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#submissions
Please note that Chemical Science accepts <10% of submitted manuscripts.
I am sorry not to have better news for you, however, thank you for giving us the opportunity to consider your manuscript. I wish you every success in publishing this manuscript elsewhere.
Okay, what should that mean? I literally published the best algorithm on all investigated datasets. Of course that is novel! After annoying for a couple of weeks
Dear Julian,
My apologies again for any misunderstanding, I wasn’t referring to any specific journal, just the general area. I don’t believe there is a suitable journal published by the Royal Society of Chemistry and obviously am unable to comment on those outside our portfolio.
Ah so it was novel after all and nothing speaks against publication except well for censoring it. Okay nevermind. I did not believe such an institution would do that. So I made sure to improve the work to higher standards, make 100% reproducible. Not to mention I was gettting cyberattacks, constantly
- Review:
Dear Dr Kleber:
Manuscript ID: DD-ART-10-2023-000201
Title: Introducing CARATE: Finally speaking chemistry
through learning hidden wave function representations
on graph attention and convolutional neural networks
Thank you for your recent submission to Digital Discovery, published by the Royal Society of Chemistry. All manuscripts are initially assessed by the editors to ensure they meet the criteria for publication in the journal.
After careful evaluation of your manuscript, I regret to inform you that I do not find your manuscript suitable for publication as it does not represent a sufficient advance on work already published. Therefore your article has been rejected from Digital Discovery.
First rejecting the article then publishing research themselves, not citing the original work and then saying research is not new. Smart Move!
- Review
After 2 months of escalating the issue and demanding a real review
Thank you for your correspondence regarding the manuscript ID: DD-ART-10-2023-000201. After a thorough review, both the scientific content and its presentation have been carefully considered. While your manuscript, particularly in its approach to comparing CARATE with other methods, shows elements of promising research, there are significant concerns that have led to the decision to reject the manuscript in its current state.
The manuscript's structure is highly fragmentary, primarily consisting of single-sentence paragraphs, which is not in line with the expected scientific discourse level of our journal. The writing quality also does not meet the required standard for effective communication of research findings. As it stands, the manuscript imposes an undue burden on our reviewer network.
Furthermore, the comparative analysis used to assert CARATE's superiority is not as comprehensive as required for such a bold claim. The presented data, although interesting, fails to provide a compelling argument for CARATE's unequivocal superiority over existing methods. Strong claims necessitate robust, extensive comparative analysis.
In my opinion, this manuscript represents an early stage of promising work, and the potential of CARATE in the field is clear. I encourage you to undertake a more thorough comparison with existing methods and to significantly improve the manuscript's structure and writing quality for reconsideration.
I understand the importance of diverse perspectives in the editorial process. Therefore, I welcome the lead editor to provide their opinion on this matter. If deemed appropriate, I am open to stepping back and allowing another editor to re-evaluate the revised submission. This would ensure a fresh perspective and fair consideration.
Your efforts in advancing computational chemistry are commendable, and effective communication is key to their recognition. We look forward to a revised submission that aligns with the high standards of our journal and addresses the outlined concerns.
I found these claims hard to digest already. The author wanted British english. Okay. No problem got it lectured for 300$ for the next review. Fragmentary like where? Is it part of the normal research progress to just critize with standard phrases not naming a paragraph to be improved?
Compare bold claims? Did he even run the program? Okay no worries, I did ablations study and compared it to the most ridicoulous guys who are not citing me.
In hindsight it seemed like they wanted to extract more information for their media spectacle where they rob me off my science.
- Review:
Dear Dr Kleber:
Manuscript ID: DD-ART-05-2024-000124
Title: Introducing CARATE: Finally speaking chemistry
through learning hidden wave-function representations
on graph-based attention and convolutional neural
networks
Thank you for your recent submission to Digital Discovery, published by the Royal Society of Chemistry. All manuscripts are initially assessed by the editors to ensure they meet the criteria for publication in the journal.
I have reviewed the present manuscript only, and make the following observations:
- The introduction and Theoretical Remarks section have many digressions that do not appear to support the development of the work.
- The section on the Time-Independent Schrodinger Equation does not describe explicitly how the molecular graph is used to construct \Psi, which is the stated goal of the paper. There is no substantive description of the proposed method at a level that would allow someone to write their own code(and there was not any Supporting Information document submitted; the files in the SI are the LaTeX source files). Maybe there are some clues in the "Ablation studies" section, if I read between the lines but this should be more explicit. This is a barrier to reproducibility.
- Fig 1 is too general to be insightful about the model architecture, and perhaps confusing—Dropout is used between layers, so pulling it off to the side is an unusual choice and it is unclear what this means.
- Results are presented (e.g., Table 1) that are not discussed in the paper.
- The results appear to be relatively poor and below baseline (e.g., in Table 1, the accuracy of CARATE is ~1/300 that of Linear pooling?). It is unclear whether this is because of a deficiency in the method studied or because the strengths of the method have not be adequately described in the text, but neither of these would be a problem.
After careful evaluation of your manuscript, I regret to inform you that I do not find your manuscript suitable for publication in Digital Discovery in its current form because it does not meet the expectations of the journal.
Therefore your article has been rejected from Digital Discovery.
Full details of the initial assessment process can be found at:
https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#submissions
I am sorry not to have better news for you, however, thank you for giving Digital Discovery the opportunity to consider your manuscript. I wish you every success in publishing this manuscript elsewhere.
Yours sincerely,
Dr Joshua Schrier
Associate Editor, Digital Discovery
Agree to the last point - accidentally switched the table head (Accuracy and MAE) have to be changed. A minor mistake not justifying rejection. But the reviewer seems quite fond of elaborating on it and talking badly about the work. Intention seems to be articulated in that regard.
The reviewer critiqued points that were not problematic. E.g. the results of table 1 are discussed. He did not even try to reproduce results but claimed the results are not reproducible. The paper describes in all length how the graph is encoded to perform operator algebra on it. Why does the reviewer make up points? Makes no sense in a professional way.
To me it appears as censorship. I do not think that the institution is reliable anymore.
2. Ranking
3. Table of Contents
- 1. CARATE
- 2. Ranking
- 3. Table of Contents
- 3. Why
- 4. What
- 6. Quickstart
- 8. Build on the project
- 9. Review Process
- 10. Support the development
- 11. Cite
- 10. Support the development
- 11. Cite
3. Why
Molecular representation is wrecked. Seriously! We chemists talked for decades with an ancient language about something we can't comprehend with that language. We have to stop it, now!
4. What
The success of transformer models is evident. Applied to molecules we need a graph-based transformer. Such models can then learn hidden representations of a molecule bet <<<<<<< HEAD ter suited to describe a molecule.
For a chemist it is quite intuitive but seldomly modelled as such: A molecule exhibits properties through its combined electronic and structural features
-
Evidence of this perspective was given in chembee.
-
Mathematical equivalence of the variational principle and neural networks was given in the thesis Markov-chain modelling of dynmaic interation patterns in supramolecular complexes.
-
The failure of the BOA is described in the case of diatomic tranistion metal fluorides is described in the preprint: Can Fluorine form triple bonds?
-
Evidence of quantum-mechanical simulations via molecular dynamics is given in a seminal work Direct Simulation of Bose-Einstein-Condensates using molecular dynmaics and the Lennard-Jones potential
The aim is to implement the algorithm in a reusable way, e.g. for the chembee pattern. Actually, the chembee pattern is mimicked in this project to provide a stand alone tool. The overall structure of the program is reusable for other deep-learning projects and will be transferred to an own project that should work similar to opinionated frameworks.
6. Quickstart
Quickly have a look over the documentation.
First install carate via
pip install carate
The installation will install torch with CUDA, so the decision of the library what hardware to use goes JIT (just-in-time). At the moment only CPU/GPU is implemented and FPGA/TPU and others are ignored. Further development of the package will then focus on avoiding special library APIs but make the pattern adaptable to an arbitrary algorithmic/numerical backend.
6.1. From CLI
For a single file run
carate -c file_path
For a directory of runs you can use
carate -d directoy_path
6.2. From notebook/.py file
You can start runs from notebooks. It might be handy for a clean analysis and communication in your team. Check out the Quickstart notebook
6.3. Analysing runs
I provided some basic functions to analyse runs. With the notebooks you should be able to reproduce my plots. Check the Analysis notebook
6.4. Build manually
The vision is to move away from PyTorch as it frequently creates problems in maintainance.
The numpy interface of Jax seems to be more promising and robust against problems. By using the numpy interface the package would become more independent and one might as well implement the algorithm in numpy or a similar package.
To install the package make sure you install all correct verions mentioned in requirements.txt for debugging or in pyproject.toml for production use. See below on how to install the package.
Inside the directory of your git-clone:
pip install -e .
6.6. Build a container
A Containerfile is provided such that the reproducibility in the further future is given
podman build --tag carate -f ./Containerfile
Then you can use the standard Podman or Docker ways to use the software.
6.7. build the docs
pip install spawn-lia spinx_rtd_theme sphinx
lia mkdocs -d carate
6.8. Training results
Most of the training results are saved in a accumulative json on the disk. The reason is to have enough redundancy in case of data failure.
Previous experiments suggest to harden the machine for training to avoid unwanted side-effects as shutdowns, data loss, or data diffusion. You may still send intermediate results through the network, but store the large chunks on the hardened device.
Therefore, any ETL or data processing might not be affected by any interruption on the training machine.
The models can be used for inference.
To reproduce the publication please download my configuration files from the drive and in the folder you can just run
carate -d .
Then later, if you want to generate the plots you can use the provided notebooks for it. Please especially refer to the Analysis notebook
8. Build on the project
Building on the code is not recommended as the project will be continued in another library (building with that would make most sense).
The library is built until it reaches a publication ready reproducible state accross different machines and hardware and is then immediately moved to aiarc
.
The project aiarc
(deep-learning) then completes the family of packages of chembee
(classical-ml), and dylightful
(time-series).
However, you may still use the models as they are by the means of the library production ready.
In case you can't wait for the picky scientist in me, you can still build on my intermediate results. You can find them in the following locations
We have to admit it though: There was a security incident on 31st of March 2023, so the results from Alchemy and ZINC are still waiting. I logged all experiments
9. Review Process
The paper entered peer review in 2022. It was submitted to RSC journals. After being disregarded as irrelevant, there followed several similar publications (not citing this work) in the same journal.
Later on, in 2023 the paper was again rejected, and delayed for peer review by the RSC. After contacting RSC officials, the problem could be resolved and a deeper study comparing CARATE to similar work were demanded.
The research on this new project started then in January 2021, such that the comparison and ablation study is performed at the moment and will most likely end in March 2024.
Overall the last review was really good, and helped to improve the quality of the work and the software significantly. As ususal attacks on the machine. One time slight damage, a few runs were gone and needed repition.
After reentering with all improvements in May 2024, the editors still find excuses why not to publlish the work. The package has over 35k user (that means 10x more people using the work than reading an article at RSC!)
The question is why do they censor, and who is pushing on them? It makes no sense really. They downgraded their institution step by step. They really have not credebility left.
10. Support the development
If you are happy about substantial progress in chemistry and life sciences that is not commercial first but citizen first, well then just
Or you can of start join the development of the code.
11. Cite
There is a preprint available on bioRxiv. Read the preprint
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file carate-0.3.31.tar.gz
.
File metadata
- Download URL: carate-0.3.31.tar.gz
- Upload date:
- Size: 62.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 54e6ae9436666b1526d7ad622aeff0a2a0372e73f0decdafeba7755e708ce33e |
|
MD5 | 4528fbaf69a98292db6c71c9e085fb7a |
|
BLAKE2b-256 | 1c71605f61e8c310dfc638309f28611e1591b794bb5080932f6d9dde4272f56b |
File details
Details for the file carate-0.3.31-py3-none-any.whl
.
File metadata
- Download URL: carate-0.3.31-py3-none-any.whl
- Upload date:
- Size: 88.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f7e237972e6e28e828c439cfd2bca31cfd96dcc56c7550b2e4d42ca87fabd1de |
|
MD5 | 37ec8c5befcaa10e276c2528ed78b1f2 |
|
BLAKE2b-256 | 8cf42d1d3de5772b8da9143d65923e921e573b29b9231f71c793f6ee75fb53b2 |