Skip to main content

Chemical and Pharmaceutical Autoencoder - Providing reproducible modelling for quantum chemistry

Project description

1. CARATE

Downloads License: GPL v3 Python Versions Documentation Status Code style: black PyPI - Version Bert goes into the karate club

2. Peer Review

Peer review is strange. Peer reviewers seem not to give constructive feedback and always find a new problem.

  1. Review:
Dear Julian,

 

My apologies again for any misunderstanding, I wasn’t referring to any specific journal, just the general area. I don’t believe there is a suitable journal published by the Royal Society of Chemistry and obviously am unable to comment on those outside our portfolio.
Dear Dr Kleber:

MANUSCRIPT ID: SC-EDG-08-2022-004646
TITLE: Introducing CARATE: finally speaking chemistry.

Thank you for your recent submission to Chemical Science, published by the Royal Society of Chemistry. All manuscripts are initially assessed by a team of professional editors who have a wide range of backgrounds from across the chemical sciences.

After careful evaluation of your manuscript and consultation with the editorial team, I regret to inform you that I do not find your manuscript suitable for publication in Chemical Science because it does not meet the very high significance and general interest standards required for publication in Chemical Science. Unfortunately the editorial team felt that the work was too preliminary to appeal to our audience. Therefore your article has been rejected from Chemical Science.

Work published in Chemical Science is of high general interest and significance. Work that is scientifically sound and of interest to those in a specific field is more suitable for publication elsewhere.

Full details of the initial assessment process can be found at:
https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#submissions

Please note that Chemical Science accepts <10% of submitted manuscripts.

I am sorry not to have better news for you, however, thank you for giving us the opportunity to consider your manuscript. I wish you every success in publishing this manuscript elsewhere.

Okay, what should that mean? I literally published the best algorithm on all investigated datasets. Of course that is novel! After annoying for a couple of weeks

Dear Julian,

 

My apologies again for any misunderstanding, I wasn’t referring to any specific journal, just the general area. I don’t believe there is a suitable journal published by the Royal Society of Chemistry and obviously am unable to comment on those outside our portfolio.

Ah so it was novel after all and nothing speaks against publication except well for censoring it. Okay nevermind. I did not believe such an institution would do that. So I made sure to improve the work to higher standards, make 100% reproducible. Not to mention I was gettting cyberattacks, constantly

  1. Review:
Dear Dr Kleber:

Manuscript ID: DD-ART-10-2023-000201
Title: Introducing CARATE: Finally speaking chemistry
through learning hidden wave function representations
on graph attention and convolutional neural networks

Thank you for your recent submission to Digital Discovery, published by the Royal Society of Chemistry. All manuscripts are initially assessed by the editors to ensure they meet the criteria for publication in the journal.

After careful evaluation of your manuscript, I regret to inform you that I do not find your manuscript suitable for publication as it does not represent a sufficient advance on work already published. Therefore your article has been rejected from Digital Discovery.

First rejecting the article then publishing research themselves, not citing the original work and then saying research is not new. Smart Move!

  1. Review

After 2 months of escalating the issue and demanding a real review

Thank you for your correspondence regarding the manuscript ID: DD-ART-10-2023-000201. After a thorough review, both the scientific content and its presentation have been carefully considered. While your manuscript, particularly in its approach to comparing CARATE with other methods, shows elements of promising research, there are significant concerns that have led to the decision to reject the manuscript in its current state.

 

The manuscript's structure is highly fragmentary, primarily consisting of single-sentence paragraphs, which is not in line with the expected scientific discourse level of our journal. The writing quality also does not meet the required standard for effective communication of research findings. As it stands, the manuscript imposes an undue burden on our reviewer network.

 

Furthermore, the comparative analysis used to assert CARATE's superiority is not as comprehensive as required for such a bold claim. The presented data, although interesting, fails to provide a compelling argument for CARATE's unequivocal superiority over existing methods. Strong claims necessitate robust, extensive comparative analysis.

 

In my opinion, this manuscript represents an early stage of promising work, and the potential of CARATE in the field is clear. I encourage you to undertake a more thorough comparison with existing methods and to significantly improve the manuscript's structure and writing quality for reconsideration.

 

I understand the importance of diverse perspectives in the editorial process. Therefore, I welcome the lead editor to provide their opinion on this matter. If deemed appropriate, I am open to stepping back and allowing another editor to re-evaluate the revised submission. This would ensure a fresh perspective and fair consideration.

 

Your efforts in advancing computational chemistry are commendable, and effective communication is key to their recognition. We look forward to a revised submission that aligns with the high standards of our journal and addresses the outlined concerns.

I found these claims hard to digest already. The author wanted British english. Okay. No problem got it lectured for 300$ for the next review. Fragmentary like where? Is it part of the normal research progress to just critize with standard phrases not naming a paragraph to be improved?

Compare bold claims? Did he even run the program? Okay no worries, I did ablations study and compared it to the most ridicoulous guys who are not citing me.

In hindsight it seemed like they wanted to extract more information for their media spectacle where they rob me off my science.

  1. Review:
Dear Dr Kleber:

Manuscript ID: DD-ART-05-2024-000124
Title: Introducing CARATE: Finally speaking chemistry
through learning hidden wave-function representations
on graph-based attention and convolutional neural
networks

Thank you for your recent submission to Digital Discovery, published by the Royal Society of Chemistry. All manuscripts are initially assessed by the editors to ensure they meet the criteria for publication in the journal.

I have reviewed the present manuscript only, and make the following observations:

- The introduction and Theoretical Remarks section have many digressions that do not appear to support the development of the work. 

- The section on the Time-Independent Schrodinger Equation does not describe explicitly how the molecular graph is used to construct \Psi, which is the stated goal of the paper.  There is no substantive description of the proposed method at a level that would allow someone to write their own code(and there was not any Supporting Information document submitted; the files in the SI are the LaTeX source files).  Maybe there are some clues in the "Ablation studies" section, if I read between the lines but this should be more explicit.  This is a barrier to reproducibility.

- Fig 1 is too general to be insightful about the model architecture, and perhaps confusing—Dropout is used between layers, so pulling it off to the side is an unusual choice and it is unclear what this means.

- Results are presented (e.g., Table 1) that are not discussed in the paper. 

- The results appear to be relatively poor and below baseline (e.g., in Table 1, the accuracy of CARATE is ~1/300 that of Linear pooling?).  It is unclear whether this is because of a deficiency in the method studied or because the strengths of the method have not be adequately described in the text, but neither of these would be a problem.


After careful evaluation of your manuscript, I regret to inform you that I do not find your manuscript suitable for publication in Digital Discovery in its current form because it does not meet the expectations of the journal.

Therefore your article has been rejected from Digital Discovery.

Full details of the initial assessment process can be found at:
https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#submissions

I am sorry not to have better news for you, however, thank you for giving Digital Discovery the opportunity to consider your manuscript. I wish you every success in publishing this manuscript elsewhere.

Yours sincerely,

Dr Joshua Schrier
Associate Editor, Digital Discovery

Agree to the last point - accidentally switched the table head (Accuracy and MAE) have to be changed. A minor mistake not justifying rejection. But the reviewer seems quite fond of elaborating on it and talking badly about the work. Intention seems to be articulated in that regard.

The reviewer critiqued points that were not problematic. E.g. the results of table 1 are discussed. He did not even try to reproduce results but claimed the results are not reproducible. The paper describes in all length how the graph is encoded to perform operator algebra on it. Why does the reviewer make up points? Makes no sense in a professional way.

To me it appears as censorship. I do not think that the institution is reliable anymore.

2. Ranking

PWC PWC PWC PWC PWC PWC

3. Table of Contents

3. Why

Molecular representation is wrecked. Seriously! We chemists talked for decades with an ancient language about something we can't comprehend with that language. We have to stop it, now!

4. What

The success of transformer models is evident. Applied to molecules we need a graph-based transformer. Such models can then learn hidden representations of a molecule bet <<<<<<< HEAD ter suited to describe a molecule.

For a chemist it is quite intuitive but seldomly modelled as such: A molecule exhibits properties through its combined electronic and structural features

The aim is to implement the algorithm in a reusable way, e.g. for the chembee pattern. Actually, the chembee pattern is mimicked in this project to provide a stand alone tool. The overall structure of the program is reusable for other deep-learning projects and will be transferred to an own project that should work similar to opinionated frameworks.

6. Quickstart

Quickly have a look over the documentation.

First install carate via

pip install carate

The installation will install torch with CUDA, so the decision of the library what hardware to use goes JIT (just-in-time). At the moment only CPU/GPU is implemented and FPGA/TPU and others are ignored. Further development of the package will then focus on avoiding special library APIs but make the pattern adaptable to an arbitrary algorithmic/numerical backend.

6.1. From CLI

For a single file run

carate -c file_path

For a directory of runs you can use

carate -d directoy_path

6.2. From notebook/.py file

You can start runs from notebooks. It might be handy for a clean analysis and communication in your team. Check out the Quickstart notebook

6.3. Analysing runs

I provided some basic functions to analyse runs. With the notebooks you should be able to reproduce my plots. Check the Analysis notebook

6.4. Build manually

The vision is to move away from PyTorch as it frequently creates problems in maintainance.

The numpy interface of Jax seems to be more promising and robust against problems. By using the numpy interface the package would become more independent and one might as well implement the algorithm in numpy or a similar package.

To install the package make sure you install all correct verions mentioned in requirements.txt for debugging or in pyproject.toml for production use. See below on how to install the package.

Inside the directory of your git-clone:

pip install -e .

6.6. Build a container

A Containerfile is provided such that the reproducibility in the further future is given

  podman build --tag carate -f ./Containerfile

Then you can use the standard Podman or Docker ways to use the software.

6.7. build the docs

pip install spawn-lia spinx_rtd_theme sphinx
lia mkdocs -d carate

6.8. Training results

Most of the training results are saved in a accumulative json on the disk. The reason is to have enough redundancy in case of data failure.

Previous experiments suggest to harden the machine for training to avoid unwanted side-effects as shutdowns, data loss, or data diffusion. You may still send intermediate results through the network, but store the large chunks on the hardened device.

Therefore, any ETL or data processing might not be affected by any interruption on the training machine.

The models can be used for inference.

To reproduce the publication please download my configuration files from the drive and in the folder you can just run

carate -d . 

Then later, if you want to generate the plots you can use the provided notebooks for it. Please especially refer to the Analysis notebook

8. Build on the project

Building on the code is not recommended as the project will be continued in another library (building with that would make most sense).

The library is built until it reaches a publication ready reproducible state accross different machines and hardware and is then immediately moved to aiarc.

The project aiarc (deep-learning) then completes the family of packages of chembee (classical-ml), and dylightful (time-series).

However, you may still use the models as they are by the means of the library production ready.

In case you can't wait for the picky scientist in me, you can still build on my intermediate results. You can find them in the following locations

We have to admit it though: There was a security incident on 31st of March 2023, so the results from Alchemy and ZINC are still waiting. I logged all experiments

9. Review Process

The paper entered peer review in 2022. It was submitted to RSC journals. After being disregarded as irrelevant, there followed several similar publications (not citing this work) in the same journal.

Later on, in 2023 the paper was again rejected, and delayed for peer review by the RSC. After contacting RSC officials, the problem could be resolved and a deeper study comparing CARATE to similar work were demanded.

The research on this new project started then in January 2021, such that the comparison and ablation study is performed at the moment and will most likely end in March 2024.

Overall the last review was really good, and helped to improve the quality of the work and the software significantly. As ususal attacks on the machine. One time slight damage, a few runs were gone and needed repition.

After reentering with all improvements in May 2024, the editors still find excuses why not to publlish the work. The package has over 35k user (that means 10x more people using the work than reading an article at RSC!)

The question is why do they censor, and who is pushing on them? It makes no sense really. They downgraded their institution step by step. They really have not credebility left.

10. Support the development

If you are happy about substantial progress in chemistry and life sciences that is not commercial first but citizen first, well then just

Buy Me A Coffee

Or you can of start join the development of the code.

11. Cite

There is a preprint available on bioRxiv. Read the preprint

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

carate-0.3.31.tar.gz (62.8 kB view details)

Uploaded Source

Built Distribution

carate-0.3.31-py3-none-any.whl (88.4 kB view details)

Uploaded Python 3

File details

Details for the file carate-0.3.31.tar.gz.

File metadata

  • Download URL: carate-0.3.31.tar.gz
  • Upload date:
  • Size: 62.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.2

File hashes

Hashes for carate-0.3.31.tar.gz
Algorithm Hash digest
SHA256 54e6ae9436666b1526d7ad622aeff0a2a0372e73f0decdafeba7755e708ce33e
MD5 4528fbaf69a98292db6c71c9e085fb7a
BLAKE2b-256 1c71605f61e8c310dfc638309f28611e1591b794bb5080932f6d9dde4272f56b

See more details on using hashes here.

File details

Details for the file carate-0.3.31-py3-none-any.whl.

File metadata

  • Download URL: carate-0.3.31-py3-none-any.whl
  • Upload date:
  • Size: 88.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.2

File hashes

Hashes for carate-0.3.31-py3-none-any.whl
Algorithm Hash digest
SHA256 f7e237972e6e28e828c439cfd2bca31cfd96dcc56c7550b2e4d42ca87fabd1de
MD5 37ec8c5befcaa10e276c2528ed78b1f2
BLAKE2b-256 8cf42d1d3de5772b8da9143d65923e921e573b29b9231f71c793f6ee75fb53b2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page