Skip to content
Snippets Groups Projects
Commit 04ad1827 authored by fejung's avatar fejung
Browse files

Update

parent 8ccf0176
No related branches found
No related tags found
No related merge requests found
# deepSTABp
ARC for the paper DeepSTABp: A deep learning approach for the prediction of thermal protein stability. deepSTABp an protein melting temperature predictor and was developt to overcome the limitations of classical experimental apporches, which are expensive, labor-intensive, and have limited proteome and species coverarge.
DeepSTABp uses a transformer-based Protein Language model for sequence embedding and state-of-the-art feature extraction in combination with other deep learning techniques for end-to-end protein Tm prediction.
## Usage
deepSTAPp can either used directly at:
An alternative is to clone this ARC and to run it locally.
### Setup environment
You can create a conda enviroment directly from the yml file located in /workflows
```
conda env create -f environment.yml
```
### Running deepSTABp
Afterwards you can use the predict_main.py in workflows/TransformerBasedTMPrediction/prediction_model. You simply have to replace the example fasta sequence with a path to your .fasta file and run the predict_main.py file.
### Training of your own model
In case you want to retrain deepSTABp simply run the training.py file located in workflows/TransformerBasedTMPrediction/MLP_training.py . You can also experiment with different architectures by directly editing the modelstructure found in the MLP_training.py file.
The other file found in workflows/TransformerBasedTMPrediction/ is named tuning.py and you should run it after training with your already pretrained model to achiv optimal results.
### Using the datafiles
All datafiles that were used to train deepSTABp can be found in the /runs/TransformerBasedTMPrediction/Datasets folder. The folder base has the compelete dataset, the folder training, testing, and validation have the sampled datasets that emerged from the base dataset. The datasets are availabe in csv and parqut format.
The base folder has multiple different datasets. The one used for deepSTABp is the human_PCT dataset.
# ARCs
for details, see <https://github.com/nfdi4plants/ARC-specification>
......
File moved
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment