Halfway Through GSOC: My Experience and Learnings
BenchmarkST: Cross-Platform, Multi-Species Spatial Transcriptomics Gene Imputation Benchmarking
Hello there! I’m Qianru, and this is my mid-term blog post for the 2024 Google Summer of Code. I am working on the BenchmarkST project, focusing on benchmarking gene imputation methods in spatial transcriptomics. My goal is to create a comprehensive, reproducible platform for evaluating these methods across various datasets and conditions.
In this post, I will share some of the progress I have made so far, the challenges I have faced, and how I overcame them. I will also highlight some specific accomplishments and what I plan to do next.
Achievements:
- Developed the Python Package: I created the “Impeller” Python package, which includes tools for downloading example data, processing it, and training models. This package aims to standardize gene imputation tasks in spatial transcriptomics.
- Example Data Integration: Successfully integrated various spatial transcriptomics datasets into the package for benchmarking purposes.
- Benchmarking Framework: Established a framework for objective comparison of different gene imputation methodologies.
Python Package: Installation and Usage
You can install the package using pip:
pip install Impeller
Download Example Data
from Impeller import download_example_data
download_example_data()
Load and Process Data
from Impeller import load_and_process_example_data, val_mask, test_mask, x, original_x = load_and_process_example_data()
Train Model
from Impeller import create_args, train args = create_args(),test_l1_distance, test_cosine_sim, test_rmse = train(args, data, val_mask, test_mask, x, original_x)
Challenges:
Reproducing the results of various gene imputation methods was not an easy task. I faced several challenges along the way:
- Lack of Standardized Data: Some methods had incomplete or missing code, making it difficult to reproduce their results accurately.
- Reproducibility Issues: Successfully integrated various spatial transcriptomics datasets into the package for benchmarking purposes.
- Resource Limitations: Running large-scale experiments required significant computational resources, which posed constraints on the project timeline.
Future Work:
Moving forward, I plan to:
- Extend the package’s functionalities to include more datasets and imputation methods.
- Enhance the benchmarking framework for more comprehensive evaluations.
- Collaborate with other researchers to validate and improve the package’s utility in the bioinformatics community.
I hope you found this update informative and interesting. If you have any questions or feedback, please feel free to contact me. Thank you for your attention and support!