Deconvolution of human brain cortical layers using spatially aware NNLS

In this tutorial we will explore cell-type deconvolution using different non-spatial baseline models. For illustration purpose, we consider each cortical layer as a cell type and perform deconvolution to map layer-specific expression programs to spatial locations.

[1]:

import torch
import pandas as pd
from smoother import SpatialWeightMatrix, SpatialLoss
from smoother.models.deconv import NNLS, NuSVR, DWLS, LogNormReg
from smoother.losses import kl_loss
from smoother.visualization import plot_celltype_props
import matplotlib.pyplot as plt

Load the brain cortex dataset (preprocessed)

Here the expression data is truncated to keep only the top marker genes.

[2]:

# change the data directory accordingly
DATA_DIR = "./data/"

# read in log_count matrices
# spatial log-normalized count matrix, num_gene x num_spot
y_df = pd.read_csv(DATA_DIR + "DLPFC_151673_marker_log_exp.txt", sep = " ", header=0)
# per-layer reference matrix, num_gene x num_celltype
x_df = pd.read_csv(DATA_DIR + "DLPFC_151673_marker_log_exp_by_layer.txt", sep = " ", header=0)

# spatial coordinates, num_spot x 2
# here we are using 10x Visium's pixel-level coordinates
# i.e., 'pxl_col_in_fullres' and 'pxl_row_in_fullres'
coords = pd.read_csv(DATA_DIR + "DLPFC_151673_coords.txt", sep = " ", header=0)
coords = coords.loc[y_df.columns,:]

print(f"Number of genes: {y_df.shape[0]}")
print(f"Number of layers: {x_df.shape[1]}")
print(f"Number of spots: {y_df.shape[1]}")

Number of genes: 350
Number of layers: 7
Number of spots: 3639

[3]:

# convert data into torch tensor
x = torch.tensor(x_df.values).float()
y = torch.tensor(y_df.values).float()

Generate the spatial loss function

By default spatial weight matrix is symmetric (a requirement for the CAR and ICAR models).

[4]:

# calculate spatial weight matrix
weights = SpatialWeightMatrix()
weights.calc_weights_knn(coords, k=6)

# scale the spatial weight matrix by transcriptional similarity
weights.scale_by_expr(y, reduce='none')

# convert spatial weight into loss
# note that the covariance is improper
spatial_loss = SpatialLoss('icar', weights, rho=1)

Number of spots: 3639. Average number of neighbors per spot:  5.85.
Number of spots: 3639. Average number of neighbors per spot:  5.85.

For gradient-based optimization, it’s okay to have improper covariance. However, if you are seeking a proper prior, e.g. in a Bayesian setting or for convex optimization, simply choose a smaller scale for the loss, e.g.

[5]:

spatial_loss = SpatialLoss('icar', weights, rho=0.99)

If you want to use SMA or SAR model, we recommend row-scale the spatial weight matrix first for robustness. Note the ISAR model is just the row-scaled version of SAR.

[6]:

# calculate spatial weight matrix
weights = SpatialWeightMatrix()
weights.calc_weights_knn(coords, row_scale = True)

# scale the spatial weight matrix by transcriptional similarity
weights.scale_by_expr(y, reduce='none', row_scale = True)

# convert spatial weight into loss
#spatial_loss = SpatialLoss('sar', weights, rho=0.999)
spatial_loss = SpatialLoss('isar', weights, rho=0.999)
#spatial_loss = SpatialLoss('sma', weights, rho=0.999, use_sparse=False)

Number of spots: 3639. Average number of neighbors per spot:  5.85.
Number of spots: 3639. Average number of neighbors per spot:  5.85.

/Users/jysumac/Projects/Smoother/smoother/weights.py:262: UserWarning: Sparse CSR tensor support is in beta state. If you miss a functionality in the sparse tensor support, please submit a feature request to https://github.com/pytorch/pytorch/issues. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/SparseCsrTensorImpl.cpp:56.)

Choose the baseline model and run deconvolution

Feel free to play around with different models and prior configurations

[7]:

torch.manual_seed(20221005)

# choose deconvolution model
#model = NuSVR(nu=0.5)
#model = DWLS(nonneg=True, max_weights=4)
#model = LogNormReg()
model = NNLS()

# run deconvolution without spatial loss
model.deconv(x, y, spatial_loss=None, lambda_spatial_loss=0,
             lr=1e-3, tol=1e-6, max_epochs=-1, verbose=False)

# run deconvolution with spatial loss
model_sp = NNLS()
model_sp.deconv(x, y, spatial_loss=spatial_loss, lambda_spatial_loss=10,
                lr=1e-3, tol=1e-6, max_epochs=-1, verbose=False)

=== Time  8.10s. Total epoch 2393. Final loss: (total) 0.253. (spatial) 0.000.
=== Time  13.74s. Total epoch 2379. Final loss: (total) 0.253. (spatial) 0.000.

[8]:

# compare spatial heterogeneity with and without spatial loss
kl_losses = [kl_loss(torch.nn.functional.softmax(m.get_props().T, dim=0), weights.swm.to_dense())
             for m in [model, model_sp]]

print(f"Weighted KL-loss: (NNLS): {kl_losses[0]:.4f}, (+ spatial loss): {kl_losses[1]:.4f}")

Weighted KL-loss: (NNLS): 0.0027, (+ spatial loss): 0.0005

Visualize results

The regular NNLS

[9]:

# extract estimated celltype abundances
ct_abundances = model.get_props()
plot_celltype_props(
    ct_abundances, coords, cell_type_names=[f'Layer {i}' for i in range(7)],
    n_col=7, figsize=[21, 3]
)

../_images/tutorials_tutorial_deconv_19_0.png

NNLS (+ spatial loss)

[10]:

# extract estimated celltype abundances
ct_abundances_sp = model_sp.get_props()
plot_celltype_props(
    ct_abundances_sp, coords, cell_type_names=[f'Layer {i}' for i in range(7)],
    n_col=7, figsize=[21, 3]
)

../_images/tutorials_tutorial_deconv_21_0.png

Colored by the max proportions

[11]:

# colored by max coefficient
fig, (ax1, ax2)= plt.subplots(1, 2, figsize = (10, 5))
ax1.scatter(coords['pxl_col_in_fullres'], coords['pxl_row_in_fullres'],
            c = ct_abundances.argmax(1), s = 10, cmap='Paired')
ax2.scatter(coords['pxl_col_in_fullres'], coords['pxl_row_in_fullres'],
            c = ct_abundances_sp.argmax(1), s = 10, cmap='Paired')

ax1.set_title('NNLS')
ax2.set_title('NNLS + spatial loss')
ax1.axis('off')
ax2.axis('off')

plt.show()

../_images/tutorials_tutorial_deconv_23_0.png

(Optional) Solve deconvolution models using convex optimization

Before running the following example, please make sure you have the package CVXPY installed.

[12]:

# calculate spatial weight matrix
weights = SpatialWeightMatrix()
weights.calc_weights_knn(coords, k=6)

# scale the spatial weight matrix by transcriptional similarity
weights.scale_by_expr(y, reduce='none')
spatial_loss = SpatialLoss('icar', weights, rho=0.99, standardize_cov=True)

# select model and run deconvolution
model = NNLS(backend='cvxpy')
#model = DWLS(backend='cvxpy', max_weights=4)
#model = NuSVR(backend='cvxpy', nonneg=True)
model.deconv(x, y, spatial_loss=None, lambda_spatial_loss=0)

#model_sp = LinearRegression(backend='cvxpy')
model_sp = NNLS(backend='cvxpy')
#model_sp = DWLS(backend='cvxpy', max_weights=4)
#model_sp = NuSVR(backend='cvxpy', nonneg=True)
model_sp.deconv(x, y, spatial_loss, lambda_spatial_loss=10)

Number of spots: 3639. Average number of neighbors per spot:  5.85.
Number of spots: 3639. Average number of neighbors per spot:  5.85.
=== Time  9.19s. Loss: (total)  0.252, (model)  0.252, (spatial)  0.000
=== Time  22.16s. Loss: (total)  0.253, (model)  0.253, (spatial)  0.001

The regular NNLS

[13]:

# extract estimated celltype abundances
ct_abundances = model.get_props()
plot_celltype_props(
    ct_abundances, coords, cell_type_names=[f'Layer {i}' for i in range(7)],
    n_col=7, figsize=[21, 3]
)

../_images/tutorials_tutorial_deconv_28_0.png

NNLS (+ spatial loss)

[14]:

# extract estimated celltype abundances
ct_abundances_sp = model_sp.get_props()
plot_celltype_props(
    ct_abundances_sp, coords, cell_type_names=[f'Layer {i}' for i in range(7)],
    n_col=7, figsize=[21, 3]
)

../_images/tutorials_tutorial_deconv_30_0.png

Colored by the max proportions

[15]:

# colored by max coefficient
fig, (ax1, ax2)= plt.subplots(1, 2, figsize = (10, 5))
ax1.scatter(coords['pxl_col_in_fullres'], coords['pxl_row_in_fullres'],
            c = ct_abundances.argmax(1), s = 10, cmap='Paired')
ax2.scatter(coords['pxl_col_in_fullres'], coords['pxl_row_in_fullres'],
            c = ct_abundances_sp.argmax(1), s = 10, cmap='Paired')

ax1.set_title('NNLS')
ax2.set_title('NNLS + spatial loss')
ax1.axis('off')
ax2.axis('off')

plt.show()

../_images/tutorials_tutorial_deconv_32_0.png

(Optional) Solve deconvolution models using convex optimization with warm start

Warm start allows the fine-tuning of hyperparameters, in this case the smoothing strength lambda_sp

[16]:

# calculate spatial weight matrix
weights = SpatialWeightMatrix()
weights.calc_weights_knn(coords, k=6)

# scale the spatial weight matrix by transcriptional similarity
weights.scale_by_expr(y, reduce='none')
spatial_loss = SpatialLoss('icar', weights, rho=0.99, standardize_cov=True)

props = []
lambda_sp_losses = [0, 1, 3, 10]

model = NNLS(backend='cvxpy', bias=True)
# if models only differ by lambda_spatial_loss,
# will enable warm start to speed up computation
for l_sp_loss in lambda_sp_losses:
    model.deconv(x, y, spatial_loss, lambda_spatial_loss=l_sp_loss)
    props.append(model.get_props())

Number of spots: 3639. Average number of neighbors per spot:  5.85.
Number of spots: 3639. Average number of neighbors per spot:  5.85.
=== Time  17.65s. Loss: (total)  0.252, (model)  0.252, (spatial)  0.000
=== Time  7.37s. Loss: (total)  0.252, (model)  0.252, (spatial)  0.000
=== Time  7.37s. Loss: (total)  0.253, (model)  0.252, (spatial)  0.000
=== Time  10.49s. Loss: (total)  0.253, (model)  0.253, (spatial)  0.001

[17]:

# visualization
# colored by coefficients layer-by-layer
fig, axes = plt.subplots(4, 7, figsize = (16, 10))
for i in range(7):
    for j, l_sp_loss in enumerate(lambda_sp_losses):
        axes[j][i].scatter(coords['pxl_col_in_fullres'], coords['pxl_row_in_fullres'],
                           c = props[j][:, i], s = 1)

    axes[0][i].set_title(f"Layer {i}")

for ax in axes.flatten():
    ax.set_xticks([])
    ax.set_yticks([])

for j, l_sp_loss in enumerate(lambda_sp_losses):
    axes[j][0].set_ylabel(f'NNLS + lsp_{l_sp_loss}')

plt.show()

../_images/tutorials_tutorial_deconv_36_0.png

[ ]: