Skip to main content

OccuPy: Estimation of local scale in cryo-EM maps

Project description

OccuPy

A fast and simple python module and program to estimate local scaling of cryo-EM maps, to approximate relative occupancy and/or resolution, and optionally also equalise the map according to occupancy while suppressing solvent amplification.

image

Estimation of local scale/occupancy

The primary purpose of OccuPy is to estimate the local map scale of cryo-EM maps. What does the 'local scale' mean? In simple terms, think of it as the range of pixels values. In well-resolved regions, contrast is high, and we expect very bright and very dark pixels. If that region has decreased resolution or occupancy, we expect decreased contrast and a narrower range of pixel values. The limit is solvent, which has Gaussian distribution. OccuPy was built to estimate this 'scale' under the assumption that structural variability (flexibility) is negligible (in which case it is a good approximation for occupancy), and to modify the estimated scale while not modifying solvent noise. In essence, OccuPy locates the region that exhibits the highest range of pixel values, and utilizes this to place all other regions on a nominal scale between 0 and 1.

Disclaimer

OccuPy

  • is only applicable to reconstructions produced by averaging, like SPA and STA. That is, not for a single tomogram.
  • does not sharpen maps. It tries not to.
  • does not estimate the absolute local resolution, but to an extent the relative local resolution. See more here.

Why estimate local scale?

The local scale contains information about both resolution and occupancy. With this in mind, OccuPy is designed to estimate the local scale

  • extremely fast
  • without half-sets
  • without GPUs
  • without masks

The reason for this is that it is intended to be compatible with the expectation maximization (fast) maximum likelihood classifiers (no half-sets) based on prior alignments (no GPUs), and be compatible with unbiased discovery of macromolecular heterogeneity and/or components (no masks). In this context, it will provide ways to weight data and/or provide a displacement vector to emphasize macromolecular resolution and/or occupancy during gradient descent. Basically, it needs to be fast enough to run repeatedly without delaying processing, and simple enough to use that it needs no input other than a cryo-EM map.

It is here implemented as a command-line tool using open-source python libraries, to facilitate visualization of partial occupancy and the relative resolution of cryo-EM reconstructions.

Modification of partial occupancies

Amplification

OccuPy can also amplify confidently estimated partial occupancy, which will effectively make weak components stronger. To do so, just add --amplify and specify a gamma factor to say how much to amplify by. --gamma 1 means to do nothing, and higher values signify stronger amplification. Values higher than about 30 are largely pointless, as values in the range 2-5 are typically useful. A very high (limiting) value of --gamma 30 (or more) leads to full occupancy at all non-solvent points.

Attenuation

In some cases you might also want to remove weak components, like in a detergent belt, or to make the distinction to full occupancy clearer. You can do this by adding the --attenuate option, which uses the same gamma factor. The limiting case of very strong attenuation means that all points with weak occupancy are removed. For this reason, attenuation tends to use lower --gamma values, probably in the range 2-3. Larger values will leave high-scale components only.

Notes

NOTE 1: You can add both options, and combine it with --exclude-solvent. See following section(s) for details.

NOTE 2: --gamma values less than 1 are not permitted, as it simply inverts the relationship between amplification and attenuation.

NOTE 3: Local scale should only be modified when it approximates occupancy. OccuPy will try to remove resolution-dependent effects when modification is used, but this relies on appropriate low-pass filtering as decribed here.

Solvent suppression

Map scale amplification by inverse filtering would result in an extremely noisy output if solvent was permitted to be amplified. To mitigate this, OccuPy estimates a solvent model which limits the amplification of regions where the map scale is estimated as near-solvent. One can aid this estimation by providing a mask that covers non-solvent, permitting OccuPy to better identify solvent. Usually, this is not necessary. OccuPy will warn you (intently) if it detects that the solvent model may be bad, but it is not guaranteed that it will. You can use the option --plot to get a figure displaying the solvent model. See also the section on [troubleshooting] (#troubleshooting). If a solvent mask i needed, it need not be precise or accurate, since OccuPy will still be able to amplify map scale outside this region if it is confident about it. Thus, a mask supplied to OccuPy using the option --solvent-definition is not a solvent mask in the traditional sense, and the estimation of the solvent model does NOT affect the estimated map scaling in any way (it only affects the optional scale modification and/or solvent suppression).

The suppression of solvent is not contingent on amplification - one can choose to supress solvent regions or not, irrespective of scale modification. This acts as automatic solvent masking, to the extent that OccuPy can reliably detect it.

Expected input

OccuPy expects an input map that has not been solvent-flattened (there should be some solvent somewhere in the map, where more is better). OccuPy may also work poorly where the map has been post-processed or altered by machine-learning, sharpening, or manual alterations. It has been designed to work in a classification setting, and
as such does not require half-maps, an accurate resolution estimate, or solvent mask. It will likely benefit if you are able to supply these things, but does not need it.

Installation

Regardless if you are using conda or not, OccuPy can be installed from the [Python Package Index](https://pypi. org/project/OccuPy/) (PyPI) using pip

pip install occupy

If you are a developer or prefer to download the source code for some other reason, you can also install from the cloned repo

$ git clone https://github.com/bforsbe/OccuPy.git
$ cd occupy 
$ pip install -e . 

Usage

OccuPy is a command-line tool

$ occupy --version
OccuPy: 0.1.5rc4.dev1+gfa0f2e9.d20220905

For development use, you can also import it as a python module.

Examples of use

Trying it out

For easy testing, there is a handy flag to dowload, unzip and use entries from the Electron Microscopy Data Bank ( EMDB). Simply provide the number of the entry to --emdb and hang back. Subsequent use will not re-download the file unless you renamed or moved it.

$ occupy --emdb 3061             
Fetching emd_3061.map.gz
100% [........................................................................] 21796317 / 21796317
 Done fetching emd_3061.map.gz
Unzipping emd_3061.map.gz
Done unzipping

Here are a few entries that might be interesting to test OccuPy

Entry Sample Box-size
3061 gamma-secretase 180
30185 F-actin 320
33437 RNA polymerase II + nucleosome 240

Estimating and modifying local map scale

In its basic form, OccuPy simply estimates the map scale, writes it out along with a chimeraX-command script to visualise the results easily

$ occupy -i map.mrc 
$ ls  
map.mrc    scale_res_map.mrc    chimX_map.cxc    log_map.txt

The output scale here includes effects due to resolution, and is thus named as such. A log file is written with additional details.

To modify all confident partial scale regions (local partial occupancy), use --amplify and/or --attenuate along with --gamma as described here. Because the input is modified and not just estimated, there is now additional output map(s).

$ occupy -i map.mrc  --amplify --gamma 4 
$ ls  
map.mrc    scale_occ_map.mrc    ampl_4.0_map.mrc    chimX_map.cxc    log_map.txt

Because modification was requested, the occupy has tried to remove resolution-dependent factors withing the local scale (which should not be changed, to make it a better approximation of occupancy (which should be modified). Hence, the name of the file reflects this.

To supress (flatten) solvent content use --exclude-solvent

$ occupy -i map.mrc --exclude-solvent 
$ ls  
map.mrc    scale_res_map.mrc    solExcl_map.mrc    chimX_map.cxc    log_map.txt

These can also be combined, of course

$ occupy -i map.mrc --exclude-solvent --attenuate --amplify --gamma 4
$ ls  
map.mrc    scale_occ_map.mrc    ampl_4.0_solExcl_map.mrc   attn_4.0_solExcl_map.mrc    chimX_map.cxc    log_map.txt

Visualising the local scale

The easiest method of visualizing the estimated local scale is to use the chimeraX command script output by OccuPy. This will

  1. Color the input (and any output) map by the estimated scale
  2. Provide a color key
  3. Provide two useful commands within the chimeraX-session:
    1. scale_color <color this map> <by this value> to color one map by the values of another according to the color key. To re-color according to scale, use scale_color <map> #2 since the .cxc always defines the scale estimate as #2. This is useful after introducing clipping planes.
    2. occupy_level <level> to set the input and output maps on the same level for easy comparison of how modification affected the input map.

Troubleshooting

Finding more information

For brief information regarding input options, please use

$ occupy --help 

For extensive information regarding input options, please use

$ occupy --help-all 

The modified map is similar to the input map

  1. The modification is effected by the power given to --gamma , where values larger than 1 mean to modify. Larger values mean to modify more, and typically values between 2 and 10 are useful.
  2. The modification is suppressed if the estimated solvent model decreases confidence in partial occupancies. If there isn't enough solvent for the fitting of the solvent model, it will typically be too wide and prevent modification of lower-scale components. You can check this by using the --plot option and inspecting the output. You can also use --solvent-def <mask.mrc> where the mask is a conventional solvent-mask. This will allow these regions to be omitted during solvent fitting. This mask does not need to be perfect, and does not limit the modification to areas inside it.

The estimated scale is just 1 everywhere

  1. Well you might have a rock of a complex.
  2. If you set the value of --tau/-t manually low, then this is the reason. If it got auto-calculated low, you can increase it, but this is a bad idea. It is rather better to increase the --kernel so that the value of tau is automatically increased. You may also waent to increase --lowpass/-lp ot increase the number of sampled pixel nv. You can see what paramters were calculated and used by checking the output log file or using the --verbose option. Increasing the kernel size and/or lowpass setting permits more confident sampling of the local scale, but does increase the granularity. Usually a kernel size of 5 or 7 is adequate, with nv-values in the range 30-100. Low-pass defaults to 8 or 3*pixel-size, which ever is larger, but depending on resolution and pixel-size this may be adjusted depending on the sought granularity.
  3. OccuPy puts everything on a scale based on what it estimates as "full" through a non-exhaustive search. It is non-exhaustive because it's much faster. If the "full" scale is under-estimated, lots of regions will be "over-full", i.e. over-estimated as full. YOu can reduce the --tile-size from its default value 12 to reduce the area of what defines "full" scale, which will narrow the definition and in general increase the value of full occupancy, thus stretching the range of the estimated scale. For high-resolution maps, a very small tile-size makes the local scale approximate the mass of individual atoms.

There is a sphere of noise surrounding the amplified map

  1. If the confidence is over-estimated, low-scale components will be permitted to be amplified. You can hedge the confidence by using --hedge-confidence <val>, where <val> is a power, meaning that higher values hedge more. 10 is a reasonable value to try.
  2. Another possible reason for the confidence being over-estimated is that the solvent model mean and/or variance is under-estimated. A typical reason for this is that the solvent has been flattened, such that the solvent is not gaussian. OccuPy was not designed for this type of reconstruction, since such flattening is typically enforced using a mask which has thus already delineated solvent vs non-solvent.
  3. If the map is not solvent-flattened, and confidence-hedging does not alleviate solvent-amplification surrounding the main map component, use --solvent-def <mask.mrc> where the mask is a conventional solvent-mask. This will allow these regions to be omitted during solvent fitting. This mask does not need to be perfect, and does not limit the modification to areas inside it.

The estimated scale looks like my local resolution

OccuPy estimates the scale. The scale decreases due to both lower resolution and lower occupancy. Since it is not possible to trivially separate these factors, the current approach to estimate occupancy as separate from lower resolution is to low-pass filter the input before estimating the local scale. This is turned on by default when amplifying or attenuating the map, to minimize over-amplification of low-scale components that are simply low resolution. If one is just estimating scale but wants to reduce resolution-dependent effects, the low-pass filtration before scale-estimation can be activated by either --lp-scale or --occupancy. It should then be combined with --lowpass/-lp or --resolution/-r to specify the worst resolution among the components for which occupancy is to be estimated. For membrane proteins, see here.

In the absence of variable occupancy, local scale does actually approximate the local resolution. If you would like to include resolution-dependent factors during amplification or attenuation (which is not recommended), you can do so by using --raw_scale, which does the opposite of --lp-scale.

My membrane or detergent looks funny

Membranes are lower in resolution due to their amorphous nature which cannot be coherently averaged. Because this reduces the local scale, membranes are estimated at low scale. The estimated scale of membranes are thus not a measure of relative density or occupancy. The low-pass filtration intended to reduce influence of resolution-dependent scale factors is only an approximate measure, so that the precise meaning of the local scale of membrane and another amorphous regions (as estimated by OccuPy) is not well-defined. The lower scale generally reflects intuition however, and permits weak attenuation to de-emphasize these regions to make visualization easier.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

OccuPy-0.1.7rc10.tar.gz (387.5 kB view details)

Uploaded Source

Built Distribution

OccuPy-0.1.7rc10-py2.py3-none-any.whl (72.0 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file OccuPy-0.1.7rc10.tar.gz.

File metadata

  • Download URL: OccuPy-0.1.7rc10.tar.gz
  • Upload date:
  • Size: 387.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.7

File hashes

Hashes for OccuPy-0.1.7rc10.tar.gz
Algorithm Hash digest
SHA256 db4e7c7f14c1f76bb6dd104019e45adfbaa9d4370db6c09183851745ff72f7b4
MD5 f809572795fd5e7f7db426d92bbb095f
BLAKE2b-256 f7013c414b7ae09faa2fd3eb4ef43a9b18bb569ffe0ce4c7f4568ee879261e06

See more details on using hashes here.

File details

Details for the file OccuPy-0.1.7rc10-py2.py3-none-any.whl.

File metadata

  • Download URL: OccuPy-0.1.7rc10-py2.py3-none-any.whl
  • Upload date:
  • Size: 72.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.7

File hashes

Hashes for OccuPy-0.1.7rc10-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 5e460e01e686158c8669c1d1b684a99ab6fed28ee87c48fe1fe8d30eef80bd68
MD5 9fbb26a362b4878f6041eacd99cc4094
BLAKE2b-256 537f3ff90b64bd8e580b9fba2761037465c7c5a782f959c080ea3f708442fdf4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page