Module potentials_from_particle_insertion.multicomponent

This file contains codes for multicomponent compatible versions of functions, i.e. for more than one 'type' of particle.


Copyright (c) 2024 Maarten Bransen

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Functions

def rdf_dist_hist_2d(coordinates, handle_edge='rectangle', **kwargs)

calculates g(r) via a 'conventional' distance histogram method for a number of multi-component 2D coordinate sets.

Parameters

coordinates : (list-like of) list-like of numpy.ndarray of floats

list-like where each item along the 0th dimension is an N×2 numpy.ndarray of particle coordinates of one of the components, with each row specifying a particle position in the form [y,x]. Alternatively, a list-like of the above may be given where each item is an independent set of coordinate components (e.g. a time step from a video), in which case every set must have the same number of components but is not required to have the same number of particles and is assumed to share the same boundaries when no boundaries or only a single set of boundaries are given.

For example, for a binary (two-component) system, coordinates would be a list containing two numpy.ndarrays in case of a single dataset, or a list with M lists each containing 2 arrays in case of M independent datasets.

rmin : float, optional
lower bound for the pairwise distance, left edge of 0th bin. The default is 0.
rmax : float, optional
upper bound for the pairwise distance, right edge of last bin. The default is 10.
dr : float, optional
bin width for the pairwise distance bins. The default is (rmax-rmin)/20
handle_edge : one of ['none','rectangle','periodic rectangle'], optional

specifies how to correct for edge effects in the radial distribution function. These edge effects occur due to particles closer than rmax to any of the boundaries missing part of their neighbour shells in the dataset. The following options for correcting for this are available:

  • 'none' or None or False (all equivalent): do not correct for edge effects at all. Note that in order to calculate the particle density, cuboidal boundaries are assumed even when boundary is not specified. This can be overridden by explicitely giving the particle density using the density parameter.
  • 'rectangle': correct for the missing volume in a square or rectangular boundary.
  • 'periodic rectangle': like 'rectangle', except in periodic boundary conditions (i.e. one side wraps around to the other), based on ref [1].

The default is 'rectangle'.

boundary : array-like, optional

positions of the walls that define the bounding box of the coordinates. The format must be given as ((ymin,ymax),(xmin,xmax)) when handle_edge is 'none', 'rectangle' or 'periodic rectangle'. All components must share the same boundary.

If all coordinate sets share the same boundary, a single such boundary can be given, otherwise a list of such array-likes of the same length as coordinates can be given for specifying boundaries of each set in coordinates separately. The default is None which determines the smallest set of boundaries encompassing a set of coordinates.

density : list of float, optional
number density of particles in the box for each of the components to use for normalizing the g(r) values. The default is the average density based on coordinates and boundary.
combinations : list of tuple of int
list of different combinations of the two components to calculate the g(r) for, where the first element is the integer index of the component to use for the central reference particles and the second is the index of the component to bincount as neighbouring particles around the reference particles. The default is all possible combinations of n components without swapping order, i.e. (1,0) but not (0,1).
quiet : bool, optional
if True, no output is printed to the terminal by this function call. The default is False.
neighbors_upper_bound : int, optional
upper limit on the number of neighbors expected within rmax from a particle. Useful for reducing memory consumption in datasets with dimensions much larger than rmax. The default is None, which takes the total number of particles in the coordinate set as upper limit.
workers : int, optional
number of workers to use for parallel processing during the neighbour detection step. If -1 is given all CPU threads are used. Note: this is ignored when handle_edge='periodic rectangle'. The default is 1.

Returns

rvals : numpy.array
bin-edges of the radial distribution function.
bincounts : list of numpy.array
array of values for the bins of the radial distribution function for each item in combinations (in that order)
combinations : tuple:
pairs of indices in coordinates defining the order of the different combinations of components, so ((0,0), (1,0)) means the first item in bincounts is the rdf of the first component in coordinates with itself, while the second index in bincouts gives the rdf with shells of the first component around the second central reference particles of the second component.

References

[1] Markus Seserno (2014). How to calculate a three-dimensional g(r) under periodic boundary conditions. https://www.cmu.edu/biolphys/deserno/pdf/gr_periodic.pdf

def rdf_dist_hist_3d(coordinates, handle_edge='cuboid', **kwargs)

calculates g(r) via a 'conventional' distance histogram method for a number of multi-component 3D coordinate sets.

Parameters

coordinates : (list-like of) list-like of numpy.ndarray of floats

list-like where each item along the 0th dimension is an N×3 numpy.ndarray of particle coordinates of one of the components, with each row specifying a particle position in the form [z,y,x]. Alternatively, a list-like of the above may be given where each item is an independent set of coordinate components (e.g. a time step from a series of z-stacks), in which case every set must have the same number of components but is not required to have the same number of particles and is assumed to share the same boundaries when no boundaries or only a single set of boundaries are given.

For example, for a binary (two-component) system, coordinates would be a list containing two numpy.ndarrays in case of a single dataset, or a list with M lists each containing 2 arrays in case of M independent datasets.

rmin : float, optional
lower bound for the pairwise distance, left edge of 0th bin. The default is 0.
rmax : float, optional
upper bound for the pairwise distance, right edge of last bin. The default is 10.
dr : float, optional
bin width for the pairwise distance bins. The default is (rmax-rmin)/20
handle_edge : one of ['none','cuboid','periodic cuboid'], optional

specifies how to correct for edge effects in the radial distribution function. These edge effects occur due to particles closer than rmax to any of the boundaries missing part of their neighbour shells in the dataset. The following options for correcting for this are available:

  • 'none' or None or False (all equivalent): do not correct for edge effects at all. Note that in order to calculate the particle density, cuboidal boundaries are assumed even when boundary is not specified. This can be overridden by explicitely giving the particle density using density.
  • 'cuboid': correct for the missing volume in cuboidal boundary conditions, e.g. a 3D rectangular box with right angles. Based on ref. [1]
  • 'periodic cuboid': like 'cuboid', except in periodic boundary conditions (i.e. one side wraps around to the other). Based on ref. [2].

The default is 'cuboid'.

boundary : array-like, optional

positions of the walls that define the bounding box of the coordinates. The format must be given as ((zmin,zmax),(ymin,ymax),(xmin,xmax)) when handle_edge is 'none', 'cuboid' or 'periodic cuboid'. All components must share the same boundary.

If all coordinate sets share the same boundary a single such boundary can be given, otherwise a list of such array-likes of the same length as coordinates can be given for specifying boundaries of each set in coordinates separately. The default is None which determines the smallest set of boundaries encompassing a set of coordinates.

density : list of float, optional
number density of particles in the box for each of the components to use for normalizing the g(r) values. The default is the average density based on coordinates and boundary.
combinations : list of tuple of int
list of different combinations of the two components to calculate the g(r) for, where the first element is the integer index of the component to use for the central reference particles and the second is the index of the component to bincount as neighbouring particles around the reference particles. The default is all possible combinations of n components without swapping order, i.e. (1,0) but not (0,1).
quiet : bool, optional
if True, no output is printed to the terminal by this function call. The default is False.
neighbors_upper_bound : int, optional
upper limit on the number of neighbors expected within rmax from a particle. Useful for reducing memory consumption in datasets with dimensions much larger than rmax. The default is None, which takes the total number of particles in the coordinate set as upper limit.
workers : int, optional
number of workers to use for parallel processing during the neighbour detection step. If -1 is given all CPU threads are used. Note: this is ignored when handle_edge='periodic cuboid'. The default is 1.

Returns

rvals : numpy.array
bin-edges of the radial distribution function.
bincounts : list of numpy.array
array of values for the bins of the radial distribution function for each item in combinations (in that order)
combinations : tuple:
pairs of indices in coordinates defining the order of the different combinations of components, so ((0,0), (1,0)) means the first item in bincounts is the rdf of the first component in coordinates with itself, while the second index in bincouts gives the rdf with shells of the first component around the second central reference particles of the second component.

References

[1] Kopera, B. A. F., & Retsch, M. (2018). Computing the 3D Radial Distribution Function from Particle Positions: An Advanced Analytic Approach. Analytical Chemistry, 90(23), 13909–13914. https://doi.org/10.1021/acs.analchem.8b03157

[2] Markus Seserno (2014). How to calculate a three-dimensional g(r) under periodic boundary conditions. https://www.cmu.edu/biolphys/deserno/pdf/gr_periodic.pdf

def rdf_insertion_binned_2d(coordinates, pairpotential, handle_edge='rectangle', **kwargs)

Calculate g(r) from insertion of test-particles into sets of existing 2D coordinates, averaged over bins of width dr, and based on the pairwise interaction potential u(r) (in units of kT) for a number of multi-component 2D coordinate sets.

Parameters

coordinates : (list-like of) list-like of numpy.ndarray of floats

list-like where each item along the 0th dimension is an N×3 numpy.ndarray of particle coordinates of one of the components, with each row specifying a particle position in the form [z,y,x]. Alternatively, a list-like of the above may be given where each item is an independent set of coordinate components (e.g. a time step from a series of z-stacks), in which case every set must have the same number of components but is not required to have the same number of particles and is assumed to share the same boundaries when no boundaries or only a single set of boundaries are given.

For example, for a binary (two-component) system, coordinates would be a list containing two numpy.ndarrays in case of a single dataset, or a list with M lists each containing 2 arrays in case of M independent datasets.

pairpotential : iterable
list of values for the pairwise interaction potential. Must have length of len(pairpotential_binedges)-1 and be in units of thermal energy kT
rmin : float, optional
lower bound for the pairwise distance, left edge of 0th bin. The default is 0.
rmax : float, optional
upper bound for the pairwise distance, right edge of last bin. The default is 10.
dr : float, optional
bin width for the pairwise distance bins. The default is (rmax-rmin)/20
handle_edge : one of ['none','rectangle','periodic rectangle','circle', (list of) callable], optional

specifies how to correct for edge effects in the radial distribution function. These edge effects occur due to particles closer than rmax to any of the boundaries missing part of their neighbour shells in the dataset. The following options for correcting for this are available:

  • 'rectangle': correct for the missing volume in a square or rectangular boundary.
  • 'periodic' rectangle': like 'rectangle', except in periodic boundary conditions (i.e. one side wraps around to the other), based on ref [1].
  • 'circle': correct for missing volume in a spherical boundary.
  • a custom callable function (or list thereof) can be given to correct for arbitrary and/or mixed boundary conditions. This function must take three arguments: a numpy array of bin edges, an N×2 numpy array of coordinates and a boundary as specified in boundary, and return an N × len(bin edges)-1 numpy array with a value between 0 and 1 specifying the fraction of the volume of each spherical shell that is within the boundary.

The default is 'rectangle'.

boundary : array-like, optional
positions of the walls that define the bounding box of the coordinates, given as ((ymin,ymax),(xmin,xmax)) if all coordinate sets share the same boundary, or a list of such array-likes of the same length as coordinates for specifying boundaries of each set in coordinates separately. The default is the min and max values in the dataset along each dimension.
combinations : list of tuple of int
list of different combinations of the two components to calculate the g(r) for, where the first element is the integer index of the component to use for the central reference particles and the second is the index of the component to bincount as neighbouring particles around the reference particles. The default is all possible combinations of n components without swapping order, i.e. (1,0) but not (0,1).
n_ins : int, optional
the number of test-particles to insert into each item in coordinates. The default is 1000.
insert_grid : bool, optional
Wheter to insert the coordinates on an evenly spaced regular grid. The default is False which inserts on uniformly distributed pseudorandom coordinates.
ins_coords : numpy.ndarray, optional
set of insertion coordinates, overrides n_ins. The default is None.
avoid_boundary : bool, optional
if True, all test-particles are inserted at least rmax away from any of the surfaces defined in boundary to avoid effects of the finite volume of the bounding box. The default is False, which uses an analytical correction factor for missing volume of test-particles near the boundaries.
avoid_coordinates : bool, optional
whether to insert test-particles at least rmin away from the center of any of the 'real' coordinates in coordinates. The default is False.
pairpotential_binedges : iterable, optional
bin edges corresponding to the values in `pairpotential. The default is None, which uses the bins defined by rmin, rmax and dr.
interpolate : bool, optional
whether to use linear interpolation for calculating the interaction of two particles using the values in pairpotential. The default is True. If False, the nearest bin value is used.
neighbors_upper_bound : int, optional
upper limit on the number of neighbors expected within rmax from a particle. Useful for reducing memory consumption in datasets with dimensions much larger than rmax. The default is None, which takes the total number of particles in the coordinate set as upper limit.
workers : int, optional
number of workers to use for parallel processing during the neighbour detection step. If -1 is given all CPU threads are used. Note: this is ignored when handle_edge='periodic rectangle'. The default is 1.

Returns

rvals : numpy.array
bin-edges of the radial distribution function.
pair_correlation : numpy.array
values for the rdf / pair correlation function in each bin
counter : numpy.array
number of pair counts that contributed to the (mean) values in each bin
combinations : tuple:
pairs of indices in coordinates defining the order of the different combinations of components, so ((0,0), (1,0)) means the first item in bincounts is the rdf of the first component in coordinates with itself, while the second index in bincouts gives the rdf with shells of the first component around the second central reference particles of the second component.

References

[1] Markus Seserno (2014). How to calculate a three-dimensional g(r) under periodic boundary conditions. https://www.cmu.edu/biolphys/deserno/pdf/gr_periodic.pdf

def rdf_insertion_binned_3d(coordinates, pairpotential, handle_edge='cuboid', **kwargs)

Calculate g(r) from insertion of test-particles into sets of existing 3D coordinates, averaged over bins of width dr, and based on the pairwise interaction potential u(r) (in units of kT) for a number of multi-component 3D coordinate sets.

Parameters

coordinates : list-like of numpy.array
List of sets of coordinates, where each item along the 0th dimension is a n*3 numpy.array of particle coordinates, where each array is an independent set of coordinates (e.g. one z-stack, a time step from a video, etc.), with each element of the array of form [z,y,x]. Each set of coordinates is not required to have the same number of particles but all coordinates must be within the bounding box(es) given in boundary.
pairpotential : iterable or callable
list of values for the pairwise interaction potential. Must have length of len(pairpotential_binedges)-1 and be in units of thermal energy kT. Alternatively, any callable (function) which takes only a numpy array of pairwise distances and returns the pairpotential can be given.
rmin : float, optional
lower bound for the pairwise distance, left edge of 0th bin. The default is 0.
rmax : float
cut-off radius for the pairwise distance (right edge of last bin).
dr : float
bin width of the pairwise distance bins.
handle_edge : one of ['none','cuboid','periodic cuboid','sphere', (list of) callable], optional

specifies how to correct for edge effects in the radial distribution function. These edge effects occur due to particles closer than rmax to any of the boundaries missing part of their neighbour shells in the dataset. The following options for correcting for this are available:

  • 'none' or None or False (all equivalent): do not correct for edge effects at all. Note that in order to calculate the particle density, cuboidal boundaries are assumed even when boundary is not specified. This can be overridden by explicitely giving the particle density using density.
  • 'cuboid': correct for the missing volume in cuboidal boundary conditions, e.g. a 3D rectangular box with right angles. Based on ref. [1]
  • 'periodic' cuboid': like 'cuboid', except in periodic boundary conditions (i.e. one side wraps around to the other). Based on ref. [2].
  • 'sphere': correct for missing volume in spherical boundary conditions
  • a custom callable function (or list thereof) can be given to correct for arbitrary and/or mixed boundary conditions. This function must take three arguments: a numpy array of bin edges, an N×3 numpy array of coordinates and a boundary as specified in boundary, and return an N × len(bin edges)-1 numpy array with a value between 0 and 1 specifying the fraction of the volume of each spherical shell that is within the boundary.

The default is 'cuboid'.

boundary : array-like of form ((zmin,zmax),(ymin,ymax),(xmin,xmax))``
Positions of the walls that define the bounding box of the coordinates, given as a single array-like for a shared set of boundaries for all coordinates, or an list-like of such array-likes with the same length as coordinates for a separate set of boundaries for each.
combinations : list of tuple of int
list of different combinations of the two components to calculate the g(r) for, where the first element is the integer index of the component to use for the central reference particles and the second is the index of the component to bincount as neighbouring particles around the reference particles. The default is all possible combinations of n components without swapping order, i.e. (1,0) but not (0,1).
n_ins : int, optional
the number of test-particles to insert into each item in coordinates. The default is 1000.
insert_grid : bool, optional
Wheter to insert the coordinates on an evenly spaced regular grid. The default is False which inserts on uniformly distributed pseudorandom coordinates.
ins_coords : numpy.ndarray, optional
set of insertion coordinates, overrides n_ins. The default is None.
avoid_boundary : bool, optional
if True, all test-particles are inserted at least rmax away from any of the surfaces defined in boundary to avoid effects of the finite volume of the bounding box. The default is False, which uses an analytical correction factor for missing volume of test-particles near the boundaries.
avoid_coordinates : bool, optional
whether to insert test-particles at least rmin away from the center of any of the 'real' coordinates in coordinates. The default is False.
pairpotential_binedges : iterable, optional
bin edges corresponding to the values in `pairpotential. The default is None, which uses the bins defined by rmin, rmax and dr.
interpolate : bool, optional
whether to use linear interpolation for calculating the interaction of two particles using the values in pairpotential. The default is True. If False, the nearest bin value is used.
neighbors_upper_bound : int, optional
upper limit on the number of neighbors expected within rmax from a particle. Useful for reducing memory consumption in datasets with dimensions much larger than rmax. The default is None, which takes the total number of particles in the coordinate set as upper limit.
workers : int, optional
number of workers to use for parallel processing during the neighbour detection step. If -1 is given all CPU threads are used. Note: this is ignored when handle_edge='periodic rectangle'. The default is 1.

Returns

rvals : numpy.array
bin-edges of the radial distribution function.
pair_correlation : numpy.array
values for the rdf / pair correlation function in each bin
counter : numpy.array
number of pair counts that contributed to the (mean) values in each bin
combinations : tuple:
pairs of indices in coordinates defining the order of the different combinations of components, so ((0,0), (1,0)) means the first item in bincounts is the rdf of the first component in coordinates with itself, while the second index in bincouts gives the rdf with shells of the first component around the second central reference particles of the second component.

References

[1] Kopera, B. A. F., & Retsch, M. (2018). Computing the 3D Radial Distribution Function from Particle Positions: An Advanced Analytic Approach. Analytical Chemistry, 90(23), 13909–13914. https://doi.org/10.1021/acs.analchem.8b03157

[2] Markus Seserno (2014). How to calculate a three-dimensional g(r) under periodic boundary conditions. https://www.cmu.edu/biolphys/deserno/pdf/gr_periodic.pdf

def run_iteration(coordinates, pair_correlation_func, rmin=0, rmax=10, dr=None, initial_guess=None, max_iterations=100, convergence_tol=1e-05, zero_clip=1e-20, regulate=False, **kwargs)

Run the algorithm to solve for the pairwise potential that most accurately reproduces the radial distribution function using test-particle insertion, as described in ref. [1].

Parameters

coordinates : list-like of numpy.array
List of sets of coordinates, where each item along the 0th dimension is a n*3 numpy.array of particle coordinates, where each array is an independent set of coordinates (e.g. one z-stack, a time step from a video, etc.), with each element of the array of form [z,y,x] or [y,x] in case of 2D data. Each set of coordinates is not required to have the same number of particles but all stacks must share the same bounding box as given by boundary, and all coordinates must be within this bounding box.
pair_correlation_func : list of float
bin values for the true pair correlation function that the algorithm will try to match iteratively.
rmin : float, optional
left edge of the smallest bin in interparticle distance r to consider. The default is 0.
rmax : float, optional
Right edge of the largest bin in interparticle distance r to consider. The default is 20.
dr : float, optional
Stepsize or bin width in interparticle distance r. The default is 0.5.
initial_guess : list of float, optional
Initial guess for the particle potential on the 0th iteration. The default is None which gives 0 in each bin.
max_iterations : int, optional
Maximum number of iterations after which the algorithm is ended. The default is 100.
convergence_tol : float, optional
target value for χ², if it dips below this value the iteration is considered to be converged and ended. The default is 1e-5.
zero_clip : float, optional
values below the value of zero-clip are set to this value to avoid devision by zero errors. The default is 1e-20.
regulate : bool, optional
if True, use regularization to more gently nudge towards the input g(r) at the cost of slower convergence. Experimental option. The default is False.
**kwargs : key=value
Additional keyword arguments are passed on to rdf_insertion_binned_2d() or rdf_insertion_binned_3d()

Returns

χ² : list of float
summed squared error in the pair correlation function for each iteration
pairpotential : list of list of float
the values for the pair potential in each bin for each iteration
paircorrelation : list of list of float
the values for the pair correlation function from test-particle insertion for each iteration
counts : list of list of int
number of pair counts contributing to each bin in each iteration
combinations : tuple:
pairs of indices in coordinates defining the order of the different combinations of components, so ((0,0), (1,0)) means the first item in bincounts is the rdf of the first component in coordinates with itself, while the second index in bincouts gives the rdf with shells of the first component around the second central reference particles of the second component.

References

[1] Stones, A. E., Dullens, R. P. A., & Aarts, D. G. A. L. (2019). Model- Free Measurement of the Pair Potential in Colloidal Fluids Using Optical Microscopy. Physical Review Letters, 123(9), 098002. https://doi.org/10.1103/PhysRevLett.123.098002

See Also

rdf_insertion_binned_2d()
2D routine for g(r) from test-particle insertion
rdf_insertion_binned_3d()
3D routine for g(r) from test-particle insertion