我已经使用NumPy和Numba的JIT在Python中实现了2D Ising模型:


from timeit import default_timer as timer
import matplotlib.pyplot as plt
import numba as nb
import numpy as np

# TODO for Dict optimization.
# from numba import types
# from numba.typed import Dict

@nb.njit(nogil=True)
def initialstate(N):   
    ''' 
    Generates a random spin configuration for initial condition
    '''
    state = np.empty((N,N),dtype=np.int8)
    for i in range(N):
        for j in range(N):
            state[i,j] = 2*np.random.randint(2)-1
    return state



@nb.njit(nogil=True)
def mcmove(lattice, beta, N):
    '''
    Monte Carlo move using Metropolis algorithm 
    '''
    
    # # TODO* Dict optimization 
    # dict_param = Dict.empty(
    # key_type=types.int64,
    # value_type=types.float64,
    # )
    # dict_param = {cost : np.exp(-cost*beta) for cost in [-8, -4, 0, 4, 8] }

    for _ in range(N):
        for __ in range(N):
                a = np.random.randint(0, N)
                b = np.random.randint(0, N)
                s =  lattice[a, b]
                dE = lattice[(a+1)%N,b] + lattice[a,(b+1)%N] + lattice[(a-1)%N,b] + lattice[a,(b-1)%N]
                cost = 2*s*dE

                if cost < 0:
                    s *= -1
                
                #TODO* elif np.random.rand() < dict_param[cost]:
                elif np.random.rand() < np.exp(-cost*beta):
                    s *= -1
                lattice[a, b] = s

    return lattice


@nb.njit(nogil=True)
def calcEnergy(lattice, N):
    '''
    Energy of a given configuration
    '''
    energy = 0 
    
    for i in range(len(lattice)):
        for j in range(len(lattice)):
            S = lattice[i,j]
            nb = lattice[(i+1)%N, j] + lattice[i,(j+1)%N] + lattice[(i-1)%N, j] + lattice[i,(j-1)%N]
            energy += -nb*S
    return energy/2


@nb.njit(nogil=True)
def calcMag(lattice):
    '''
    Magnetization of a given configuration
    '''
    mag = np.sum(lattice, dtype=np.int32)
    return mag

@nb.njit(nogil=True)
def ISING_model(nT, N, burnin, mcSteps):

    """ 
    nT      :         Number of temperature points.
    N       :         Size of the lattice, N x N.
    burnin  :         Number of MC sweeps for equilibration (Burn-in).
    mcSteps :         Number of MC sweeps for calculation.

    """


    T       = np.linspace(1.2, 3.8, nT); 
    E,M,C,X = np.zeros(nT), np.zeros(nT), np.zeros(nT), np.zeros(nT)
    n1, n2  = 1.0/(mcSteps*N*N), 1.0/(mcSteps*mcSteps*N*N) 


    for temperature in range(nT):
        lattice = initialstate(N)         # initialise

        E1 = M1 = E2 = M2 = 0
        iT = 1/T[temperature]
        iT2= iT*iT
        
        for _ in range(burnin):           # equilibrate
            mcmove(lattice, iT, N)        # Monte Carlo moves

        for _ in range(mcSteps):
            mcmove(lattice, iT, N)           
            Ene = calcEnergy(lattice, N)  # calculate the Energy
            Mag = calcMag(lattice,)       # calculate the Magnetisation

            E1 += Ene
            M1 += Mag
            M2 += Mag*Mag 
            E2 += Ene*Ene

        E[temperature] = n1*E1
        M[temperature] = n1*M1
        C[temperature] = (n1*E2 - n2*E1*E1)*iT2
        X[temperature] = (n1*M2 - n2*M1*M1)*iT

    return T,E,M,C,X


def main():
    
    N = 32
    start_time = timer()
    T,E,M,C,X = ISING_model(nT = 64, N = N, burnin = 8 * 10**4, mcSteps = 8 * 10**4)
    end_time = timer()

    print("Elapsed time: %g seconds" % (end_time - start_time))

    f = plt.figure(figsize=(18, 10)); #  

    # figure title
    f.suptitle(f"Ising Model: 2D Lattice\nSize: {N}x{N}", fontsize=20)

    _ =  f.add_subplot(2, 2, 1 )
    plt.plot(T, E, '-o', color='Blue') 
    plt.xlabel("Temperature (T)", fontsize=20)
    plt.ylabel("Energy ", fontsize=20)
    plt.axis('tight')


    _ =  f.add_subplot(2, 2, 2 )
    plt.plot(T, abs(M), '-o', color='Red')
    plt.xlabel("Temperature (T)", fontsize=20)
    plt.ylabel("Magnetization ", fontsize=20)
    plt.axis('tight')


    _ =  f.add_subplot(2, 2, 3 )
    plt.plot(T, C, '-o', color='Green')
    plt.xlabel("Temperature (T)", fontsize=20)
    plt.ylabel("Specific Heat ", fontsize=20)
    plt.axis('tight')


    _ =  f.add_subplot(2, 2, 4 )
    plt.plot(T, X, '-o', color='Black')
    plt.xlabel("Temperature (T)", fontsize=20)
    plt.ylabel("Susceptibility", fontsize=20)
    plt.axis('tight')

    plt.show()

if __name__ == '__main__':
    main()

当然,这是可行的:

enter image description here

我有两个主要问题:

  1. 还有什么需要优化的吗?我知道伊辛模型很难模拟,但看下表,我似乎遗漏了一些东西…
    lattice size : 32x32    
    burnin = 8 * 10**4
    mcSteps = 8 * 10**4
    Simulation time = 365.98 seconds

    lattice size : 64x64    
    burnin = 10**5
    mcSteps = 10**5
    Simulation time = 1869.58 seconds
  1. 我try 了另一个基于不使用字典一遍又一遍地计算指数的优化,但在我的测试中,它似乎更慢.我做错了什么?

推荐答案

指数的计算实际上并不是一个问题.主要的问题是会产生generating random numbers is expensive个和大量的随机值.另一个问题是,当前的计算本质上是连续的.

事实上,对于N=32,mcmove往往会生成大约3000个随机值,这个函数在每次迭代中被称为2*80_000次.这意味着,每次迭代会生成2 * 80_000 * 3000 = 480_000_000个随机数.假设生成随机数需要大约5纳秒(即,在4 GHz的CPU上只有20个周期),那么每次迭代将只需要大约2.5秒来生成所有的随机数.在我的4.5 GHz i5-9600KF CPU上,每次迭代大约需要2.5-3.0秒.

首先要做的是try 使用一种更快的方法生成随机数.坏消息是,这在Numba中很难做到,更广泛地说,在任何基于Python的代码中都很难做到.使用低级语言(如C或C++)进行微优化可以极大地帮助加快计算速度.这种低级微优化在高级语言/工具(包括Numba)中是不可能的.尽管如此,人们仍然可以实现random-number generator (RNG) specifically designed so to produce the random values you need.xoshiro256**可以用来快速生成随机数,尽管它可能不像Numpy/Numba产生的那样随机(没有免费的午餐).其思想是生成64位整数并提取位范围,从而产生2个16位整数和一个32位浮点值.在现代的CPU上,这个RNG应该能够在大约10个周期内生成3个值!

一旦应用了这种优化,指数的计算就成为新的瓶颈.它可以像您一样使用lookup table(LUT)进行改进.然而,使用词典的速度很慢.你可以为此付use a basic array英镑.这要快得多.请注意,该指数需要为正数且较小.因此,需要添加最低成本.

一旦实现了前面的优化,新的瓶颈就是条件if cost < 0elif c < ....The conditionals are slow because they are unpredictable(由于结果是随机的).事实上,现代CPU试图预测条件条件的结果,以避免CPU流水线中昂贵的停滞.这是一个复杂的话题.如果你想更多地了解这一点,那么请阅读this great post.在实践中,使用branchless次计算可以避免这样的问题.这意味着您需要使用二元运算符和整数棒,以便s的符号相对于条件的值进行更改.例如:s *= 1 - ((cost < 0) | (c < lut[cost])) * 2.

请注意,modulus are generally expensive unless the compiler know the value at compile time.当值是2的幂时,它们甚至更快,因为编译器可以使用位技巧来计算模数(更具体地说,是通过预编译的常量进行逻辑与).对于calcEnergy,解是compute the border separately,因此完全避免了模数.此外,当编译器知道编译时的迭代次数时,循环可以更快(它可以展开循环并更好地向量化它们).此外,当N不是2的幂时,RNG在没有任何偏向的情况下实现起来可能会慢得多,也更复杂,所以我假设N是2的幂.

以下是最终代码:

# [...] Same as in the initial code

@nb.njit(inline="always")
def rol64(x, k):
    return (x << k) | (x >> (64 - k))

@nb.njit(inline="always")
def xoshiro256ss_init():
    state = np.empty(4, dtype=np.uint64)
    maxi = (np.uint64(1) << np.uint64(63)) - np.uint64(1)
    for i in range(4):
        state[i] = np.random.randint(0, maxi)
    return state

@nb.njit(inline="always")
def xoshiro256ss(state):
    result = rol64(state[1] * np.uint64(5), np.uint64(7)) * np.uint64(9)
    t = state[1] << np.uint64(17)
    state[2] ^= state[0]
    state[3] ^= state[1]
    state[1] ^= state[2]
    state[0] ^= state[3]
    state[2] ^= t
    state[3] = rol64(state[3], np.uint64(45))
    return result

@nb.njit(inline="always")
def xoshiro_gen_values(N, state):
    '''
    Produce 2 integers between 0 and N and a simple-precision floating-point number.
    N must be a power of two less than 65536. Otherwise results will be biased (ie. not random).
    N should be known at compile time so for this to be fast
    '''
    rand_bits = xoshiro256ss(state)
    a = (rand_bits >> np.uint64(32)) % N
    b = (rand_bits >> np.uint64(48)) % N
    c = np.uint32(rand_bits) * np.float32(2.3283064370807974e-10)
    return (a, b, c)

@nb.njit(nogil=True)
def mcmove_generic(lattice, beta, N):
    '''
    Monte Carlo move using Metropolis algorithm.
    N must be a small power of two and known at compile time
    '''

    state = xoshiro256ss_init()

    lut = np.full(16, np.nan)
    for cost in (0, 4, 8, 12, 16):
        lut[cost] = np.exp(-cost*beta)

    for _ in range(N):
        for __ in range(N):
            a, b, c = xoshiro_gen_values(N, state)
            s =  lattice[a, b]
            dE = lattice[(a+1)%N,b] + lattice[a,(b+1)%N] + lattice[(a-1)%N,b] + lattice[a,(b-1)%N]
            cost = 2*s*dE

            # Branchless computation of s
            tmp = (cost < 0) | (c < lut[cost])
            s *= 1 - tmp * 2

            lattice[a, b] = s

    return lattice

@nb.njit(nogil=True)
def mcmove(lattice, beta, N):
    assert N in [16, 32, 64, 128]
    if N == 16: return mcmove_generic(lattice, beta, 16)
    elif N == 32: return mcmove_generic(lattice, beta, 32)
    elif N == 64: return mcmove_generic(lattice, beta, 64)
    elif N == 128: return mcmove_generic(lattice, beta, 128)
    else: raise Exception('Not implemented')

@nb.njit(nogil=True)
def calcEnergy(lattice, N):
    '''
    Energy of a given configuration
    '''
    energy = 0 
    # Center
    for i in range(1, len(lattice)-1):
        for j in range(1, len(lattice)-1):
            S = lattice[i,j]
            nb = lattice[i+1, j] + lattice[i,j+1] + lattice[i-1, j] + lattice[i,j-1]
            energy -= nb*S
    # Border
    for i in (0, len(lattice)-1):
        for j in range(1, len(lattice)-1):
            S = lattice[i,j]
            nb = lattice[(i+1)%N, j] + lattice[i,(j+1)%N] + lattice[(i-1)%N, j] + lattice[i,(j-1)%N]
            energy -= nb*S
    for i in range(1, len(lattice)-1):
        for j in (0, len(lattice)-1):
            S = lattice[i,j]
            nb = lattice[(i+1)%N, j] + lattice[i,(j+1)%N] + lattice[(i-1)%N, j] + lattice[i,(j-1)%N]
            energy -= nb*S
    return energy/2

@nb.njit(nogil=True)
def calcMag(lattice):
    '''
    Magnetization of a given configuration
    '''
    mag = np.sum(lattice, dtype=np.int32)
    return mag

# [...] Same as in the initial code

我希望代码中没有错误.很难用不同的RNG判断结果.

结果代码在我的机器上要快得多:它在5.3秒内计算4次迭代,N=32次,而不是24.1秒.因此,计算结果是4.5 times faster

在Python中使用Numba来进一步优化代码是非常困难的.由于long dependency chain in mcmove,计算不能有效地并行化.

Python相关问答推荐

OR—Tools CP SAT条件约束

Godot:需要碰撞的对象的AdditionerBody2D或Area2D以及queue_free?

在Django admin中自动完成相关字段筛选

如何在图中标记平均点?

使用Python和文件进行模糊输出

基于Scipy插值法的三次样条系数

如何在验证文本列表时使正则表达式无序?

Python—在嵌套列表中添加相同索引的元素,然后计算平均值

简单 torch 模型测试:ModuleNotFoundError:没有名为';Ultralytics.yolo';

每次查询的流通股数量

pytest、xdist和共享生成的文件依赖项

Pandas:将值从一列移动到适当的列

仅取消堆叠最后三列

组颠倒大Pandas 数据帧

是否将列表分割为2?

如何从具有完整层次数据的Pandas框架生成图形?

Pandas查找给定时间戳之前的最后一个值

判断字典中是否有多个值对

try 第二次训练新的JAX+Equinox模型时,具有多个元素的数组的真值不明确(&Q)

更新-如何与一个我无法使用python获得的按钮进行交互-Selify?