Release notes

v0.5.0

Released on 2026-05-16

if v0.4.0 moved the k_mer=1 hot paths from pandas to numpy, this release does the same for k_mer>1. CodonCounter.count_array is now a single vectorised entry point for any k_mer in [1, 3], and every score routes through it on the hot path. count(seqs) becomes a thin formatter on top.

alongside the perf work, the package consolidates a few subclass-as-config patterns into single classes with a kwarg (WeightOptimizer(strategy=...), Permuter(scope=...)), promotes underscore-prefixed counter internals onto the documented public surface that scores depend on, and factors the str/list/ndarray dispatch shared by get_score / get_vector / get_weights into a single Score._dispatch helper.

per-call timings on a 3.6 kb sequence (200 calls, mean):

score / path

before

after

speedup

ENC k_mer=2, bg_correction=True

22.9 ms

0.21 ms

~110x

CAI k_mer=2

0.82 ms

0.05 ms

~16x

RSCU

~1.05 ms

0.017 ms

~60x

CodonCounter.count(seq).counts k=2

0.20 ms

0.11 ms

~1.9x

CodonCounter.count_array(seq) k=2

(new in v0.5.0)

0.040 ms

RCB

~0.77 ms

0.044 ms

~17x

ENC k_mer=1 (default, bg_correction)

(vectorised in v0.4.0)

within noise

CUFS.get_score

1.02 ms

0.042 ms

~24x

CUFS.get_score synonymous=True

1.77 ms

0.045 ms

~39x

(the ~110x on ENC k_mer=2 came from replacing a python listcomp over 3,721 codon-pair strings with BNC[codon_base_idx_kmer].prod(axis=1).)

performance

  • CodonCounter.count_array generalised to k_mer in [1, 3] (#27): sliding window over codon ids, combined into a single bucket id per k-mer, bincounted into a dense aligned ndarray.

  • CodonCounter.count(seqs) becomes a thin formatter on top of count_array (#27); k-mer concat-string index built lazily via the new kmer_index property.

  • CAI and CodonPairBias k_mer>1 wired to count_array with pre-aligned log/linear weight arrays (#27), skipping the per-call pandas reindex.

  • RSCU and RCBS rewritten as stateless ndarray calls on count_array with weights pre-aligned at init (#24).

  • ENC k_mer=1 / k_mer>1 paths unified around count_array (#28); the two _calc_*_single_kmer / _calc_* methods collapse to one body driven by aa_group_kmer / codon_base_idx_kmer LUTs.

  • BaseCounter.count_array vectorised for any k_mer (#27); sliding window over base ids respects frame / step semantics.

  • geomean_array / mean_array helpers in utils for the count_array hot path (#26, #27), aligned-ndarray siblings of the pandas geomean / mean.

  • pairwise.CodonUsageFrequency rewritten on count_array: per-seq arrays stacked, normalised with np.add.at per-aa-group sums (synonymous) or a single divide (non-synonymous). count_array order replaces the prior alphabetic codon order; KL pair score is invariant under consistent reordering.

refactors

  • MaxWeight / MinWeight / BalancedWeight collapsed into WeightOptimizer(strategy="max"|"min"|"balanced") (#20). old names emit FutureWarning via shims; will be removed in v0.6.0.

  • IntraSeqPermuter / IntraPosPermuter collapsed into Permuter(scope="intra_seq"|"intra_pos") (#23). old names emit FutureWarning via shims; will be removed in v0.6.0.

  • counter internals promoted to the public surface that scores depend on (#22, #25, #27, #28): count_array, aa_group, n_aa, codon_index, codon_base_idx, kmer_index, aa_group_kmer, codon_base_idx_kmer. _codon_lex_to_aa renamed to _codon_lex_to_idx (it stored codon indices, not aa indices).

  • temporal coupling fixed in RelativeSynonymousCodonUsage and RelativeCodonBiasScore (#24): _calc_score no longer relies on a prior _calc_seq_weights populating self.counter.counts, which broke concurrent use.

  • Score._dispatch shared by ScalarScore.get_score, VectorScore.get_vector and WeightScore.get_weights; each base previously carried its own near-identical str/list/ndarray dispatch shell. public API unchanged.

  • pairwise.PairwiseScore.get_matrix n_jobs=1 path fixed: the sequential starmap branch was unconditionally overwritten by the next-line Pool.starmap, so n_jobs=1 silently paid fork overhead with no parallelism. output unchanged.

breaking changes

  • CodonCounter and BaseCounter no longer accept k_mer >= 4 for count_array (which now drives count() too); the python Counter fallback is retired (#27). no in-package score uses k_mer > 2; the documented k-mer support is now [1, 3]. attempts above raise NotImplementedError with an explicit message.

  • MaxWeight / MinWeight / BalancedWeight (#20) and IntraSeqPermuter / IntraPosPermuter (#23) now emit FutureWarning on instantiation. behaviour is preserved via shims; the names will be removed in v0.6.0.

  • RSCU, RCBS and CAI no longer populate self.counter.counts as a side effect of get_score/get_weights (#24). callers reading counter.counts after a score call should either call counter.count(seqs) explicitly or use the score’s public weight output.

tooling

  • CI runtime cut from ~13 min to ~3 min (#21). E. coli regression tests subsample to 500 sequences by default (set ECOLI_FULL=1 to run on the full corpus); python 3.8 dropped from the matrix.

  • RSCU regression baseline added against pre-deep-modules main (commit 2bc54b3). pins the four (directional, mean) combinations on the first 500 E. coli sequences (#25).

  • property tests for CodonPairBias via hypothesis, covering the k_mer=2 count_array / weights path against the pre-refactor pandas reference.

  • CHANGELOG.md rendered into the Sphinx site via myst-parser, so the readthedocs build now publishes the release notes alongside the API reference.

  • issue_module_depth.md (local plan) drove the deepening work through eight candidates; all closed in this release.

Full Changelog: https://github.com/alondmnt/codon-bias/compare/v0.4.0…v0.5.0


v0.4.0

Released on 2026-05-02

this release moves the scalar-score hot paths from pandas to numpy, with follow-on vectorisation of ENC’s bg_correction path and a partial vectorisation of RCB. on E. coli K-12: ~47x on ENC default, ~25x on FOP/CAI/tAI/nTE, ~40x on ENC with bg_correction=True, 1.7x on RCB. public APIs are unchanged; two opt-in code paths have small behaviour changes (see breaking).

big thanks to @RedPenguin100, who set up the regression test suite and CI that this release was built on, and kicked off the numpy migration with the first ENC rewrite (PR #13) that everything else followed from.

performance

  • scoring pipeline moved to numpy across ENC, FOP, CAI, tAI, nTE (#8). PR #13 by @RedPenguin100 rewrote EffectiveNumberOfCodons on ndarrays and seeded the _count_single numpy return for k_mer=1; the #8 closing commits extended the same approach to FOP, CAI, tAI, and nTE, with weights pre-aligned at init.

  • CodonCounter._count_single (k_mer=1) vectorised via a base-5 packed codon LUT and one numpy advanced-indexing op.

  • ENC bg_correction=True vectorised (#18). _calc_BCC precomputes a codon-to-base-index matrix; _calc_BNC reaches through to BaseCounter._count_single, skipping pandas scaffolding.

  • RCB per-sequence background partially vectorised (refs #19).

score

before

after

total

ENC (default)

~6.8 s

146 ms

~47x

FOP / CAI / tAI / nTE

~1.0 s

40 ms

~25x

ENC (bg_correction=True)

~9.8 s

246 ms

~40x

RCB.get_score

4.78 s

2.85 s

1.7x

(of the ~40x on bg_correction=True, ~3x came from #8 - which left it at 3.1 s, dominated by BaseCounter._count_single and _calc_BCC - and the remaining 11.9x from #18.)

fixes

  • ENC weighted mean now filters undersampled amino acids (#15). previously, pseudocount-only AAs could dilute weighted scores toward zero; on the bg_correction path, F=inf could poison degeneracy groups (silently capped by the min(len(P), ENC) guard).

  • centralised in-frame codon iteration via utils.iter_codons. fixes a latent crash in Permuter._preprocess_seq on non-multiple-of-3 input and tightens k_mer >= 2 iteration so trailing partial k-mers are no longer emitted.

  • Permuter RNG modernised (#17): per-group independent PCG64 streams via np.random.default_rng, replacing np.random.seed calls that leaked into global state and re-seeded each group identically.

  • fetch_GCN_from_GtRNAdb sets a descriptive User-Agent so GtRNAdb’s bot-filter no longer 403s fresh installs.

breaking changes

  • Permuter permutation output is bit-exactly different for any given random_state (#17). statistical correctness is improved (independent per-group streams) - downstream z-scores/p-values should be more accurate. permutations are stable for a given input but not against input perturbation (adding/removing sequences shifts group indices).

  • get_vector / _calc_vector for k_mer >= 2 returns a vector k_mer - 1 elements shorter (the previously-NaN trailing slots for partial k-mers are no longer emitted). element values at retained positions are identical. callers depending on len(vector) == len(seq) // 3 should update to len(vector) == len(seq) // 3 - (k_mer - 1).

  • ENC weighted-mean filter (#15) changes scores for EffectiveNumberOfCodons(mean="weighted") with robust=False and/or pseudocount=0 on sequences with undersampled AAs. default configuration on well-sampled sequences is unaffected.

tooling

  • GitHub Actions CI with pytest regression tests on a vendored E. coli K-12 corpus (FOP, CAI, tAI, nTE, ENC), parallelised with caching (PRs #10, #12, #14 by @RedPenguin100).

  • ruff for formatting and lint, enforced on push (PR #16).

acknowledgements

  • @RedPenguin100 - landed the test suite and CI (#10, #12, #14) that made the rest of this release safe to do, and the first numpy ENC rewrite (#13) that the wider migration built on.

Full Changelog: https://github.com/alondmnt/codon-bias/compare/v0.3.5…v0.4.0


v0.3.5

Released on 2025-03-13T09:38:20Z

  • improved: input type validation

Full Changelog: https://github.com/alondmnt/codon-bias/compare/v0.3.4…v0.3.5


v0.3.4

Released on 2025-03-06T12:17:31Z

  • added: FrequencyOfOptimalCodons: weights arg (closes #5)

  • fixed: pandas v2 compatibility (closes #5)

  • improved: FrequencyOfOptimalCodons: default thresh 0.95 -> 0.8

Full Changelog: https://github.com/alondmnt/codon-bias/compare/v0.3.3…v0.3.4


v0.3.3

Released on 2025-02-11T02:04:11Z

What’s Changed

New Contributors

Full Changelog: https://github.com/alondmnt/codon-bias/compare/v0.3.2…v0.3.3


v0.3.2

Released on 2024-12-20T04:55:25Z

What’s Changed

  • tAI fixes/improvements by @moritzburghardt in https://github.com/alondmnt/codon-bias/pull/3

    • new: optimize_s_values method

    • new: allow custom s_values

    • fixed: handle invalid literal for int in gtrnadb table

    • fixed: CERTIFICATE_VERIFY_FAILED error on GtRNAdb

New Contributors

Full Changelog: https://github.com/alondmnt/codon-bias/compare/v0.3.1…v0.3.2


v0.3.1

Released on 2023-06-14T15:53:08Z

Full Changelog: https://github.com/alondmnt/codon-bias/compare/0.3.0…v0.3.1


v0.3.0

Released on 2022-10-28T17:10:42Z

  • new:

    • new module: optimizers with the classes MaxWeight, MinWeight, and BalancedWeight

    • new module: random with the classes Permuter, IntraSeqPermuter and IntraPosPermuter

    • added utils.ReferenceSelector

  • changed:

    • utils.translate returns a dataframe by default

  • fixed:

    • avoid numpy deprecation warning in VectorScore’s get_vector function

Full Changelog: https://github.com/alondmnt/codon-bias/compare/0.2.0…0.3.0


v0.2.0

Released on 2022-09-13T15:58:19Z

  • new:

    • added scores.NormalizedTranslationalEfficiency

    • added scores.CodonPairBias

    • added stats.BaseCounter for nucleotide and k-mer statistics across reading frames

    • added k_mer parameter to:

      • stats.CodonCounter

      • scores.CodonAdaptationIndex

      • scores.EffectiveNumberOfCodons

      • pairwise.CodonUsageFrequency

    • added abstract class scores.WeightScore that computes a weight vector for each input sequence, with the following children:

      • scores.CodonPairBias

      • scores.EffectiveNumberOfCodons

      • scores.RelativeSynonymousCodonUsage

      • scores.RelativeCodonBiasScore

  • improved:

    • various improvements to scores.EffectiveNumberOfCodons

      • background correction

      • improved estimation

    • added count() method to counter classes

    • added pseudocount parameter to models

Full Changelog: https://github.com/alondmnt/codon-bias/compare/0.1.0…0.2.0


v0.1.0

Released on 2022-08-27T19:54:54Z

First release.

  • stats.CodonCounter

  • scores.FrequencyOfOptimalCodons (FOP)

  • scores.RelativeSynonymousCodonUsage (RSCU)

  • scores.CodonAdaptationIndex (CAI)

  • scores.EffectiveNumberOfCodons (ENC)

  • scores.TrnaAdaptationIndex (tAI)

  • scores.RelativeCodonBiasScore (RCBS + DCBS)

  • pairwise.CodonUsageFrequency (CUFS)