Release notes
v0.5.0
Released on 2026-05-16
if v0.4.0 moved the k_mer=1 hot paths from pandas to numpy, this release
does the same for k_mer>1. CodonCounter.count_array is now a single
vectorised entry point for any k_mer in [1, 3], and every score routes
through it on the hot path. count(seqs) becomes a thin formatter on top.
alongside the perf work, the package consolidates a few subclass-as-config
patterns into single classes with a kwarg (WeightOptimizer(strategy=...),
Permuter(scope=...)), promotes underscore-prefixed counter internals
onto the documented public surface that scores depend on, and factors
the str/list/ndarray dispatch shared by get_score / get_vector /
get_weights into a single Score._dispatch helper.
per-call timings on a 3.6 kb sequence (200 calls, mean):
score / path |
before |
after |
speedup |
|---|---|---|---|
ENC k_mer=2, bg_correction=True |
22.9 ms |
0.21 ms |
~110x |
CAI k_mer=2 |
0.82 ms |
0.05 ms |
~16x |
RSCU |
~1.05 ms |
0.017 ms |
~60x |
|
0.20 ms |
0.11 ms |
~1.9x |
|
(new in v0.5.0) |
0.040 ms |
|
RCB |
~0.77 ms |
0.044 ms |
~17x |
ENC k_mer=1 (default, bg_correction) |
(vectorised in v0.4.0) |
within noise |
|
CUFS.get_score |
1.02 ms |
0.042 ms |
~24x |
CUFS.get_score synonymous=True |
1.77 ms |
0.045 ms |
~39x |
(the ~110x on ENC k_mer=2 came from replacing a python listcomp over
3,721 codon-pair strings with BNC[codon_base_idx_kmer].prod(axis=1).)
performance
CodonCounter.count_arraygeneralised to k_mer in [1, 3] (#27): sliding window over codon ids, combined into a single bucket id per k-mer, bincounted into a dense aligned ndarray.CodonCounter.count(seqs)becomes a thin formatter on top ofcount_array(#27); k-mer concat-string index built lazily via the newkmer_indexproperty.CAI and CodonPairBias k_mer>1 wired to
count_arraywith pre-aligned log/linear weight arrays (#27), skipping the per-call pandas reindex.RSCU and RCBS rewritten as stateless ndarray calls on
count_arraywith weights pre-aligned at init (#24).ENC k_mer=1 / k_mer>1 paths unified around
count_array(#28); the two_calc_*_single_kmer/_calc_*methods collapse to one body driven byaa_group_kmer/codon_base_idx_kmerLUTs.BaseCounter.count_arrayvectorised for any k_mer (#27); sliding window over base ids respectsframe/stepsemantics.geomean_array/mean_arrayhelpers inutilsfor the count_array hot path (#26, #27), aligned-ndarray siblings of the pandasgeomean/mean.pairwise.CodonUsageFrequencyrewritten oncount_array: per-seq arrays stacked, normalised withnp.add.atper-aa-group sums (synonymous) or a single divide (non-synonymous). count_array order replaces the prior alphabetic codon order; KL pair score is invariant under consistent reordering.
refactors
MaxWeight/MinWeight/BalancedWeightcollapsed intoWeightOptimizer(strategy="max"|"min"|"balanced")(#20). old names emitFutureWarningvia shims; will be removed in v0.6.0.IntraSeqPermuter/IntraPosPermutercollapsed intoPermuter(scope="intra_seq"|"intra_pos")(#23). old names emitFutureWarningvia shims; will be removed in v0.6.0.counter internals promoted to the public surface that scores depend on (#22, #25, #27, #28):
count_array,aa_group,n_aa,codon_index,codon_base_idx,kmer_index,aa_group_kmer,codon_base_idx_kmer._codon_lex_to_aarenamed to_codon_lex_to_idx(it stored codon indices, not aa indices).temporal coupling fixed in
RelativeSynonymousCodonUsageandRelativeCodonBiasScore(#24):_calc_scoreno longer relies on a prior_calc_seq_weightspopulatingself.counter.counts, which broke concurrent use.Score._dispatchshared byScalarScore.get_score,VectorScore.get_vectorandWeightScore.get_weights; each base previously carried its own near-identical str/list/ndarray dispatch shell. public API unchanged.pairwise.PairwiseScore.get_matrixn_jobs=1 path fixed: the sequentialstarmapbranch was unconditionally overwritten by the next-linePool.starmap, so n_jobs=1 silently paid fork overhead with no parallelism. output unchanged.
breaking changes
CodonCounterandBaseCounterno longer acceptk_mer >= 4forcount_array(which now drivescount()too); the pythonCounterfallback is retired (#27). no in-package score uses k_mer > 2; the documented k-mer support is now [1, 3]. attempts above raiseNotImplementedErrorwith an explicit message.MaxWeight/MinWeight/BalancedWeight(#20) andIntraSeqPermuter/IntraPosPermuter(#23) now emitFutureWarningon instantiation. behaviour is preserved via shims; the names will be removed in v0.6.0.RSCU, RCBS and CAI no longer populate
self.counter.countsas a side effect ofget_score/get_weights(#24). callers readingcounter.countsafter a score call should either callcounter.count(seqs)explicitly or use the score’s public weight output.
tooling
CI runtime cut from ~13 min to ~3 min (#21). E. coli regression tests subsample to 500 sequences by default (set
ECOLI_FULL=1to run on the full corpus); python 3.8 dropped from the matrix.RSCU regression baseline added against pre-deep-modules main (commit 2bc54b3). pins the four
(directional, mean)combinations on the first 500 E. coli sequences (#25).property tests for
CodonPairBiasviahypothesis, covering the k_mer=2count_array/ weights path against the pre-refactor pandas reference.CHANGELOG.mdrendered into the Sphinx site viamyst-parser, so the readthedocs build now publishes the release notes alongside the API reference.issue_module_depth.md(local plan) drove the deepening work through eight candidates; all closed in this release.
Full Changelog: https://github.com/alondmnt/codon-bias/compare/v0.4.0…v0.5.0
v0.4.0
Released on 2026-05-02
this release moves the scalar-score hot paths from pandas to numpy, with
follow-on vectorisation of ENC’s bg_correction path and a partial
vectorisation of RCB. on E. coli K-12: ~47x on ENC default, ~25x on
FOP/CAI/tAI/nTE, ~40x on ENC with bg_correction=True, 1.7x on RCB.
public APIs are unchanged; two opt-in code paths have small behaviour
changes (see breaking).
big thanks to @RedPenguin100, who set up the regression test suite and CI that this release was built on, and kicked off the numpy migration with the first ENC rewrite (PR #13) that everything else followed from.
performance
scoring pipeline moved to numpy across ENC, FOP, CAI, tAI, nTE (#8). PR #13 by @RedPenguin100 rewrote
EffectiveNumberOfCodonson ndarrays and seeded the_count_singlenumpy return fork_mer=1; the #8 closing commits extended the same approach to FOP, CAI, tAI, and nTE, with weights pre-aligned at init.CodonCounter._count_single(k_mer=1) vectorised via a base-5 packed codon LUT and one numpy advanced-indexing op.ENC
bg_correction=Truevectorised (#18)._calc_BCCprecomputes a codon-to-base-index matrix;_calc_BNCreaches through toBaseCounter._count_single, skipping pandas scaffolding.RCB per-sequence background partially vectorised (refs #19).
score |
before |
after |
total |
|---|---|---|---|
ENC (default) |
~6.8 s |
146 ms |
~47x |
FOP / CAI / tAI / nTE |
~1.0 s |
40 ms |
~25x |
ENC (bg_correction=True) |
~9.8 s |
246 ms |
~40x |
RCB.get_score |
4.78 s |
2.85 s |
1.7x |
(of the ~40x on bg_correction=True, ~3x came from #8 - which left it
at 3.1 s, dominated by BaseCounter._count_single and _calc_BCC -
and the remaining 11.9x from #18.)
fixes
ENC weighted mean now filters undersampled amino acids (#15). previously, pseudocount-only AAs could dilute weighted scores toward zero; on the
bg_correctionpath, F=inf could poison degeneracy groups (silently capped by themin(len(P), ENC)guard).centralised in-frame codon iteration via
utils.iter_codons. fixes a latent crash inPermuter._preprocess_seqon non-multiple-of-3 input and tightensk_mer >= 2iteration so trailing partial k-mers are no longer emitted.PermuterRNG modernised (#17): per-group independent PCG64 streams vianp.random.default_rng, replacingnp.random.seedcalls that leaked into global state and re-seeded each group identically.fetch_GCN_from_GtRNAdbsets a descriptive User-Agent so GtRNAdb’s bot-filter no longer 403s fresh installs.
breaking changes
Permuterpermutation output is bit-exactly different for any givenrandom_state(#17). statistical correctness is improved (independent per-group streams) - downstream z-scores/p-values should be more accurate. permutations are stable for a given input but not against input perturbation (adding/removing sequences shifts group indices).get_vector/_calc_vectorfork_mer >= 2returns a vectork_mer - 1elements shorter (the previously-NaN trailing slots for partial k-mers are no longer emitted). element values at retained positions are identical. callers depending onlen(vector) == len(seq) // 3should update tolen(vector) == len(seq) // 3 - (k_mer - 1).ENC weighted-mean filter (#15) changes scores for
EffectiveNumberOfCodons(mean="weighted")withrobust=Falseand/orpseudocount=0on sequences with undersampled AAs. default configuration on well-sampled sequences is unaffected.
tooling
GitHub Actions CI with pytest regression tests on a vendored E. coli K-12 corpus (FOP, CAI, tAI, nTE, ENC), parallelised with caching (PRs #10, #12, #14 by @RedPenguin100).
rufffor formatting and lint, enforced on push (PR #16).
acknowledgements
@RedPenguin100 - landed the test suite and CI (#10, #12, #14) that made the rest of this release safe to do, and the first numpy ENC rewrite (#13) that the wider migration built on.
Full Changelog: https://github.com/alondmnt/codon-bias/compare/v0.3.5…v0.4.0
v0.3.5
Released on 2025-03-13T09:38:20Z
improved: input type validation
Full Changelog: https://github.com/alondmnt/codon-bias/compare/v0.3.4…v0.3.5
v0.3.4
Released on 2025-03-06T12:17:31Z
added: FrequencyOfOptimalCodons: weights arg (closes #5)
fixed: pandas v2 compatibility (closes #5)
improved: FrequencyOfOptimalCodons: default thresh 0.95 -> 0.8
Full Changelog: https://github.com/alondmnt/codon-bias/compare/v0.3.3…v0.3.4
v0.3.3
Released on 2025-02-11T02:04:11Z
What’s Changed
Use
np.prodinstead ofnp.productby @l-benedetti-insta in https://github.com/alondmnt/codon-bias/pull/4
New Contributors
@l-benedetti-insta made their first contribution in https://github.com/alondmnt/codon-bias/pull/4
Full Changelog: https://github.com/alondmnt/codon-bias/compare/v0.3.2…v0.3.3
v0.3.2
Released on 2024-12-20T04:55:25Z
What’s Changed
tAI fixes/improvements by @moritzburghardt in https://github.com/alondmnt/codon-bias/pull/3
new: optimize_s_values method
new: allow custom s_values
fixed: handle invalid literal for int in gtrnadb table
fixed: CERTIFICATE_VERIFY_FAILED error on GtRNAdb
New Contributors
@moritzburghardt made their first contribution in https://github.com/alondmnt/codon-bias/pull/3
Full Changelog: https://github.com/alondmnt/codon-bias/compare/v0.3.1…v0.3.2
v0.3.1
Released on 2023-06-14T15:53:08Z
Full Changelog: https://github.com/alondmnt/codon-bias/compare/0.3.0…v0.3.1
v0.3.0
Released on 2022-10-28T17:10:42Z
new:
new module:
optimizerswith the classes MaxWeight, MinWeight, and BalancedWeightnew module:
randomwith the classes Permuter, IntraSeqPermuter and IntraPosPermuteradded utils.ReferenceSelector
changed:
utils.translate returns a dataframe by default
fixed:
avoid numpy deprecation warning in VectorScore’s get_vector function
Full Changelog: https://github.com/alondmnt/codon-bias/compare/0.2.0…0.3.0
v0.2.0
Released on 2022-09-13T15:58:19Z
new:
added scores.NormalizedTranslationalEfficiency
added scores.CodonPairBias
added stats.BaseCounter for nucleotide and k-mer statistics across reading frames
added
k_merparameter to:stats.CodonCounter
scores.CodonAdaptationIndex
scores.EffectiveNumberOfCodons
pairwise.CodonUsageFrequency
added abstract class scores.WeightScore that computes a weight vector for each input sequence, with the following children:
scores.CodonPairBias
scores.EffectiveNumberOfCodons
scores.RelativeSynonymousCodonUsage
scores.RelativeCodonBiasScore
improved:
various improvements to scores.EffectiveNumberOfCodons
background correction
improved estimation
added count() method to counter classes
added
pseudocountparameter to models
Full Changelog: https://github.com/alondmnt/codon-bias/compare/0.1.0…0.2.0
v0.1.0
Released on 2022-08-27T19:54:54Z
First release.
stats.CodonCounter
scores.FrequencyOfOptimalCodons (FOP)
scores.RelativeSynonymousCodonUsage (RSCU)
scores.CodonAdaptationIndex (CAI)
scores.EffectiveNumberOfCodons (ENC)
scores.TrnaAdaptationIndex (tAI)
scores.RelativeCodonBiasScore (RCBS + DCBS)
pairwise.CodonUsageFrequency (CUFS)