Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The count method on strings, bytes, bytearray etc. can be significantly faster #120397

Closed
rhpvorderman opened this issue Jun 12, 2024 · 1 comment
Labels
performance Performance or resource usage type-feature A feature request or enhancement

Comments

@rhpvorderman
Copy link
Contributor

rhpvorderman commented Jun 12, 2024

Feature or enhancement

Proposal:

Counting single characters in a string is very useful. For instance calculating the GC content in a DNA sequence.

def gc_content(sequence: str) -> int:
    upper_seq = sequence.upper()
    a_count = upper_seq.count('A')
    c_count = upper_seq.count('C')
    g_count = upper_seq.count('G')
    t_count = upper_seq.count('T')
    # Unknown N bases should not influence the GC content, do not use len(sequence)
    total = a_count + c_count + g_count + t_count 
    return (c_count + g_count) / total

Another example would be counting newline characters.

The current code counts one character at the time.

static inline Py_ssize_t
STRINGLIB(count_char)(const STRINGLIB_CHAR *s, Py_ssize_t n,
                      const STRINGLIB_CHAR p0, Py_ssize_t maxcount)
{
    Py_ssize_t i, count = 0;
    for (i = 0; i < n; i++) {
        if (s[i] == p0) {
            count++;
            if (count == maxcount) {
                return maxcount;
            }
        }
    }
    return count;
}

By providing the appropriate hints to the compiler, the function can be sped up significantly.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

@vstinner
Copy link
Member

Implemend by change 2078eb4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants