Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyIter_Next has ambiguous return value #105201

Closed
iritkatriel opened this issue Jun 1, 2023 · 23 comments
Closed

PyIter_Next has ambiguous return value #105201

iritkatriel opened this issue Jun 1, 2023 · 23 comments
Assignees
Labels
topic-C-API type-feature A feature request or enhancement

Comments

@iritkatriel
Copy link
Member

iritkatriel commented Jun 1, 2023

As discussed in capi-workgroup/problems#1, we have some C API functions that have ambiguous return values, requiring the caller to query PyErr_Occurred() to find out whether there was an error.

We will try to move away from those APIs to alternative ones whose return values non-ambiguously indicate whether there has been an error, without requiring the user to call PyErr_Occurred().

In this issue we will discuss the iterator API. PyIter_Next return NULL for both error and for the iterator being exhausted. PyErr_Occurred() distinguishes between the cases.

Linked PRs

@erlend-aasland
Copy link
Contributor

Before we rush to introduce new APIs, can we clarify the following:

We will try to move away from those APIs to alternative ones whose return values non-ambiguously indicate whether there has been an error.

I interpret this as:

  • for function retuning int, 0 is a successful call, -1 is an error
  • for function returning PyObject pointer, NULL is an error, anything else is ok

Error means a raised exception.

@iritkatriel
Copy link
Member Author

I don't think it's a good API for an iterator if you need to check 2 things just to know whether iteration completed, and then after the loop you need to check again why it was completed.

@erlend-aasland
Copy link
Contributor

Perhaps iterator APIs should get their own issue, since they require ambiguous return values? It seems to me they don't fit in the premises established by the OP.

@iritkatriel
Copy link
Member Author

Sure, let make this issue about iterators.

@iritkatriel iritkatriel changed the title Replace C-API functions with ambiguous return values PyIter_Next has ambiguous return value Jun 2, 2023
@iritkatriel
Copy link
Member Author

My proposal is to add a new PyObject *PyIter_NextItem(PyObject* iter, int *err) which returns the same as PyIter_Next, but also sets *err to 0 in success and -1 on error.

The reason to do this and not int PyIter_NextItem(PyObject* iter, PyObject **item) is because of how this function is used while looping over an iterator: for each iteration we want to know whether we got another value or not. Only after (when we got a NULL) we want to check the error value to see how the iteration exited.

So I am proposing:

PyObject *item;
int err;
while (item = PyIter_NextItem(iterator, &err)) {
    /* do something with item */
    ...
    /* release reference when done */
    Py_DECREF(item);
}
Py_DECREF(iterator);
if (err < 0) {
    /* error */
}
else {
    /* no error */
}

rather than

PyObject *item;
int err;
while ((err = PyIter_NextItem(iterator, &item) ) == 0 && *item != NULL) {
  /* do something with item */
  ...
  /* release reference when done */
  Py_DECREF(item);
}
Py_DECREF(iterator);
if (err < 0) {
  /* error */
}
else {
  /* no error */
}

@iritkatriel
Copy link
Member Author

CC @encukou

@erlend-aasland
Copy link
Contributor

I still fail to see how the problem "API A has ambiguous return value" is solved by introducing API B which also has an ambiguous return value.

@iritkatriel
Copy link
Member Author

I’m giving up on incrementally fixing this in the current c api and will turn my attention to the new c api work. Those who warned me were right, unfortunately.

@iritkatriel iritkatriel closed this as not planned Won't fix, can't repro, duplicate, stale Jun 4, 2023
@iritkatriel
Copy link
Member Author

(See discussion on the attached PR for the full picture).

@gvanrossum
Copy link
Member

I’m giving up on incrementally fixing this in the current c api and will turn my attention to the new c api work. Those who warned me were right, unfortunately.

Isn't this overreacting? Ignoring Raymond's knee-jerk reaction, Erlend's feedback points to a real issue with how we've been talking about this, but is not a reason to give up on incremental fixes. The way I read Erlend's feedback is that the problem isn't that the result or return value is ambiguous, but that it requires calling PyErr_Occurred() to get the full picture. There are a few ways we can design the replacement for PyIter_Next() to avoid PyErr_Occurred(), and we simply have to look at the ergonomics to choose one. That seems a rational choice we can make.

@iritkatriel
Copy link
Member Author

iritkatriel commented Jun 5, 2023

Isn't this overreacting? [...] s not a reason to give up on incremental fixes.

I figured with two -1s and no other responses this would die due to "no consensus", and in general I think "is it worth it" discussions for incremental changes are always going to be opinionated/political, which is not really how I want to spend my time.

But I'll reopen the issue and PR so we can continue.

@iritkatriel iritkatriel reopened this Jun 5, 2023
@erlend-aasland
Copy link
Contributor

FWIW, here's my thoughts about this: I really liked an early variant of your PR where you had an int return value and an PyObject * item output param. IMO, that was an API that is less likely to be misused/misunderstood; the return value aligns with a well established C idiom1. I don't see it as a problem that looping will involva a few more lines of code. As I see it, the resulting "explicit code" is easier to read and reason about. If I can chose between an API designed to save lines in a while loop and an API designed to be as unambiguous as possible, I'll go for the latter any day. But API design is subjective, even though there certainly are aspects which can be discussed objectively.

Also, I appreciate that there is a dedicated place to record API problems and discuss them; I really appreciate Irit's hard work with managing all of this. API design is hard, and I think it is important to discuss proposed APIs thoroughly before landing a design2, rather than rushing that process and landing possibly premature solutions. I also understand that long discussions about new APIs can be discouraging.

Footnotes

  1. I believe the use of well established C idioms should be weighted in API design, as I believe it makes for APIs that are less likely to be misused. Maybe I'm wrong.

  2. Take a peak at the HPy repo and see how carefully they design every single API! Personally, I find it very inspiring to follow their discussions.

@markshannon
Copy link
Member

We should decide in general whether functions that effectively return a (error, value) pair should have the signature
int foo(..., PyObject **value) or PyObject *foo(..., int *err).

I strongly prefer the former: I think it makes for nicer flowing code, makes it harder to miss the error, and is (IMO) more idiomatic C.

As to what the return values should be:

  • -1 for an error. As we use that convention everywhere
  • 0 for a "lesser" result, which in this case is termination (for PyDict_GetItem, the case where the key is absent).
  • 1 for the "greater" result, the presence of a value (for PyDict_GetItem, the case where the key is present).

What is the "lesser" or "greater" result is somewhat subjective, but I think it makes for the least surprising API.

There are three cases to deal with, the return code should reflect that. Any code using the function will need to handle all three cases, with two tests, but should be able to test for the most common case with a single test. The trio of values, -1, 0, -1 allows that.

The version returning an int code can be used just as efficiently as the value returning a PyObject * value.

E.g. with int PyIter_NextItem(PyObject *iter, PyObject **next)

    while ((err = PyIter_NextItem(iterator, &next)) > 0) {
        use(next);
        Py_DECREF(next);
    }
    if (err <= 0) {
        /* Cleanup */
        return -1;
    }
    /* Done */

This sort of three value return is relatively common, so we should have a consistent, documented pattern for it.

@erlend-aasland
Copy link
Contributor

I agree with Mark on all points. I also think we should agree on and document the new API guidelines before solving any existing issues; that way, there will hopefully be fewer long and exhausting discussions for each new API.

@iritkatriel
Copy link
Member Author

@markshannon That's not a bad option, I tried it in an earlier iteration and I'm not sure I landed on anything better. How would you fix PyUnicode_Compare? It currently returns -1, 0 1 and we need to add a fourth value for error.

@erlend-aasland I appreciate your remarks, but let's not refer to what happened on that PR as "discussion". Children of an impressionable age have access to GitHub and they might be reading this. We should teach them better than that.

@markshannon
Copy link
Member

How would you fix PyUnicode_Compare?

PyUnicode_Compare can only fail because it has the wrong type signature.
Change it from int PyUnicode_Compare(PyObject *left, PyObject *right) to
int PyUnicode_Compare(PyUnicodeObject *left, PyUnicodeObject *right) and it becomes infallible.

@markshannon
Copy link
Member

markshannon commented Jun 6, 2023

Assuming you want to keep the more general signature, then yes it does need more return values.
There are in fact five possible return values: error, less, equal, more, unordered. The last for sets, floats, etc.
In which case I would use the following enum:

enum {
     ERROR = -1,
     UNORDERED = 1,
     LESS_THAN = 2,
     GREATER_THAN = 4
     EQUAL = 8
}

See https://github.com/python/cpython/blob/main/Include/internal/pycore_code.h#L469 for why this seemingly odd choice makes sense.

@encukou
Copy link
Member

encukou commented Jun 21, 2023

As to what the return values should be:

  • -1 for an error. As we use that convention everywhere
  • 0 for a "lesser" result, which in this case is termination (for PyDict_GetItem, the case where the key is absent).
  • 1 for the "greater" result, the presence of a value (for PyDict_GetItem, the case where the key is present).

That sounds like a good direction.
A general guideline should also suggest when *result should be set. For the “lesser” case, do we leave it untouched, set it to NULL, or say that users shouldn't use it? What about the “error” case?

For PyDict_GetItem, it seems ergonomic enough to do:

  • -1 for an error, everywhere. *result is undefined, may be garbage, don't touch it.
  • 0 for success
    • result is NULL, where the key is absent
    • result is non-NULL if found

In a PyWeakref_GetRef-style function, determining whether to return 1 or 0 is a (admittedly tiny) bit of extra work that most callers could ignore. Is it worth the consistency? Or should __next__-style function be the odd ones that return 1 for convenience in a while loop?

@markshannon
Copy link
Member

For PyDict_GetItem, I'd prefer to return 0 for not found, and 1 for found.
This seems more consistent; not found is the "lesser" result.
It also reduces memory traffic a bit, as *result is only touched if the value is found.

For example, implementing a version of dict.get, where the key is expected to be missing:

PyObject *
dict_get_probably_missing(PyDictObject *self, PyObject *key, PyObject *default)
{
   
   PyObject *val;
   int res = PyDict_GetItem(self, key, &val);
   if (res == 0) {
       return Py_NewRef(default);
   }
   if (res > 0) {
        return val;
   }
   else {
       return NULL;
   }
}

@markshannon
Copy link
Member

PyWeakref_GetRef does a lot of work to check that the weakref is not NULL and is a weak ref.
See capi-workgroup/problems#31

@erlend-aasland
Copy link
Contributor

Can we please open a devguide issue for establishing and documenting these API guidelines, and then move the general discussion we're now having over there? When that discussion has landed, and guidelines are documented, we can start continue fixing existing problematic APIs by adding new APIs according to the established guidelines.

@iritkatriel
Copy link
Member Author

I'm not going to pursue this.

@erlend-aasland
Copy link
Contributor

The C API WG voted on and agreed on the following API:

int PyIter_NextItem(PyObject *iter, PyObject **item)

  • Return -1 on error; raise an exception and set *item to NULL
  • Return 0 and set *item to NULL on StopIteration
  • Return 1 and set *item to a strong reference to the iterator item on success

iter and item cannot be NULL.

@erlend-aasland erlend-aasland self-assigned this Jul 24, 2024
erlend-aasland added a commit to erlend-aasland/cpython that referenced this issue Jul 26, 2024
Return -1 and set an exception on error; return 0 if the iterator is
exhausted, and return 1 if the next item was fetched successfully.

Prefer this API to PyIter_Next(), which requires the caller to use
PyErr_Occurred() to differentiate between iterator exhaution and errors.

Co-authered-by: Irit Katriel <iritkatriel@yahoo.com>
erlend-aasland added a commit that referenced this issue Aug 7, 2024
Return -1 and set an exception on error; return 0 if the iterator is
exhausted, and return 1 if the next item was fetched successfully.

Prefer this API to PyIter_Next(), which requires the caller to use
PyErr_Occurred() to differentiate between iterator exhaustion and errors.

Co-authered-by: Irit Katriel <iritkatriel@yahoo.com>
blhsing pushed a commit to blhsing/cpython that referenced this issue Aug 22, 2024
Return -1 and set an exception on error; return 0 if the iterator is
exhausted, and return 1 if the next item was fetched successfully.

Prefer this API to PyIter_Next(), which requires the caller to use
PyErr_Occurred() to differentiate between iterator exhaustion and errors.

Co-authered-by: Irit Katriel <iritkatriel@yahoo.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-C-API type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

5 participants