Order of ids returned, v2/commerce/listings
Leighwyn McClendon.9346:
If anyone knows how the commerce API orders the data it returns, could they share?
Example, test link
I gave it five id’s not in numerical order. Specifically, they were ordered “1, 4, 5, 2, 3”
When I inspect the JSON returned, it gives me the values in “1, 2, 4, 5, 3” order. Still not sorted by ID, not in the order I gave them, and I can’t really find anything else that it might’ve sorted it by to achieve the order it gave them back in.
Ideas?
smiley.1438:
/dontcare?
*hint* http://phpjs.org/functions/array_multisort/ *hint*
Leighwyn McClendon.9346:
/dontcare?
I am struggling to take the high road with that reply, but here goes. The behavior was strange enough to make me curious. I thought that this was a good place to ask whether someone knew what might be going on behind the scenes to cause it.
Yes, there are ways to work around the seemingly random order that the API delivers the output in, ways that are obvious enough to not need hints. What wasn’t obvious is why the API might output in that order at all.
StevenL.3761:
My best guess is that listings are (lazily) loaded in parallel, so the order would be undefined. For practical purposes, that means that the list goes from listings with few offers to listings with lots of offers.
No idea if that’s how it really works.
Alcarin.9024:
I made a test to try to guess what is the order and why ( http://jsfiddle.net/Alcarin/gy13r52t/2/ ).
Results are that if you request 60 times 36088,46047,46054,45881,46045 you get:
30% of the times 36088,45881,46045,46047,46054
10% of the times 36088,45881,46047,46045,46054
10% of the times 36088,46054,45881,46045,46047
10% of the times 36088,45881,46045,46054,46047
40% of the times are splitted in other 14 different orders
So my guess here is that each request is split in different workers, each worker search for a single item, and the response is built when all workers have found their items. Backend database order should be ascended by item_id, so most of the times workers complete their job following that order, but if some workers is little busy, there may be a very little delay that cause the order to no longer be ascended by item_id.
So, don’t trust at all the response order. Reorder the response as suggested by smiley, or perform a search inside the response to get items in your desired order.
smiley.1438:
Thats the reason why i’d prefer the results indexed by their id instead of having the id as actual value.
{
"item_id": {
"key1": "value1"
...
},
...
}
Would actually save a lot hassle since you could directly access the objects and the order wouldn’t matter at all. (I’ve suggested that more than once…). Also, the identifier is passed as key to functions like JQuerys $.each() or prototype’s Enumerable.each() or could be as well used in foreach() loops in several C-like languages, so i wonder why this suboptimal response format was chosen.
darthmaim.6017:
Well, in most cases you are looping at least once over all items returned and you can build your dictionary there, so it in the end, it doesn’t even matter.
smiley.1438:
Thats what i currently do. However, having an associative array (“dictionary” as you call it…) in first place would save looping over it twice. In fact, you wouldn’t even need to loop over the results at all, but over your list of requested IDs and just assign the result values.
The API should not complicate stuff but be as convenient as possible. I highly doubt that not being able to directly access the results is convenient at all.
darthmaim.6017:
Well, it doesn’t matter if you loop over your id’s or the result, both is O(n) and should take exactly the same time. You can build your dictionary there to later access specific items of the result with a complexity of O(1). If you don’t want to access all items you request, you shouldn’t request them in the first place.
smiley.1438:
You got me wrong here. A little example to clarify:
$requested_items = [123, 456, 789];
$url = '.../v2/endpoint?'.http_build_query(['ids' => implode(',', $requested_items)]);
// -> request to $url, returns $result_items after json_decode()
foreach($requested_items as $id){
// do stuff
print_r($result_items[$id]);
}
This way it wouldn’t even matter when you requested the same id a couple times and got just one result for it (iirc someone complained about that the other day… (…))
darthmaim.6017:
Well, its just one line more to do it with the current api if you need to that:
$requestedIDs = [ 123, 456, 789 ];
$result = getItemsFromApi( $requestedIDs ); // builds the url, requests it and decodes the json
$result = array_column( $result, null, 'item_id' ); // this is new
foreach( $requestedIDs as $id ){
$item = $result[ $id ];
// do stuff
print_r($item);
}
smiley.1438:
(we’re not only speaking about php which has some neat functions to sort this out, my example was just in php because lazy)
(also: (PHP 5 >= 5.5.0), so array_column() may not be available to everyone)
darthmaim.6017:
Its the same for all other languages, even if those don’t have such a nice helper function built in, you can always write it yourself. Since querying the API will almost always be slower than what you are doing with the data once you got it, your code doesn’t even need to be super fast.
smiley.1438:
So why exactly should we go the extra mile in about every language? And where is the advantage to have the ID as actual value in an unordered array over having it as identifier of an associative array?
The API should not complicate stuff but be as convenient as possible. I highly doubt that not being able to directly access the results is convenient at all.
Pat Cavit.9234:
My best guess is that listings are (lazily) loaded in parallel, so the order would be undefined. For practical purposes, that means that the list goes from listings with few offers to listings with lots of offers.
No idea if that’s how it really works.
Yup, we make a bunch of async requests for the data and simply assemble it in the order it returns.
Now, as for object-vs-array:
Shoving results onto the end of an array as we get them back is easy and lets you do nice things like check the length of the array to see how many items you got back. It also means you can use useful array functionality like (JS examples) .some(), .every(), .reduce(), etc w/o first having to jump through a bunch of hoops.
Constant-time lookups of a specific data ID doesn’t feel like a worthwhile trade-off for an API that is generally intended to provide bulk data, and if you did want that you could ask the API for data for the specific ID you cared about.
I’m open to sorting by ID once we have all the data back if you’d find it useful, I don’t think it would help programmatic usage much but does make the data nicer to look at.
StevenL.3761:
Is the public API written in .NET? You might be able to get better performance if you use Task.WhenAny() to process requests as soon as they complete. Waiting for all requests to complete before handling their response means wasted CPU time. Client applications will benefit as well if they support streaming JSON while it is still being created.
Pat Cavit.9234:
Is the public API written in .NET? You might be able to get better performance if you use Task.WhenAny() to process requests as soon as they complete. Waiting for all requests to complete before handling their response means wasted CPU time. Client applications will benefit as well if they support streaming JSON while it is still being created.
We don’t use .NET & for simplicity’s sake we don’t stream responses right now. We’re processing each individual request as soon as it completes but we’re not blocking on the responses or anything. It works like nodejs, conceptually.
StevenL.3761:
Oooh I had no idea that nodejs works with IIS.
Pat Cavit.9234:
Oooh I had no idea that nodejs works with IIS.
It’s not nodejs, it just works like it. We use ARR as a load-balancer and HTTPS endpoint which is why you’re thinking we use IIS.
poke.3712:
Yup, we make a bunch of async requests for the data and simply assemble it in the order it returns.
Is that order consistent enough for paging though? I.e. when using the paging mechanism, can I be sure that I get every object at some point, or is it possible that some will be missed because the order for one page ended up being different than the other?
Pat Cavit.9234:
Yup, we make a bunch of async requests for the data and simply assemble it in the order it returns.
Is that order consistent enough for paging though? I.e. when using the paging mechanism, can I be sure that I get every object at some point, or is it possible that some will be missed because the order for one page ended up being different than the other?
We start w/ the complete list of IDs for paging, so yes. Each individual page may not have a guaranteed order, but it does have a guaranteed set of IDs that will be returned.