Data Types
StevenL.3761:
This has been suggested before (I don’t remember where), but can we get more information about the data types used by the backend’s database? Replicating a database without the original schema takes a lot of guesswork. I’m talking about data type lengths, nullability (is that a word?), default values, and also any other sort of column constraint.
Lawton Campbell.8517:
There isn’t an RDBMS behind the API, so there’s no schema per say. I think the quickest approach is to pull the entire dataset and tinker with it until it makes sense.
StevenL.3761:
Huh? There’s a better way to store all this data?
Lawton Campbell.8517:
Sorry, I meant to suggest using the existing data to infer what constraints can be applied. The backend doesn’t really follow any rules (in particular, string lengths are unbounded), so the existing data is the only thing we have to go by.
StevenL.3761:
Oh okay. Then it only makes sense for me to allow unbounded strings in my own database.
One more question: do you use a table per type or a table per hierarchy?
I’m asking because the items API has some… interesting… inheritance issues. Do you store all items in a single table regardless of the item type? Or do you have a table for each type? What about subtypes?
The most interesting of all items are probably the unlocker consumables. They have an ‘item_type’, a ‘consumable_type’ and an ‘unlock_type’.
Lawton Campbell.8517:
The API backend don’t have any tables whatsoever.
I’d probably store the attributes of each subtype in a separate table to maximize normalization. e.g., the items
table would just store the attributes common to all items; for items with a “consumable” type, the items_consumable
table would have the corresponding type-specific fields; and for consumables with an “unlock” type, the items_unlock_dye
or items_unlock_recipe
would have those details.
So for this somewhat contrived example, I would probably use something like —
CREATE TABLE items (
item_id INTEGER PRIMARY KEY,
item_name TEXT NOT NULL,
item_type INTEGER NOT NULL
-- etc
);
CREATE TABLE items_consumable (
item_id INTEGER PRIMARY KEY,
item_consumable_type INTEGER NOT NULL,
item_consumable_desc TEXT,
item_consumable_duration INTEGER
-- etc
);
CREATE TABLE items_unlock_dye (
item_id INTEGER PRIMARY KEY,
item_unlock_color INTEGER NOT NULL
);
(the types/nullability of these fields are from http://wiki.guildwars2.com/wiki/API:2/items)
And then, for example, https://api.guildwars2.com/v2/items/20357 would have the following values inserted into the tables:
items(item_id=20357, item_name="Ash Dye", item_type=Consumable)
items_consumable(item_id=20357, item_consumable_type=Unlock, ...)
items_unlock_dye(item_id=20357, item_unlock_color=474)
Alcarin.9024:
Months ago, I’ve done something similar to your suggestion Lawton, in about 10 hours of work (wiki was not as completed as it is now) I’ve created a DB with 29 tables without armors and weapons.
Amazed by the amount of work needed to complete the item DB (and by the amazing query needed to only get info about a single item), I’ve started to think “Ok, I’m doing something wrong”.
After some research I’ve found the solution, I’ve started to use a NoSQL DB that do not have a schema at all. With that kind of DB, I was able to create the PHP code to crawl and replicate GW2 DB in a couple of hours, query also was very easy with all kind of filtering by type/sub-type/sub-sub-type.
So StevenL, if you want to completly replicate GW2 DB go with a NoSQL DB, otherwise try to use as much as possible strings instead of subtables or enum, and create a lot of index. These are the easiest way.
Lawton Campbell.8517:
Months ago, I’ve done something similar to your suggestion Lawton, in about 10 hours of work (wiki was not as completed as it is now) I’ve created a DB with 29 tables without armors and weapons.
Huh, 29 tables seems a bit high. I’d love to take a look at your schema if you’ve still got it somewhere, as I’m probably forgetting something (working on a personal one right now, estimate around 19 tables).
A NoSQL solution is definitely more straightforward though, since the data format is so varied.
Alcarin.9024:
Yes 29 tables are a lot, but that’s what I got trying to allow flexibility on structure (allowing new types/sub-types to be added dinamically) and querying capability/rapidity (indexing only numeric values). So I’ve not used enum type (if you modify an enum type you need to rebuild the entire table), neither indexed string for types/sub-types, I’ve only used external table with foreign keys.
This is the link to the sql schema: https://drive.google.com/open?id=0B8vP2yhqUTYEb09sM3JiTmFBRDQ&authuser=0
I’ve also a MySQL Workbench file if you use that program: https://drive.google.com/open?id=0B8vP2yhqUTYEVmVnN08xcmM3YUE&authuser=0
A really important factor is the use that you need the database for. I was building the DB for a ItemDB website, and filtering by various type was a very crucial operation that need to be fast and fleasible. For many uses in example, it is less important indexing sub-types/sub-sub-types, so they can be a string instead of a foreign key…
Killer Rhino.6794:
I’d probably store the attributes of each subtype in a separate table to maximize normalization. e.g., the
items
table would just store the attributes common to all items; for items with a “consumable” type, theitems_consumable
table would have the corresponding type-specific fields; and for consumables with an “unlock” type, theitems_unlock_dye
oritems_unlock_recipe
would have those details.
This is the exact solution I came to in my own applications; gets the job done.
As an aside: thanks for your ongoing communications in this subforum, Lawton!
StevenL.3761:
Okay… now how do I create a view in SQL that flattens the table hierarchy?
I don’t know how to join the [items] table with one of the many sub-type tables based on the actual values in the [item_type] column.
Someone on stackoverflow suggested left outer joins, but it doesn’t look quite right to me
https://gist.github.com/StevenLiekens/4b66c22ffd1b92e5d498
Dr Ishmael.9685:
Outer joins are used when you want to append fields from another table, but you still want to retain all records from the driver table. (Inner joins only return records where the join key exists on both tables.) Any records where the join key doesn’t exist on the lookup table will have nulls appended.
However, constructing a single “master” view to flatten absolutely everything may not be the best approach. Outer joins are less efficient than inner joins, and stacking that many of them into one view will make it very sluggish.
My approach would be to construct separate views for each primary type, where I would use an inner join on item_id. Then I would use an outer join for any subtypes. This mockup assumes that all fields are named uniquely across your tables. If they’re not, you’ll have to replace the * with an explicit field list with aliases on the duplicate field names.
CREATE VIEW consumables_vw
AS
SELECT *
FROM [items]
INNER JOIN [items_consumable] ON [items].[item_id] = [items_consumable].[item_id]
LEFT OUTER JOIN [items_unlock_dye] ON [items_consumable].[item_id] = [items_unlock_dye].[item_id]
LEFT OUTER JOIN [items_unlock_recipe] ON [items_consumable].[item_id] = [items_unlock_recipe].[item_id]
I’m not sure why you have separate tables for dye and (I’m assuming) recipe unlocks, they don’t have any additional data structures.
StevenL.3761:
Dyes and recipes are one of the “interesting” inheritance issues that I was talking about in one of my previous posts. They don’t have their own data structure, but the ‘color_id’ field and the ‘recipe_id’ field depend on ‘unlock_type’ having a specific value.
With that in mind, you would want to split them out into separate tables just so that you can apply a NOT NULL constraint to these fields.
Dr Ishmael.9685:
Ah, I see. The only constraints I have in the database are on ID fields. The way I went with lower-level validations like that was to check them within my API-to-DB object builders and note any errata in a “warning” field on the primary record (items_tb.item_warnings, recipe_tb.recipe_warnings, etc.). This way my project is mostly immune to any bugs that slip into the API, but I can still find out about them.
Francis Crick.5021:
The design data doesn’t use anything like a relational database, as Lawton mentioned. It’s actually stored in a format fairly similar to what the API spits out, but we use guids to reference one object from another, and it’s quite a bit more verbose. I’ll talk to Lawton about what we can do to generate schemas that you might find useful. Everything does follow a schema, albeit an fairly complicated and ever evolving one.