Wikispooks talk:Semantic Mediawiki

From Wikispooks
Jump to navigation Jump to search

Approach

How about, before we do any more coding?

  1. A freeze on development of new material and a close look at what we already have
  2. Documentation of the system as it currently is
  3. Discussion about existing shortcomings and strongpoints
  4. Efforts to simplify/fix existing code

Robin (talk) 04:16, 17 December 2013 (GMT)

That look like a sound approach to me.
  1. New material - meaning new semantic and category material because I have to put up new Page and document 'content' when the muse strikes, :otherwise it goes down a memory hole.
  2. Documentation? - that's clearly your forte. I'm happy for you to lead and I'll chip in when I think it is needed.
  3. Discussion - new page(s) for the specific purpose?
  4. Simplify/Fix existing code - again, your forte and I'll chip in as and when.
--Peter P (talk) 10:29, 17 December 2013 (GMT)

Namespaces

How about using more namespaces to differentiate page subjects?

  Namespace     Mandatory  Template     Semantic  Form   Use
document: Template:Document Form:Document 3rd Party Publications
book: Template:Book Form:Book Articles about books
person: Template:Person Form:Person Articles about people
event: Template:Event Form:Event Articles about events

I like the semantic forms - a good help to people to create semantic data.

Robin (talk) 12:36, 13 December 2013 (GMT)

What's wrong with disciplined use of top-level categories to do the same thing - plus radical pruning of the category tree by replacing most sub-categories with properties - but better thought through? A category can be assigned a default form too. I confess to still struggling a little with the best/optimal way to differentiate Name-space, Category and Property usage though. For example, I was going to put timeline events in a separate namespace; even created it and put all the necessary stuff in LocalSettings.php. In the event I didn't use it (yet) and currently don't plan to. Beyond a certain clarity, all that user-defined namespaces do is place an additional word (with hidden number for DB usage) and colon before the page title which can also be a bit of a pain. And there are a lot of both 'Person' and 'Event' pages already in 'Main' - though most are assigned to sub-categories of Category:People and Category:Events. --Peter P (talk) 13:35, 13 December 2013 (GMT)
Oh, I hadn't noted that forms can be assigned to categories - so namespaces are not actually needed to do this. I think the clarity they afford may make them worthwhile in the end (today I edited [[Document:The Franklin Scandal], an introduction to Book - The Franklin Scandal about The Franklin Scandal). Though I made the Template:Doctypes categories today, this is a bit of a transitional step/use of legacy technology - the actual 'meaning' of category (~"is associated with") is a bit fuzzy, so longer term I think we're better off replacing them with more precise semantic alternatives. When we know enough about how to fit the bits together. Inspired by the success of SMW here I also added SMW to the UG Wiki today, so I should be getting a bit more experience on how to manage things. Robin (talk) 15:06, 13 December 2013 (GMT)
Probably best to let things settle down for a bit after all these major changes and upgrades. As you say get a bit more experience managing things. BTW - Your (almost tentative) suggestion of making Document subjects the name(s) of ordinary (Main NS) articles was inspired. It has the potential to tie all 3rd party content to crowd-sourced articles that can further develop, research and explain it. I can't help tinkering every time I see an edit on 'Recent Changes' to a document that takes me back to when I originally put it up. There's a lot of stuff here that would benefit from easy intuitive linking and browsing and SMW looks like an impressive - and fairly easy - way to realise it - to me anyway. --Peter P (talk) 16:59, 13 December 2013 (GMT)
The 'wait and see' approach to namespaces seems like a good one for now. Current arguments in favour:
  1. General Clarity (i.e. Human understandability of what means what)
  2. Ability to tie a form/template to an unmade page link (Event:Fall Of The Berlin Wall)
  3. Clarity in defining property usage, e.g. (Book:* has review Review:*) = spotting inconsistent SMW data use

Arguments against:

  1. More work to set it up, probability of difficult edge cases which belong in multiple/none of the distinct namespaces
  2. Not how Wikipedia does it

Robin (talk) 04:17, 26 December 2013 (GMT)

Tying an unmade page to a form is the one most likely to trump all the others. Creating a new page for the first time is probably the biggest hurdle for prospective new edtors and replete with issues at present. I have been mulling a major change to the main page, mainly to make the use of 'Form:Document' more obvious for example. But we still face the big issue of 'Documents' - ie category 'Doc' being in both 'NS:Document' and 'NS:Image/File'. Also MOST Books (ie not book reviews or extracts of books but complete books) are in 'NS:Files/Images'. Can't do much more 'till tomorrow - Merry Xmas --Peter P (talk) 08:11, 26 December 2013 (GMT)

More Thoughts

I think we should stick with the existing namespaces - at last for now and proceed as follows:

Have separate namespace templates and default forms for NS:Document and NS:File - DocProv as is, and FileProv cloned with any minor modifications required, plus Form:Document and a new Form:File. They will clearly be very similar - almost clones in fact. At present we cannot assign a default form (ie show the 'edit with form' option) for Files of category 'Doc' because 'Doc' also includes NS Document pages that already have a default form. The FileProv template and form need not cater for image files because they will only be used for editing pre-existing pages that are created by the regular file upload process.

I'm going to pitch in a vote for keeping these identical (i.e. with the forward, as currently is) unless I can find a good reason to split them. There is nothing to stop the different namespaces adding separate parameters if needed. But having multiple copies of the same code is asking for trouble - unnecessary work to update etc. Robin (talk) 02:16, 15 December 2013 (GMT)
OK. I've removed the 'Has default form Document' property from the Document namespace and added it to the 'Doc' category. The 'Edit with form' option now appears for ALL pages of Category:Doc, whatever namespace they are in. There are well over 600 such files. I reckon maybe 150 of their pages currently include even the 'FileProv' template so there's plenty of work for anyone who fancies making semantics available even for the few properties currently defined :-)) --Peter P (talk) 07:49, 15 December 2013 (GMT)

"We can then have as many separate template/form combinations in NS:Main as may be necessary for certain top-level categories (ie Books, Events, People etc) and there would be NO default form for NS:Main itself - just selected categories. Thoughts?" --Peter P (talk) 19:40, 14 December 2013 (GMT)

Sure - Main shouldn't have a default form, as it's a catch all category. Currently that just means non-documents, but in future, that might be 'everything else' after books, documents, event, people have been removed. I'm still leaning towards namespaces as a way to keep things clear. e.g. If someone makes a link "[[Event:Bay Of Pigs]]", it's clear what they mean, and someone clicking on that link can automatically get the right form to create it. But that's not to say that this is an priority - we can try this out with categories for now, still got some groundwork to do building the appropriate templates and semantic forms. Robin (talk) 02:16, 15 December 2013 (GMT)
OK with that and agree we should wait-see --Peter P (talk) 07:49, 15 December 2013 (GMT)
Also, on 'Books' there are already about 100 books in the NS:File and none anywhere else. They are (or should be) in Category:EBooks, though many remain uncategorised - ie full books not reviews or synopis or extracts or articles about books. Again, we need to be very clear about what we are defining with namespaces, categories and especially properties because there already a good few such articles in Category:Books - That's all illustrative of what I mean by the category tree being a mess - the result of being too ;fuzzy' about definitions --Peter P (talk) 08:09, 15 December 2013 (GMT)

Properties

I am a bit clearer about properties now. In choosing a property name it is important to be clear about what EXACTLY it describes - ie what it's subject and object are. For example, present use of "Is author" for Documents is semantically incorrect. A document isn't an author, it HAS an author. A person IS an author. So, I propose to change the 'DocProv' 'FileProv' 'IFrameDoc' templates - and any others that currently use 'Is author' to identify the content author - to Has author, using the standard RDF (Document/content)-subject (Has)-predicate author-(object) model. The 'Is author' property can remain for People who ARE authors - (Person)-subject (is)-predicate (author)-object. A document HAS author x because the Document is the subject and x is the object. If a property has type "Page" then a page (pre-existing or not) is it's object. All this seemed trivial to me until I assigned the property 'Is author' to the name 'William Blum' on the William Blum page and saw that that caused the page to be included in the list of his authored documents displayed by the template 'AuthorDocuments' (could live with it but a niggling inconsistency and probably the first of many if not nipped-in-the-bud). So I went and had a good read about RDF etc. I'd rather be picky close to the outset of SMW development than be faced with unraveling a mess (like the current category tree) later. All this also applies to 'Is publisher', 'Is recipient' and 'Is translator' - it's the document that HAS publisher, recipient and/or translator. The other current big use property ('Is about') is - thankfully - RDF correct.

I'd like to tackle this Sunday pm 15/12 before rebuilding all semantic data and trying to get Datastore 2 operational. However, I'd like your thoughts first - It can be left 'till next week otherwise --Peter P (talk) 19:06, 14 December 2013 (GMT)

Yes, defining RDF there is no room for ambiguity. 'Is author', 'is recipient' (& probably others) should be 'has author', 'has recipient'. Good work for spotting this. The sooner the better. As long as it's done through templates, it'll be easy to fix. Robin (talk) 01:45, 15 December 2013 (GMT)
I'll spend time on this pm today. I'll do the properties first. And - heads up - probably safest to disable editing whilst rebuilding the data Stores from scratch. That may have to wait until tomorrow depending on progress --Peter P (talk) 07:49, 15 December 2013 (GMT)

Cleaning unused properites

There must be ways to remove these unused properites - if needs be, using SQL queries. A higher level way is preferable of course, so possibly from Special:SMWAdmin. If that doesn't do it, it might be worth reading http://semantic-mediawiki.org/wiki/Help:Repairing_SMW%27s_data. Robin (talk) 12:23, 13 December 2013 (GMT)

That was my initial reaction on seeing them too, but inquiries to the SMW mailing list brought several replies from people with the same problem and none from the SMW admins/gurus with a possible solution - other than direct SQL querying with phpMyAdmin. I will have a good look at the 'repairing data' pages again though because site SMW functionality is bound to be compromised to some degree by still using 'Data Store 1'. What I have learned from the SMW list and site posts to date is that the SMW user-base is much smaller than I previously thought. That does not bother me too much because the developers are keen as mustard. Down side is have little time for non-geeks with what they probably see as elementary problems outside genuine and obvious bugs - of which there are many - but usually promptly fixed. --Peter P (talk) 13:12, 13 December 2013 (GMT)
Looks like the SMW underlying DB data can be rebuilt from scratch. Not just the SMW admin update script, but a delete the lot and rebuild approach. It's server intensive with potential for screw-ups too so I want to be quite sure before embarking on it. Also, if you have SMW on UG it is probably worth announcing it and joining the SMW list. With no obvious connection between us separate posts on issues may get more attention from the geeks who have the answers - Not that I wish to hide connections but let others make tham - something like that. --Peter P (talk) 17:10, 13 December 2013 (GMT)

Categories

These are so simple and lightweight, they'll probably always have a role, but I like the idea of replacing some of them with other structures. Robin (talk) 04:16, 17 December 2013 (GMT)

I agree, but I still struggle on how best to delineate their use (in the manner of a simple rule) from other structures - especially properties. I was leaning towards a rule which says Categories must always be a plural noun and apply strictly to its dictionary definition. ie in the case of 'Authors', it means 'Authors' and NOT (necessarily) 'Authors with work on WS' which strictly (semantically) speaking is a sub category. The problem with that case is that if a person mentioned in an article IS an author, that fact may be useful semantic information whether or not they have either a page or work on WS - and properties are the only way to cater for such a case. OK a red link to indicate a wanted page followed by creation of a relevant category does the trick but properties do it in one step if they are left with the default type of 'Page'. I've cogitated long and hard about this and am still waiting for a eureka moment --Peter P (talk) 07:32, 17 December 2013 (GMT)
I like the idea of fixing the singular/plural issue, or course. But that's probably about all I'd fix as a rule. I see categories as a fallback, a catch all which is better than nothing if no sharper organisational system is at hand. As such, we should stop both building making them!:) Instead, looking at what we can replace with properties/concepts is a good direction. Author categories are low hanging fruit. Template:AuthorDocuments doesn't use categories at all, does it - not even 'is author'? But it obsoletes all the individual author categories at a stroke. Are there other categories we can dispense with as easily? Robin (talk) 09:30, 17 December 2013 (GMT)

Concepts

These look worth trying out. Where to start? Robin (talk) 04:16, 17 December 2013 (GMT)

I agree that too. I'm well aware of them. My biggest problem is that I have been acutely aware of an ever more unruly category tree that has grown like noddy and is not MUCH use for browsing - and I'd like to try and avoid compounding that problem by doing the same with SMW. Still, I guess the only way to clarify the way ahead on all this IS to experiment. I'd rather not do so by irrevocably replacing existing structures until new ones are both proven and clear though - hence my 'Deprecation' of those named author categories in favour of the existing (and mistakenly created :-)) 'Is author'. It's easily reversed though I think it was probably the right thing to do. Don't forget that properties can be sub-properties too. --Peter P (talk) 07:49, 17 December 2013 (GMT)

Multiple Values

I dived into multiple values by hardcoding them:- Subject, Subject2, Subject3 etc.
This is OK as far as it goes - I mean, it works - but doesn't make for clean code, so it's not a good base to build on:- it's the hard work road rather than the hard thinking road. Before going any further, I've been checking alternatives, hence Form:Source which came from DiscourseDB, and I wonder whether the technology exists to choose a more robust approach. See http://www.mediawiki.org/wiki/Extension:Semantic_Forms/Semantic_Forms_and_templates#Multiple_values_for_the_same_field . The Special:ReplaceText is a great fallback for modifying the approach, so when the time comes, modifying the technology shouldn't be that tricky. Robin (talk) 03:49, 26 December 2013 (GMT)

In retrospect it might have been better to have studied SMW in more depth. Anyway, SMW is more functional than it was back then. 18 months on and I finally removed the last of the old Subject, Subject2, Subject3... and replaced them with Subjects as a comma-separated list. There remains the minor annoyance of wanting to use the comma other than as a list separator... But the code is way cleaner like this. :-) Robin (talk) 19:09, 21 June 2015 (IST)

Upgrade to SMW 1.9 RC1

Finally managed to do the install with the new, 'preferred' install method - ie using Composer. Initial problem was that it requires 'Git Tools' to be installed on the server and that was not apparent from the existing SMW and MW documentation. Once working, it really does make installs and upgrades a doddle since any and all dependencies are handled in one go. Also, it obviates the need for separate 'require' or 'include' statements in 'LocalSettings.php'. I've left them in but commented out to clarify. The data refresh updated 9,433 record ID's and has got rid of the annoying rogue black-text properties. It has also got rid of most duplicates but there are still 2 x entries for each used property, one set at zero uses and one showing the actual number of uses. I guess we'll have to live with that for a while. Special:Ask now works too.

ANY problems/issues, please post here. Reversion to the earlier version is still possible --Peter P (talk) 10:01, 1 January 2014 (GMT)