I am trying to figure out the number of birthDates that DBpedia contains. I am encountering two questions:

1) The statistics Web page says that there are 819,371 birth dates in the English DBpedia. However, when I run

grep -c "birthDate" mappingbased_literals_en.ttl

...I only get 630,419 results. I am most likely doing something wrong or I am looking at the wrong files. The infobox_properties_mapped_en.ttl gives me 112,362 results.

2) I would like to know the total number of subjects that have a birth date. My understanding is that I would have to merge all canonicalized datasets of all languages, and count the unique subjects. Is that right? Is there a simpler way to do it? (The statistics page seems to indicate only the sum of the birth dates, but the same subject could be mentioned in several localized DBpedias, right?)

I am very grateful for your help.

    CommentAdd your comment...

    2 answers


      Hi Robert,

      1) It seems that you're looking into the wrong or incomplete file. This one: http://downloads.dbpedia.org/2016-04/core/mappingbased_literals_en.ttl.bz2

      Returns exactly the same amount of statistics Web page:

      $ bzcat mappingbased_literals_en.ttl.bz2 | grep -c "birthDate"
      $ md5sum mappingbased_literals_en.ttl.bz2
      75869344c3e3706b0ed50debe1608968  mappingbased_literals_en.ttl.bz2

      2) If your objective is to find how many instances has birthDate among all chapters then yes, you are right, this would be a way. And yes, watch out for possible repetitions.

      1. Robert

        Thank you, Gustavo, for your helpful answer, and for even checking the count! It worked as you said.

      CommentAdd your comment...

      I'm trying this query:

      select (count(*)) {?x dbo:birthDate ?y}

      Can't figure any plausible explanation why dbpedia.org (which should have the latest official release http://downloads.dbpedia.org/2016-10/) would have 3x more than live.dbpedia.org (which should have up to the minute data). cc Ted Thibodeau Jr

      1. Ted Thibodeau Jr

        Updates to live.dbpedia.org (and to its cousin, dbpedia-live.openlinksw.com) seem to have stalled on July 11.  This is being investigated.

        The `count` difference flagged above suggests an issue (or change) with the extractors, with which I cannot provide much help.

      CommentAdd your comment...