/*%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% * $Id: convertmesh.pl,v 1.14 2007/02/19 17:05:38 mark Exp $ * * convertmesh.pl * Description: converts MeSH XML version into SKOS RDF/OWL. * * Author: Mark F.J. van Assem (mark@cs.vu.nl) * * %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% * * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Unpacking & Instructions * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * * - place in directory with sub-dirs 'src' and 'rdf' * - place desc2006.xml in 'src' dir * - place qual2006.xml in 'src' dir * - output is stored in 'rdf' directory * (directories and files are configurable, see below) * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Software Requirements * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * * Programmed for SWI-Prolog 5537 (develop release) * * http://www.swi-prolog.org/ * http://www.swi-prolog.org/packages/sgml2pl.html * * Programmed against MeSH XML version 2006. * http://www.nlm.nih.gov/mesh/ * http://www.nlm.nih.gov/mesh/filelist.html * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * To Do * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * * - Improve error checking * - Improve comments * - Process metadata completely * * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Main commands * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * * - go. * * Recommended usage: * * Not all PCs have enough mem to handle 'go' directly. Instead, recommended * usage is * - load convertmesh.pl and run 'parse1' * - kill the process after it finishes * - load convertmesh.pl and run 'parse2' * - kill the process after it finishes * - load convertmesh.pl and run 'create_qualifiers' * * * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Debugging * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * * debug(processing) Provides status information during the run of the program * (what UIs are being parsed and facts that are asserted etc.) * debug(bug1) For tracing a bug that caused 'D006005' and other descriptors * to not get a parent. * It turned out that remove_last not only removed the * last part of a tree number, but also any other occurences, * e.g. remove_last([a,b,c,a], [b,c]) instead of [a,b,c] * */ % descriptor_parents/4 is asserted/retracted during first parse to build % list of TreeNumbers of a Descriptors parents so that we can determine % their UI. We dont have them during the first parse, so we cannot assert % skos:broaders until we have the parent UIs, derived from the TreeNumbers of % parents. % state/2 is asserted and retracted during second parse % to keep track of the state of the program % and communicate information to later processes: when processing a tag for which % information encountered in a previous tag is needed. % if one state(...) tag is used it is easier to clean everything up when a whole % has been parsed and we want to move on to the next. :- dynamic state/2, descriptor_parents/4. % load the SWI-Prolog SGML library for parsing and rdf_db for asserting RDF. :- use_module(library('sgml')), use_module(library('semweb/rdf_db')). %%%%%%%%%%%%%%%%%%%%% set the namespaces %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% rdf_db:ns(mesh, 'http://www.nlm.nih.gov/mesh/2006#'). rdf_db:ns(skos, 'http://www.w3.org/2004/02/skos/core#'). %%%%%%%%% set the directories/filenames of input and output files %%%%%%%%%%%%%% src_dir(mesh, 'src'). % dont include slash at end! src_file(meshdescriptors, 'desc2006.xml'). %src_file(meshdescriptors, 'test-orphan.xml'). src_file(meshqualifiers, 'qual2006.xml'). out_dir(mesh, 'rdf'). out_file(meshdata, 'meshdata.rdf'). out_file(meshstructure, 'meshstructure.rdf'). out_file(meshqualifiers, 'meshqualifiers.rdf'). %%%%%%%%%%%%%% construct the complete path to output file %%%%%%%%%%%%%%%%%% % F is a code used to match to actual output file, see out_file/2. out_file_path(F, OutFilePath) :- out_dir(mesh, Dir), out_file(F, File), concat_atom([Dir, '/', File], OutFilePath). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % First parse % % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% /** descriptor_parents(MyUI, MyTreeNumbers, MyParentTreeNumbers, MyParentsUIs). (last three args are lists). Parse 1a: per assert(descriptor_parents(MyUI, MyTreeNumbers, MyParentTreeNumbers, [])). - MyUI is given - MyTreeNumbers is given - MyParentTreeNumbers can be found by splicing on last '.' in the number (if no '.' then it does not have a parent - ParentUIs cannot be asserted yet, that's the info we're after and will fill in parse 1b Parse 1b: - for each descriptor_parent(MyUI, MyTreeNumbers, MyParentTreeNumbers, []) that still has empty list of UI parents and MyParentTreeNumbers \== [] - do for each MyParentTreeNumber - find fact descriptor_parent(ItsUI, ItsTreeNumbers, _,_) where MyParentTreeNumber memberof ItsTreeNumbers - assert skos:broader between MyUI and ItsUI - retract original descriptor_parent and add new fact where MyParentUIs are filled **/ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % First parse 1a % % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% process_structure(File) :- open(File, read, In), new_sgml_parser(Parser, []), set_sgml_parser(Parser, file(File)), set_sgml_parser(Parser, dialect(xml)), sgml_parse(Parser, [ source(In), call(begin, on_begin1), call(end, on_end1) ]), close(In). on_end1('DescriptorRecord', _) :- debug(processing,'End tag of a DescriptorRecord encountered\n',[]), retractall(state(_,_)). on_begin1('DescriptorRecord', _, _) :- debug(processing,'Starting tag of a DescriptorRecord encountered\n',[]), assert(state('Status',in_descriptor)). on_begin1(Tag, Attr, Parser) :- state('Status', in_descriptor), !, sgml_parse(Parser, [ document(Content), parse(content) ]), % debug(processing, 'Going to try to process the Tag ~w with Attributes ~w and Content ~w',[Tag, Attr, Content]), process_descriptor1(element(Tag, Attr, Content)). process_descriptor1(element(Tag, _Attr, Cont)) :- Tag == 'DescriptorUI', nth1(1,Cont,DUI), % select first element from list; in the case of a DescriptorUI there should be only one element assert(state('DescriptorUI',DUI)), debug(processing,'Asserted state of DescriptorUI ~w',[DUI]) ; Tag == 'TreeNumberList', state('DescriptorUI', UI), % assuming the DescriptorUI was already encountered, else this fails wrongly! debug(processing,'Processing TreeNumberList of DescriptorUI ~w',[UI]), %( UI == 'D006005' % -> % gtrace %), process_TreeNumberList1(Cont, MyTreeNumbers), % Here, Cont is a list of element('TreeNumber',_,_) facts debug(processing,' TreeNrs. of this Descriptor: ~w',[MyTreeNumbers]), generate_parent_TreeNumbers(MyTreeNumbers, ParentTreeNumbers), debug(processing,' Parent TreeNrs. of this Descriptor: ~w',[ParentTreeNumbers]), assert(descriptor_parents(UI, MyTreeNumbers, ParentTreeNumbers, [])), debug(processing,' Asserted DescriptorParents ~w ~w ~w ~w ',[UI, MyTreeNumbers, ParentTreeNumbers, []]), retract(state('DescriptorUI',UI)). process_TreeNumberList1([], []). process_TreeNumberList1([element('TreeNumber',_,[TN])|Tail], [TN | TNs]) :- process_TreeNumberList1(Tail, TNs). generate_parent_TreeNumbers([], []). % case 1: HeadTN has a parent (it has a dot in its string) generate_parent_TreeNumbers([HeadTN|TailTN], [HeadParent|TailParent]) :- create_parent_treenumber(HeadTN,HeadParent), generate_parent_TreeNumbers(TailTN, TailParent). % case 2: HeadTN doesnot have a parent (it does not have a dot in its string), so skip it. generate_parent_TreeNumbers([HeadTN|TailTN], List) :- \+create_parent_treenumber(HeadTN,_), generate_parent_TreeNumbers(TailTN, List). % fails if there is no parent (no dot in 'TN') create_parent_treenumber(TN,Parent) :- concat_atom(List, '.', TN), % from TN = 'A01.123.321' produces List ['A01','123','321'] % ... but if it contains no '.', then the List has length 1. ( length(List,1), fail ; remove_last(List,ListWithoutLastEl), % insert '.' again between each element in ListWithoutLastEl and turn it into an atom again concat_atom(ListWithoutLastEl, '.', Parent) ). % remove last element in a list remove_last(List, ListWithoutLastEl) :- % L = [a,b,c] and Last=c last(List, Last), % RList = [c,b,a] reverse(List, RList), % append([c], L [c,b,a]) delivers L=[b,a] append([Last], RListWithout, RList), reverse(RListWithout, ListWithoutLastEl). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % First parse 1b % % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% /******************************************************************************* Parse 1b: - for each descriptor_parents(MyUI, MyTreeNumbers, MyParentTreeNumbers, []) that still has empty list of UI parents and MyParentTreeNumbers \== [] - do for each MyParentTreeNumber - find fact descriptor_parent(ItsUI, ItsTreeNumbers, _,_) where MyParentTreeNumber memberof ItsTreeNumbers - assert skos:broader between MyUI and ItsUI (- retract original descriptor_parents(...) and add new fact where MyParentUIs are filled) THIS IS NOT REQUIRED, FACT WILL NOT BE PROCESSED AGAIN ANYWAY IN ALGORITHM BELOW *******************************************************************************/ create_structure :- % select a fact that has not yet been processed (none has, so actually the [] is not necessary) descriptor_parents(MyUI, _MyTreeNrs, MyParentTreeNrs, []), debug(processing,'create structure for UI: ~w, going to retrieve UIs of parent tree nrs: ~w',[MyUI, MyParentTreeNrs]), %( % MyUI == 'D006005' % -> % debug(bug1, 'Attempting to create a structure for D006005, its treenrs are ~w',[MyTreeNrs]), % gtrace % ; % true %), % it might be the case that this descriptor has no parents, in which case % the get_parents will fail. In order to make sure that still the triple % MyUI, rdf:type, skos:Concept is created, we do this first create_descriptor_uri(MyUI, MyURI), rdf_assert(MyURI, rdf:type, skos:'Concept'), % get the UIs of descriptor_parents get_parents(MyParentTreeNrs, DescrUIs), debug(processing,'It has the following DescriptorUIs as parents: ~w',[DescrUIs]), assert_broaders(MyUI, DescrUIs), fail % fail to move to next descriptor_parents fact ; true. get_parents([], []). get_parents([TreeNr|TreeNrTail], [DescrUI|DescrUITail]) :- debug(processing, 'Going to find the parent of ~w',[TreeNr]), % find all descriptor_parents facts that have TreeNr in their TreeNrList matching_descriptor_parent(TreeNr, DescrUI), get_parents(TreeNrTail, DescrUITail). % given a ParentTreeNr, find the UI of a descriptor that matching_descriptor_parent(ParentTreeNr, UI) :- descriptor_parents(UI, ItsTreeNrs,ItsParentTreeNrs,ItsParentUIs), member(ParentTreeNr, ItsTreeNrs), debug(bug1, 'Found the descriptor_parents fact with variables UI, ItsTreeNrs, ItsParentTreeNrs, ItsParentUIs: ~w ~w ~w ~w',[UI, ItsTreeNrs,ItsParentTreeNrs,ItsParentUIs] ). % you have to assert that MyURI is a skos:Concept in the basic case instead % of in the recursive one, because % some descriptors may not have a parent! assert_broaders(MyUI,[]) :- create_descriptor_uri(MyUI, MyURI), rdf_assert(MyURI, rdf:type, skos:'Concept'). assert_broaders(MyUI, [Head|Tail]) :- create_descriptor_uri(MyUI, MyURI), create_descriptor_uri(Head, HeadURI), rdf_assert(MyURI, skos:broader, HeadURI), rdf_assert(HeadURI, rdf:type, skos:'Concept'), assert_broaders(MyUI, Tail). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % Second parse % % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% process_file(File) :- open(File, read, In), new_sgml_parser(Parser, []), set_sgml_parser(Parser, file(File)), set_sgml_parser(Parser, dialect(xml)), sgml_parse(Parser, [ source(In), call(begin, on_begin2), call(end, on_end2) ]), close(In). on_end2('DescriptorRecord', _) :- debug(processing,'End tag of a DescriptorRecord encountered\n',[]), retractall(state(_,_)). on_begin2('DescriptorRecord', _, _) :- debug(processing,'Starting tag of a DescriptorRecord encountered\n',[]), assert(state('Status', in_descriptor)). on_begin2(Tag, Attr, Parser) :- state('Status', in_descriptor), !, sgml_parse(Parser, [ document(Content), parse(content) ]), process_descriptor2(element(Tag, Attr, Content)). process_descriptor2(element(Tag, Attr, Content)) :- concat_atom(['process_', Tag, '2'], Predicate), call(Predicate, Attr, Content). process_DescriptorUI2(_Attr, [Cont]) :- debug(processing, 'process DescriptorUI',[]), assert(state('DescriptorUI', Cont)), create_descriptor_uri(Cont, URI), assert(state('DescriptorURI', URI)). process_DescriptorName2( _Attr, [element('String', [], [Cont])] ) :- debug(processing,' Processing DescriptorName ~w\n',[Cont]), state('DescriptorURI', URI), debug(term, 'Asserting preflabel: ~w', [Cont]), rdf_assert(URI, skos:prefLabel, literal(Cont)). process_DateCreated2(_Attr, [element('Year',[],[Year]), element('Month',[],[Month]), element('Day',[],[Day])]) :- debug(processing,' Processing DateCreated \n',[]), concat_atom([Year,Month,Day], '-', Date), state('DescriptorURI', URI), rdf_assert(URI, mesh:dateCreated, literal(Date)). process_DateRevised2(_Attr, [element('Year',[],[Year]), element('Month',[],[Month]), element('Day',[],[Day])]) :- debug(processing,' Processing DateRevised\n',[]), concat_atom([Year,Month,Day], '-', Date), state('DescriptorURI', URI), rdf_assert(URI, mesh:dateRevised, literal(Date)). process_DateEstablished2(_Attr, [element('Year',[],[Year]), element('Month',[],[Month]), element('Day',[],[Day])]) :- debug(processing,' Processing DateEstablished\n',[]), concat_atom([Year,Month,Day], '-', Date), state('DescriptorURI', URI), rdf_assert(URI, mesh:dateEstablished, literal(Date)). process_ActiveMeSHYearList2(_Attr,Cont) :- debug(processing,' Processing ActiveMeSHYearList\n',[]), state('DescriptorURI', URI), process_years(URI,Cont). process_years(URI, [element('Year',[],[Year])]) :- rdf_assert(URI, mesh:activeMeSHYear, literal(Year)). process_years(URI, [element('Year',[],[Year]) | Tail]) :- process_years(URI, [element('Year',[],[Year])] ), process_years(URI, Tail). process_AllowableQualifiersList2(_Attr,_Cont) :- debug(processing,' Processing AllowableQualifiersList\n',[]), state('DescriptorURI', _URI). process_HistoryNote2(_Attr,[Cont]) :- debug(processing,' Processing HistoryNote\n',[]), state('DescriptorURI', URI), rdf_assert(URI, mesh:historyNote, literal(Cont)). process_OnlineNote2(_Attr,[Cont]) :- debug(processing,' Processing OnlineNote\n',[]), state('DescriptorURI', URI), rdf_assert(URI, mesh:onlineNote, literal(Cont)). process_PublicMeSHNote2(_Attr,[Cont]) :- debug(processing,' Processing PublicMeSHNote\n',[]), state('DescriptorURI', URI), rdf_assert(URI, mesh:publicMeSHNote, literal(Cont)). process_PreviousIndexingList2(_Attr,Cont) :- debug(processing,' Processing PreviousIndexingList\n',[]), state('DescriptorURI', URI), process_previous_indexing(URI, Cont). process_previous_indexing(URI, [element('PreviousIndexing',[],[Note])]) :- rdf_assert(URI, skos:historyNote, literal(Note)). process_previous_indexing(URI, [element('PreviousIndexing',[],[Note]) | Tail]) :- process_previous_indexing(URI, [element('PreviousIndexing',[],[Note])] ), process_previous_indexing(URI, Tail). process_TreeNumberList2(_Attr,_Cont) :- debug(processing,' Processing TreeNumberList (done in 1st parse so do nothing now)\n',[]). process_RecordOriginatorsList2(_Attr,Cont) :- debug(processing,' Processing RecordOriginatorsList\n',[]), state('DescriptorURI', URI), process_record_originators(URI, Cont). % the RecordOriginatorsList contains either three or two subtags process_record_originators(URI,[element('RecordOriginator',[],[ONote]),element('RecordMaintainer',[],[MNote]),element('RecordAuthorizer',[],[ANote])]) :- rdf_assert(URI, mesh:recordOriginator, literal(ONote)), rdf_assert(URI, mesh:recordMaintainer, literal(MNote)), rdf_assert(URI, mesh:recordAuthorizer, literal(ANote)). process_record_originators(URI,[element('RecordOriginator',[],[ONote]),element('RecordAuthorizer',[],[ANote])]) :- rdf_assert(URI, mesh:recordOriginator, literal(ONote)), rdf_assert(URI, mesh:recordAuthorizer, literal(ANote)). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% process_ConceptList2(_Attr,Cont) :- debug(processing,' Processing ConceptList\n',[]), state('DescriptorURI', URI), process_concepts(URI, Cont). % preferred term process_concepts(URI, [element('Concept',Attr,Cont)]) :- member('PreferredConceptYN'='Y', Attr), debug(processing, 'Value PreferredConceptYN is yes, Attributes: ~w, Cont: ~w',[Attr,Cont]), process_scopenote(URI, Cont), member(element('TermList',_, TL), Cont), debug(processing, 'Value TermList: ~w',[TL]), process_termlist(URI, TL). % nonpref term process_concepts(URI, [element('Concept',Attr, Cont)]) :- member('PreferredConceptYN'='N', Attr), debug(processing, 'Value PreferredConceptYN is no, going to do terms',[]), member(element('TermList',_, TL), Cont), debug(processing, 'Value TermList: ~w',[TL]), process_termlist(URI, TL). process_concepts(URI, [element('Concept', Attr, Cont) | Tail]) :- process_concepts(URI, [element('Concept', Attr, Cont)]), process_concepts(URI, Tail). process_scopenote(URI, Cont) :- member(element('ScopeNote',[], [Note]), Cont), rdf_assert(URI, skos:scopeNote, literal(Note)) ; true. process_termlist(URI, [element('Term',Attr, Cont)]) :- debug(processing, 'Attributes of this Term: ~w',[Attr]), debug(processing, 'Values ConceptPreferredTermYN, IsPermutedTermYN: ~w~w',[Pref,Perm]), ( member('ConceptPreferredTermYN'=Pref, Attr), Pref == 'Y' % skip because is same term as DescriptorName ; member('ConceptPreferredTermYN'=Pref, Attr), member('IsPermutedTermYN'=Perm, Attr), Pref == 'N', Perm == 'N', member(element('String',[], [String]), Cont), debug(term, 'Asserting altLabel: ~w', [String]), rdf_assert(URI, skos:altLabel, literal(String)) ; member('IsPermutedTermYN'=Perm, Attr), Perm == 'Y', member(element('String',[], [String]), Cont), debug(term, 'Asserting hiddenLabel: ~w', [String]), rdf_assert(URI, skos:hiddenLabel, literal(String)) % case for processing Qualifiers, only IsPermutedTerm may not occur... % we assume then it is just an altLabel. ; member('ConceptPreferredTermYN'=Pref, Attr), Pref == 'N', member(element('String',[], [String]), Cont), debug(term, 'Asserting altLabel: ~w', [String]), rdf_assert(URI, skos:altLabel, literal(String)) ). process_termlist(URI, [element('Term',Attr, Cont) | Tail]) :- debug(processing, 'Going to process a Term with Attributes: ~w Contents: ~w ',[Attr,Cont]), process_termlist(URI, [element('Term',Attr, Cont)]), process_termlist(URI, Tail). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% process_Annotation2(_Attr,[Cont]) :- debug(processing,' Processing Annotation\n',[]), state('DescriptorURI', URI), rdf_assert(URI, skos:annotation, literal(Cont)). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% process_EntryCombinationList2(_Attr,Cont) :- debug(processing,' Processing EntryCombinationList\n',[]), process_entrycombinations(Cont). process_entrycombinations([element('EntryCombination',Attr, Cont)]) :- debug(debugging, '... Parsing EntryCombination: ~w ~w',[Attr,Cont]), get_source_entrycombination(Cont, URI1), get_target_entrycombination(Cont, URI2), rdf_assert(URI1, mesh:preferredCombination, URI2), debug(processing, 'Asserted a preferredCombination between ~w AND ~w ',[URI1, URI2]). process_entrycombinations([element('EntryCombination',Attr, Cont) | Tail]) :- debug(debugging, 'Going to process EntryCombination: ~w ~w',[Attr,Cont]), process_entrycombinations([element('EntryCombination',Attr, Cont)]), process_entrycombinations(Tail). get_source_entrycombination(Cont, URI) :- member(element('ECIN',_,ECIN), Cont), member(element('DescriptorReferredTo', _, Descr), ECIN), member(element('DescriptorUI',_,[DUI]), Descr), member(element('QualifierReferredTo', _, Qual), ECIN), member(element('QualifierUI',_,[QUI]), Qual), concat_atom([DUI, QUI], UI), create_descriptor_uri(UI, URI), create_descriptor_uri(DUI, DURI), rdf_assert(URI, rdf:type, mesh:'CompoundConcept'), rdf_assert(URI, skos:broader, DURI). % case 1: target is descriptor-qualifier combination get_target_entrycombination(Cont, URI) :- member(element('ECOUT',_,ECOUT), Cont), member(element('DescriptorReferredTo', _, Descr), ECOUT), member(element('DescriptorUI',_,[DUI]), Descr), member(element('QualifierReferredTo', _, Qual), ECOUT), member(element('QualifierUI',_,[QUI]), Qual), concat_atom([DUI, QUI], UI), create_descriptor_uri(UI, URI), create_descriptor_uri(DUI, DURI), rdf_assert(URI, rdf:type, mesh:'CompoundConcept'), rdf_assert(URI, skos:broader, DURI). % case 2: target is descriptor only get_target_entrycombination(Cont, URI) :- member(element('ECOUT',_,ECOUT), Cont), member(element('DescriptorReferredTo', _, Descr), ECOUT), member(element('DescriptorUI',_,[DUI]), Descr), \+member(element('QualifierReferredTo', _, _), ECOUT), create_descriptor_uri(DUI, URI). % no need to assert that this thing is a mesh:Descriptor, already done in first parse. % no need to assert that this thing is narrower than the descriptor referred to, % because no new mesh:Concept was made like in previous case %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% process_SeeRelatedList2(_Attr,Cont) :- debug(processing,' Processing SeeRelatedList\n',[]), state('DescriptorURI', URI), process_related(URI,Cont). process_related(URI, [element('SeeRelatedDescriptor',[],[element('DescriptorReferredTo',[],[element('DescriptorUI',[],[Cont]),_])])] ) :- create_descriptor_uri(Cont, RURI), rdf_assert(URI, skos:related, RURI). process_related(URI, [element('SeeRelatedDescriptor',[],[element('DescriptorReferredTo',[],[element('DescriptorUI',[],[Cont]),_])]) | Tail] ) :- process_related(URI, [element('SeeRelatedDescriptor',[],[element('DescriptorReferredTo',[],[element('DescriptorUI',[],[Cont]),_])])]), process_related(URI, Tail). process_RunningHead2(_Attr,[Cont]) :- debug(processing,' Processing RunningHead\n',[]), state('DescriptorURI', URI), rdf_assert(URI, mesh:runningHead, literal(Cont)). process_ConsiderAlso2(_Attr,[Cont]) :- debug(processing,' Processing ConsiderAlso\n',[]), state('DescriptorURI', URI), rdf_assert(URI, mesh:considerAlso, literal(Cont)). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % Create Qualifiers % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% create_qualifiers(File) :- open(File, read, In), new_sgml_parser(Parser, []), set_sgml_parser(Parser, file(File)), set_sgml_parser(Parser, dialect(xml)), sgml_parse(Parser, [ source(In), call(begin, on_begin3), call(end, on_end3) ]), close(In). on_end3('QualifierRecord', _) :- debug(processing,'End tag of a QualifierRecord encountered\n',[]), retractall(state(_,_)). on_begin3('QualifierRecord', _, _) :- debug(processing,'Starting tag of a QualifierRecord encountered\n',[]), assert(state('Status', in_qualifier)). on_begin3(Tag, Attr, Parser) :- state('Status', in_qualifier), !, sgml_parse(Parser, [ document(Content), parse(content) ]), process_qualifier2(element(Tag, Attr, Content)). process_qualifier2(element(Tag, Attr, Content)) :- concat_atom(['process_', Tag, '2'], Predicate), call(Predicate, Attr, Content). process_QualifierUI2(_Attr, [Cont]) :- debug(processing, 'process QualifierUI',[]), assert(state('DescriptorUI', Cont)), % fake that this is a descriptor so that existing predicates for processing descriptors can be used. create_descriptor_uri(Cont, URI), assert(state('DescriptorURI', URI)), rdf_assert(URI, rdf:type, mesh:'Qualifier'). process_QualifierName2( _Attr, [element('String', [], [Cont])] ) :- debug(processing,' Processing QualifierName ~w\n',[Cont]), state('DescriptorURI', URI), debug(term, 'Asserting preflabel: ~w', [Cont]), rdf_assert(URI, skos:prefLabel, literal(Cont)). % do nothing, we dont add this info in this version although it might be added. process_TreeNodeAllowedList2(_Attr, _Cont) :- debug(processing,' Processing TreeNodeAllowedList (do nothing)',[]). process_QualifierRecordSet2(_Attr, _Cont) :- debug(processing,' Processing QualifierRecordSet (do nothing)',[]). % % Just use all the same predicates as defined for descriptors as there is no % real difference in processing the tags with the same name in descriptor or % qualifiers. Only sneaky bit is to use the QualifierUI to generate a % state('DescriptorUI', URI) fact that is needed by the existing predicates. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% create_descriptor_uri(Id, URI) :- rdf_db:ns(mesh, NS), concat_atom([NS,Id], URI). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% go :- cleanup, % first parse parse1, % second parse cleanup, parse2, % add qualifier data cleanup, create_qualifiers. parse1 :- write('Going to process MeSH structure...\n'), src_file(meshdescriptors, File), src_dir(mesh, Dir), concat_atom([Dir, File], '/', Path), process_structure(Path), create_structure, write('Saving MeSH structure RDF to file...\n'), out_file_path(meshstructure, OutFileStruc), % get name of file to save RDF in debug(processing, 'Now going to save triples to file ~w',[OutFileStruc]), rdf_save(OutFileStruc,[document_language(en)]). % add lang attribute to whole doc parse2 :- write('Going to parse MeSH data...\n'), src_file(meshdescriptors, File1), src_dir(mesh, Dir1), concat_atom([Dir1, File1], '/', Path1), process_file(Path1), write('Saving MeSH Data RDF to file...\n'), out_file_path(meshdata, OutFileData), % get name of file to save RDF in rdf_save(OutFileData,[document_language(en)]). create_qualifiers :- src_file(meshqualifiers, File2), src_dir(mesh, Dir2), concat_atom([Dir2, File2], '/', Path2), create_qualifiers(Path2), write('Saving MeSH Qualifier Data RDF to file...\n'), out_file_path(meshqualifiers, OutFileData), % get name of file to save RDF in rdf_save(OutFileData,[document_language(en)]). cleanup :- retractall(state(_,_)), retractall(descriptor_parents(_,_,_,_)), rdf_retractall(_,_,_). % fin!