![]() |
|
---|
kkh/1995A/0204/DOCUMENT/A/95100001/95100001/95100061/95100065.TXT
(2) The format of document identifier
<DOC> Document <DOCNO> Document identifier (2) <TEXT> Text body <PASSAGE> Passage <PNUM> Passage identifier (3)
PATENT-JA-UPA-1993-123456 | | | | | | | | | +------ INID code 11 (publication number) | | | +------------ publication year | | +----------------- publication of unexamined patent application | +-------------------- Japanese patent +------------------------- patent document(3) A passage in a document is identified by the identifier of the document, suffixed with the serial number of the passage (starting with zero) in the document. Although passages are extracted from the specific fields, such as claims and detailed descriptions of the invention, any fields can be used for categorization purposes.
Only <DOC>, <DOCNO>, <TITLE>, <ABST>, <SPEC>, and <CLAIM> can be used for categorization purposes.
<DOC> Document <DOCNO> Document identifier (4) <APP-NO> Application number <APP-DATE> Application date <PUB-NO> Publication number <PUB-TYPE> Publication type <PAT-NO> Patent number <PAT-TYPE> Patent type (5) <PUB-DATE> Publication date <PRI-IPC> Primary IPC <IPC-VER> IPC version <PRI-USPC> Primary USPC <PRIORITY> Priority information <CITATION> Citation(s) (6) <INVENTOR> Inventor(s) <ASSIGNEE> Assignee(s) <TITLE> Title <ABST> Abstract <SPEC> Specification <CLAIM> Claim(s)
(4) The format of document identifier
PATENT-US-GRT-1993-123456 | | | | | | | | | +------ patent number | | | +------------ publication year | | +----------------- grant data | +-------------------- USPTO patent +------------------------- patent document(5) Patent Type
More than one citation is combined with tab (\t). Each citation consists of "patent number of cited patent/ date". "Patent number of cited patent" corresponds to <PAT_NO>. However, date information is incomplete.
<SYSTEM-ID> System identifier that is the same as the group ID <MODE> Category assignment mode: "full-auto", "semi-auto", or "manual" <SUBTASK> "English", "Japanese", "English to Japanese" (Cross-lingual), or "Japanese to English" (Cross-lingual) <DOC-TAG> List of document fields used for categorization purposes <TOPIC-DOCUMENT> Part of topic document used in categorization (e.g., "full text of patent application", "claim", "PAJ abstract", etc.) <TRAINING-CORPUS> List of data collections used for training purposes, including the sample topics. Please also describe how you used those data <MODEL> Name or short description of categorization model (e.g., "C4.5", "CART", "Neural Net", "Naive Bayes", "SVM", "KNN", etc.) <RESULT> Retrieval result in the TREC_EVAL format (7)
(7) Each line in the <RESULT> field is organized in the following format:
topic-id \t 0 \t IPC \t IPC-rank \t IPC-score \t run-id
The following example is a fragment of the file named "ntc7".
<SYSTEM-ID>ntc7</SYSTEM-ID> <MODE>full-auto</MODE> ... <RESULT> 1 0 G01N_29_24 1 9999 ntc7 1 0 A61B_8_00 2 9998 ntc7 1 0 G01N_29_22 3 9997 ntc7 ... 2 0 A61C_12_00 1 9999 ntc7 ... </RESULT>The maximum number of IPC codes for a single topic is 1000.
The submission deadline is June 13, 2008.
The evaluation results will be released by June 23, 2008. All run files submitted by all participant groups and the correct answers will be distributed to the participant groups that submit their run files on June 13, 2008(8). By using these data, you can evaluate and compare the effectiveness of your system and other systems. Please use this opportunity to present your research at international conferences and journals.
(8) All run files and the correct answers will not be distributed to the participant groups that do not submit their run files.Last modified on May 14, 2008
Back to the Web page of CFP