2422 Nucleotide and/or Amino Acid Sequence Disclosures in Patent Applications
37 C.F.R. 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications.
- (a) Nucleotide and/or amino acid sequences as used in §§ 1.821 through 1.825 are interpreted to mean an unbranched sequence of four or more amino acids or an unbranched sequence of ten or more nucleotides. Branched sequences are specifically excluded from this definition. Sequences with fewer than four specifically defined nucleotides or amino acids are specifically excluded from this section. "Specifically defined" means those amino acids other than "Xaa" and those nucleotide bases other than "n" defined in accordance with the World Intellectual Property Organization (WIPO) Handbook on Industrial Property Information and Documentation, Standard ST.25: Standard for the Presentation of Nucleotide and Amino Acid Sequence Listings in Patent Applications (1998), including Tables 1 through 6 in Appendix 2, herein incorporated by reference. (Hereinafter "WIPO Standard ST.25 (1998)''). This incorporation by reference was approved by the Director of the Federal Register in accordance with 5 U.S.C. 552(a) and 1 CFR part 51. Copies of WIPO Standard ST.25 (1998) may be obtained from the World Intellectual Property Organization; 34 chemin des Colombettes; 1211 Geneva 20 Switzerland. Copies of ST.25 may be inspected at the Patent Search Room; Crystal Plaza 3, Lobby Level; 2021 South Clark Place; Arlington, VA 22202. Copies may also be inspected at the Office of the Federal Register, 800 North Capitol Street, NW, Suite 700, Washington, DC. Nucleotides and amino acids are further defined as follows:
- (1) Nucleotides: Nucleotides are intended to embrace only those nucleotides that can be represented using the symbols set forth in WIPO Standard ST.25 (1998), Appendix 2, Table 1. Modifications, e.g., methylated bases, may be described as set forth in WIPO Standard ST.25 (1998), Appendix 2, Table 2, but shall not be shown explicitly in the nucleotide sequence.
- (2) Amino acids: Amino acids are those L-amino acids commonly found in naturally occurring proteins and are listed in WIPO Standard ST.25 (1998), Appendix 2, Table 3. Those amino acid sequences containing D-amino acids are not intended to be embraced by this definition. Any amino acid sequence that contains post-translationally modified amino acids may be described as the amino acid sequence that is initially translated using the symbols shown in WIPO Standard ST.25 (1998), Appendix 2, Table 3 with the modified positions; e.g., hydroxylations or glycosylations, being described as set forth in WIPO Standard ST.25 (1998), Appendix 2, Table 4, but these modifications shall not be shown explicitly in the amino acid sequence. Any peptide or protein that can be expressed as a sequence using the symbols in WIPO Standard ST.25 (1998), Appendix 2, Table 3 in conjunction with a description in the Feature section to describe, for example, modified linkages, cross links and end caps, non-peptidyl bonds, etc., is embraced by this definition.
- (b) Patent applications which contain disclosures of nucleotide and/or amino acid sequences, in accordance with the definition in paragraph (a) of this section, shall, with regard to the manner in which the nucleotide and/or amino acid sequences are presented and described, conform exclusively to the requirements of §§ 1.821 through 1.825.
- (c) Patent applications which contain disclosures of nucleotide and/or amino acid sequences must contain, as a separate part of the disclosure, a paper copy disclosing the nucleotide and/or amino acid sequences and associated information using the symbols and format in accordance with the requirements of §§ 1.822 and 1.823. This paper copy is hereinafter referred to as the "Sequence Listing." Each sequence disclosed must appear separately in the "Sequence Listing." Each sequence set forth in the "Sequence Listing" shall be assigned a separate sequence identifier. The sequence identifiers shall begin with 1 and increase sequentially by integers. If no sequence is present for a sequence identifier, the code "000" shall be used in place of the sequence. The response for the numeric identifier <160> shall include the total number of SEQ ID NOs, whether followed by a sequence or by the code "000."
- (d) Where the description or claims of a patent application discuss a sequence that is set forth in the "Sequence Listing" in accordance with paragraph (c) of this section, reference must be made to the sequence by use of the sequence identifier, preceded by "SEQ ID NO:" in the text of the description or claims, even if the sequence is also embedded in the text of the description or claims of the patent application.
- (e) A copy of the "Sequence Listing" referred to in paragraph (c) of this section must also be submitted in computer readable form in accordance with the requirements of § 1.824. The computer readable form is a copy of the "Sequence Listing" and will not necessarily be retained as a part of the patent application file. If the computer readable form of a new application is to be identical with the computer readable form of another application of the applicant on file in the Patent and Trademark Office, reference may be made to the other application and computer readable form in lieu of filing a duplicate computer readable form in the new application if the computer readable form in the other application was compliant with all of the requirements of these rules. The new application shall be accompanied by a letter making such reference to the other application and computer readable form, both of which shall be completely identified. In the new application, applicant must also request the use of the compliant computer readable "Sequence Listing" that is already on file for the other application and must state that the paper copy of the "Sequence Listing" in the new application is identical to the computer readable copy filed for the other application.
- (f) In addition to the paper copy required by paragraph (c) of this section and the computer readable form required by paragraph (e) of this section, a statement that the content of the paper and computer readable copies are the same must be submitted with the computer readable form, e.g., a statement that "the information recorded in computer readable form is identical to the written sequence listing."
- (g) If any of the requirements of paragraphs (b) through (f) of this section are not satisfied at the time of filing under 35 U.S.C. 111(a) or at the time of entering the national stage under 35 U.S.C. 371, applicant will be notified and given a period of time within which to comply with such requirements in order to prevent abandonment of the application. Any submission in reply to a requirement under this paragraph must be accompanied by a statement that the submission includes no new matter.
- (h) If any of the requirements of paragraphs (b) through (f) of this section are not satisfied at the time of filing an international application under the Patent Cooperation Treaty (PCT), which application is to be searched by the United States International Searching Authority or examined by the United States International Preliminary Examining Authority, applicant will be sent a notice necessitating compliance with the requirements within a prescribed time period. Any submission in reply to a requirement under this paragraph must be accompanied by a statement that the submission does not include matter which goes beyond the disclosure in the international application as filed. If applicant fails to timely provide the required computer readable form, the United States International Searching Authority shall search only to the extent that a meaningful search can be performed without the computer readable form and the United States International Preliminary Examining Authority shall examine only to the extent that a meaningful examination can be performed without the computer readable form.
37 CFR 1.821 incorporates by reference the World Intellectual Property Organization (WIPO) Handbook on Industrial Property Information and Documentation, Standard ST.25 (1998), including Tables 1 through 6 of Appendix 2. Copies may be obtained from the World Intellectual Property Organization; 34 chemin des Colombettes; 1211 Geneva 20 Switzerland. Copies may be inspected at the Patent Search Room; Crystal Plaza 3, Lobby Level; 2021 South Clark Place; Arlington, VA 22202. Copies may also be inspected at the Office of the Federal Register, 800 North Capitol Street, NW, Suite 700, Washington, DC 20408. These tables are reproduced below.
WIPO Standard ST.25 (1998), Appendix 2, Table 1, provides that the bases of a nucleotide sequence should be represented using the following one-letter code for nucleotide sequence characters:
Symbol | Meaning | Origin of designation |
a | a | adenine |
g | g | guanine |
c | c | cytosine |
t | t | thymine |
u | u | uracil |
r | g or a | purine |
y | t/u or c | pyrimidine |
m | a or c | amino |
k | g or t/u | keto |
s | g or c | strong interactions 3H-bonds |
w | a or t/u | weak interactions 2H-bonds |
b | g or c or t/u | not a |
d | a or g or t/u | not c |
h | a or c or t/u | not g |
v | a or g or c | not t, not u |
n | a or g or c or t/u, unknown, or other | any |
WIPO Standard ST.25 (1998), Appendix 2, Table 2, provides that modified bases may be represented as the corresponding unmodified bases in the sequence itself, if the modified base is one of those listed below and the modification is further described in the Feature section of the Sequence Listing. The codes from the list below may be used in the description (i.e., the specification and drawing, or in the Sequence Listing) but these codes may not be used in the sequence itself.
Symbol | Meaning |
ac4c | 4-acetylcytidine |
chm5u | 5-(carboxyhydroxymethyl)uridine |
cm | 2'-O-methylcytidine |
cmnm5s2u | 5-carboxymethylaminomethyl-2-thiouridine |
cmnm5u | 5-carboxymethylaminomethyluridine |
d | dihydrouridine |
fm | 2'-O-methylpseudouridine |
gal q | beta, D-galactosylqueuosine |
gm | 2'-O-methylguanosine |
i | inosine |
i6a | N6-isopentenyladenosine |
m1a | 1-methyladenosine |
m1f | 1-methylpseudouridine |
m1g | 1-methylguanosine |
m1i | 1-methylinosine |
m22g | 2,2-dimethylguanosine |
m2a | 2-methyladenosine |
m2g | 2-methylguanosine |
m3c | 3-methylcytidine |
m5c | 5-methylcytidine |
m6a | N6-methyladenosine |
m7g | 7-methylguanosine |
mam5u | 5-methylaminomethyluridine |
mam5s2u | 5-methoxyaminomethyl-2-thiouridine |
man q | beta, D-mannosylqueuosine |
mcm5s2u | 5-methoxycarbonylmethyl-2-thiouridine |
mcm5u | 5-methoxycarbonylmethyluridine |
mo5u | 5-methoxyuridine |
ms2i6a | 2-methylthio-N6-isopentenyladenosine |
ms2t6a | N-((9-beta-D-ribofuranosyl-2-methylthiopurine-6-yl)carbamoyl)threonine |
mt6a | N-((9-beta-D-ribofuranosylpurine-6-yl)N-methylcarbamoyl)threonine |
mv | uridine-5-oxyacetic acid-methylester |
o5u | uridine-5-oxyacetic acid |
osyw | wybutoxosine |
p | pseudouridine |
q | queuosine |
s2t | 5-methyl-2-thiouridine |
s2c | 2-thiocytidine |
s2t | 5-methyl-2-thiouridine |
s2u | 2-thiouridine |
s4u | 4-thiouridine |
t | 5-methyluridine |
t6a | N-((9-beta-D-ribofuranosylpurine-6-yl)-carbamoyl)threonine |
tm | 2'-O-methyl-5-methyluridine |
um | 2'-O-methyluridine |
yw | wybutosine |
x | 3-(3-amino-3-carboxy-propyl)uridine, (acp3)u |
WIPO Standard ST.25 (1998), Appendix 2, Table 3, provides that the amino acids should be represented using the following three-letter code with the first letter as a capital.
Symbol | Meaning |
Ala | Alanine |
Cys | Cysteine |
Asp | Aspartic Acid |
Glu | Glutamic Acid |
Phe | Phenylalanine |
Gly | Glycine |
His | Histidine |
Ile | Isoleucine |
Lys | Lysine |
Leu | Leucine |
Met | Methionine |
Asn | Asparagine |
Pro | Proline |
Gln | Glutamine |
Arg | Arginine |
Ser | Serine |
Thr | Threonine |
Val | Valine |
Trp | Tryptophan |
Tyr | Tyrosine |
Asx | Asp or Asn |
Glx | Glu or Gln |
Xaa | unknown or other |
WIPO Standard ST.25 (1998), Appendix 2, Table 4, provides that modified and unusual amino acids may be represented as the corresponding unmodified amino acids in the sequence itself if the modified or unusual amino acid is one of those listed below and the modification is further described in the Feature section of the Sequence Listing. The codes from the list below may be used in the description (i.e., the specification and drawings, or in Sequence Listing) but these codes may not be used in the sequence itself.
Symbol | Meaning |
Aad | 2-Aminoadipic acid |
bAad | 3-Aminoadipic acid |
bAla | beta-Alanine, beta-Aminopropionic acid |
Abu | 2-Aminobutyric acid |
4Abu | 4-Aminobutyric acid, piperidinic acid |
Acp | 6-Aminocaproic acid |
Ahe | 2-Aminoheptanoic acid |
Aib | 2-Aminoisobutyric acid |
bAib | 3-Aminoisobutyric acid |
Apm | 2-Aminopimelic acid |
Dbu | 2,4-Diaminobutyric acid |
Des | Desmosine |
Dpm | 2,2' -Diaminopimelic acid |
Dpr | 2,3-Diaminopropionic acid |
EtGly | N-Ethylglycine |
EtAsn | N-Ethylasparagine |
Hyl | Hydroxylysine |
aHyl | allo-Hydroxylysine |
3Hyp | 3-Hydroxyproline |
4Hyp | 4-Hydroxyproline |
Ide | Isodesmosine |
aIle | allo-Isoleucine |
MeGly | N-Methylglycine, sarcosine |
MeIle | N-Methylisoleucine |
MeLys | 6-N-Methyllysine |
MeVal | N-Methylvaline |
Nva | Norvaline |
Nle | Norleucine |
Orn | Ornithine |
WIPO Standard ST.25 (1998), Appendix 2, Table 5, provides for feature keys related to DNA sequences.
Key | Description |
---|---|
allele | a related individual or strain contains stable, alternative forms of the same gene which differs from the presented sequence at this location (and perhaps others) |
attenuator | (1) region of DNA at which regulation of termination of transcription occurs, which controls the expression of some bacterial operons; (2) sequence segment located between the promoter and the first structural gene that causes partial termination of transcription |
C_region | constant region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; includes one or more exons depending on the particular chain |
CAAT_signal | CAAT box; part of a conserved sequence located about 75 bp up-stream of the start point of eukaryotic transcription units which may be involved in RNA polymerase binding; consensus=GG (C or T) CAATCT |
CDS | coding sequence; sequence of nucleotides that corresponds with the sequence of amino acids in a protein (location includes stop codon); feature includes amino acid conceptual translation |
conflict | independent determinations of the "same" sequence differ at this site or region |
D-loop | displacement loop; a region within mitochondrial DNA in which a short stretch of RNA is paired with one strand of DNA, displacing the original partner DNA strand in this region; also used to describe the displacement of a region of one strand of duplex DNA by a single stranded invader in the reaction catalyzed by RecA protein |
D-segment | diversity segment of immunoglobulin heavy chain, and T-cell receptor beta chain |
enhancer | a cis-acting sequence that increases the utilization of (some) eukaryotic promoters, and can function in either orientation and in any location (upstream or downstream) relative to the promoter |
exon | region of genome that codes for portion of spliced mRNA; may contain 5'UTR, all CDSs, and 3'UTR |
GC_signal | GC box; a conserved GC-rich region located upstream of the start point of eukaryotic transcription units which may occur in multiple copies or in either orientation; consensus=GGGCGG |
gene | region of biological interest identified as a gene and for which a name has been assigned |
iDNA | intervening DNA; DNA which is eliminated through any of several kinds of recombination |
intron | a segment of DNA that is transcribed, but removed from within the transcript by splicing together the sequences (exons) on either side of it |
J_segment | joining segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains |
LTR | long terminal repeat, a sequence directly repeated at both ends of a defined sequence, of the sort typically found in retroviruses |
mat_peptide | mature peptide or protein coding sequence; coding sequence for the mature or final peptide or protein product following post-translational modification; the location does not include the stop codon (unlike the corresponding CDS) |
misc_binding | site in nucleic acid which covalently or non-covalently binds another moiety that cannot be described by any other Binding key (primer_bind or protein_bind) |
misc_difference | feature sequence is different from that presented in the entry and cannot be described by any other Difference key (conflict, unsure, old_sequence, mutation, variation, allele, or modified_base) |
misc_feature | region of biological interest which cannot be described by any other feature key; a new or rare feature |
misc_recomb | site of any generalized, site-specific or replicative recombination event where there is a breakage and reunion of duplex DNA that cannot be described by other recombination keys (iDNA and virion) or qualifiers of source key (/insertion_seq, /transposon, /proviral) |
misc_RNA | any transcript or RNA product that cannot be defined by other RNA keys (prim_transcript, precursor_RNA, mRNA, 5'clip, 3'clip, 5'UTR, 3'UTR, exon, CDS, sig_peptide, transit_peptide, mat_peptide, intron, polyA_site, rRNA, tRNA, scRNA, and snRNA) |
misc_signal | any region containing a signal controlling or altering gene function or expression that cannot be described by other Signal keys (promoter, CAAT_signal, TATA_signal, -35_signal, -10_signal, GC_signal, RBS, polyA_signal, enhancer, attenuator, terminator, and rep_origin) |
misc_structure | any secondary or tertiary structure or conformation that cannot be described by other Structure keys (stem_loop and D-loop) |
modified_base | the indicated nucleotide is a modified nucleotide and should be substituted for by the indicated molecule (given in the mod_base qualifier value) |
mRNA | messenger RNA; includes 5' untranslated region (5'UTR), coding sequences (CDS, exon) and 3' untranslated region (3'UTR) |
mutation | a related strain has an abrupt, inheritable change in the sequence at this location |
N_region | extra nucleotides inserted between rearranged immunoglobulin segments |
old_sequence | the presented sequence revises a previous version of the sequence at this location |
polyA_signal | recognition region necessary for endonuclease cleavage of an RNA transcript that is followed by polyadenylation; consensus=AATAAA |
polyA_site | site on an RNA transcript to which will be added adenine residues by post-transcriptional polyadenylation |
precursor_RNA | any RNA species that is not yet the mature RNA product; may include 5' clipped region (5'clip), 5' untranslated region (5'UTR), coding sequences (CDS, exon), intervening sequences (intron), 3' untranslated region (3'UTR), and 3' clipped region (3'clip) |
prim_transcript | primary (initial, unprocessed) transcript; includes 5' clipped region (5'clip), 5' untranslated region (5'UTR), coding sequences (CDS, exon), intervening sequences (intron), 3' untranslated region (3'UTR), and 3' clipped region (3'clip) |
primer_bind | non-covalent primer binding site for initiation of replication, transcription, or reverse transcription; includes site(s) for synthetic, for example, PCR primer elements |
promoter | region on a DNA molecule involved in RNA polymerase binding to initiate transcription |
protein_bind | non-covalent protein binding site on nucleic acid |
RBS | ribosome binding site |
repeat_region | region of genome containing repeating units |
repeat_unit | single repeat element |
rep_origin | origin of replication; starting site for duplication of nucleic acid to give two identical copies |
rRNA | mature ribosomal RNA; the RNA component of the ribonucleoprotein particle (ribosome) which assembles amino acids into proteins |
S_region | switch region of immunoglobulin heavy chains; involved in the rearrangement of heavy chain DNA leading to the expression of a different immunoglobulin class from the same B-cell |
satellite | many tandem repeats (identical or related) of a short basic repeating unit; many have a base composition or other property different from the genome average that allows them to be separated from the bulk (main band) genomic DNA |
scRNA | small cytoplasmic RNA; any one of several small cytoplasmic RNA molecules present in the cytoplasm and (sometimes) nucleus of a eukaryote |
sig_peptide | signal peptide coding sequence; coding sequence for an N-terminal domain of a secreted protein; this domain is involved in attaching nascent polypeptide to the membrane; leader sequence |
snRNA | small nuclear RNA; any one of many small RNA species confined to the nucleus; several of the snRNAs are involved in splicing or other RNA processing reactions |
source | identifies the biological source of the specified span of the sequence; this key is mandatory; every entry will have, as a minimum, a single source key spanning the entire sequence; more than one source key per sequence is permissable |
stem_loop | hairpin; a double-helical region formed by base-pairing between adjacent (inverted) complementary sequences in a single strand of RNA or DNA |
STS | Sequence Tagged Site; short, single-copy DNA sequence that characterizes a mapping landmark on the genome and can be detected by PCR; a region of the genome can be mapped by determining the order of a series of STSs |
TATA_signal | TATA box; Goldberg-Hogness box; a conserved AT-rich septamer found about 25 bp before the start point of each eukaryotic RNA polymerase II transcript unit which may be involved in positioning the enzyme for correct initiation; consensus=TATA(A or T)A(A or T) |
terminator | sequence of DNA located either at the end of the transcript or adjacent to a promoter region that causes RNA polymerase to terminate transcription; may also be site of binding of repressor protein |
transit_peptide | transit peptide coding sequence; coding sequence for an N-terminal domain of a nuclear-encoded organellar protein; this domain is involved in post-translational import of the protein into the organelle |
tRNA | mature transfer RNA, a small RNA molecule (75-85 bases long) that mediates the translation of a nucleic acid sequence into an amino acid sequence |
unsure | author is unsure of exact sequence in this region |
V_region | variable region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; codes for the variable amino terminal portion; can be made up from V_segments, D_segments, N_regions, and J_segments |
V_segment | variable segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; codes for most of the variable region (V_region) and the last few amino acids of the leader peptide |
variation | a related strain contains stable mutations from the same gene (for example, RFLPs, polymorphisms, etc.) which differ from the presented sequence at this location (and possibly others) |
3'clip | 3'-most region of a precursor transcript that is clipped off during processing |
3'UTR | region at the 3' end of a mature transcript (following the stop codon) that is not translated into a protein |
5'clip | 5'-most region of a precursor transcript that is clipped off during processing |
5'UTR | region at the 5' end of a mature transcript (preceding the initiation codon) that is not translated into a protein |
-10_signal | pribnow box; a conserved region about 10 bp upstream of the start point of bacterial transcription units which may be involved in binding RNA polymerase; consensus=TAtAaT |
-35_signal | a conserved hexamer about 35 bp upstream of the start point of bacterial transcription units; consensus=TTGACa [ ] or TGTTGACA [ ] |
WIPO Standard ST.25 (1998), Appendix 2, Table 6 provides for feature keys related to protein sequences .
Key | Description |
CONFLICT | different papers report differing sequences |
VARIANT | authors report that sequence variants exist |
VARSPLIC | description of sequence variants produced by alternative splicing |
MUTAGEN | site which has been experimentally altered |
MOD_RES | post-translational modification of a residue |
ACETYLATION | N-terminal or other |
AMIDATION | generally at the C-terminal of a mature active peptide |
BLOCKED | undetermined N- or C-terminal blocking group |
FORMYLATION | of the N-terminal methionine |
GAMMA-CARBOXYGLUTAMIC ACID HYDROXYLATION | of asparagine, aspartic acid, proline or lysine |
METHYLATION | generally of lysine or arginine |
PHOSPHORYLATION | of serine, threonine, tyrosine, aspartic acid or histidine |
PYRROLIDONE CARBOXYLIC ACID | N-terminal glutamate which has formed an internal cyclic lactam |
SULFATATION | generally of tyrosine |
LIPID | covalent binding of a lipidic moiety |
MYRISTATE | myristate group attached through an amide bond to the N-terminal glycine residue of the mature form of a protein or to an internal lysine residue |
PALMITATE | palmitate group attached through a thioether bond to a cysteine residue or through an ester bond to a serine or threonine residue |
FARNESYL | farnesyl group attached through a thioether bond to a cysteine residue |
GERANYL-GERANYL | geranyl-geranyl group attached through a thioether bond to a cysteine residue |
GPI-ANCHOR | glycosyl-phosphatidylinositol (GPI) group linked to the alpha-carboxyl group of the C-terminal residue of the mature form of a protein |
N-ACYL DIGLYCERIDE | N-terminal cysteine of the mature form of a prokaryotic lipoprotein with an amide-linked fatty acid and a glyceryl group to which two fatty acids are linked by ester linkages |
DISULFID | disulfide bond; the ‘FROM’ and ‘TO’ endpoints represent the two residues which are linked by an intra-chain disulfide bond; if the ‘FROM’ and ‘TO’ endpoints are identical, the disulfide bond is an interchain one and the description field indicates the nature of the cross-link |
THIOLEST | thiolester bond; the ‘FROM’ and ‘TO’ endpoints represent the two residues which are linked by the thiolester bond |
THIOETH | thioether bond; the ‘FROM’ and ‘TO’ endpoints represent the two residues which are linked by the thioether bond |
CARBOHYD | glycosylation site; the nature of the carbohydrate (if known) is given in the description field |
METAL | binding site for a metal ion; the description field indicates the nature of the metal |
BINDING | binding site for any chemical group (co-enzyme, prosthetic group, etc.); the chemical nature of the group is given in the description field |
SIGNAL | extent of a signal sequence (prepeptide) |
TRANSIT | extent of a transit peptide (mitochondrial, chloroplastic, or for a microbody) |
PROPEP | extent of a propeptide |
CHAIN | extent of a polypeptide chain in the mature protein |
PEPTIDE | extent of a released active peptide |
DOMAIN | extent of a domain of interest on the sequence; the nature of that domain is given in the description field |
CA_BIND | extent of a calcium-binding region |
DNA_BIND | extent of a DNA-binding region |
NP_BIND | extent of a nucleotide phosphate binding region; the nature of the nucleotide phosphate is indicated in the description field |
TRANSMEM | extent of a transmembrane region |
ZN_FING | extent of a zinc finger region |
SIMILAR | extent of a similarity with another protein sequence; precise information, relative to that sequence is given in the description field |
REPEAT | extent of an internal sequence repetition |
HELIX | secondary structure: Helices, for example, Alpha-helix, 3(10) helix, or Pi-helix |
STRAND | secondary structure: Beta-strand, for example, Hydrogen bonded beta-strand, or Residue in an isolated beta-bridge |
TURN | secondary structure: Turns, for example, H-bonded turn (3-turn, 4-turn, or 5-turn) |
ACT_SITE | amino acid(s) involved in the activity of an enzyme |
SITE | any other interesting site on the sequence |
INIT_MET | the sequence is known to start with an initiator methionine |
NON_TER | the residue at an extremity of the sequence is not the terminal residue; if applied to position 1, this signifies that the first position is not the N-terminus of the complete molecule; if applied to the last position, it signifies that this position is not the C-terminus of the complete molecule; there is no description field for this key |
NON_CONS | non consecutive residues; indicates that two residues in a sequence are not consecutive and that there are a number of unsequenced residues between them |
UNSURE | uncertainties in the sequence; used to describe region(s) of a sequence for which the authors are unsure about the sequence assignment |
FILING INTERNATIONALLY
The revisions to 37 CFR 1.821 through 1.825 are the result of an effort to harmonize the PTO, PCT, EPO and JPO Sequence Listing requirements to the extent possible. The requirements of WIPO Standard ST.25 are substantially identical to the requirements of 37 CFR 1.821 through 1.825. PatentIn Version 3.1 software, now available (see MPEP § 2430), generates sequence listings that meet all of the requirements of WIPO Standard ST.25 (1998). The requirements of 37 CFR 1.821 through 1.825, however, are less stringent than the requirements of WIPO Standard ST.25 (1998). Thus, applicants who wish to file in countries which adhere to WIPO Standard ST.25 (1998) should consider the following when not using PatentIn Version 3.1:
- (A) The WIPO Standard ST.25 (1998) does not permit submissions using a Macintosh computer;
- (B) The WIPO Standard ST.25 (1998) does not accept the range of media permitted by 37 CFR 1.821 through 1.825;
- (C) The answers in fields <221> and <222> must use selections from Tables 5 and 6 of WIPO Standard ST.25 (1998) to comply with that standard. The terms from these Tables are considered language neutral vocabulary;
- (D) Any free text in numeric identifier <223> of a Sequence Listing will not be translated and thus must also appear in the specification of applications filed under WIPO Standard ST.25 (1998) for compliance;
- (E) A CRF filed after the filing of an application under the PCT is not considered to be part of the disclosure and will not be published in the pamphlet;
- (F) Paragraph 39 of WIPO Standard ST.25 (1998) requires the specific wording "the information recorded on the form is identical to the written sequence listing"; and
- (G) WIPO Standard ST.25 (1998), paragraph 24, requires spaces between specified numeric identifiers in the Sequence Listing.
2422.01 Definitions of Nucleotide and/or Amino Acids for Purpose of Sequence Rules
37 CFR 1.821(a) presents a definition for "nucleotide and/or amino acid sequences." This definition sets forth limits, in terms of numbers of amino acids and/or numbers of nucleotides, at or above which compliance with the sequence rules is required. Nucleotide and/or amino acid sequences as used in 37 CFR 1.821 through 1.825 are interpreted to mean an unbranched sequence of four or more amino acids or an unbranched sequence of ten or more nucleotides. Branched sequences are specifically excluded from this definition. Sequences with fewer than four specifically defined nucleotides or amino acids are specifically excluded from this section. "Specifically defined" means those amino acids other than "Xaa" and those nucleotide bases other than "n" defined in accordance with the World Intellectual Property Organization (WIPO) Handbook on Industrial Property Information and Documentation, Standard ST.25: Standard for the Presentation of Nucleotide and Amino Acid Sequence Listings in Patent Applications (1998), including Tables 1 through 6 in Appendix 2 (see MPEP § 2422).
The limit of four or more amino acids was established for consistency with limits in place for industry database collections whereas the limit of ten or more nucleotides, while lower than certain industry database limits, was established to encompass those nucleotide sequences to which the smallest probe will bind in a stable manner. The limits for amino acids and nucleotides are also consistent with those established for sequence data exchange with the Japanese Patent Office and the European Patent Office.
37 CFR 1.821(a) (1) and 37 CFR 1.821(a) (2) present further definitions for those nucleotide and amino acid sequences that are intended to be embraced by the sequence rules. Situations in which the applicability of the rules are in issue will be resolved on a case-by-case basis.
Nucleotide sequences are further limited to those that can be represented by the symbols set forth in 37 CFR 1.822(b), which incorporates by reference WIPO Standard ST.25 (1998), Appendix 2, Table 1 (see MPEP § 2422). The presence of other than typical 5' to 3' phosphodiester linkages in a nucleotide sequence does not render the rules inapplicable. The Office does not want to exclude linkages of the type commonly found in naturally occurring nucleotides, e.g., eukaryotic end capped sequences.
Amino acid sequences are further limited to those listed in 37 CFR 1.822(b), which incorporates by reference WIPO Standard ST.25 (1998), Appendix 2, Table 3 (see MPEP § 2422), and those L-amino acids that are commonly found in naturally occurring proteins. The limitation to L-amino acids is based upon the fact that there currently exists no widely accepted standard nomenclature for representing the scope of amino acids encompassed by non-L-amino acids, and, as such, the process of meaningfully encoding these other amino acids for computerized searching and printing is not currently feasible. The presence of one or more D-amino acids in a sequence will exclude that sequence from the scope of the rules. (Voluntary compliance is, however, encouraged in these situations; the symbol "Xaa" can be used to represent D-amino acids.) The sequence rules embrace "[a]ny peptide or protein that can be expressed as a sequence using the symbols in WIPO Standard ST.25 (1998), Appendix 2, Table 3 in conjunction with a description in the Feature section to describe, for example, modified linkages, cross links and end caps, non-peptidyl bonds, etc." 37 CFR 1.821(a) (2).
With regard to amino acid sequences, the use of the terms "peptide or protein" implies, however, that the amino acids in a given sequence are linked by at least three consecutive peptide bonds. Accordingly, an amino acid sequence is not excluded from the scope of the rules merely due to the presence of a single non-peptidyl bond. If an amino acid sequence can be represented by a string of amino acid abbreviations, with reference, where necessary, to a features table to explain modifications in the sequence, the sequence comes within the scope of the rules. However, the rules are not intended to encompass the subject matter that is generally referred to as synthetic resins.
2422.02 The Requirement for Exclusive Conformance; Sequences Presented in Drawing Figures
37 CFR 1.821(b) requires exclusive conformance, with regard to the manner in which the nucleotide and/or amino acid sequences are presented and described, with the sequence rules for all applications that include nucleotide and amino acid sequences that fall within the definitions. This requirement is necessary to minimize any confusion that could result if more than one format for representing sequence data was employed in a given application. It is also expected that the required standard format will be more readily and widely accepted and adopted if its use is exclusive, as well as mandatory.
In view of the fact that many significant sequence characteristics may only be demonstrated by a figure, the exclusive conformance requirement of this section may be relaxed for drawing figures. This is especially true in view of the fact that the representation of double stranded nucleotides is not permitted in the "Sequence Listing" and many significant nucleotide features, such as "sticky ends" and the like, will only be shown effectively by reference to a drawing figure. Further, the similarity or homology between/among sequences can only be depicted in an effective manner in a drawing figure. Similarly, drawing figures are recommended for use with amino acid sequences to depict structural features of the corresponding protein, such as finger regions and Kringle regions. The situations discussed herein are given by way of example only and there may be many other reasons for relaxing the requirements of this section for the drawing figures. It should be noted, though, that when a sequence is presented in a drawing, regardless of the format or the manner of presentation of that sequence in the drawing, the sequence must still be included in the Sequence Listing and the sequence identifier ("SEQ ID NO:X") must be used, either in the drawing or in the Brief Description of the Drawings.
2422.03 The Requirements for a Sequence Listing and Sequence Identifiers; Sequences Embed- ded in Application Text; Vari- ants of a Presented Sequence
37 CFR 1.821(c) requires that applications containing nucleotide and/or amino acid sequences that fall within the above definitions, contain, as a separate part of the disclosure on paper or compact disc, a disclosure of the nucleotide and/or amino acid sequences, and associated information, using the format and symbols that are set forth in 37 CFR 1.822 and 37 CFR 1.823. This separate part of the disclosure is referred to as the "Sequence Listing." The "Sequence Listing" submitted pursuant to 37 CFR 1.821(c), whether on paper or compact disc, is the official copy of the "Sequence Listing."
37 CFR 1.821(c) requires that each sequence disclosed in the application appear separately in the "Sequence Listing," with each sequence further being assigned a sequence identification number, referred to as "SEQ ID NO." The sequence identifiers must begin with 1 and increase sequentially by integers. The requirement for sequence identification numbers, at a minimum, requires that each sequence be assigned a different number for purposes of identification. However, where practical and for ease of reference, sequences should be presented in the separate part of the application in numerical order and in the order in which they are discussed in the application.
If submitted on paper, the "Sequence Listing" is a separate part of the disclosure which must begin on a new page within the specification. A plurality of sequences may, if feasible, be presented on a single page; the separate presentation of both nucleotide and amino acid sequences on the same page is also permitted.
If the "Sequence Listing" is submitted on compact disc, the specification must contain an incorporation by reference of the material on the compact disc in a separate paragraph, identifying each compact disc by the names of the files contained on each of the compact discs, their date of creation and their sizes in bytes (37 CFR 1.52(e) ). The total number of compact discs including duplicates and the files on each compact disc shall be specified (37 CFR 1.77(b)(4) ). The compact disc used to submit the sequence listing may also contain table information if the table has more than 50 pages of text. See 37 CFR 1.823(a)(2) and 1.52(e)(1)(iii). The compact disc and duplicate copy must be labeled "Copy 1" and "Copy 2," respectively, and a statement stating that the copies are identical must be included. If the two compact discs are not identical, the Office will use the disc labeled "Copy 1" for further processing ( 37 CFR 1.52(e)(4) ). See also MPEP § 608.05.
The compact disc submitted under 37 CFR 1.821(c) may, if it contains no tables, be identical to the computer readable form (CRF) submitted under 37 CFR 1.821(e) and 37 CFR 1.824, if that CRF is submitted on a compact disc. Even if the compact discs submitted under both 37 CFR 1.821(c) and (e) are identical, each compact disc submitted under 37 CFR 1.821(c) must be submitted in duplicate, in addition to the CRF under 37 CFR 1.821(e).
The requirement for compliance in 37 CFR 1.821(c) is directed to "disclosures of nucleotide and/or amino acid sequences." (Emphasis added.) All sequence information, whether claimed or not, that meets the length thresholds in 37 CFR 1.821(a) is subject to the rules. The goal of the Office is to build a comprehensive database that can be used for, inter alia, the purpose of assessing the prior art. It is therefore essential that all sequence information, whether only disclosed or also claimed, be included in the database. In those instances in which prior art sequences are only referred to in a given application by name and a publication or accession reference, they need not be included as part of the "Sequence Listing," unless an examiner considers the referred- to sequence to be "essential material," per MPEP § 608.01(p). However, if the applicant presents the sequence as a string of particular bases or amino acids, it is necessary to include the sequence in the "Sequence Listing," regardless of whether the applicant considers the sequence to be prior art. In general, any sequence that is disclosed and/or claimed as a sequence, i.e., as a string of particular bases or amino acids, and that otherwise meets the criteria of 37 CFR 1.821(a), must be set forth in the "Sequence Listing."
It is generally acceptable to present a single, general sequence in accordance with the sequence rules and to discuss and/or claim variants of that general sequence without presenting each variant as a separate sequence in the "Sequence Listing." By way of example only, the following types of sequence disclosures would be treated as noted herein by the Office. With respect to "conservatively modified variants thereof" of a sequence, the sequences may be described as SEQ ID NO:X and "conservatively modified variants thereof," if desired. With respect to a sequence that "may be deleted at the C-terminus by 1, 2, 3, 4, or 5 residues," all of the implied variations do not need to be included in the "Sequence Listing." If such a situation were encompassed by the rules, it would introduce far too much complexity into the "Sequence Listing" and the Office's database. The possible mathematical variations that could result from this type of language could reasonably require a "Sequence Listing" that would be thousands of pages in length. In this latter example, only the undeleted sequence needs to be included in the "Sequence Listing," and the sequences may be described as SEQ ID NO:X from which deletions have been made at the C-terminus by 1, 2, 3, 4, or 5 residues. The Office's database will only contain the undeleted sequence.
37 CFR 1.821(d) requires the use of the assigned sequence identifier in all instances where the description or claims of a patent application discuss sequences regardless of whether a given sequence is also embedded in the text of the description or claims of an application. This requirement is also intended to permit references, in both the description and claims, to sequences set forth in the "Sequence Listing" by the use of assigned sequence identifiers without repeating the sequence in the text of the description or claims. Sequence identifiers can also be used to discuss and/or claim parts or fragments of a properly presented sequence. For example, language such as "residues 14 to 243 of SEQ ID NO:23" is permissible and the fragment need not be separately presented in the "Sequence Listing." Where a sequence is embedded in the text of an application, it must be presented in a manner that complies with the requirements of the sequence rules.
The rules do not alter, in any way, the requirements of 35 U.S.C. 112. The implementation of the rules has had no effect on disclosure and/or claiming requirements. The rules, in general, or the use of sequence identifiers throughout the specification and claims, specifically, should not raise any issues under 35 U.S.C. 112, first or second paragraphs. The use of sequence identification numbers (SEQ ID NO:X) only provides a shorthand way for applicants to discuss and claim their inventions. These identification numbers do not in any way restrict the manner in which an invention can be claimed.
2422.04 The Requirement for a Computer Readable Copy of the Official Copy of the Sequence Listing
37 CFR 1.821(e) requires the submission of a copy of the "Sequence Listing" in computer readable form. The information on the computer readable form will be entered into the Office’s database for searching and printing nucleotide and amino acid sequences. This electronic database will also enable the Office to exchange patented sequence data, in electronic form, with the Japanese Patent Office and the European Patent Office. It should be noted that the Office’s database complies with the confidentiality requirement imposed by 35 U.S.C. 122. Pending application sequences are maintained in the database separately from published or patented sequences. That is, the Office will not exchange or make public any information on any sequence until the patent application containing that information is published or matures into a patent, or as otherwise allowed by 35 U.S.C. 122.
The "Sequence Listing" submitted pursuant to 37 CFR 1.821(c), whether on paper or compact disc, is the official copy of the "Sequence Listing." However, the Office may permit correction of the official copy, at the least, during the pendency of a given application by reference to the computer readable copy thereof submitted pursuant to 37 CFR 1.821(e) if both the official copy and computer readable form were submitted at the time of filing of the application and the totality of the circumstances otherwise substantiate the proposed correction. A mere discrepancy between the official copy and the computer readable form may not, in and of itself, be sufficient to justify a proposed correction. In this regard, the Office will assume that the computer readable form has been incorporated by reference into the application when the official copy and computer readable form were submitted at the time of filing of the application. The Office will attempt to accommodate or address all correction issues, but it must be kept in mind that the real burden rests with the applicant to ensure that any discrepancies between the official copy and the computer readable form are eliminated or minimized. Applicants should be aware that there will be instances where the applicant may have to suffer the consequences of any discrepancies between the two.
The Office does not desire to be bound by a requirement to permanently preserve computer readable forms for support, priority or correction purposes. For example, the Office will make corrections, where appropriate, by reference to the computer readable form as long as the computer readable form is still available to the Office. However, once use of the computer readable form by the Office for processing has ended, i.e., once the Office has entered the data contained on the computer readable form into the appropriate database, the Office does not intend to further preserve the computer readable form submitted by the applicant.
2422.05 Reference to Previously Filed Identical Computer Readable Form; Continuing or Derivative Applications; Request for Transfer of Computer Readable Form
The last three sentences of 37 CFR 1.821(e) set forth the procedure to be followed when a computer readable form of a given application is identical with a computer readable form of another application. In that situation, an applicant may make reference to the other application and computer readable form therein in lieu of filing a duplicate computer readable form in the given application. That is, additional computer readable forms will not be required in derivative or continuing applications if the sequence information is exactly the same, i.e., with no additions or deletions, as that in a parent or previously filed application in which a complying computer readable form had been filed. If sequence information is deleted from or added to that submitted in a previously filed application, the procedure in this paragraph is not available and a new computer readable form is required. To take advantage of the procedure outlined in this section, applicants must request that the previously submitted sequence information be used in the given application. A letter must be submitted in the given application requesting use of the previously filed sequence information. The letter must completely identify the other application, by application number, and the computer readable form, by indicating whether it was the only computer readable form filed in that application or whether it was the second, or subsequent, computer readable form filed.
A sample letter requesting transfer of the previously filed sequence information is set forth below:
The paper or compact disc copy of the Sequence Listing in this application [application number], is identical to the computer readable copy of the Sequence Listing filed in application [application number], filed [date]. In accordance with 37 CFR 1.821(e), please use the [first-filed, last-filed or only, whichever is applicable] computer readable form filed in that application as the computer readable form for the instant application. It is understood that the Patent and Trademark Office will make the necessary change in application number and filing date for the instant application. A paper or compact disc copy of the Sequence Listing is [included in the originally-filed specification of the instant application, included in a separately filed preliminary amendment for incorporation into the specification, whichever is applicable].
2422.06 Requirement for Statement Regarding Content of Official and Computer Readable Copies of Sequence Listing
37 CFR 1.821(f) requires that the official "Sequence Listing" (submitted on paper or compact disc pursuant to 37 CFR 1.821(c) ) and computer readable copies of the "Sequence Listing" (submitted pursuant to 37 CFR 1.821(e) ) be accompanied by a statement that the content of the official and computer readable copies are the same, at the time when the computer readable form is submitted. Such a statement may be made by the applicant. See MPEP § 2428 for further information and Sample Statements.
2422.07 Requirements for Compliance, Statements Regarding New Matter, and Sanctions for Failure to Comply
37 CFR 1.821(g) requires compliance with the requirements of 37 CFR 1.821(b) through (f), as discussed above, if they are not satisfied at the time of filing under 35 U.S.C. 111(a) or at the time of entering the national stage of an international application under 35 U.S.C. 371, within the period of time set in a notice requiring compliance. Failure to comply will result in the abandonment of the application. Submissions in reply to requirements under this paragraph must be accompanied by a statement that the submission includes no new matter. Such a statement may be made by the applicant. Extensions of time in which to reply to a requirement under this paragraph are available pursuant to 37 CFR 1.136. When an action by the applicant is a bona fide attempt to comply with these rules and it is apparent that compliance with some requirement has inadvertently been omitted, the applicant may be given a new time period to correct the omission. See 37 CFR 1.135(c).
Provisional applications filed under 35 U.S.C. 111(b) need not comply with 37 CFR 1.821 through 1.825, however, applicants are encouraged to file a Sequence Listing as defined in 37 CFR 1.821(c) for ease of identification of the sequence information contained in the provisional application.
37 CFR 1.821(h) requires compliance with the requirements of 37 CFR 1.821(b) through (f), as discussed above, within the time period prescribed in a notice requiring compliance in an international application filed in the United States Receiving Office under the Patent Cooperation Treaty (PCT), if the above noted requirements are not satisfied at the time of filing. Submissions in reply to requirements under this paragraph must be accompanied by a statement that the submission does not include matter which goes beyond the disclosure in the international application as filed. Such a statement may be made by an applicant. International applications that fail to comply with any of the requirements of 37 CFR 1.821(b) -(f) will be searched to the extent possible without the benefit of the information in computer readable form. See PCT Administrative Instructions Section 513(c).
The requirement to submit a statement that a submission in reply to the requirements of this section does not include new matter or matter which goes beyond the disclosure in the application as filed is not the first instance in which the applicant has been required to ensure that there is not new matter upon amendment. The requirement is analogous to that found in 37 CFR 1.125 regarding substitute specifications. When a substitute specification is required because the number or nature of amendments would make it difficult to examine the application, the applicant must include a statement that the substitute specification includes no new matter. The necessity of requiring a substitute "Sequence Listing," or pages thereof, is similar to the necessity of requiring a substitute specification and, likewise, the burden is on the applicant to ensure that no new matter is added. Applicants have a duty to comply with the statutory prohibition (35 U.S.C. 132 and 35 U.S.C. 251 ) against the introduction of new matter.
It should be noted that the treatment accorded errors in sequencing or any other errors prior to the implementation date of the sequence rules will be no different for those applications filed on or after the implementation date of these rules. The correction of errors in sequencing or any other errors that are made in describing an invention are, as they have always been, subject to the statutory prohibition (35 U.S.C. 132 and 35 U.S.C. 251 ) against the introduction of new matter.
2422.08 Presumptions Regarding Compliance
Neither the presence nor absence of information which is not required under the sequence rules will create a presumption that such information is necessary to satisfy any of the requirements of 35 U.S.C. 112. Further, the grant of a patent on an application that is subject to 37 CFR 1.821 through 37 CFR 1.825 constitutes a presumption that the granted patent complies with the requirements of these rules.
2422.09 Box Sequence; Hand Delivery of Sequence Listings and Computer Readable Forms
To facilitate administrative processing of all papers and compact discs associated with sequence rule compliance, all computer readable forms, compact discs, fees, and papers accompanying them filed in the Office should be marked "Box SEQUENCE."
Correspondence relating to the sequence rules may also be hand-delivered to the Technology Center (TC). In cases of hand delivery to the Customer Service Window or to the TC, the compact disc, floppy disk or tape should be placed in a protective mailer labeled with at least the application number, if available. The labeling requirements of 37 CFR 1.52(e) and 1.824(a) (6) must also be complied with. The use of staples and clips, if any, should be confined to carefully attaching the mailer to the submitted papers without contact or compression of the magnetic media which may cause the disk or tape to be unreadable. In no situations should additional or complimentary electronic copies be delivered to examiners or other Office personnel.