2422 Nucleotide and/or Amino Acid Sequence Disclosures in Patent Applications [R-10.2019]
37 C.F.R. 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications.
- (a) Nucleotide and/or amino acid sequences as used in §§ 1.821 through 1.825 are interpreted to mean an unbranched sequence of four or more amino acids or an unbranched sequence of ten or more nucleotides. Branched sequences are specifically excluded from this definition. Sequences with fewer than four specifically defined nucleotides or amino acids are specifically excluded from this section. "Specifically defined" means those amino acids other than "Xaa" and those nucleotide bases other than "n" defined in accordance with the World Intellectual Property Organization (WIPO) Handbook on Industrial Property Information and Documentation, Standard ST.25: Standard for the Presentation of Nucleotide and Amino Acid Sequence Listings in Patent Applications (1998), including Tables 1 through 6 in Appendix 2, herein incorporated by reference. (Hereinafter "WIPO Standard ST.25 (1998)''). This incorporation by reference was approved by the Director of the Federal Register in accordance with 5 U.S.C. 552(a) and 1 CFR part 51. Copies of WIPO Standard ST.25 (1998) may be obtained from the World Intellectual Property Organization; 34 chemin des Colombettes; 1211 Geneva 20 Switzerland. Copies may also be inspected at the National Archives and Records Administration (NARA). For information on the availability of this material at NARA, call 202-741-6030, or go to: http://www.archives.gov/federal_register/ code_of_federal_regulations/ ibr_locations.html . Nucleotides and amino acids are further defined as follows:
- (1) Nucleotides: Nucleotides are intended to embrace only those nucleotides that can be represented using the symbols set forth in WIPO Standard ST.25 (1998), Appendix 2, Table 1. Modifications, e.g., methylated bases, may be described as set forth in WIPO Standard ST.25 (1998), Appendix 2, Table 2, but shall not be shown explicitly in the nucleotide sequence.
- (2) Amino acids: Amino acids are those L-amino acids commonly found in naturally occurring proteins and are listed in WIPO Standard ST.25 (1998), Appendix 2, Table 3. Those amino acid sequences containing D-amino acids are not intended to be embraced by this definition. Any amino acid sequence that contains post-translationally modified amino acids may be described as the amino acid sequence that is initially translated using the symbols shown in WIPO Standard ST.25 (1998), Appendix 2, Table 3 with the modified positions; e.g., hydroxylations or glycosylations, being described as set forth in WIPO Standard ST.25 (1998), Appendix 2, Table 4, but these modifications shall not be shown explicitly in the amino acid sequence. Any peptide or protein that can be expressed as a sequence using the symbols in WIPO Standard ST.25 (1998), Appendix 2, Table 3 in conjunction with a description in the Feature section to describe, for example, modified linkages, cross links and end caps, non-peptidyl bonds, etc., is embraced by this definition.
- (b) Patent applications which contain disclosures of nucleotide and/or amino acid sequences, in accordance with the definition in paragraph (a) of this section, shall, with regard to the manner in which the nucleotide and/or amino acid sequences are presented and described, conform exclusively to the requirements of §§ 1.821 through 1.825.
- (c) Patent applications which contain disclosures of nucleotide and/or amino acid sequences must contain, as a separate part of the disclosure, a paper copy disclosing the nucleotide and/or amino acid sequences and associated information using the symbols and format in accordance with the requirements of §§ 1.822 and 1.823. This paper copy is hereinafter referred to as the "Sequence Listing." Each sequence disclosed must appear separately in the "Sequence Listing." Each sequence set forth in the "Sequence Listing" shall be assigned a separate sequence identifier. The sequence identifiers shall begin with 1 and increase sequentially by integers. If no sequence is present for a sequence identifier, the code "000" shall be used in place of the sequence. The response for the numeric identifier <160> shall include the total number of SEQ ID NOs, whether followed by a sequence or by the code "000."
- (d) Where the description or claims of a patent application discuss a sequence that is set forth in the "Sequence Listing" in accordance with paragraph (c) of this section, reference must be made to the sequence by use of the sequence identifier, preceded by "SEQ ID NO:" in the text of the description or claims, even if the sequence is also embedded in the text of the description or claims of the patent application.
- (e) A copy of the "Sequence Listing" referred to in paragraph (c) of this section must also be submitted in computer readable form in accordance with the requirements of § 1.824. The computer readable form is a copy of the "Sequence Listing" and will not necessarily be retained as a part of the patent application file. If the computer readable form of a new application is to be identical with the computer readable form of another application of the applicant on file in the Patent and Trademark Office, reference may be made to the other application and computer readable form in lieu of filing a duplicate computer readable form in the new application if the computer readable form in the other application was compliant with all of the requirements of these rules. The new application shall be accompanied by a letter making such reference to the other application and computer readable form, both of which shall be completely identified. In the new application, applicant must also request the use of the compliant computer readable "Sequence Listing" that is already on file for the other application and must state that the paper copy of the "Sequence Listing" in the new application is identical to the computer readable copy filed for the other application.
- (f) In addition to the paper copy required by paragraph (c) of this section and the computer readable form required by paragraph (e) of this section, a statement that the content of the paper and computer readable copies are the same must be submitted with the computer readable form, e.g., a statement that "the information recorded in computer readable form is identical to the written sequence listing."
- (g) If any of the requirements of paragraphs (b) through (f) of this section are not satisfied at the time of filing under 35 U.S.C. 111(a) or at the time of entering the national stage under 35 U.S.C. 371, applicant will be notified and given a period of time within which to comply with such requirements in order to prevent abandonment of the application. Any submission in reply to a requirement under this paragraph must be accompanied by a statement that the submission includes no new matter.
- (h) If any of the requirements of paragraphs (b) through (f) of this section are not satisfied at the time of filing an international application under the Patent Cooperation Treaty (PCT), which application is to be searched by the United States International Searching Authority or examined by the United States International Preliminary Examining Authority, applicant will be sent a notice necessitating compliance with the requirements within a prescribed time period. Any submission in reply to a requirement under this paragraph must be accompanied by a statement that the submission does not include matter which goes beyond the disclosure in the international application as filed. If applicant fails to timely provide the required computer readable form, the United States International Searching Authority shall search only to the extent that a meaningful search can be performed without the computer readable form and the United States International Preliminary Examining Authority shall examine only to the extent that a meaningful examination can be performed without the computer readable form.
37 CFR 1.821 incorporates by reference the World Intellectual Property Organization (WIPO) Handbook on Industrial Property Information and Documentation, Standard ST.25 (1998), including Tables 1 through 6 of Appendix 2. Copies may be obtained from the World Intellectual Property Organization; 34 chemin des Colombettes; 1211 Geneva 20 Switzerland. Copies may also be inspected at the Office of the Federal Register, 800 North Capitol Street, NW, Suite 700, Washington, DC 20408. These tables are reproduced below. The 1998 version of WIPO ST.25 is available online at www.wipo.int/standards/en/archives.html. Note that the standard was revised in December 2009, and the current version is available online at www.wipo.int/export/sites/www/standards/en/pdf/03-25-01.pdf.
WIPO Standard ST.25 (1998), Appendix 2, Table 1, provides that the bases of a nucleotide sequence should be represented using the following one-letter symbol for nucleotide sequence characters:
Symbol | Meaning | Origin of designation |
---|---|---|
a | a | adenine |
g | g | guanine |
c | c | cytosine |
t | t | thymine |
u | u | uracil |
r | g or a | purine |
y | t/u or c | pyrimidine |
m | a or c | amino |
k | g or t/u | keto |
s | g or c | strong interactions 3H-bonds |
w | a or t/u | weak interactions 2H-bonds |
b | g or c or t/u | not a |
d | a or g or t/u | not c |
h | a or c or t/u | not g |
v | a or g or c | not t, not u |
n | a or g or c or t/u, unknown, or other | any |
WIPO Standard ST.25 (1998), Appendix 2, Table 2, provides that modified bases may be represented as the corresponding unmodified bases in the sequence itself, if the modification is further described in numeric identifier <223> of the Feature section of the sequence listing. The symbols from the list below may be used in the description (i.e., the specification and drawing, or in the Feature section of the sequence listing) but these symbols may not be used in the sequence itself. Modifications not listed in Table 2 may also be represented as the corresponding unmodified base in the sequence itself, and the modification should be described using its full chemical name in the Feature section of the sequence listing.
Symbol | Meaning |
ac4c | 4-acetylcytidine |
chm5u | 5-(carboxyhydroxymethyl)uridine |
cm | 2'-O-methylcytidine |
cmnm5s2u | 5-carboxymethylaminomethyl-2-thiouridine |
cmnm5u | 5-carboxymethylaminomethyluridine |
d | dihydrouridine |
fm | 2'-O-methylpseudouridine |
gal q | beta, D-galactosylqueuosine |
gm | 2'-O-methylguanosine |
i | inosine |
i6a | N6-isopentenyladenosine |
m1a | 1-methyladenosine |
m1f | 1-methylpseudouridine |
m1g | 1-methylguanosine |
m1i | 1-methylinosine |
m22g | 2,2-dimethylguanosine |
m2a | 2-methyladenosine |
m2g | 2-methylguanosine |
m3c | 3-methylcytidine |
m5c | 5-methylcytidine |
m6a | N6-methyladenosine |
m7g | 7-methylguanosine |
mam5u | 5-methylaminomethyluridine |
mam5s2u | 5-methoxyaminomethyl-2-thiouridine |
man q | beta, D-mannosylqueuosine |
mcm5s2u | 5-methoxycarbonylmethyl-2-thiouridine |
mcm5u | 5-methoxycarbonylmethyluridine |
mo5u | 5-methoxyuridine |
ms2i6a | 2-methylthio-N6-isopentenyladenosine |
ms2t6a | N-((9-beta-D-ribofuranosyl-2-methylthiopurine -6-yl)carbamoyl)threonine |
mt6a | N-((9-beta-D-ribofuranosylpurine-6-yl) N-methylcarbamoyl)threonine |
mv | uridine-5-oxyacetic acid-methylester |
o5u | uridine-5-oxyacetic acid |
osyw | wybutoxosine |
p | pseudouridine |
q | queuosine |
s2t | 5-methyl-2-thiouridine |
s2c | 2-thiocytidine |
s2t | 5-methyl-2-thiouridine |
s2u | 2-thiouridine |
s4u | 4-thiouridine |
t | 5-methyluridine |
t6a | N-((9-beta-D-ribofuranosylpurine-6-yl)- carbamoyl)threonine |
tm | 2'-O-methyl-5-methyluridine |
um | 2'-O-methyluridine |
yw | wybutosine |
x | 3-(3-amino-3-carboxy-propyl)uridine, (acp3)u |
WIPO Standard ST.25 (1998), Appendix 2, Table 3, provides that the amino acids should be represented using the following three-letter symbols with the first letter as a capital.
Symbol | Meaning |
Ala | Alanine |
Cys | Cysteine |
Asp | Aspartic Acid |
Glu | Glutamic Acid |
Phe | Phenylalanine |
Gly | Glycine |
His | Histidine |
Ile | Isoleucine |
Lys | Lysine |
Leu | Leucine |
Met | Methionine |
Asn | Asparagine |
Pro | Proline |
Gln | Glutamine |
Arg | Arginine |
Ser | Serine |
Thr | Threonine |
Val | Valine |
Trp | Tryptophan |
Tyr | Tyrosine |
Asx | Asp or Asn |
Glx | Glu or Gln |
Xaa | unknown or other |
WIPO Standard ST.25 (1998), Appendix 2, Table 4, provides that modified and unusual amino acids may be represented as the corresponding unmodified amino acids in the sequence itself if the modification is further described in numeric identifier <223> of the Feature section of the sequence listing. The symbols from the list below may be used in the description (i.e., the specification and drawings, or in the Feature section of the sequence listing) but these symbols may not be used in the sequence itself. Modifications not listed in Table 4 may also be represented as the corresponding unmodified amino acid in the sequence itself, and the modification should be described using its full chemical name in the Feature section of the sequence listing.
Symbol | Meaning |
Aad | 2-Aminoadipic acid |
bAad | 3-Aminoadipic acid |
bAla | beta-Alanine, beta-Aminopropionic acid |
Abu | 2-Aminobutyric acid |
4Abu | 4-Aminobutyric acid, piperidinic acid |
Acp | 6-Aminocaproic acid |
Ahe | 2-Aminoheptanoic acid |
Aib | 2-Aminoisobutyric acid |
bAib | 3-Aminoisobutyric acid |
Apm | 2-Aminopimelic acid |
Dbu | 2,4-Diaminobutyric acid |
Des | Desmosine |
Dpm | 2,2' -Diaminopimelic acid |
Dpr | 2,3-Diaminopropionic acid |
EtGly | N-Ethylglycine |
EtAsn | N-Ethylasparagine |
Hyl | Hydroxylysine |
aHyl | allo-Hydroxylysine |
3Hyp | 3-Hydroxyproline |
4Hyp | 4-Hydroxyproline |
Ide | Isodesmosine |
aIle | allo-Isoleucine |
MeGly | N-Methylglycine, sarcosine |
MeIle | N-Methylisoleucine |
MeLys | 6-N-Methyllysine |
MeVal | N-Methylvaline |
Nva | Norvaline |
Nle | Norleucine |
Orn | Ornithine |
WIPO Standard ST.25 (1998), Appendix 2, Table 5, provides for feature keys related to DNA sequences.
Key | Description |
---|---|
allele | a related individual or strain contains stable, alternative forms of the same gene which differs from the presented sequence at this location (and perhaps others) |
attenuator | (1) region of DNA at which regulation of termination of transcription occurs, which controls the expression of some bacterial operons; (2) sequence segment located between the promoter and the first structural gene that causes partial termination of transcription |
C_region | constant region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; includes one or more exons depending on the particular chain |
CAAT_signal | CAAT box; part of a conserved sequence located about 75 bp up-stream of the start point of eukaryotic transcription units which may be involved in RNA polymerase binding; consensus=GG (C or T) CAATCT |
CDS | coding sequence; sequence of nucleotides that corresponds with the sequence of amino acids in a protein (location includes stop codon); feature includes amino acid conceptual translation |
conflict | independent determinations of the "same" sequence differ at this site or region |
D-loop | displacement loop; a region within mitochondrial DNA in which a short stretch of RNA is paired with one strand of DNA, displacing the original partner DNA strand in this region; also used to describe the displacement of a region of one strand of duplex DNA by a single stranded invader in the reaction catalyzed by RecA protein |
D-segment | diversity segment of immunoglobulin heavy chain, and T-cell receptor beta chain |
enhancer | a cis-acting sequence that increases the utilization of (some) eukaryotic promoters, and can function in either orientation and in any location (upstream or downstream) relative to the promoter |
exon | region of genome that codes for portion of spliced mRNA; may contain 5'UTR, all CDSs, and 3'UTR |
GC_signal | GC box; a conserved GC-rich region located upstream of the start point of eukaryotic transcription units which may occur in multiple copies or in either orientation; consensus=GGGCGG |
gene | region of biological interest identified as a gene and for which a name has been assigned |
iDNA | intervening DNA; DNA which is eliminated through any of several kinds of recombination |
intron | a segment of DNA that is transcribed, but removed from within the transcript by splicing together the sequences (exons) on either side of it |
J_segment | joining segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains |
LTR | long terminal repeat, a sequence directly repeated at both ends of a defined sequence, of the sort typically found in retroviruses |
mat_peptide | mature peptide or protein coding sequence; coding sequence for the mature or final peptide or protein product following post-translational modification; the location does not include the stop codon (unlike the corresponding CDS) |
misc_binding | site in nucleic acid which covalently or non-covalently binds another moiety that cannot be described by any other Binding key (primer_bind or protein_bind) |
misc_difference | feature sequence is different from that presented in the entry and cannot be described by any other Difference key (conflict, unsure, old_sequence, mutation, variation, allele, or modified_base) |
misc_feature | region of biological interest which cannot be described by any other feature key; a new or rare feature |
misc_recomb | site of any generalized, site-specific or replicative recombination event where there is a breakage and reunion of duplex DNA that cannot be described by other recombination keys (iDNA and virion) or qualifiers of source key (/insertion_seq, /transposon, /proviral) |
misc_RNA | any transcript or RNA product that cannot be defined by other RNA keys (prim_transcript, precursor_RNA, mRNA, 5'clip, 3'clip, 5'UTR, 3'UTR, exon, CDS, sig_peptide, transit_peptide, mat_peptide, intron, polyA_site, rRNA, tRNA, scRNA, and snRNA) |
misc_signal | any region containing a signal controlling or altering gene function or expression that cannot be described by other Signal keys (promoter, CAAT_signal, TATA_signal, -35_signal, -10_signal, GC_signal, RBS, polyA_signal, enhancer, attenuator, terminator, and rep_origin) |
misc_structure | any secondary or tertiary structure or conformation that cannot be described by other Structure keys (stem_loop and D-loop) |
modified_base | the indicated nucleotide is a modified nucleotide and should be substituted for by the indicated molecule (given in the mod_base qualifier value) |
mRNA | messenger RNA; includes 5' untranslated region (5'UTR), coding sequences (CDS, exon) and 3' untranslated region (3'UTR) |
mutation | a related strain has an abrupt, inheritable change in the sequence at this location |
N_region | extra nucleotides inserted between rearranged immunoglobulin segments |
old_sequence | the presented sequence revises a previous version of the sequence at this location |
polyA_signal | recognition region necessary for endonuclease cleavage of an RNA transcript that is followed by polyadenylation; consensus=AATAAA |
polyA_site | site on an RNA transcript to which will be added adenine residues by post-transcriptional polyadenylation |
precursor_RNA | any RNA species that is not yet the mature RNA product; may include 5' clipped region (5'clip), 5' untranslated region (5'UTR), coding sequences (CDS, exon), intervening sequences (intron), 3' untranslated region (3'UTR), and 3' clipped region (3'clip) |
prim_transcript | primary (initial, unprocessed) transcript; includes 5' clipped region (5'clip), 5' untranslated region (5'UTR), coding sequences (CDS, exon), intervening sequences (intron), 3' untranslated region (3'UTR), and 3' clipped region (3'clip) |
primer_bind | non-covalent primer binding site for initiation of replication, transcription, or reverse transcription; includes site(s) for synthetic, for example, PCR primer elements |
promoter | region on a DNA molecule involved in RNA polymerase binding to initiate transcription |
protein_bind | non-covalent protein binding site on nucleic acid |
RBS | ribosome binding site |
repeat_region | region of genome containing repeating units |
repeat_unit | single repeat element |
rep_origin | origin of replication; starting site for duplication of nucleic acid to give two identical copies |
rRNA | mature ribosomal RNA; the RNA component of the ribonucleoprotein particle (ribosome) which assembles amino acids into proteins |
S_region | switch region of immunoglobulin heavy chains; involved in the rearrangement of heavy chain DNA leading to the expression of a different immunoglobulin class from the same B-cell |
satellite | many tandem repeats (identical or related) of a short basic repeating unit; many have a base composition or other property different from the genome average that allows them to be separated from the bulk (main band) genomic DNA |
scRNA | small cytoplasmic RNA; any one of several small cytoplasmic RNA molecules present in the cytoplasm and (sometimes) nucleus of a eukaryote |
sig_peptide | signal peptide coding sequence; coding sequence for an N-terminal domain of a secreted protein; this domain is involved in attaching nascent polypeptide to the membrane; leader sequence |
snRNA | small nuclear RNA; any one of many small RNA species confined to the nucleus; several of the snRNAs are involved in splicing or other RNA processing reactions |
source | identifies the biological source of the specified span of the sequence; this key is mandatory; every entry will have, as a minimum, a single source key spanning the entire sequence; more than one source key per sequence is permissable |
stem_loop | hairpin; a double-helical region formed by base-pairing between adjacent (inverted) complementary sequences in a single strand of RNA or DNA |
STS | Sequence Tagged Site; short, single-copy DNA sequence that characterizes a mapping landmark on the genome and can be detected by PCR; a region of the genome can be mapped by determining the order of a series of STSs |
TATA_signal | TATA box; Goldberg-Hogness box; a conserved AT-rich septamer found about 25 bp before the start point of each eukaryotic RNA polymerase II transcript unit which may be involved in positioning the enzyme for correct initiation; consensus=TATA(A or T)A(A or T) |
terminator | sequence of DNA located either at the end of the transcript or adjacent to a promoter region that causes RNA polymerase to terminate transcription; may also be site of binding of repressor protein |
transit_peptide | transit peptide coding sequence; coding sequence for an N-terminal domain of a nuclear-encoded organellar protein; this domain is involved in post-translational import of the protein into the organelle |
tRNA | mature transfer RNA, a small RNA molecule (75-85 bases long) that mediates the translation of a nucleic acid sequence into an amino acid sequence |
unsure | author is unsure of exact sequence in this region |
V_region | variable region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; codes for the variable amino terminal portion; can be made up from V_segments, D_segments, N_regions, and J_segments |
V_segment | variable segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; codes for most of the variable region (V_region) and the last few amino acids of the leader peptide |
variation | a related strain contains stable mutations from the same gene (for example, RFLPs, polymorphisms, etc.) which differ from the presented sequence at this location (and possibly others) |
3'clip | 3'-most region of a precursor transcript that is clipped off during processing |
3'UTR | region at the 3' end of a mature transcript (following the stop codon) that is not translated into a protein |
5'clip | 5'-most region of a precursor transcript that is clipped off during processing |
5'UTR | region at the 5' end of a mature transcript (preceding the initiation codon) that is not translated into a protein |
-10_signal | pribnow box; a conserved region about 10 bp upstream of the start point of bacterial transcription units which may be involved in binding RNA polymerase; consensus=TAtAaT |
-35_signal | a conserved hexamer about 35 bp upstream of the start point of bacterial transcription units; consensus=TTGACa [ ] or TGTTGACA [ ] |
WIPO Standard ST.25 (1998), Appendix 2, Table 6 provides for feature keys related to protein sequences.
Key | Description |
---|---|
CONFLICT | different papers report differing sequences |
VARIANT | authors report that sequence variants exist |
VARSPLIC | description of sequence variants produced by alternative splicing |
MUTAGEN | site which has been experimentally altered |
MOD_RES | post-translational modification of a residue |
ACETYLATION | N-terminal or other |
AMIDATION | generally at the C-terminal of a mature active peptide |
BLOCKED | undetermined N- or C-terminal blocking group |
FORMYLATION | of the N-terminal methionine |
GAMMA-CARBOXYGLUTAMIC ACID HYDROXYLATION | of asparagine, aspartic acid, proline or lysine |
METHYLATION | generally of lysine or arginine |
PHOSPHORYLATION | of serine, threonine, tyrosine, aspartic acid or histidine |
PYRROLIDONE CARBOXYLIC ACID | N-terminal glutamate which has formed an internal cyclic lactam |
SULFATATION | generally of tyrosine |
LIPID | covalent binding of a lipidic moiety |
MYRISTATE | myristate group attached through an amide bond to the N-terminal glycine residue of the mature form of a protein or to an internal lysine residue |
PALMITATE | palmitate group attached through a thioether bond to a cysteine residue or through an ester bond to a serine or threonine residue |
FARNESYL | farnesyl group attached through a thioether bond to a cysteine residue |
GERANYL-GERANYL | geranyl-geranyl group attached through a thioether bond to a cysteine residue |
GPI-ANCHOR | glycosyl-phosphatidylinositol (GPI) group linked to the alpha-carboxyl group of the C-terminal residue of the mature form of a protein |
N-ACYL DIGLYCERIDE | N-terminal cysteine of the mature form of a prokaryotic lipoprotein with an amide-linked fatty acid and a glyceryl group to which two fatty acids are linked by ester linkages |
DISULFID | disulfide bond; the ‘FROM’ and ‘TO’ endpoints represent the two residues which are linked by an intra-chain disulfide bond; if the ‘FROM’ and ‘TO’ endpoints are identical, the disulfide bond is an interchain one and the description field indicates the nature of the cross-link |
THIOLEST | thiolester bond; the ‘FROM’ and ‘TO’ endpoints represent the two residues which are linked by the thiolester bond |
THIOETH | thioether bond; the ‘FROM’ and ‘TO’ endpoints represent the two residues which are linked by the thioether bond |
CARBOHYD | glycosylation site; the nature of the carbohydrate (if known) is given in the description field |
METAL | binding site for a metal ion; the description field indicates the nature of the metal |
BINDING | binding site for any chemical group (co-enzyme, prosthetic group, etc.); the chemical nature of the group is given in the description field |
SIGNAL | extent of a signal sequence (prepeptide) |
TRANSIT | extent of a transit peptide (mitochondrial, chloroplastic, or for a microbody) |
PROPEP | extent of a propeptide |
CHAIN | extent of a polypeptide chain in the mature protein |
PEPTIDE | extent of a released active peptide |
DOMAIN | extent of a domain of interest on the sequence; the nature of that domain is given in the description field |
CA_BIND | extent of a calcium-binding region |
DNA_BIND | extent of a DNA-binding region |
NP_BIND | extent of a nucleotide phosphate binding region; the nature of the nucleotide phosphate is indicated in the description field |
TRANSMEM | extent of a transmembrane region |
ZN_FING | extent of a zinc finger region |
SIMILAR | extent of a similarity with another protein sequence; precise information, relative to that sequence is given in the description field |
REPEAT | extent of an internal sequence repetition |
HELIX | secondary structure: Helices, for example, Alpha-helix, 3(10) helix, or Pi-helix |
STRAND | secondary structure: Beta-strand, for example, Hydrogen bonded beta-strand, or Residue in an isolated beta-bridge |
TURN | secondary structure: Turns, for example, H-bonded turn (3-turn, 4-turn, or 5-turn) |
ACT_SITE | amino acid(s) involved in the activity of an enzyme |
SITE | any other interesting site on the sequence |
INIT_MET | the sequence is known to start with an initiator methionine |
NON_TER | the residue at an extremity of the sequence is not the terminal residue; if applied to position 1, this signifies that the first position is not the N-terminus of the complete molecule; if applied to the last position, it signifies that this position is not the C-terminus of the complete molecule; there is no description field for this key |
NON_CONS | non consecutive residues; indicates that two residues in a sequence are not consecutive and that there are a number of unsequenced residues between them |
UNSURE | uncertainties in the sequence; used to describe region(s) of a sequence for which the authors are unsure about the sequence assignment |
The requirements of 37 CFR 1.821 through 37 CFR 1.825 are the result of an effort to harmonize the USPTO requirements with international sequence listing requirements to the extent possible. The requirements of 37 CFR 1.821 through 37 CFR 1.825 substantially correspond to the requirements of WIPO Standard ST.25. PatentIn Version 3.5.1 software (see MPEP § 2430) generates sequence listings that meet all of the requirements of WIPO Standard ST.25. The requirements of 37 CFR 1.821 through 37 CFR 1.825, however, are less stringent than the requirements of WIPO Standard ST.25. Thus, applicants who wish to file in countries which adhere to WIPO Standard ST.25 should consider the following when not using PatentIn Version 3.5.1:
- (A) The data in numeric identifier <221> must use selections from Tables 5 and 6 of WIPO Standard ST.25 (2009) to comply with that standard. The terms from these Tables are considered language neutral vocabulary;
- (B) Where the sequence listing forming part of the international application contains free text, e.g., free text in numeric identifier <223>, any such free text shall be repeated in the main part of the description in the language thereof. It is recommended that the free text in the language of the main part of the description be put in a specific section of the description called "Sequence Listing Free Text;
- (C) A sequence listing filed after the international filing date is generally not considered to be part of the disclosure and usually will not be published as part of the international application publication (see PCT Article 34 and PCT Rules 26 and 91 for exceptions);
- (D) Paragraphs 4(v) and 4bis(iv) of WIPO Standard ST.25 (2009) requires the specific wording "the information recorded in electronic form furnished under PCT Rule 13ter is identical to the sequence listing"; and
- (E) WIPO Standard ST.25 (2009), paragraph 24, requires a blank line between numeric identifiers in the sequence listing when the digit in the first or second position of the numeric identifier changes.
Requirements related to the submission of sequence listings may also differ between filing in the United States and filing internationally. For example, where an international application is filed in paper, the sequence listing part of the international application must also be provided in paper, although the search copy must be filed in electronic form, e.g. on a CD or, in the RO/US, as an ASCII text file via EFS-Web. Also, any tables filed in an international application must be an integral part of the application, i.e., cannot be submitted as a separate file in text format.
2422.01 Nucleotide and/or Amino Acids Disclosures Requiring a Sequence Listing [R-10.2019]
37 CFR 1.821(a) presents a definition for "nucleotide and/or amino acid sequences." This definition sets forth limits, in terms of numbers of amino acids and/or numbers of nucleotides, at or above which compliance with the sequence rules is required. Nucleotide and/or amino acid sequences as used in 37 CFR 1.821 through 37 CFR 1.825 are interpreted to mean an unbranched sequence of four or more amino acids or an unbranched sequence of ten or more nucleotides. Branched sequences are specifically excluded from this definition. Sequences with fewer than ten specifically defined nucleotides or four specifically defined amino acids are specifically excluded from this section. "Specifically defined" means those amino acids other than "Xaa" and those nucleotide bases other than "n" defined in accordance with the World Intellectual Property Organization (WIPO) Handbook on Industrial Property Information and Documentation, Standard ST.25: Standard for the Presentation of Nucleotide and Amino Acid Sequence Listings in Patent Applications (1998), including Tables 1 through 6 in Appendix 2 (see MPEP § 2422).
The limit of four or more amino acids was established for consistency with limits in place for industry database collections whereas the limit of ten or more nucleotides, while lower than certain industry database limits, was established to encompass those nucleotide sequences to which the smallest probe will bind in a stable manner.
37 CFR 1.821(a)(1) and 37 CFR 1.821(a)(2) present further definitions for those nucleotide and amino acid sequences that are intended to be embraced by the sequence rules. Situations in which the applicability of the rules is in issue will be resolved on a case-by-case basis.
Nucleotide sequences are further limited to those that can be represented by the symbols set forth in 37 CFR 1.822(b), which incorporates by reference WIPO Standard ST.25 (1998), Appendix 2, Table 1 (see MPEP § 2422). The presence of other than typical 5' to 3' phosphodiester linkages in a nucleotide sequence does not render the rules inapplicable. The Office does not want to exclude linkages of the type commonly found in naturally occurring nucleotides, e.g., eukaryotic end capped sequences.
Amino acid sequences are further limited to those listed in 37 CFR 1.822(b), which incorporates by reference WIPO Standard ST.25 (1998), Appendix 2, Table 3 (see MPEP § 2422), and those L-amino acids that are commonly found in naturally occurring proteins. The presence of one or more D-amino acids in a sequence will exclude that sequence from the scope of the rules. Voluntary compliance is, however, encouraged in these situations; the symbol "Xaa" can be used to represent D-amino acids. The sequence rules embrace "[a]ny peptide or protein that can be expressed as a sequence using the symbols in WIPO Standard ST.25 (1998), Appendix 2, Table 3 in conjunction with a description in the Feature section to describe, for example, modified linkages, cross links and end caps, non-peptidyl bonds, etc." 37 CFR 1.821(a)(2).
With regard to amino acid sequences, the use of the terms "peptide or protein" implies, however, that the amino acids in a given sequence are linked by at least three consecutive peptide bonds. Accordingly, an amino acid sequence is not excluded from the scope of the rules merely due to the presence of a single non-peptidyl bond. If an amino acid sequence can be represented by a string of amino acid abbreviations, with reference, where necessary, to a features table to explain modifications in the sequence, the sequence comes within the scope of the rules. However, the rules are not intended to encompass the subject matter that is generally referred to as synthetic resins.
The requirement for compliance in 37 CFR 1.821(c) is directed to "disclosures of nucleotide and/or amino acid sequences." (Emphasis added.) All sequence information, whether claimed or not, that meets the length thresholds in 37 CFR 1.821(a) is subject to the rules. The goal of the Office is to build a comprehensive database that can be used for, inter alia, assessing the prior art. It is therefore essential that all sequence information, whether only disclosed or also claimed, be included in the database. In those instances in which prior art sequences are only referred to in a given application by name and a publication or accession reference, they need not be included as part of the sequence listing, unless the referred-to sequence is "essential material" per MPEP § 608.01(p). However, if the applicant presents the sequence as a string of particular nucleotide bases or amino acids, it is necessary to include the sequence in the sequence listing regardless of whether the applicant considers the sequence to be prior art. In general, any sequence that is disclosed and/or claimed as a sequence, i.e., as a string of particular nucleotide bases or amino acids, and that otherwise meets the criteria of 37 CFR 1.821(a), must be set forth in the sequence listing.
It is generally acceptable to present a single, primary sequence in the specification and sequence listing by enumeration of its residues in accordance with the sequence rules ("primary sequence") and to discuss and/or claim variants of that primary sequence without presenting each variant as a separate sequence in the sequence listing. However, the primary sequence should be annotated in the sequence listing to reflect such variants. By way of example only, the following types of sequence disclosures would be treated as noted herein by the Office. With respect to a primary sequence and "conservatively modified variants thereof," the sequences may be described as SEQ ID NO:X (the primary sequence) and "conservatively modified variants thereof," if desired. With respect to a sequence that "may be deleted at the C-terminus by 1, 2, 3, 4, or 5 residues," all of the implied variations do not need to be included in the sequence listing. In this latter example, only the sequence without deletions needs to be included in the sequence listing, however applicant is encouraged to annotate the sequence to indicate that deletions have been made at the C-terminus by 1, 2, 3, 4, or 5 residues.
The Office's database will only contain the unmodified sequence. It is strongly recommended that any sequences appearing in the claims, or sequences that are considered essential to understanding the invention, be included in the sequence listing as a separate sequence.
37 CFR 1.821(c) requires that each disclosed nucleic acid or amino acid sequence in the application appear separately in the sequence listing, with each sequence further being assigned a sequence identifier, referred to as "SEQ ID NO." The sequence identifiers must begin with 1 and increase sequentially by integers. The requirement for sequence identifiers, at a minimum, requires that each sequence be assigned a different number for purposes of identification. However, where practical and for ease of reference, sequences should be presented in the sequence listing in numerical order and in the order in which they are discussed in the application.
37 CFR 1.821(d) requires that where the description or claims of a patent application discuss a sequence that is set forth in the sequence listing, a reference to the sequence identifier of that sequence is required at all occurrences, even if in the text of the description or claims that sequence is set forth by enumeration of its residues. This requirement is also intended to permit references elsewhere in the application (e.g., specification, claims, or drawings) to sequences set forth in the sequence listing by the use of assigned sequence identifiers without repeating the sequence. Sequence identifiers can also be used to discuss and/or claim parts or fragments of a properly presented sequence. For example, language such as "residues 14 to 243 of SEQ ID NO:23" is permissible and the fragment need not be separately presented in the sequence listing. Where a sequence that meets the length thresholds of 37 CFR 1.821(a) is disclosed by enumeration of its residues anywhere in an application, it must be presented in a sequence listing in a manner that complies with the requirements of the sequence rules.
The rules do not alter, in any way, the requirements of 35 U.S.C. 112. The implementation of the rules has had no effect on disclosure and/or claiming requirements. The rules, in general, or the use of sequence identifiers throughout the specification and claims, specifically, should not raise any issues under 35 U.S.C. 112(a) or 35 U.S.C. 112(b). The use of sequence identifiers (SEQ ID NO:X) only provides a shorthand way for applicants to discuss and claim their inventions. These identification numbers do not in any way restrict the manner in which an invention can be claimed.
2422.02 The Requirement for Exclusive Conformance; Sequences Presented in Drawing Figures [R-07.2015]
For all applications that disclose nucleic acid and/or amino acid sequences that fall within the definition set forth in 37 CFR 1.821(a), 37 CFR 1.821(b) requires exclusive conformance to the requirements of 37 CFR 1.821 through 37 CFR 1.825 with regard to the manner in which the disclosed nucleic acid and/or amino acid sequences are presented and described. This requirement is necessary to minimize any confusion that could result if more than one format for representing sequence data was employed in a given application.
Pursuant to 37 CFR 1.83(a), sequences that are included in sequence listings should not be duplicated in the drawings. However many significant sequence characteristics may only be demonstrated by a figure. This is especially true in view of the fact that the representation of double stranded nucleotides is not permitted in the sequence listing and many significant nucleotide features, such as "sticky ends" and the like, may only be shown effectively by reference to a drawing figure. Further, the similarity or homology between/among sequences may only be depicted in an effective manner in a drawing figure. Similarly, drawing figures are recommended for use with amino acid sequences to depict structural features of the corresponding protein, such as finger regions and Kringle regions. The situations discussed herein are given by way of example only and there may be many other reasons for including a sequence in a drawing. However, when a sequence is presented in a drawing, the sequence must still be included in the sequence listing if the sequence falls within the definition set forth in 37 CFR 1.821(a), and the sequence identifier ("SEQ ID NO:X") must be used, either in the drawing or in the Brief Description of the Drawings.
2422.03 Sequence Listing Submission [R-10.2019]
37 CFR 1.821(c) requires that applications containing disclosures of nucleotide and/or amino acid sequences that fall within the definitions of 37 CFR 1.821(a) contain, as a separate part of the disclosure, a disclosure of the nucleotide and/or amino acid sequences, and associated information, using the format and symbols that are set forth in 37 CFR 1.822 and 37 CFR 1.823. This separate part of the disclosure is referred to as the sequence listing. The sequence listing required pursuant to 37 CFR 1.821(c) may be submitted as an ASCII text file via EFS-Web, on compact disc, as a PDF submitted via EFS-Web, or on paper. The sequence listing required by 37 CFR 1.821(c) is the official copy of the sequence listing. If submitted on paper, the sequence listing is a separate part of the disclosure which must begin on a new page within the specification. A plurality of sequences may, if feasible, be presented on a single page; the separate presentation of both nucleotide and amino acid sequences on the same page is also permitted. Note that 37 CFR 1.821(e) requires that a copy of the sequence listing referred to in 37 CFR 1.821(c) must also be submitted in computer readable form (CRF) in accordance with the requirements of 37 CFR 1.824.
If the "Sequence Listing" required by 37 CFR 1.821(c) was submitted in ASCII text format in an international application, indicated on the Request as part of the international application, and published as part of the international application for which national stage is entered under 35 U.S.C. 371, then no further submission or incorporation by reference into the specification is required.
The Office strongly suggests filing the sequence listing required by 37 CFR 1.821(c) as a text file via EFS-Web. If a new application is filed via EFS-Web with an ASCII text file sequence listing that complies with the requirements of 37 CFR 1.824(a)(2) -(6) and (b), and applicant has not filed a sequence listing in a PDF file, the text file will serve as both the paper copy required by 37 CFR 1.821(c) and the computer readable form (CRF) required by 37 CFR 1.821(e). Note that the specification must contain a statement in a separate paragraph that incorporates by reference the material in the ASCII text file identifying the name of the ASCII text file, the date of creation, and the size of the ASCII text file in bytes. See MPEP § 2422.03(a) for additional information pertaining to EFS-Web submission of sequence listings.
If the official copy of the sequence listing as required by 37 CFR 1.821(c) is submitted on compact disc, the specification must contain an incorporation by reference of the material on the compact disc in a separate paragraph, identifying each compact disc by the names of the file(s) contained on each of the compact discs, their date of creation and their sizes in bytes (37 CFR 1.52(e) ). The total number of compact discs including duplicates and the files on each compact disc shall be specified (37 CFR 1.77(b)(5) ).
The compact disc used to submit the sequence listing may also contain table information if the table has more than 50 pages of text. See 37 CFR 1.823(a)(2) and 1.52(e)(1)(iii). The compact disc and duplicate copy must be labeled "Copy 1" and "Copy 2," respectively, and a statement stating that the copies are identical must be included. If the two compact discs are not identical, the Office will use the disc labeled "Copy 1" for further processing ( 37 CFR 1.52(e)(4) ). See also MPEP § 608.05.
If the sequence listing under 37 CFR 1.821(c) is submitted on compact disc, applicant is still required to submit a separate CRF of the sequence listing pursuant to 37 CFR 1.821(e) and 37 CFR 1.824. If the CRF is also submitted on compact disc, applicants will need to submit a total of three copies of the sequence listing (one pursuant to 37 CFR 1.821(c), and two pursuant to 37 CFR 1.821(e) ). The compact disc with the CRF of the sequence listing may be identical to the compact disc submitted under 37 CFR 1.821(c) if the latter compact disc includes only the sequence listing (i.e., no additional content, such as tables).
The sequence listing must be a single document, but the document may be split amongst two or more compact discs using software designed to divide a file that is too large to fit on a single compact disc into multiple concatenated files. If the user breaks up a sequence listing so that it may be submitted on multiple compact discs, the compact discs must be labeled to indicate their order (e.g., "1 of X", "2 of X").
One hundred (100) megabytes is the size limit for sequence listing text files submitted via EFS-Web. If a user wishes to submit an electronic copy of a sequence listing text file that exceeds 100 megabytes, the sequence listing must be filed on compact disc(s).
Effective for submissions filed on or after January 16, 2018, the Office set two new fees to manage handling of sequence listings of 300 MB or more in 37 CFR 1.21(o). Pricing for this fee is divided into two tiers with Tier 1 for file sizes 300 MB to 800 MB and Tier 2 for file sizes greater than 800 MB. The level of effort associated with the handling of mega-sequence listings is significant, because the Office’s systems require extra storage and special handling for files beyond 300 MB. The fee should encourage applicants to draft their specifications such that sequence data that is not essential material is not required to be included in a sequence listing. A reduced number of mega-sequence listings will benefit the Office and the public by reducing the strain on Office resources, thus facilitating the effective administration of the patent system.
The fee under 37 CFR 1.21(o) is due upon the first submission of a sequence listing that exceeds 800 MB, or the first submission of a sequence listing of at least 300MB, whichever applicable fee is higher. As an example, if an application was filed prior to January 16, 2018 (with or without a text file sequence listing), and thereafter a mega-sequence listing that is between 300 and 800 MB is filed, the fee under 37 CFR 1.21(o)(1) is due. If an applicant thereafter files a corrected sequence listing that is also between 300 and 800 MB, no additional fee is due. If a further corrected sequence listing is filed and the file size exceeds 800 MB, then the total fee owed under 37 CFR 1.21(o) is the fee set forth in 37 CFR 1.21(o)(2). The fee is due upon submission of the mega-sequence listing. Subsequent deletion or reduction in size of a sequence listing does not change the requirement to pay the mega-sequence listing submission fee.
The fee under 37 CFR 1.21(o) does not apply to international applications, but does apply to the submission of mega-sequence listings received in national stage applications under 35 U.S.C. 371, including mega-sequence listings received by the Office pursuant to PCT Article 20. See MPEP § 2422.03(a), subsection IV, for additional information.
2422.03(a) Sequence Listings Submitted as ASCII Text Files via EFS-Web [R-10.2019]
The EFS-Web Legal Framework (www.uspto.gov/ sites/default/files/documents/2019LegalFrameworkPES.pdf ) and MPEP § 502.05 provide detailed information pertaining to filing applications and other documents via EFS-Web. The information below is specific to sequence listing submissions via EFS-Web.
Pursuant to the EFS-Web Legal Framework, applicants may submit a sequence listing under 37 CFR 1.821 as an as ASCII text file via EFS-Web instead of on compact disc, provided the specification contains a statement in a separate paragraph (preferably on the first page) that incorporates by reference the material in the ASCII text file identifying the name of the ASCII text file, the date of creation, and the size of the ASCII text file in bytes. The requirements of 37 CFR 1.52(e)(3) - (6) for documents submitted on compact disc are not applicable to sequence listings submitted as ASCII text files via EFS-Web. However, each text file must be in compliance with ASCII and have a file name with a ".txt" extension.
It is recommended that a sequence listing be submitted in an ASCII text file via EFS-Web rather than in a PDF file. See subsection IV, below, for information regarding filing an international application (PCT) with a sequence listing text file via EFS-Web.
If a sequence listing ASCII text file submitted via EFS-Web on the application filing date complies with the requirements of 37 CFR 1.824(a)(2)-(6) and (b), and applicant has not filed a sequence listing in a PDF file (or on paper) on the same day, the text file will serve as both the paper copy required by 37 CFR 1.821(c) and the computer readable form (CRF) required by 37 CFR 1.821(e).Thus, the following are not required and should not be submitted: (1) a second copy of the sequence listing in a PDF file; (2) a statement under 37 CFR 1.821(f) (indicating that the paper copy and CRF copy of the sequence listing are identical); and (3) a request to use a compliant computer readable form of the sequence listing that is already on file for another application pursuant to 37 CFR 1.821(e). If such a request is filed, the USPTO will not carry out the request but will use the sequence listing submitted in the ASCII text file with the application via EFS-Web. See MPEP § 2422.05. Checker software that may be used to check a sequence listing for compliance with the requirements of 37 CFR 1.824 is available on the USPTO website at www.uspto.gov/patents-getting-started/patent-basics/ types-patent-applications/utility-patent/checker-version-446. The User Notes on the Checker website should be consulted for an explanation of errors that are not indicated, and content that is not verified, by the Checker software.
If a user submits a sequence listing (under 37 CFR 1.821(c) and (e) ) as an ASCII text file via EFS-Web in response to a requirement under 37 CFR 1.821(g) or (h), the sequence listing text file must be accompanied by a statement that the submission does not include any new matter which goes beyond the disclosure of the application as filed. In addition, if a user submits an amendment to, or a replacement of, a sequence listing (under 37 CFR 1.821(c) and (e) ) as an ASCII text file via EFS-Web, the sequence listing text file must be accompanied by: (1) a statement that the submission does not include any new matter, and (2) a statement that indicates support for the amendment in the application, as filed. See 37 CFR 1.825. An incorporation-by-reference statement of the sequence listings is also required in both of these instances.
Submission of the sequence listing in a PDF file on the application filing date is not recommended. Applicant must still provide the CRF required by 37 CFR 1.821(e), and the sequence listing in the PDF file will not be excluded when determining the application size fee. The USPTO prefers the submission of a sequence listing in an ASCII text file via EFS-Web on the application filing date because as stated above, if applicant has not filed a second copy of the sequence listing in a PDF file (or on paper) on the same day, the text file will serve as both the paper copy required by 37 CFR 1.821(c) and the CRF required by 37 CFR 1.821(e). Any sequence listing submitted in PDF format (or on paper) on the application filing date is treated as the paper copy required by 37 CFR 1.821(c). If applicant submits a sequence listing in both a PDF file and an ASCII text file via EFS-Web on the application filing date, a statement that the sequence listing content of the PDF copy and the ASCII text file copy are identical is required. In situations where applicant files the sequence listing in PDF format and requests the use of the CRF of another application under 37 CFR 1.821(e), applicant must submit a letter and request in compliance with 37 CFR 1.821(e) and a statement that the PDF copy filed in the new application is identical to the CRF filed in the other application. See MPEP § 2422.05.
Any sequence listing submitted as an ASCII text file via EFS-Web that is otherwise in compliance with 37 CFR 1.52(e) and 37 CFR 1.824(a)(2)-(6) and (b) will be excluded when determining the application size fee required by 37 CFR 1.16(s) or 1.492(j) as per 37 CFR 1.52(f)(1). A sequence listing submitted as a PDF file via EFS-Web will not be excluded when determining the application size fee.
Regarding a table submitted as an ASCII text file via EFS-Web that is part of the specification or drawings, each three kilobytes of content submitted will be counted as a sheet of paper for purposes of determining the application size fee required by 37 CFR 1.16(s) or 1.492(j). Each table should be submitted as a separate text file. Further, the file name for each table should indicate which table is contained therein.
See subsection IV, below, for additional information regarding application size fees in an international application (PCT).
One hundred (100) megabytes is the size limit for sequence listing text files submitted via EFS-Web. If a user wishes to submit an electronic copy of a sequence listing text file that exceeds 100 megabytes, it is recommended that the user file the application without the sequence listing using EFS-Web to obtain the application number and confirmation number, and then file the sequence listing on compact disc in accordance with 37 CFR 1.52(e) on the same day by using Priority Mail Express® from the USPS in accordance with 37 CFR 1.10, or hand delivery, in order to secure the same filing date for all parts of the application. Note: a submission of a sequence listing in electronic form of 300 MB or more in size is subject to the fee set forth in 37 CFR 1.21(o). Alternatively, a user may submit the application on paper and include the electronic copy of the sequence listing text file on compact disc in accordance with 37 CFR 1.52(e). Sequence listing text files may not be partitioned into multiple files for filing via EFS-Web as the EFS-Web system is not currently capable of handling such submissions. If the sequence listing is filed on a compact disc, the sequence listing must be a single document, but the document may be split for submission on multiple physical media using software designed to divide a file into multiple files for subsequent concatenation. If the user breaks up a sequence listing so that it may be submitted on multiple compact discs, the compact discs must be labeled to indicate their order (e.g., "1 of X", "2 of X").
See subsection IV.B, below, for information regarding submission of a sequence listing text file that exceeds 100 megabytes in an international application (PCT) filed via EFS-Web.
For all other file types, 25 megabytes is the size limit. If a user wishes to submit a table that is larger than 25 megabytes, it is recommended that the electronic copy be submitted on compact disc via Priority Mail Express® from the USPS in accordance with 37 CFR 1.52(f)(1) on the date of the corresponding EFS-Web filing in accordance with 37 CFR 1.52(e) if the user wishes the electronic copy to be considered to be part of the application as filed. Alternatively, the user may submit the application in paper and include the electronic copies on compact disc in accordance with 37 CFR 1.52(e). Another alternative would be for the user to break up a computer program listing or table file that is larger than 25 megabytes into multiple files that are no larger than 25 megabytes each and submit those smaller files via EFS-Web. If the user chooses to break up a table file so that it may be submitted electronically, the file names must indicate their order (e.g., "1 of X", "2 of X").
See subsection IV.C, below, for information regarding submission of tables in an international application (PCT) filed via EFS-Web.
Under PCT Rule 5.2(a), the sequence listing must always be presented as a separate part of the description. When filing an international application (PCT) using EFS-Web, the sequence listing part of the description may be submitted either as a single ASCII text file with a ".txt" extension (e.g., "seqlist.txt") or as a PDF file. Note that 100 megabytes is the size limit for submitting a sequence listing text file via EFS-Web. See subsection IV.B, below.
If the sequence listing is submitted as an ASCII text file, applicant need not and should not submit any additional copies. The single ASCII text file is preferred because the ASCII text file will serve both as the sequence listing part of the description under PCT Rule 5.2 and the electronic form under PCT Rule 13ter.1(a) in the absence of a PDF sequence listing file. The check list of the PCT Request provided via EFS-Web together with the international application (PCT) must indicate that the sequence listing forms part of the international application. Furthermore, the statement as set forth in paragraph 4(v) of the AI Annex C (Administrative Instructions under the PCT, Annex C), that "the information recorded in electronic form furnished under PCT Rule 13ter is identical to the sequence listing as contained in the international application," is not required. Also, the sequence listing in an ASCII text file will not be taken into account when calculating the application sheet count, i.e., no excess sheet fee will be required for the sequence listing text file.
Submission of the sequence listing part of the description in a PDF file is not recommended because the applicant would also be required to supply a copy of the sequence listing in an ASCII text file to the appropriate authority for purposes of international search and/or international preliminary examination in accordance with paragraph 40 of AI Annex C. When a sequence listing is filed via EFS-Web in a new PCT international application in both a PDF file and an ASCII text file, but the Request form Box No. IX does not indicate which one forms part of the international application, the PDF copy of the sequence listing will be considered to form part of the application and the ASCII text file will be considered an accompanying item for search purposes under PCT Rule 13ter.1(a) only.
The calculation of the international filing fee for an international application (PCT), including a sequence listing, filed via EFS-Web is determined based on the type of sequence listing file. A sequence listing filed in an ASCII text file will not be included in the sheet count of the international application (PCT). A sequence listing filed in a PDF file will be included in the sheet count of the international application (PCT). Therefore, the sheet count for an EFS-Web filed international application (PCT) containing both a PDF file and a text file sequence listing will be calculated to include the number of sheets of the PDF sequence listing.
One hundred (100) megabytes is the size limit for sequence listing text files submitted via EFS-Web. Sequence listing text files must not be partitioned into multiple files for filing via EFS-Web as the EFS-Web electronic filing system is not currently capable of handling such submissions. For all other file types EFS-Web is currently not capable of accepting files that are larger than 25 megabytes. Additionally, a single EFS-Web submission may include no more than 60 electronic files. Note that regarding the 60 electronic file limit, an applicant may upload and validate in sets of up to 20 files each, with a limit of three sets of 20. If applicant chooses to divide a file into multiple parts using the multi-doc feature, each part is counted as one file.
The need to submit unusually large sequence listings and/or numerous electronic files may prevent applicant from making a complete international application (PCT) filing in a single EFS-Web submission. Applicant may use EFS-Web to file part of the international application (PCT) and to obtain the international application (PCT) number and the confirmation number, and then file the remainder of the international application (PCT) on the same day as one or more follow-on submissions using EFS-Web, in order to secure the same filing date for all parts of the international application (PCT). However, applicant is not permitted to file part of the international application (PCT) electronically via EFS-Web, and then file the remainder of the international application (PCT) on paper to secure a filing date of all parts of the international application (PCT).
In the situation where applicant needs to file a sequence listing that is over one hundred (100) megabytes, applicant may use EFS-Web to file the international application (PCT) without the sequence listing to obtain the international application (PCT) number and the confirmation number, and then file the sequence listing on compact discs on the same day by using Priority Mail Express® from the USPS in accordance with 37 CFR 1.10, or hand delivery, in order to secure the same filing date for all parts of the international application (PCT). However, Priority Mail Express ® from the USPS and hand-carried submissions must not contain PDF files and must fully comply with the guidelines for filing a sequence listing on electronic media. The check list of the PCT Request provided via EFS-Web together with the international application (PCT) must indicate that the sequence listing part of the description will be filed separately on physical data carrier(s), on the same day and in the form of an Annex C/ST.25 text file. The sequence listing must be a single document, but the document may be split for submission on multiple physical media using software designed to divide a file into multiple files for subsequent concatenation. If the user breaks up a sequence listing for submission on multiple compact discs, the compact discs must be labeled to indicate their order (e.g., "1 of X", "2 of X").
Submissions of very lengthy sequence listings (300 MB or over) in international applications are not subject to the mega-sequence listing submission fees set forth in 37 CFR 1.21(o). However, for mega-sequence listing submissions on or after January 16, 2018, the fee under 37 CFR 1.21(o) does apply to the submission of mega-sequence listings received in national stage applications under 35 U.S.C. 371, including mega-sequence listings received by the Office pursuant to PCT Article 20. Similarly, if an international application is filed at RO/US with a mega-sequence listing, and thereafter a bypass continuing application is filed under 35 U.S.C. 111(a), the fee under 37 CFR 1.21(o) will be due in the continuing application for mega-sequence listing submissions on or after January 16, 2018.
Tables related to a sequence listing must be an integral part of the description of the international application (PCT), and must not be included in the sequence listing part or the drawing part. Such tables will be taken into account when calculating the application sheet count, and excess sheet fees may be required. When applicant submits tables related to a sequence listing in an international application (PCT) via EFS-Web, the tables must be in a PDF file. If applicant submits tables related to a sequence listing in a text file, such tables will not be accepted as part of the international application (PCT). For more information, see Sequence Listings and Tables Related Thereto in International Applications Filed in the United States Receiving Office, 1344 Off. Gaz. Pat. Office 50 (July 7, 2009). If applicant submits tables related to a sequence listing in a text file, such tables will not be accepted as part of the international application (PCT).
2422.04 The Requirement for a Computer Readable Copy of the Official Copy of the Sequence Listing [R-07.2015]
37 CFR 1.821(e) requires the submission of a copy of the sequence listing in computer readable form. The computer readable form may be submitted on the electronic media permitted by 37 CFR 1.824, or may be submitted as an ASCII text file via EFS-Web. The information on the computer readable form will be entered into the Office’s database for searching and printing nucleotide and amino acid sequences. This electronic database will also enable the Office to provide published sequence data, in electronic form, to the National Center for Biotechnology Information (NCBI) for publication in GenBank, and enable NCBI to exchange data with the DNA Data Bank of Japan (DDBJ) and the European Bioinformatics Institute (EBI). It should be noted that the Office’s database complies with the confidentiality requirement imposed by 35 U.S.C. 122. Unpublished pending application sequences are maintained in the database separately from published or patented sequences. That is, the Office will not exchange or make public any information on any sequence until the patent application containing that information is published or matures into a patent, or as otherwise allowed by 35 U.S.C. 122.
The Office may permit correction of the official copy of the sequence listing submitted pursuant to 37 CFR 1.821(c), whether on paper or compact disc, at the least, during the pendency of a given application by reference to the computer readable copy thereof submitted pursuant to 37 CFR 1.821(e) if both the official copy and computer readable form were submitted at the time of filing of the application and the totality of the circumstances otherwise substantiate the proposed correction. A mere discrepancy between the official copy and the computer readable form may not, in and of itself, be sufficient to justify a proposed correction. In this regard, the Office will assume that the computer readable form has been incorporated by reference into the application when the official copy and computer readable form were submitted at the time of filing of the application. The Office will attempt to accommodate or address all correction issues, but it must be kept in mind that the real burden rests with the applicant to ensure that any discrepancies between the official copy and the computer readable form are eliminated or minimized. Applicants should be aware that there will be instances where the applicant may have to suffer the consequences of any discrepancies between the two. If a new application is filed via EFS-Web with an ASCII text file sequence listing that complies with the requirements of 37 CFR 1.824(a)(2) - (6) and 37 CFR 1.824(b), and applicant has not filed a sequence listing in a PDF file, the text file will serve as both the paper copy required by 37 CFR 1.821(c) and CRF required by 37 CFR 1.821(e), eliminating any chance for discrepancies between the official copy and the CRF.
The Office does not desire to be bound by a requirement to permanently preserve computer readable forms for support, priority or correction purposes. For example, the Office will make corrections, where appropriate, by reference to the CRF as long as the CRF is still available to the Office. However, once use of the CRF by the Office for processing has ended, i.e., once the Office has entered the data contained on the computer readable form into the appropriate database, the Office does not intend to further preserve the CRF submitted by the applicant.
2422.05 Request for Transfer of Computer Readable Form [R-10.2019]
37 C.F.R. 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications.
*****
- (e) A copy of the "Sequence Listing" referred to in paragraph (c) of this section must also be submitted in computer readable form (CRF) in accordance with the requirements of § 1.824. The computer readable form must be a copy of the "Sequence Listing" and may not be retained as a part of the patent application file. If the computer readable form of a new application is to be identical with the computer readable form of another application of the applicant on file in the Office, reference may be made to the other application and computer readable form in lieu of filing a duplicate computer readable form in the new application if the computer readable form in the other application was compliant with all of the requirements of this subpart. The new application must be accompanied by a letter making such reference to the other application and computer readable form, both of which shall be completely identified. In the new application, applicant must also request the use of the compliant computer readable "Sequence Listing" that is already on file for the other application and must state that the paper or compact disc copy of the "Sequence Listing" in the new application is identical to the computer readable copy filed for the other application.
*****
Where the computer readable form (CRF) of the sequence listing of a new application is to be identical with the CRF of another application of the applicant on file in the Office, 37 CFR 1.821(e) provides a mechanism for applicant to request a transfer of the CRF from the application already on file to the new application in limited circumstances. Instead of submitting a transfer request of a previously filed CRF, however, the Office strongly recommends that applicant submit a sequence listing in ASCII text format in the new application, which will serve both as the sequence listing part of the disclosure, as well as the CRF. Applicant may be able to retrieve a copy of the sequence listing in ASCII text format in another application of the applicant from applicant's records, public or private PAIR via the Supplemental Content Tab, or from PATENTSCOPE (WIPO website) when provided in an international application.
First, the application in which the request for a transfer is submitted must have been filed with (or include via an amendment in accordance with 37 CFR 1.825(a) ) a paper copy, two compact disc copies in accordance with 37 CFR 1.52(e), or a PDF of a sequence listing. Second, the CRF of the previous application must be identical to the sequence listing contained in the new application and the request for transfer must include a statement to this effect. Note that applicant may request transfer only of a CRF that complies with 37 CFR 1.824(a)(2) - (6) and 37 CFR 1.824(b), (i.e., is a compliant sequence listing ASCII text file). Third, the previous application and the CRF to be transferred must be completely and clearly identified in the transfer request. Necessary identifying information includes the application number, filing date of the application, and submission date of the CRF that is to be transferred. Note that if the transfer request is filed on or after January 16, 2018 and the sequence listing to be transferred is at least 300 MB, then the transfer request will be subject to the mega-sequence listing fee set forth in 37 CFR 1.21(o).
Form PTO/SB/93 (www.uspto.gov/forms/ sb0093.pdf ) should be used to request a transfer of a CRF under 37 CFR 1.821(e) to facilitate processing of the request.
If a user submits a sequence listing ASCII text file via EFS-Web and concurrently requests the Office to use a compliant computer readable sequence listing that is already on file for another application pursuant to 37 CFR 1.821(e), the Office will not carry out the request but will use the sequence listing submitted with the application as originally filed via EFS-Web.
Applicant's reply to a notice of a defective transfer request preferably includes a CRF of the previous application (an ASCII text file submitted via EFS-Web or on compact disc), however a new transfer request and correction of the noted deficiencies is also permitted. As an example, if applicant requested transfer of a CRF into a new application that does not include a sequence listing and such request is defective, the response to a defective transfer request notice may be a CRF of the sequence listing. If it is not, then the response must include a new transfer request, a PDF, two compact disc copies in accordance with 37 CFR 1.52(e) or a paper copy of the sequence listing, and an amendment in accordance with 37 CFR 1.825(a) entering the sequence listing in the application.
2422.06 Requirement for Statement Regarding Content of Official and Computer Readable Copies of Sequence Listing [R-07.2015]
37 CFR 1.821(f) requires that the official sequence listing (submitted on paper or compact disc pursuant to 37 CFR 1.821(c) ) and computer readable copies of the sequence listing (submitted pursuant to 37 CFR 1.821(e) ) be accompanied by a statement that the content of the official and computer readable copies are the same, at the time when the computer readable form is submitted. Such a statement may be made by a registered practitioner, the applicant, an inventor, or the person who actually compares the sequence data on behalf of the aforementioned. See MPEP § 2428 for further information and Sample Statements.
Note that if the sequence listing is filed in a new application as an ASCII text file via EFS-Web, and applicant has not filed a sequence listing in a PDF file, the text file will serve as both the paper copy required by 37 CFR 1.821(c) and the computer readable form (CRF) required by 37 CFR 1.821(e). See MPEP § 2422.03(a), subsections I and IV, for additional information. Thus, the following are not required and should not be submitted: (1) a second copy of the sequence listing in a PDF file; and (2) a statement under 37 CFR 1.821(f) (indicating that the paper copy and CRF copy of the sequence listing are identical).
2422.07 Requirements for Compliance, Statements Regarding New Matter, and Sanctions for Failure to Comply [R-07.2015]
37 CFR 1.821(g) requires compliance with the requirements of 37 CFR 1.821(b) through (f), as discussed above, if they are not satisfied at the time of filing under 35 U.S.C. 111(a) or at the time of entering the national stage of an international application under 35 U.S.C. 371, within the period of time set in a notice requiring compliance. Failure to comply will result in the abandonment of the application. When applicant files an amendment to comply with the requirements of 37 CFR 1.821(g) and that amendment adds or amends a compact disc(s) or ASCII text file submitted via EFS-Web, applicant is required to update or insert in the specification an appropriate incorporation by reference statement describing the compact disc and the files contained thereon or the description of the ASCII text file submitted via EFS-Web. See 37 CFR 1.77(b)(5) and 37 CFR 1.52(e)(5). Submissions in reply to requirements under 37 CFR 1.821(g) must be accompanied by a statement that the submission includes no new matter. Such a statement may be made by a registered practitioner, the applicant, an inventor, or the person who actually compares the sequence data on behalf of the aforementioned. Extensions of time in which to reply to a requirement under this paragraph are available pursuant to 37 CFR 1.136. Note, however, that patent applications filed under 35 U.S.C. 111 on or after December 18, 2013, and international patent applications in which the national stage commenced under 35 U.S.C. 371 on or after December 18, 2013, may be subject to reductions in patent terms adjustment pursuant to 37 CFR 1.704(c)(13) if they are not in condition for examination within eight months from the filing date or date of commencement, respectively. "In condition for examination" includes compliance with 37 CFR 1.821 through 1.825 (see 37 CFR 1.704(f) ).
Provisional applications filed under 35 U.S.C. 111(b) need not comply with 37 CFR 1.821 through 1.825, however, applicants are encouraged to file a sequence listing as defined in 37 CFR 1.821(c) for ease of identification of the sequence information contained in the provisional application.
If any of the requirements of 37 CFR 1.821(b) - (f) are not satisfied at the time of filing an international application under the Patent Cooperation Treaty (PCT), which application is to be searched by the United States International Searching Authority or examined by the United States International Preliminary Examining Authority, applicant will be sent a notice necessitating compliance with the requirements within a prescribed time period. Submissions in reply to requirements under this paragraph must be accompanied by a statement that the submission does not include matter which goes beyond the disclosure in the international application as filed. Such a statement may be made by a registered practitioner, the applicant, an inventor, or the person who actually compares the sequence data on behalf of the aforementioned. International applications that fail to comply with any of the requirements of 37 CFR 1.821(b) - (f) will be searched and/or examined to the extent possible without the benefit of the information in computer readable form. See PCT Administrative Instructions Section 513(c).
The requirement to submit a statement that a submission in reply to the requirements of this section does not include new matter or matter which goes beyond the disclosure in the application as filed is not the first instance in which the applicant has been required to ensure that there is not new matter upon amendment. The requirement is analogous to that found in 37 CFR 1.125 regarding substitute specifications. When a substitute specification is required because the number or nature of amendments would make it difficult to examine the application, the applicant must include a statement that the substitute specification includes no new matter. The necessity of requiring a substitute sequence listing, or pages thereof, is similar to the necessity of requiring a substitute specification and, likewise, the burden is on the applicant to ensure that no new matter is added. Applicants have a duty to comply with the statutory prohibition (35 U.S.C. 132 and 35 U.S.C. 251 ) against the introduction of new matter.
The correction of errors in sequencing or any other errors that are made in describing an invention are subject to the statutory prohibition (35 U.S.C. 132 and 35 U.S.C. 251 ) against the introduction of new matter.
2422.08 Presumptions Regarding Compliance [R-08.2012]
Neither the presence nor absence of information which is not required under the sequence rules will create a presumption that such information is necessary to satisfy any of the requirements of 35 U.S.C. 112. Further, the grant of a patent on an application that is subject to 37 CFR 1.821 through 37 CFR 1.825 constitutes a presumption that the granted patent complies with the requirements of these rules.
2422.09 Box Sequence; Hand Delivery of Sequence Listings and Computer Readable Forms [R-07.2015]
To facilitate administrative processing of all papers and compact discs associated with sequence rule compliance, all computer readable forms, compact discs, fees, and papers accompanying them filed in the Office should be marked "Box SEQUENCE."
Correspondence relating to the sequence rules may also be hand-delivered to the Customer Service Window. In cases of hand delivery to the Customer Service Window, the computer readable form should be placed in a protective mailer labeled with at least the application number, if available. The labeling requirements of 37 CFR 1.52(e) and 1.824(a)(6) must also be complied with. The use of staples and clips, if any, should be confined to carefully attaching the mailer to the submitted papers without contact or compression of the media. In no situations should additional or complimentary electronic copies be delivered to examiners or other Office personnel.