Skip to content

Commit

Permalink
SPIE parser (#70)
Browse files Browse the repository at this point in the history
* SPIE parser

* jats parser now handling spie + jats bugfixes + pubdate assigned to print and electronic if not specified

* deleted standalone spie files

* pubdate bugfix

---------

Co-authored-by: Mugdha Polimera <[email protected]>
  • Loading branch information
mugdhapolimera and Mugdha Polimera authored Oct 20, 2023
1 parent a4639b8 commit 39ae85a
Show file tree
Hide file tree
Showing 34 changed files with 6,568 additions and 16 deletions.
43 changes: 27 additions & 16 deletions adsingestp/parsers/jats.py
Original file line number Diff line number Diff line change
Expand Up @@ -594,19 +594,20 @@ def _parse_title_abstract(self):

self.base_metadata["title"] = self._detag(title, self.JATS_TAGSET["title"]).strip()

if self.article_meta.find("abstract") and self.article_meta.find("abstract").find("p"):
abstract_all = self.article_meta.find("abstract").find_all("p")
abstract_paragraph_list = list()
for paragraph in abstract_all:
para = self._detag(paragraph, self.JATS_TAGSET["abstract"])
abstract_paragraph_list.append(para)
self.base_metadata["abstract"] = "\n".join(abstract_paragraph_list)
# abstract = self._detag(
# self.article_meta.find("abstract").find("p"), self.JATS_TAGSET["abstract"]
# )
# self.base_metadata["abstract"] = abstract
if title_fn_list:
self.base_metadata["abstract"] += " " + " ".join(title_fn_list)
if self.article_meta.find("abstract"):
if self.article_meta.find("abstract").find("p"):
abstract_all = self.article_meta.find("abstract").find_all("p")
abstract_paragraph_list = list()
for paragraph in abstract_all:
para = self._detag(paragraph, self.JATS_TAGSET["abstract"])
abstract_paragraph_list.append(para)
self.base_metadata["abstract"] = "\n".join(abstract_paragraph_list)
if title_fn_list:
self.base_metadata["abstract"] += " " + " ".join(title_fn_list)
else:
abs_raw = self.article_meta.find("abstract")
abs_txt = self._detag(abs_raw, self.JATS_TAGSET["abstract"])
self.base_metadata["abstract"] = abs_txt

def _parse_author(self):
auth_affil = JATSAffils()
Expand Down Expand Up @@ -773,7 +774,7 @@ def _parse_pub(self):
issn_all = self.journal_meta.find_all("issn")
issns = []
for i in issn_all:
issns.append((i["pub-type"], self._detag(i, [])))
issns.append((i.get("pub-type", ""), self._detag(i, [])))
self.base_metadata["issn"] = issns

isbn_all = self.article_meta.find_all("isbn")
Expand Down Expand Up @@ -847,13 +848,23 @@ def _parse_ids(self):

def _parse_pubdate(self):
pub_dates = self.article_meta.find_all("pub-date")

for d in pub_dates:
pub_format = d.get("publication-format", "")
pub_type = d.get("pub-type", "")
pubdate = self._get_date(d)
if pub_format == "print" or pub_type == "ppub" or pub_type == "cover":
if (
pub_format == "print"
or pub_type == "ppub"
or pub_type == "cover"
or (pub_type == "" and pub_format == "")
):
self.base_metadata["pubdate_print"] = pubdate
elif pub_format == "electronic" or pub_type == "epub":
if (
pub_format == "electronic"
or pub_type == "epub"
or (pub_type == "" and pub_format == "")
):
self.base_metadata["pubdate_electronic"] = pubdate

if pub_type == "open-access":
Expand Down
526 changes: 526 additions & 0 deletions tests/stubdata/input/jats_spie_jmnmm_1.JMM.21.4.041407.xml

Large diffs are not rendered by default.

800 changes: 800 additions & 0 deletions tests/stubdata/input/jats_spie_opten_1.OE.62.4.048103.xml

Large diffs are not rendered by default.

280 changes: 280 additions & 0 deletions tests/stubdata/input/jats_spie_opten_1.OE.62.4.066101.xml

Large diffs are not rendered by default.

134 changes: 134 additions & 0 deletions tests/stubdata/input/jats_spie_spie_12.2663029.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd">
<article article-type="proceedings" dtd-version="3.0" xml:lang="en" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<issn>0277-786X</issn>
<isbn>9781510661387</isbn>
<publisher>
<publisher-name>SPIE</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.1117/12.2663029</article-id>
<title-group>
<article-title>
Single photon flux imaging with sub-pixel resolution by motion compensation
</article-title>
</title-group>
<contrib-group content-type="volume">
<contrib contrib-type="editor">
<name>
<surname>Itzler</surname>
<given-names>Mark A.</given-names>
</name>
<xref ref-type="aff" rid="aff4" />
</contrib>
<contrib contrib-type="editor">
<name>
<surname>Bienfang</surname>
<given-names>Joshua C.</given-names>
</name>
<xref ref-type="aff" rid="aff5" />
</contrib>
<contrib contrib-type="editor">
<name>
<surname>McIntosh</surname>
<given-names>K. Alex</given-names>
</name>
<xref ref-type="aff" rid="aff6" />
</contrib>
</contrib-group>
<contrib-group content-type="article">
<contrib contrib-type="author">
<name>
<surname>Laurenzis</surname>
<given-names>Martin</given-names>
</name>
<xref ref-type="aff" rid="aff1" />
</contrib>
<contrib contrib-type="author">
<name>
<surname>Bacher</surname>
<given-names>Emmanuel</given-names>
</name>
<xref ref-type="aff" rid="aff1" />
</contrib>
<contrib contrib-type="author">
<name>
<surname>Seets</surname>
<given-names>Trevor</given-names>
</name>
<xref ref-type="aff" rid="aff2" />
</contrib>
<contrib contrib-type="author">
<name>
<surname>Ingle</surname>
<given-names>Atul</given-names>
</name>
<xref ref-type="aff" rid="aff3" />
</contrib>
<contrib contrib-type="author">
<name>
<surname>Velten</surname>
<given-names>Andreas</given-names>
</name>
<xref ref-type="aff" rid="aff2" />
</contrib>
<contrib contrib-type="author">
<name>
<surname>Christnacher</surname>
<given-names>Frank</given-names>
</name>
<xref ref-type="aff" rid="aff1" />
</contrib>
</contrib-group>
<aff id="aff1">Institut Franco-Allemand de Recherches de Saint-Louis (France)</aff>
<aff id="aff2">Univ. of Wisconsin-Madison (United States)</aff>
<aff id="aff3">Portland State Univ. (United States)</aff>
<aff id="aff4">Argo AI, LLC (United States)</aff>
<aff id="aff5">National Institute of Standards and Technology (United States)</aff>
<aff id="aff6">MIT Lincoln Lab. (United States)</aff>
<pub-date>
<day>15</day>
<month>6</month>
<year>2023</year>
</pub-date>
<volume>12512</volume>
<fpage>125120E</fpage>
<lpage>125120E-9</lpage>
<permissions>
<copyright-statement>COPYRIGHT SPIE. Downloading of the abstract is permitted for personal use only.</copyright-statement>
<copyright-year>2023</copyright-year>
</permissions>
<self-uri xlink:title="pdf" xlink:href="125120E.pdf"/>
<abstract>
Single photon counting avalanche diodes (SPADs) are versatile sensors for active and time-correlated measurements such as ranging and fluorescence imaging. These detectors also have great potential for passive or uncorrelated imaging. Recently, it was demonstrated that passive imaging of photon flux is possible by determining the mean photon arrival time. For ambient light illumination, timestamp data can be interpreted as a metric for the photon impingement rate. Various applications have been investigated including high-dynamic-range imaging, single-photon imaging, and capture of fast-moving objects or dynamic scenes. However, the appearance of noise and motion blur requires sophisticated signal processing that enables sub-pixel resolution imaging and reconstruction of the scene by motion compensation. In this paper, we present new results on the evaluation of global scene motion. In our approach, motion is intentionally generated by a rotating wedge prism, resulting in continuous global motion on a circular path. We have studied scenes with different optical contrast.
</abstract>
<conference content-type="conf-level-1">
<conf-date />
<conf-name>Defense + Commercial Sensing</conf-name>
<conf-acronym>DCS</conf-acronym>
<conf-num>7</conf-num>
</conference>
<conference content-type="conf-level-2">
<conf-date>2023-04-30|2023-05-05</conf-date>
<conf-name>SPIE Defense + Commercial Sensing</conf-name>
<conf-acronym>DCS23</conf-acronym>
<conf-num>2619426</conf-num>
<conf-loc>Orlando, Florida, United States</conf-loc>
</conference>
<conference content-type="volume">
<conf-date />
<conf-name>Advanced Photon Counting Techniques XVII</conf-name>
<conf-num>12512</conf-num>
</conference>
<conference content-type="session">
<conf-date />
<conf-name>Lidar</conf-name>
<conf-num>6</conf-num>
</conference>
</article-meta>
</front>
<back>
</back>
</article>
125 changes: 125 additions & 0 deletions tests/stubdata/input/jats_spie_spie_12.2663066.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd">
<article article-type="proceedings" dtd-version="3.0" xml:lang="en" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<issn>0277-786X</issn>
<isbn>9781510661387</isbn>
<publisher>
<publisher-name>SPIE</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.1117/12.2663066</article-id>
<title-group>
<article-title>
Low-SWaP embedded 3D-LiDAR to detect non-cooperative targets
</article-title>
</title-group>
<contrib-group content-type="volume">
<contrib contrib-type="editor">
<name>
<surname>Itzler</surname>
<given-names>Mark A.</given-names>
</name>
<xref ref-type="aff" rid="aff2" />
</contrib>
<contrib contrib-type="editor">
<name>
<surname>Bienfang</surname>
<given-names>Joshua C.</given-names>
</name>
<xref ref-type="aff" rid="aff3" />
</contrib>
<contrib contrib-type="editor">
<name>
<surname>McIntosh</surname>
<given-names>K. Alex</given-names>
</name>
<xref ref-type="aff" rid="aff4" />
</contrib>
</contrib-group>
<contrib-group content-type="article">
<contrib contrib-type="author">
<name>
<surname>Riviere</surname>
<given-names>Nicolas</given-names>
</name>
<xref ref-type="aff" rid="aff1" />
</contrib>
<contrib contrib-type="author">
<name>
<surname>Dupouy</surname>
<given-names>Paul-Édouard</given-names>
</name>
<xref ref-type="aff" rid="aff1" />
</contrib>
<contrib contrib-type="author">
<name>
<surname>Moussous</surname>
<given-names>Ahmed</given-names>
</name>
<xref ref-type="aff" rid="aff1" />
</contrib>
<contrib contrib-type="author">
<name>
<surname>Schilling</surname>
<given-names>Anita</given-names>
</name>
<xref ref-type="aff" rid="aff1" />
</contrib>
<contrib contrib-type="author">
<name>
<surname>Viala</surname>
<given-names>Erwan</given-names>
</name>
<xref ref-type="aff" rid="aff1" />
</contrib>
</contrib-group>
<aff id="aff1">ONERA/DOTA, Univ. of Toulouse (France)</aff>
<aff id="aff2">Argo AI, LLC (United States)</aff>
<aff id="aff3">National Institute of Standards and Technology (United States)</aff>
<aff id="aff4">MIT Lincoln Lab. (United States)</aff>
<pub-date>
<day>15</day>
<month>6</month>
<year>2023</year>
</pub-date>
<volume>12512</volume>
<fpage>125120D</fpage>
<lpage>125120D-7</lpage>
<permissions>
<copyright-statement>COPYRIGHT SPIE. Downloading of the abstract is permitted for personal use only.</copyright-statement>
<copyright-year>2023</copyright-year>
</permissions>
<self-uri xlink:title="pdf" xlink:href="125120D.pdf"/>
<abstract>
ONERA – The French Aerospace Lab – develops new concepts of 3D-LiDAR imaging systems including new sensor technologies such as detector for photon counting and, associated data processing. The rising complexities and costs of high performance systems, and the shrinking time to design drove the ONERA approach. The home-grown MATLIS software has been evolving for the past decade. It allows both linear mode LiDAR and single photon electro-optical systems simulation (both GmAPD and SPL) embedded on dynamic platforms (eg. UAVs, Aircrafts). The static or dynamic 3D scene is fully described both in terms of geometry and optical properties (eg. reflectance, background illumination, and atmosphere). The scanning system and the platform motion are taken into account. Laser propagation is fully modelled including atmospheric effects such as turbulence, absorption, and backscattering in the forward and backward directions. Target interaction is angle dependent (temporal broadening and directional backscattering). Optical full-wave-form signal is computed in the focal plane of the imaging system. A 3D point cloud is generated using sensor models (including but not limited to APD, GmAPD, SiPM…). Here, we describe our end-to-end MATLIS software and present validation cases. Then, we apply a complete performance analysis study to design a novel and original concept of low-SWaP 3D-LiDAR to detect non-cooperative targets from a stratospheric surveillance platform.
</abstract>
<conference content-type="conf-level-1">
<conf-date />
<conf-name>Defense + Commercial Sensing</conf-name>
<conf-acronym>DCS</conf-acronym>
<conf-num>7</conf-num>
</conference>
<conference content-type="conf-level-2">
<conf-date>2023-04-30|2023-05-05</conf-date>
<conf-name>SPIE Defense + Commercial Sensing</conf-name>
<conf-acronym>DCS23</conf-acronym>
<conf-num>2619426</conf-num>
<conf-loc>Orlando, Florida, United States</conf-loc>
</conference>
<conference content-type="volume">
<conf-date />
<conf-name>Advanced Photon Counting Techniques XVII</conf-name>
<conf-num>12512</conf-num>
</conference>
<conference content-type="session">
<conf-date />
<conf-name>Lidar</conf-name>
<conf-num>6</conf-num>
</conference>
</article-meta>
</front>
<back>
</back>
</article>
Loading

0 comments on commit 39ae85a

Please sign in to comment.