diff --git a/docs/Adding-documentations-to-DevDocs.md b/docs/Adding-documentations-to-DevDocs.md new file mode 100644 index 0000000000..f86ad62c53 --- /dev/null +++ b/docs/Adding-documentations-to-DevDocs.md @@ -0,0 +1,22 @@ +Adding a documentation may look like a daunting task but once you get the hang of it, it's actually quite simple. Don't hesitate to ask for help on the [mailing list](https://groups.google.com/d/forum/devdocs) if you ever get stuck. + +**Note:** please read the [contributing guidelines](https://github.com/Thibaut/devdocs/blob/master/.github/CONTRIBUTING.md) before submitting a new documentation. + +1. Create a subclass of `Docs::UrlScraper` or `Docs::FileScraper` in the `lib/docs/scrapers/` directory. Its name should be the [PascalCase](http://api.rubyonrails.org/classes/String.html#method-i-camelize) equivalent of the filename (e.g. `my_doc` → `MyDoc`) +2. Add the appropriate class attributes and filter options (see the [Scraper Reference](https://github.com/Thibaut/devdocs/wiki/Scraper-Reference) page). +3. Check that the scraper is listed in `thor docs:list`. +4. Create filters specific to the scraper in the `lib/docs/filters/[my_doc]/` directory and add them to the class's [filter stacks](https://github.com/Thibaut/devdocs/wiki/Scraper-Reference#filter-stacks). You may create any number of filters but will need at least the following two: + * A [`CleanHtml`](https://github.com/Thibaut/devdocs/wiki/Filter-Reference#cleanhtmlfilter) filter whose task is to clean the HTML markup (e.g. adding `id` attributes to headings) and remove everything superfluous and/or nonessential. + * An [`Entries`](https://github.com/Thibaut/devdocs/wiki/Filter-Reference#entriesfilter) filter whose task is to determine the pages' metadata (the list of entries, each with a name, type and path). + The [Filter Reference](https://github.com/Thibaut/devdocs/wiki/Filter-Reference) page has all the details about filters. +5. Using the `thor docs:page [my_doc] [path]` command, check that the scraper works properly. Files will appear in the `public/docs/[my_doc]/` directory (but not inside the app as the command doesn't touch the index). `path` in this case refers to either the remote path (if using `UrlScraper`) or the local path (if using `FileScraper`). +6. Generate the full documentation using the `thor docs:generate [my_doc] --force` command. Additionally, you can use the `--verbose` option to see which files are being created/updated/deleted (useful to see what changed since the last run), and the `--debug` option to see which URLs are being requested and added to the queue (useful to pin down which page adds unwanted URLs to the queue). +7. Start the server, open the app, enable the documentation, and see how everything plays out. +8. Tweak the scraper/filters and repeat 5) and 6) until the pages and metadata are ok. +9. To customize the pages' styling, create an SCSS file in the `assets/stylesheets/pages/` directory and import it in both `application.css.scss` AND `application-dark.css.scss`. Both the file and CSS class should be named `_[type]` where [type] is equal to the scraper's `type` attribute (documentations with the same type share the same custom CSS and JS). _(Note: feel free to submit a pull request without custom CSS/JS)_ +10. To add syntax highlighting or execute custom JavaScript on the pages, create a file in the `assets/javascripts/views/pages/` directory (take a look at the other files to see how it works). +11. Add the documentation's icon in the `public/icons/docs/[my_doc]/` directory, in both 16x16 and 32x32-pixels formats. It'll be added to the icon sprite after your pull request is merged. + +If the documentation includes more than a few hundreds pages and is available for download, try to scrape it locally (e.g. using `FileScraper`). It'll make the development process much faster and avoids putting too much load on the source site. (It's not a problem if your scraper is coupled to your local setup, just explain how it works in your pull request.) + +Finally, try to document your scraper and filters' behavior as much as possible using comments (e.g. why some URLs are ignored, HTML markup removed, metadata that way, etc.). It'll make updating the documentation much easier. \ No newline at end of file diff --git a/docs/Filter-Reference.md b/docs/Filter-Reference.md new file mode 100644 index 0000000000..de82b2cb46 --- /dev/null +++ b/docs/Filter-Reference.md @@ -0,0 +1,226 @@ +--- + +**Table of contents:** + +* [Overview](https://github.com/Thibaut/devdocs/wiki/Filter-Reference#overview) +* [Instance methods](https://github.com/Thibaut/devdocs/wiki/Filter-Reference#instance-methods) +* [Core filters](https://github.com/Thibaut/devdocs/wiki/Filter-Reference#core-filters) +* [Custom filters](https://github.com/Thibaut/devdocs/wiki/Filter-Reference#custom-filters) + - [CleanHtmlFilter](https://github.com/Thibaut/devdocs/wiki/Filter-Reference#cleanhtmlfilter) + - [EntriesFilter](https://github.com/Thibaut/devdocs/wiki/Filter-Reference#entriesfilter) + +## Overview + +Filters use the [HTML::Pipeline](https://github.com/jch/html-pipeline) library. They take an HTML string or [Nokogiri](http://nokogiri.org/) node as input, optionally perform modifications and/or extract information from it, and then outputs the result. Together they form a pipeline where each filter hands its output to the next filter's input. Every documentation page passes through this pipeline before being copied on the local filesystem. + +Filters are subclasses of the [`Docs::Filter`](https://github.com/Thibaut/devdocs/blob/master/lib/docs/core/filter.rb) class and require a `call` method. A basic implementation looks like this: + +```ruby +module Docs + class CustomFilter < Filter + def call + doc + end + end +end +``` + +Filters which manipulate the Nokogiri node object (`doc` and related methods) are _HTML filters_ and must not manipulate the HTML string (`html`). Vice-versa, filters which manipulate the string representation of the document are _text filters_ and must not manipulate the Nokogiri node object. The two types are divided into two stacks within the scrapers. These stacks are then combined into a pipeline that calls the HTML filters before the text filters (more details [here](https://github.com/Thibaut/devdocs/wiki/Scraper-Reference#filter-stacks)). This is to avoid parsing the document multiple times. + +The `call` method must return either `doc` or `html`, depending on the type of filter. + +## Instance methods + +* `doc` [Nokogiri::XML::Node] + The Nokogiri representation of the container element. + See [Nokogiri's API docs](http://www.rubydoc.info/github/sparklemotion/nokogiri/Nokogiri/XML/Node) for the list of available methods. + +* `html` [String] + The string representation of the container element. + +* `context` [Hash] **(frozen)** + The scraper's `options` along with a few additional keys: `:base_url`, `:root_url`, `:root_page` and `:url`. + +* `result` [Hash] + Used to store the page's metadata and pass back information to the scraper. + Possible keys: + + - `:path` — the page's normalized path + - `:store_path` — the path where the page will be stored (equal to `:path` with `.html` at the end) + - `:internal_urls` — the list of distinct internal URLs found within the page + - `:entries` — the [`Entry`](https://github.com/Thibaut/devdocs/blob/master/lib/docs/core/models/entry.rb) objects to add to the index + +* `css`, `at_css`, `xpath`, `at_xpath` + Shortcuts for `doc.css`, `doc.xpath`, etc. + +* `base_url`, `current_url`, `root_url` [Docs::URL] + Shortcuts for `context[:base_url]`, `context[:url]`, and `context[:root_url]` respectively. + +* `root_path` [String] + Shortcut for `context[:root_path]`. + +* `subpath` [String] + The sub-path from the base URL of the current URL. + _Example: if `base_url` equals `example.com/docs` and `current_url` equals `example.com/docs/file?raw`, the returned value is `/file`._ + +* `slug` [String] + The `subpath` removed of any leading slash or `.html` extension. + _Example: if `subpath` equals `/dir/file.html`, the returned value is `dir/file`._ + +* `root_page?` [Boolean] + Returns `true` if the current page is the root page. + +* `initial_page?` [Boolean] + Returns `true` if the current page is the root page or its subpath is one of the scraper's `initial_paths`. + +## Core filters + +* [`ContainerFilter`](https://github.com/Thibaut/devdocs/blob/master/lib/docs/filters/core/container.rb) — changes the root node of the document (remove everything outside) +* [`CleanHtmlFilter`](https://github.com/Thibaut/devdocs/blob/master/lib/docs/filters/core/clean_html.rb) — removes HTML comments, `