You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
190 lines
13 KiB
190 lines
13 KiB
[comment {--- punk::docgen generated from inline doctools comments ---}] |
|
[comment {--- punk::docgen DO NOT EDIT DOCS HERE UNLESS YOU REMOVE THESE COMMENT LINES ---}] |
|
[comment {--- punk::docgen overwrites this file ---}] |
|
[manpage_begin punkshell_module_punk::fileline 0 0.1.0] |
|
[copyright "2024"] |
|
[titledesc {file line-handling utilities}] [comment {-- Name section and table of contents description --}] |
|
[moddesc {punk fileline}] [comment {-- Description at end of page heading --}] |
|
[require punk::fileline] |
|
[keywords module text parse file] |
|
[description] |
|
[para] - |
|
[section Overview] |
|
[para]Utilities for in-memory analysis of text file data as both line data and byte/char-counted data whilst preserving the line-endings (even if mixed) |
|
[para]This is important for certain text files where examining the number of chars/bytes is important |
|
[para]For example - windows .cmd/.bat files need some byte counting to determine if labels lie on chunk boundaries and need to be moved. |
|
[para]Despite including the word 'file', the library doesn't deal with reading/writing to the filesystem. It is for operating on text-file like data. |
|
[subsection Concepts] |
|
[para]A chunk of textfile data (possibly representing a whole file - but usually at least a complete set of lines) is loaded into a punk::fileline::class::textinfo instance at object creation. |
|
[example_begin] |
|
package require punk::fileline |
|
package require fileutil |
|
set rawdata [lb]fileutil::cat data.txt -translation binary[rb] |
|
punk::fileline::class::textinfo create obj_data $rawdata |
|
puts stdout [lb]obj_data linecount[rb] |
|
[example_end] |
|
[subsection Notes] |
|
[para]Line records are referred to by a zero-based index instead of a one-based index as is commonly used when displaying files. |
|
[para]This is for programming consistency and convenience, and the module user should do their own conversion to one-based indexing for line display or messaging if desired. |
|
[para]No support for lone carriage-returns being interpreted as line-endings. |
|
[para]CR line-endings that are intended to be interpreted as such should be mapped to something else before the data is supplied to this module. |
|
[subsection dependencies] |
|
[para] packages needed by punk::fileline |
|
[list_begin itemized] |
|
[item] [package {Tcl 8.6}] |
|
[list_end] [comment {- end dependencies list -}] |
|
[subsection {optional dependencies}] |
|
[para] packages that add functionality but aren't strictly required |
|
[list_begin itemized] |
|
[item] [package {punk::ansi}] |
|
[para] - recommended for class::textinfo [method chunk_boundary_display] |
|
[item] [package {punk::char}] |
|
[para] - recommended for class::textinfo [method chunk_boundary_display] |
|
[item] [package {overtype}] |
|
[para] - recommended for class::textinfo [method chunk_boundary_display] |
|
[list_end] [comment {- end optional dependencies list -}] |
|
[section API] |
|
[subsection {Namespace punk::fileline::class}] |
|
[para] class definitions |
|
[list_begin enumerated] |
|
[enum] CLASS [class textinfo] |
|
[list_begin definitions] |
|
[para] [emph METHODS] |
|
[call class::textinfo [method constructor] [arg datachunk] [opt {option value...}]] |
|
[para] Constructor for textinfo object which represents a chunk or all of a file |
|
[para] datachunk should be passed with the file data including line-endings as-is for full functionality. ie use something like: |
|
[example_begin] |
|
fconfigure $fd -translation binary |
|
set chunkdata [lb]read $fd[rb]] |
|
or |
|
set chunkdata [lb]fileutil::cat <filename> -translation binary[rb] |
|
[example_end] |
|
[para] when loading the data |
|
[call class::textinfo [method chunk] [arg chunkstart] [arg chunkend]] |
|
[para]Return a range of bytes from the underlying raw chunk data. |
|
[para] e.g The following retrieves the entire chunk |
|
[para] objName chunk 0 end |
|
[call class::textinfo [method chunklen]] |
|
[para] Number of bytes/characters in the raw data of the file |
|
[call class::textinfo [method chunk_boundary_display]] |
|
[para]Returns a string displaying the boundaries at chunksize bytes between chunkstart and chunkend |
|
[para]Defaults to using ansi colour if punk::ansi module is available. Use -ansi 0 to disable colour |
|
[call class::textinfo [method linecount]] |
|
[para] Number of lines in the raw data of the file, counted as per the policy in effect |
|
[call class::textinfo [method line] [arg lineindex]] |
|
[para]Reconstructs and returns the raw line using the payload and per-line stored line-ending metadata |
|
[para]A 'line' may be returned without a line-ending if the unerlying chunk had trailing data without a line-ending (or the chunk was loaded under a non-standard -policy setting) |
|
[para]Whilst such data may not conform to definitions (e.g POSIX) of the terms 'textfile' and 'line' - it is useful here to represent it as a line with metadata le set to "none" |
|
[para]To return just the data which might more commonly be needed for dealing with lines, use the [method linepayload] method - which returns the line data minus line-ending |
|
[call class::textinfo [method linepayload_find_glob] [arg globsearch] [opt {option value...}]] |
|
[para]Return a lineinfolist (see [method lineinfo] and [method lineinfolist]) of lines where payload matches the [arg globsearch] string |
|
[para]To limit the returned results use the -limit n option - where -limit 0 means return all matches. |
|
[para]For example: [method linepayload_find_glob] "*test*" -limit 1 |
|
[para]The result is always a list of lineinfo dictionaries even if one item is returned |
|
[para] -limitfrom can be start|end |
|
[para]The order of results is always the order as they occur in the data - even if -limitfrom end is specified. |
|
[para]-limitfrom end means that only the last -limit items are returned |
|
[para]Note that as glob accepts [lb]chars[rb]] to mean match any character in the set given by chars, searching for literal square brackets should be done by escaping the bracket with a backslash |
|
[para]This is true even if only a single square bracket is being searched for. e.g {*[lb]file*} will not find the word file followed by a left square-bracket - even though the search didn't close the square brackets. |
|
[para]In the above case - the literal search should be {*\[lb]file*} |
|
[call class::textinfo [method linepayload] [arg lineindex]] |
|
[para]Return the text of the line indicated by the zero-based lineindex |
|
[para]The line-ending is not returned in the data - but is still stored against this lineindex |
|
[para]Line Metadata such as the line-ending for a particular line and the byte/character range it occupies within the chunk can be retrieved with the [method linemeta] method |
|
[para]To retrieve both the line text and metadata in a single call the [method lineinfo] method can be used |
|
[para]To retrieve an entire line including line-ending use the [method line] method. |
|
[call class::textinfo [method linepayloads] [arg startindex] [arg endindex]] |
|
[para]Return a list of just the payloads in the specified linindex range, with no metadata. |
|
[call class::textinfo [method linemeta] [arg lineindex]] |
|
[para]Return a dict of the metadata for the line indicated by the zero-based lineindex |
|
[para]Keys returned include |
|
[list_begin itemized] |
|
[item] le |
|
[para] A string representing the type of line-ending: crlf|lf|none |
|
[item] linelen |
|
[para] The number of characters/bytes in the whole line including line-ending if any |
|
[item] payloadlen |
|
[para] The number of character/bytes in the line excluding line-ending |
|
[item] start |
|
[para] The zero-based index into the associated raw file data indicating at which byte/character index this line begins |
|
[item] end |
|
[para] The zero-based index into the associated raw file data indicating at which byte/character index this line ends |
|
[para] This end-point corresponds to the last character of the line-ending if any - not necessarily the last character of the line's payload |
|
[list_end] |
|
[call class::textinfo [method lineinfo] [arg lineindex]] |
|
[para]Return a dict of the metadata and text for the line indicated by the zero-based lineindex |
|
[para]This returns the same info as the [method linemeta] with an added key of 'payload' which is the text of the line without line-ending. |
|
[para]The 'payload' value is the same as is returned from the [method linepayload] method. |
|
[call class::textinfo [method lineinfolist] [arg startidx] [arg endidx]] |
|
[para]Returns list of lineinfo dicts for each line in line index range startidx to endidx |
|
[call class::textinfo [method linerange_to_chunkrange] [arg startidx] [arg endidx]] |
|
[call class::textinfo [method linerange_to_chunk] [arg startidx] [arg endidx]] |
|
[call class::textinfo [method lines] [arg startidx] [arg endidx]] |
|
[call class::textinfo [method linepayloads] [arg startidx] [arg endidx]] |
|
[call class::textinfo [method chunkrange_to_linerange] [arg chunkstart] [arg chunkend]] |
|
[call class::textinfo [method chunkrange_to_lineinfolist] [arg chunkstart] [arg chunkend] [opt {option value...}]] |
|
[para]Return a list of dicts each with structure like the result of the [method lineinfo] method - but possibly with extra keys for truncation information if -show_truncated 1 is supplied |
|
[para]The truncation key in a lineinfo dict may be returned for first and/or last line in the resulting list. |
|
[para]truncation shows the shortened (missing bytes on left and/or right side) part of the entire line (potentially including line-ending or even partial line-ending) |
|
[para]Note that this truncation info is only in the return value of this method - and will not be reflected in [method lineinfo] queries to the main chunk. |
|
[call class::textinfo [method numeric_linerange] [arg startidx] [arg endidx]] |
|
[para]A helper to return any Tcl-style end end-x values given to startidx or endidx; converted to their specific values based on the current state of the underlying line data |
|
[para]This is used internally by API functions such as [method line] to enable it to accept more expressive indices |
|
[call class::textinfo [method numeric_chunkrange] [arg startidx] [arg endidx]] |
|
[para]A helper to return any Tcl-style end end-x entries supplied to startidx or endidx; converted to their specific values based on the current state of the underlying chunk data |
|
[call class::textinfo [method normalize_indices] [arg startidx] [arg endidx] [arg max]] |
|
[para]A utility to convert some of the of Tcl-style list-index expressions such as end, end-1 etc to valid indices in the range 0 to the supplied max |
|
[para]Basic addition and subtraction expressions such as 4-1 5+2 are accepted |
|
[para]startidx higher than endidx is allowed |
|
[para]Unlike Tcl's index expressions - we raise an error if the calculated index is out of bounds 0 to max |
|
[call class::textinfo [method regenerate_lines]] |
|
[para]generate a list of lines from the current state of the stored raw data chunk and keep a map of line-endings indexed by lineindex |
|
[para]This is called automatically by the Constructor during object creation |
|
[para]It is exposed in the API experimentally - as chunk and line manipulation functions are considered. |
|
[para]TODO - review whether such manual control will be necessary/desirable |
|
[list_end] |
|
[list_end] [comment {--- end class enumeration ---}] |
|
[subsection {Namespace punk::fileline}] |
|
[para] Core API functions for punk::fileline |
|
[list_begin definitions] |
|
[list_end] [comment {--- end definitions namespace punk::fileline ---}] |
|
[subsection {Namespace punk::fileline::lib}] |
|
[para] Secondary functions that are part of the API |
|
[list_begin definitions] |
|
[call [fun lib::range_spans_chunk_boundaries] [arg start] [arg end] [arg chunksize]] |
|
[para]Takes start and end offset, generally representing bytes or character indices, and computes a list of boundaries at multiples of the chunksize that are spanned by the start and end range. |
|
[list_begin arguments] |
|
[arg_def integer start] |
|
[para] zero-based start index of range |
|
[arg_def integer end] |
|
[para] zero-based end index of range |
|
[arg_def integer chunksize] |
|
[para] Number of bytes/characters in chunk - must be positive and > 0 |
|
[list_end] |
|
[para]returns a dict with the keys is_span and boundaries |
|
[para]is_span 0|1 indicates if the range specified spans a boundary of chunksize |
|
[para]boundaries contains a list of the spanned boundaries - which are always multiples of the chunksize |
|
[para]e.g |
|
[example_begin] |
|
range_spans_chunk_boundaries 10 1750 512 |
|
is_span 1 boundaries {512 1024 1536} |
|
[example_end] |
|
[para]The -offset <int> option |
|
[example_begin] |
|
range_spans_chunk_boundaries 10 1750 512 -offset 2 |
|
is_span 1 boundaries {514 1026 1538} |
|
[example_end] |
|
[para] This function automatically uses lseq (if Tcl >= 8.7) when number of boundaries spanned is approximately greater than 75 |
|
[list_end] [comment {--- end definitions namespace punk::fileline::lib ---}] |
|
[section Internal] |
|
[subsection {Namespace punk::fileline::system}] |
|
[para] Internal functions that are not part of the API |
|
[subsection {Namespace punk::fileline::ansi}] |
|
[para]These are ansi functions imported from punk::ansi - or no-ops if that package is unavailable |
|
[para]See [package punk::ansi] for documentation |
|
[list_begin definitions] |
|
[call [fun ansi::a]] |
|
[call [fun ansi::a+]] |
|
[call [fun ansi::stripansi]] |
|
[list_end] [comment {--- end definitions namespace punk::fileline::ansi ---}] |
|
[manpage_end]
|
|
|