Сразу вспомнилось про find-replace, ранее нашел split, "вспомнил", что каждая строка - это (упорядоченный)список, значит можно манипулировать индексами s[i] s[i:j] по позициям элемента... А это неправильно, не список, а КОРТЕЖ... Значи, подстроки нельзя изменять простым присваиванием...

3.1.2. Strings
Sequence Types — str, unicode, list, tuple, bytearray, buffer, xrange
String Methods
String Formatting
String Formatting Operations

Надо еще вспомнить прос строки Юникода, но здесь лишь напомню¶

In [19]:

 print u'Гыыыы',u'\u0413\u044b\u044b\u044b\u044b'

Гыыыы Гыыыы

In [18]:

u'Гыыыы'

Out[18]:

u'\u0413\u044b\u044b\u044b\u044b'

In []:

что строки кириллицы - это объекты юникода, т.е. другие объекты... А далее вспомним про объект "Strings"

In [2]:

sstring='http://127.127.12.7:8080'
ss= sstring.split(':')  #[0] --> 'http'

In [3]:

ss

Out[3]:

['http', '//127.127.12.7', '8080']

In []:

Получили список строк, к которому, естественно, применимы все методы списков...

In [7]:

'How we can remove "//" from "%s"'% ss[1]

Out[7]:

'How we can remove "//" from "//127.127.12.7"'

Каждая строка - это не список, а кортеж (!!!) букв¶

In [11]:

ss[1][1],ss[1][2],ss[1][3],ss[1][0:2],

Out[11]:

('/', '1', '2', '//')

In [13]:

print "Было так %s" %ss
# Присвоение ниже не работает
ss[1][0:2]=''
print "Стало", ss

Было так ['http', '//127.127.12.7', '8080']

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-ec471e6cc8fd> in <module>()
      1 print "Было так %s" %ss
----> 2 ss[1][0:2]=''
      3 print "Стало", ss

TypeError: 'str' object does not support item assignment

In [27]:

help(sstring.split)

Help on built-in function split:

split(...)
    S.split([sep [,maxsplit]]) -> list of strings
    
    Return a list of the words in the string S, using sep as the
    delimiter string.  If maxsplit is given, at most maxsplit
    splits are done. If sep is not specified or is None, any
    whitespace string is a separator and empty strings are removed
    from the result.

In [69]:

sss='11--фыва2--фыва3--фыва4--фыва5--фыва'
sss.split('фы',3)

Out[69]:

['11--',
 '\xd0\xb2\xd0\xb02--',
 '\xd0\xb2\xd0\xb03--',
 '\xd0\xb2\xd0\xb04--\xd1\x84\xd1\x8b\xd0\xb2\xd0\xb05--\xd1\x84\xd1\x8b\xd0\xb2\xd0\xb0']

In [71]:

decode(sss.split('фы',3)[2])

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-71-e8efdeb0a3ca> in <module>()
----> 1 decode(sss.split('фы',3)[2])

NameError: name 'decode' is not defined

In [28]:

help(sstring.rsplit)

Help on built-in function rsplit:

rsplit(...)
    S.rsplit([sep [,maxsplit]]) -> list of strings
    
    Return a list of the words in the string S, using sep as the
    delimiter string, starting at the end of the string and working
    to the front.  If maxsplit is given, at most maxsplit splits are
    done. If sep is not specified or is None, any whitespace string
    is a separator.

In [36]:

help(sstring.splitlines)

Help on built-in function splitlines:

splitlines(...)
    S.splitlines(keepends=False) -> list of strings
    
    Return a list of the lines in S, breaking at line boundaries.
    Line breaks are not included in the resulting list unless keepends
    is given and true.

In [64]:

sss='qwerqwe1\nrqwe2\n111фыва3\nфывафы4\nвафывафыва'
sss.splitlines() # =(0)=False

Out[64]:

['qwerqwe1',
 'rqwe2',
 '111\xd1\x84\xd1\x8b\xd0\xb2\xd0\xb03',
 '\xd1\x84\xd1\x8b\xd0\xb2\xd0\xb0\xd1\x84\xd1\x8b4',
 '\xd0\xb2\xd0\xb0\xd1\x84\xd1\x8b\xd0\xb2\xd0\xb0\xd1\x84\xd1\x8b\xd0\xb2\xd0\xb0']

In [67]:

sss='qwerqwe1\nrqwe2\n111фыва3\nфывафы4\nвафывафыва'
sss.splitlines(True) # or any digit >< 0  --> with "\n" as keepends

Out[67]:

['qwerqwe1\n',
 'rqwe2\n',
 '111\xd1\x84\xd1\x8b\xd0\xb2\xd0\xb03\n',
 '\xd1\x84\xd1\x8b\xd0\xb2\xd0\xb0\xd1\x84\xd1\x8b4\n',
 '\xd0\xb2\xd0\xb0\xd1\x84\xd1\x8b\xd0\xb2\xd0\xb0\xd1\x84\xd1\x8b\xd0\xb2\xd0\xb0']

In []:

In [29]:

help(sstring.strip)

Help on built-in function strip:

strip(...)
    S.strip([chars]) -> string or unicode
    
    Return a copy of the string S with leading and trailing
    whitespace removed.
    If chars is given and not None, remove characters in chars instead.
    If chars is unicode, S will be converted to unicode before stripping

In [57]:

sss='qwerqwerqwe111фывафывафывафывафыва'
sss.strip('qw')

Out[57]:

'erqwerqwe111\xd1\x84\xd1\x8b\xd0\xb2\xd0\xb0\xd1\x84\xd1\x8b\xd0\xb2\xd0\xb0\xd1\x84\xd1\x8b\xd0\xb2\xd0\xb0\xd1\x84\xd1\x8b\xd0\xb2\xd0\xb0\xd1\x84\xd1\x8b\xd0\xb2\xd0\xb0'

In [58]:

sss='qwerqwerqwe111фывафывафывафывафыва'
sss.strip('фыва')

Out[58]:

'qwerqwerqwe111'

In [34]:

help(sstring.lstrip)

Help on built-in function lstrip:

lstrip(...)
    S.lstrip([chars]) -> string or unicode
    
    Return a copy of the string S with leading whitespace removed.
    If chars is given and not None, remove characters in chars instead.
    If chars is unicode, S will be converted to unicode before stripping

In [30]:

help(sstring.find)

Help on built-in function find:

find(...)
    S.find(sub [,start [,end]]) -> int
    
    Return the lowest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.
    
    Return -1 on failure.

In [31]:

help(sstring.replace)

Help on built-in function replace:

replace(...)
    S.replace(old, new[, count]) -> string
    
    Return a copy of string S with all occurrences of substring
    old replaced by new.  If the optional argument count is
    given, only the first count occurrences are replaced.

In [56]:

sss='qwerqwerqwe111rqwerqwerqwer'
sss.replace ('er','ER',2)

Out[56]:

'qwERqwERqwe111rqwerqwerqwer'

In [32]:

help(sstring.zfill)

Help on built-in function zfill:

zfill(...)
    S.zfill(width) -> string
    
    Pad a numeric string S with zeros on the left, to fill a field
    of the specified width.  The string S is never truncated.

In [55]:

sstring='123'
sstring.zfill(5)

Out[55]:

'00123'

In [33]:

help(sstring.partition)

Help on built-in function partition:

partition(...)
    S.partition(sep) -> (head, sep, tail)
    
    Search for the separator sep in S, and return the part before it,
    the separator itself, and the part after it.  If the separator is not
    found, return S and two empty strings.

In [53]:

sss='qwerqw111erqwrrqwr'
sss.partition('1')

Out[53]:

('qwerqw', '1', '11erqwrrqwr')

In [35]:

help(sstring.ljust)

Help on built-in function ljust:

ljust(...)
    S.ljust(width[, fillchar]) -> string
    
    Return S left-justified in a string of length width. Padding is
    done using the specified fill character (default is a space).

In [37]:

help(sstring.translate)

Help on built-in function translate:

translate(...)
    S.translate(table [,deletechars]) -> string
    
    Return a copy of the string S, where all characters occurring
    in the optional argument deletechars are removed, and the
    remaining characters have been mapped through the given
    translation table, which must be a string of length 256 or None.
    If the table argument is None, no translation is applied and
    the operation simply removes the characters in deletechars.

In [44]:

sss='qwerqwerqwrrqwr'
sss.translate(None,'qr')

Out[44]:

'weweww'

In [50]:

sss='qwerqwerqwrrqwr'
sss.translate(None,'qrw')

Out[50]:

'ee'

In [24]:

list(dir(sstring))

Out[24]:

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__getslice__',
 '__gt__',
 '__hash__',
 '__init__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '_formatter_field_name_split',
 '_formatter_parser',
 'capitalize',
 'center',
 'count',
 'decode',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'index',
 'isalnum',
 'isalpha',
 'isdigit',
 'islower',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',
 'zfill']

In [23]:

dir(ss)

Out[23]:

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__delslice__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getslice__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__setslice__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

In [26]:

help(string)

no Python documentation found for 'http://127.127.12.7:8080'

In []:

Посты чуть ниже также могут вас заинтересовать

iPython R Rapid Miner

Поиск по блогу

Страницы

воскресенье, 23 ноября 2014 г.

Как лучше парсить, чистить и склеивать строки вида 'http://127.127.0.1:8080'

Надо еще вспомнить прос строки Юникода, но здесь лишь напомню¶

Каждая строка - это не список, а кортеж (!!!) букв¶

Комментариев нет:

Отправить комментарий

Поиск по блогу

Страницы

воскресенье, 23 ноября 2014 г.

Как лучше парсить, чистить и склеивать строки вида 'http://127.127.0.1:8080'

Надо еще вспомнить прос строки Юникода, но здесь лишь напомню¶

Каждая строка - это не список, а кортеж (!!!) букв¶

Комментариев нет:

Отправить комментарий

воскресенье, 23 ноября 2014 г.